Re: [GIT PULL] KVM fixes for 2.6.26-rc7

2008-06-25 Thread Gerd Hoffmann
> You mean with CONFIG_KVM_CLOCK=n it boots fine, I suppose.
> 
> You should upgrade the guest kernel to the git tree, kvm clock changes
> break compatibility with older kernels.

For completeness: older -rc kernels only, mixing -rc8 with 2.6.25 and
older is fine.

cheers,
  Gerd

-- 
http://kraxel.fedorapeople.org/xenner/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[PATCH] KVM: update gitignore

2008-06-25 Thread Amit Shah
The new coalesced_mmio.[ch] files need to be ignored in the userspace
kernel/ directory.

Signed-off-by: Amit Shah <[EMAIL PROTECTED]>
---
 .gitignore |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/.gitignore b/.gitignore
index 48edae1..fee15ed 100644
--- a/.gitignore
+++ b/.gitignore
@@ -50,6 +50,8 @@ kernel/mmu.h
 kernel/modules.order
 kernel/tss.h
 kernel/x86.c
+kernel/coalesced_mmio.c
+kernel/coalesced_mmio.h
 qemu/pc-bios/extboot.bin
 qemu/qemu-doc.html
 qemu/*.1
-- 
1.5.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM: pvmmu breakage with gcc 4.3.0

2008-06-25 Thread Anthony Liguori

Marcelo Tosatti wrote:

Some pvmmu functions store their commands on stack, and newer GCC
versions conclude that these commands are unused.

So stick an inline asm statement to convince the compiler otherwise.
  


I think a better fix is to add a "memory " clobber to the hypercalls.  
This isn't really a GCC bug since it doesn't realize that hypercalls can 
touch memory.


See the attached patch.

Avi: please push this for 2.6.26 if possible.

Regards,

Anthony Liguori


Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>


diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8b7a3cf..c892752 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -55,6 +55,12 @@ static void kvm_mmu_op(void *buffer, unsigned len)
int r;
unsigned long a1, a2;
 
+	/*

+* GCC 4.3.0 concludes that on-stack kvm_mmu_op* is unused and
+* optimizes its initialization away.
+*/
+asm ("" : : "p" (buffer));
+
do {
a1 = __pa(buffer);
a2 = 0;   /* on i386 __pa() always returns <4G */
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  




hypercall-memory-clobber.patch
Description: application/mbox


KVM: pvmmu breakage with gcc 4.3.0

2008-06-25 Thread Marcelo Tosatti

Some pvmmu functions store their commands on stack, and newer GCC
versions conclude that these commands are unused.

So stick an inline asm statement to convince the compiler otherwise.

Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>


diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8b7a3cf..c892752 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -55,6 +55,12 @@ static void kvm_mmu_op(void *buffer, unsigned len)
int r;
unsigned long a1, a2;
 
+   /*
+* GCC 4.3.0 concludes that on-stack kvm_mmu_op* is unused and
+* optimizes its initialization away.
+*/
+asm ("" : : "p" (buffer));
+
do {
a1 = __pa(buffer);
a2 = 0;   /* on i386 __pa() always returns <4G */
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Add ept_sync_context in flush_tlb

2008-06-25 Thread Yang, Sheng
On Wednesday 25 June 2008 20:02:17 Avi Kivity wrote:
> Yang, Sheng wrote:
> > From 54dc26e44f1c0aa460bef409b799f36dae56a911 Mon Sep 17 00:00:00 2001
> > From: Sheng Yang <[EMAIL PROTECTED]>
> > Date: Wed, 18 Jun 2008 11:23:13 +0800
> > Subject: [PATCH] KVM: VMX: Add ept_sync_context in flush_tlb
> >
> > Fix a potention issue caused by kvm_mmu_slot_remove_write_access().
> > The old behavior don't sync EPT TLB with modified EPT entry, which
> > result in inconsistent content of EPT TLB and EPT table.
> >
> >
> > @@ -1407,6 +1408,8 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
> >  static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
> >  {
> > vpid_sync_vcpu_all(to_vmx(vcpu));
> > +   if (vm_need_ept())
> > +   ept_sync_context(to_vmx(vcpu));
> >  }
>
> So we're flushing both the vpid tlb and the ept context?  What does an
> ept context flush mean exactly?  tlb entries for gpa->hpa?

Yeah, the entries for gpa->hpa. So if we don't do this, cpu may see rw entry 
rather than ro, then write to it directly rather than fall into KVM. 

-- 
Thanks
Yang, Sheng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3 of 3] mmu-notifier-core

2008-06-25 Thread Andrea Arcangeli
From: Andrea Arcangeli <[EMAIL PROTECTED]>

With KVM/GFP/XPMEM there isn't just the primary CPU MMU pointing to
pages. There are secondary MMUs (with secondary sptes and secondary
tlbs) too. sptes in the kvm case are shadow pagetables, but when I say
spte in mmu-notifier context, I mean "secondary pte". In GRU case
there's no actual secondary pte and there's only a secondary tlb
because the GRU secondary MMU has no knowledge about sptes and every
secondary tlb miss event in the MMU always generates a page fault that
has to be resolved by the CPU (this is not the case of KVM where the a
secondary tlb miss will walk sptes in hardware and it will refill the
secondary tlb transparently to software if the corresponding spte is
present). The same way zap_page_range has to invalidate the pte before
freeing the page, the spte (and secondary tlb) must also be
invalidated before any page is freed and reused.

Currently we take a page_count pin on every page mapped by sptes, but
that means the pages can't be swapped whenever they're mapped by any
spte because they're part of the guest working set. Furthermore a spte
unmap event can immediately lead to a page to be freed when the pin is
released (so requiring the same complex and relatively slow tlb_gather
smp safe logic we have in zap_page_range and that can be avoided
completely if the spte unmap event doesn't require an unpin of the
page previously mapped in the secondary MMU).

The mmu notifiers allow kvm/GRU/XPMEM to attach to the tsk->mm and
know when the VM is swapping or freeing or doing anything on the
primary MMU so that the secondary MMU code can drop sptes before the
pages are freed, avoiding all page pinning and allowing 100% reliable
swapping of guest physical address space. Furthermore it avoids the
code that teardown the mappings of the secondary MMU, to implement a
logic like tlb_gather in zap_page_range that would require many IPI to
flush other cpu tlbs, for each fixed number of spte unmapped.

To make an example: if what happens on the primary MMU is a protection
downgrade (from writeable to wrprotect) the secondary MMU mappings
will be invalidated, and the next secondary-mmu-page-fault will call
get_user_pages and trigger a do_wp_page through get_user_pages if it
called get_user_pages with write=1, and it'll re-establishing an
updated spte or secondary-tlb-mapping on the copied page. Or it will
setup a readonly spte or readonly tlb mapping if it's a guest-read, if
it calls get_user_pages with write=0. This is just an example.

This allows to map any page pointed by any pte (and in turn visible in
the primary CPU MMU), into a secondary MMU (be it a pure tlb like GRU,
or an full MMU with both sptes and secondary-tlb like the
shadow-pagetable layer with kvm), or a remote DMA in software like
XPMEM (hence needing of schedule in XPMEM code to send the invalidate
to the remote node, while no need to schedule in kvm/gru as it's an
immediate event like invalidating primary-mmu pte).

At least for KVM without this patch it's impossible to swap guests
reliably. And having this feature and removing the page pin allows
several other optimizations that simplify life considerably.

Dependencies:

1) mm_take_all_locks() to register the mmu notifier when the whole VM
   isn't doing anything with "mm". This allows mmu notifier users to
   keep track if the VM is in the middle of the
   invalidate_range_begin/end critical section with an atomic counter
   incraese in range_begin and decreased in range_end. No secondary
   MMU page fault is allowed to map any spte or secondary tlb
   reference, while the VM is in the middle of range_begin/end as any
   page returned by get_user_pages in that critical section could
   later immediately be freed without any further ->invalidate_page
   notification (invalidate_range_begin/end works on ranges and
   ->invalidate_page isn't called immediately before freeing the
   page). To stop all page freeing and pagetable overwrites the
   mmap_sem must be taken in write mode and all other anon_vma/i_mmap
   locks must be taken too.

2) It'd be a waste to add branches in the VM if nobody could possibly
   run KVM/GRU/XPMEM on the kernel, so mmu notifiers will only enabled
   if CONFIG_KVM=m/y. In the current kernel kvm won't yet take
   advantage of mmu notifiers, but this already allows to compile a
   KVM external module against a kernel with mmu notifiers enabled and
   from the next pull from kvm.git we'll start using them. And
   GRU/XPMEM will also be able to continue the development by enabling
   KVM=m in their config, until they submit all GRU/XPMEM GPLv2 code
   to the mainline kernel. Then they can also enable MMU_NOTIFIERS in
   the same way KVM does it (even if KVM=n). This guarantees nobody
   selects MMU_NOTIFIER=y if KVM and GRU and XPMEM are all =n.

The mmu_notifier_register call can fail because mm_take_all_locks may
be interrupted by a signal and return -EINTR. Because
mmu_notifier_reigster is used when a driver 

[PATCH 1 of 3] list_del_init_rcu

2008-06-25 Thread Andrea Arcangeli
From: Andrea Arcangeli <[EMAIL PROTECTED]>

Introduces list_del_init_rcu and documents it (fixes a comment for
list_del_rcu too).

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>
Acked-by: Linus Torvalds <[EMAIL PROTECTED]>
---

diff -r 98f755616212 -r 5e8c41d283cc include/linux/list.h
--- a/include/linux/list.h  Tue Jun 24 11:23:35 2008 -0700
+++ b/include/linux/list.h  Wed Jun 25 03:34:11 2008 +0200
@@ -747,7 +747,7 @@ static inline void hlist_del(struct hlis
  * or hlist_del_rcu(), running on this same list.
  * However, it is perfectly legal to run concurrently with
  * the _rcu list-traversal primitives, such as
- * hlist_for_each_entry().
+ * hlist_for_each_entry_rcu().
  */
 static inline void hlist_del_rcu(struct hlist_node *n)
 {
@@ -760,6 +760,34 @@ static inline void hlist_del_init(struct
if (!hlist_unhashed(n)) {
__hlist_del(n);
INIT_HLIST_NODE(n);
+   }
+}
+
+/**
+ * hlist_del_init_rcu - deletes entry from hash list with re-initialization
+ * @n: the element to delete from the hash list.
+ *
+ * Note: list_unhashed() on the node return true after this. It is
+ * useful for RCU based read lockfree traversal if the writer side
+ * must know if the list entry is still hashed or already unhashed.
+ *
+ * In particular, it means that we can not poison the forward pointers
+ * that may still be used for walking the hash list and we can only
+ * zero the pprev pointer so list_unhashed() will return true after
+ * this.
+ *
+ * The caller must take whatever precautions are necessary (such as
+ * holding appropriate locks) to avoid racing with another
+ * list-mutation primitive, such as hlist_add_head_rcu() or
+ * hlist_del_rcu(), running on this same list.  However, it is
+ * perfectly legal to run concurrently with the _rcu list-traversal
+ * primitives, such as hlist_for_each_entry_rcu().
+ */
+static inline void hlist_del_init_rcu(struct hlist_node *n)
+{
+   if (!hlist_unhashed(n)) {
+   __hlist_del(n);
+   n->pprev = NULL;
}
 }
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2 of 3] mm_take_all_locks

2008-06-25 Thread Andrea Arcangeli
From: Andrea Arcangeli <[EMAIL PROTECTED]>

mm_take_all_locks holds off reclaim from an entire mm_struct. This allows mmu
notifiers to register into the mm at any time with the guarantee that no mmu
operation is in progress on the mm.

This operation locks against the VM for all pte/vma/mm related operations that
could ever happen on a certain mm. This includes vmtruncate, try_to_unmap, and
all page faults.

The caller must take the mmap_sem in write mode before calling
mm_take_all_locks(). The caller isn't allowed to release the mmap_sem until
mm_drop_all_locks() returns.

mmap_sem in write mode is required in order to block all operations that could
modify pagetables and free pages without need of altering the vma layout (for
example populate_range() with nonlinear vmas). It's also needed in write mode
to avoid new anon_vmas to be associated with existing vmas.

A single task can't take more than one mm_take_all_locks() in a row or it would
deadlock.

mm_take_all_locks() and mm_drop_all_locks are expensive operations
that may have to take thousand of locks.

mm_take_all_locks() can fail if it's interrupted by signals.

When mmu_notifier_register returns, we must be sure that the driver is notified
if some task is in the middle of a vmtruncate for the 'mm' where the mmu
notifier was registered (mmu_notifier_invalidate_range_start/end is run around
the vmtruncation but mmu_notifier_register can run after
mmu_notifier_invalidate_range_start and before
mmu_notifier_invalidate_range_end). Same problem for rmap paths. And we've to
remove page pinning to avoid replicating the tlb_gather logic inside KVM (and
GRU doesn't work well with page pinning regardless of needing tlb_gather), so
without mm_take_all_locks when vmtruncate frees the page, kvm would have no way
to notice that it mapped into sptes a page that is going into the freelist
without a chance of any further mmu_notifier notification.

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>
Acked-by: Linus Torvalds <[EMAIL PROTECTED]>
---

diff -r 5e8c41d283cc -r 167f154fa536 include/linux/mm.h
--- a/include/linux/mm.hWed Jun 25 03:34:11 2008 +0200
+++ b/include/linux/mm.hWed Jun 25 03:34:14 2008 +0200
@@ -1068,6 +1068,9 @@ extern struct vm_area_struct *copy_vma(s
unsigned long addr, unsigned long len, pgoff_t pgoff);
 extern void exit_mmap(struct mm_struct *);
 
+extern int mm_take_all_locks(struct mm_struct *mm);
+extern void mm_drop_all_locks(struct mm_struct *mm);
+
 #ifdef CONFIG_PROC_FS
 /* From fs/proc/base.c. callers must _not_ hold the mm's exe_file_lock */
 extern void added_exe_file_vma(struct mm_struct *mm);
diff -r 5e8c41d283cc -r 167f154fa536 include/linux/pagemap.h
--- a/include/linux/pagemap.h   Wed Jun 25 03:34:11 2008 +0200
+++ b/include/linux/pagemap.h   Wed Jun 25 03:34:14 2008 +0200
@@ -19,6 +19,7 @@
  */
 #defineAS_EIO  (__GFP_BITS_SHIFT + 0)  /* IO error on async 
write */
 #define AS_ENOSPC  (__GFP_BITS_SHIFT + 1)  /* ENOSPC on async write */
+#define AS_MM_ALL_LOCKS(__GFP_BITS_SHIFT + 2)  /* under 
mm_take_all_locks() */
 
 static inline void mapping_set_error(struct address_space *mapping, int error)
 {
diff -r 5e8c41d283cc -r 167f154fa536 include/linux/rmap.h
--- a/include/linux/rmap.h  Wed Jun 25 03:34:11 2008 +0200
+++ b/include/linux/rmap.h  Wed Jun 25 03:34:14 2008 +0200
@@ -26,6 +26,14 @@
  */
 struct anon_vma {
spinlock_t lock;/* Serialize access to vma list */
+   /*
+* NOTE: the LSB of the head.next is set by
+* mm_take_all_locks() _after_ taking the above lock. So the
+* head must only be read/written after taking the above lock
+* to be sure to see a valid next pointer. The LSB bit itself
+* is serialized by a system wide lock only visible to
+* mm_take_all_locks() (mm_all_locks_mutex).
+*/
struct list_head head;  /* List of private "related" vmas */
 };
 
diff -r 5e8c41d283cc -r 167f154fa536 mm/mmap.c
--- a/mm/mmap.c Wed Jun 25 03:34:11 2008 +0200
+++ b/mm/mmap.c Wed Jun 25 03:34:14 2008 +0200
@@ -2261,3 +2261,161 @@ int install_special_mapping(struct mm_st
 
return 0;
 }
+
+static DEFINE_MUTEX(mm_all_locks_mutex);
+
+static void vm_lock_anon_vma(struct anon_vma *anon_vma)
+{
+   if (!test_bit(0, (unsigned long *) &anon_vma->head.next)) {
+   /*
+* The LSB of head.next can't change from under us
+* because we hold the mm_all_locks_mutex.
+*/
+   spin_lock(&anon_vma->lock);
+   /*
+* We can safely modify head.next after taking the
+* anon_vma->lock. If some other vma in this mm shares
+* the same anon_vma we won't take it again.
+*
+* No need of atomic instructions here, head.next
+* can't change from under us thanks to the
+* anon_vma->lock.
+*/
+  

[PATCH 0 of 3] mmu notifier v18 for -mm

2008-06-25 Thread Andrea Arcangeli
Hello,

Christoph suggested me to repost v18 for merging in -mm, to give it more
exposure before the .27 merge window opens. There's no code change compared to
the previous v18 submission (the only change is the correction in the comment
in the mm_take_all_locks patch rightfully pointed out by Linus).

Full patchset including other XPMEM support patches can be found here:


http://www.kernel.org/pub/linux/kernel/people/andrea/patches/v2.6/2.6.26-rc7/mmu-notifier-v18

Only the three patches of the patchset I'm submitting here by email are ready
for merging, the rest you can find in the website is not ready for merging yet
for various performance degradations, lots of the XPMEM patches needs to be
elaborated to avoid any slowdown for the non-XPMEM case, but I keep
maintaining them to make life easier to XPMEM current development and later we
can keep work on them to make them suitable for inclusion to avoid any
performance degradation risk.

(the fourth patch in the series of the above url, is not strictly relealted to
mmu notifiers but it's good at least for me to keep it in the same tree to
test pci-passthrough capable guest running on reserved-ram at the same time of
two regular guests swapping heavily with mmu notifiers which tends to
exercises both spte models at the same time, if you find this confusing I'll
remove it from any later upload, but xpmem users can totally ignore it, it
only touches x86-64 code)

Thanks a lot.
Andrea
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4 of 4] Add initial PowerPC libcflat files & make file changes

2008-06-25 Thread Hollis Blanchard
On Wed, 2008-06-25 at 15:39 -0500, Jerone Young wrote:
> 4 files changed, 124 insertions(+), 1 deletion(-)
> user/config-powerpc.mak   |   10 ++-
> user/test/lib/powerpc/44x/map.c   |   51 +
> user/test/lib/powerpc/44x/tlbwe.S |   29 +
> user/test/lib/powerpc/io.c|   35 +

Without user/test/powerpc/cstart.S (and exit.c which uses it), most of
this code is totally unused. That should at least be an additional patch
in this series; see
http://marc.info/?l=kvm-ppc-devel&m=120043765909206&w=2

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0 of 4] [kvm-userspace][test] consolidate test libs to libcflat

2008-06-25 Thread Hollis Blanchard
On Wed, 2008-06-25 at 15:39 -0500, Jerone Young wrote:
> This set of patches are to consolidate test libraries into a single
> library archive. This lib archive is libcflat. This will allow common
> code to be shared among archs.
> 
> Signed-off-by: Jerone Young <[EMAIL PROTECTED]>

I think patches 1-3 should be combined.

By the way, I assume you've built x86 with these changes, but have you
run it too?

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


another kvm-70 compile bug with rhel/centos 5.2

2008-06-25 Thread Farkas Levente

hi,
i'm just try to recompile kvm-70 with the latest centos-5.2 (aka
rhel-5.2) kernel, but i've got a new compile error:
---
make KDIR=/usr/src/kernels/2.6.18-92.1.1.el5-x86_64
make -C /usr/src/kernels/2.6.18-92.1.1.el5-x86_64 M=`pwd` \
LINUXINCLUDE="-I`pwd`/include -Iinclude -I`pwd`/include-compat \
-include include/linux/autoconf.h \
-include `pwd`/external-module-compat.h"
make[1]: Entering directory `/usr/src/kernels/2.6.18-92.1.1.el5-x86_64'
  LD  /home/robot/rpm/BUILD/kvm-kmod-70/_kmod_build_/kernel/built-in.o
  CC [M]  /home/robot/rpm/BUILD/kvm-kmod-70/_kmod_build_/kernel/svm.o
In file included from :2:
/home/robot/rpm/BUILD/kvm-kmod-70/_kmod_build_/kernel/external-module-compat.h:351: 


error: redefinition of typedef 'bool'
include/linux/types.h:36: error: previous declaration of 'bool' was here
make[2]: ***
[/home/robot/rpm/BUILD/kvm-kmod-70/_kmod_build_/kernel/svm.o] Error 1
make[1]: ***
[_module_/home/robot/rpm/BUILD/kvm-kmod-70/_kmod_build_/kernel] Error 2
make[1]: Leaving directory `/usr/src/kernels/2.6.18-92.1.1.el5-x86_64'
make: *** [all] Error 2
---
yours.

--
  Levente   "Si vis pacem para bellum!"
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3 of 4] Remove old x86 test libs, that are now appart of libcflat

2008-06-25 Thread Jerone Young
8 files changed, 411 deletions(-)
user/test/x86/lib/apic.h   |   14 ---
user/test/x86/lib/exit.c   |5 -
user/test/x86/lib/printf.c |  195 
user/test/x86/lib/printf.h |2 
user/test/x86/lib/smp.c|  151 --
user/test/x86/lib/smp.h|   16 ---
user/test/x86/lib/string.c |   21 
user/test/x86/lib/string.h |7 -


Signed-off-by: Jerone Young <[EMAIL PROTECTED]>

diff --git a/user/test/x86/lib/apic.h b/user/test/x86/lib/apic.h
deleted file mode 100644
--- a/user/test/x86/lib/apic.h
+++ /dev/null
@@ -1,14 +0,0 @@
-#ifndef SILLY_APIC_H
-#define SILLY_APIC_H
-
-#define APIC_BASE 0x1000
-#define APIC_SIZE 0x100
-
-#define APIC_REG_NCPU0x00
-#define APIC_REG_ID  0x04
-#define APIC_REG_SIPI_ADDR   0x08
-#define APIC_REG_SEND_SIPI   0x0c
-#define APIC_REG_IPI_VECTOR  0x10
-#define APIC_REG_SEND_IPI0x14
-
-#endif
diff --git a/user/test/x86/lib/exit.c b/user/test/x86/lib/exit.c
deleted file mode 100644
--- a/user/test/x86/lib/exit.c
+++ /dev/null
@@ -1,5 +0,0 @@
-
-void exit(int code)
-{
-   asm volatile("out %0, %1" : : "a"(code), "d"((short)0xf4));
-}
diff --git a/user/test/x86/lib/printf.c b/user/test/x86/lib/printf.c
deleted file mode 100644
--- a/user/test/x86/lib/printf.c
+++ /dev/null
@@ -1,195 +0,0 @@
-#include "printf.h"
-#include "smp.h"
-#include 
-#include "string.h"
-
-static struct spinlock lock;
-
-void print(const char *s);
-
-typedef struct pstream {
-char *buffer;
-int remain;
-int added;
-} pstream_t;
-
-static void addchar(pstream_t *p, char c)
-{
-if (p->remain) {
-   *p->buffer++ = c;
-   --p->remain;
-}
-++p->added;
-}
-
-void print_str(pstream_t *p, const char *s)
-{
-while (*s)
-   addchar(p, *s++);
-}
-
-static char digits[16] = "0123456789abcdef";
-
-void print_int(pstream_t *ps, long long n, int base)
-{
-char buf[sizeof(long) * 3 + 2], *p = buf;
-int s = 0, i;
-
-if (n < 0) {
-   n = -n;
-   s = 1;
-}
-
-while (n) {
-   *p++ = digits[n % base];
-   n /= base;
-}
-
-if (s)
-   *p++ = '-';
-
-if (p == buf)
-   *p++ = '0';
-
-for (i = 0; i < (p - buf) / 2; ++i) {
-   char tmp;
-
-   tmp = buf[i];
-   buf[i] = p[-1-i];
-   p[-1-i] = tmp;
-}
-
-*p = 0;
-
-print_str(ps, buf);
-}
-
-void print_unsigned(pstream_t *ps, unsigned long long n, int base)
-{
-char buf[sizeof(long) * 3 + 1], *p = buf;
-int i;
-
-while (n) {
-   *p++ = digits[n % base];
-   n /= base;
-}
-
-if (p == buf)
-   *p++ = '0';
-
-for (i = 0; i < (p - buf) / 2; ++i) {
-   char tmp;
-
-   tmp = buf[i];
-   buf[i] = p[-1-i];
-   p[-1-i] = tmp;
-}
-
-*p = 0;
-
-print_str(ps, buf);
-}
-
-int vsnprintf(char *buf, int size, const char *fmt, va_list va)
-{
-pstream_t s;
-
-s.buffer = buf;
-s.remain = size - 1;
-s.added = 0;
-while (*fmt) {
-   char f = *fmt++;
-   int nlong = 0;
-
-   if (f != '%') {
-   addchar(&s, f);
-   continue;
-   }
-morefmt:
-   f = *fmt++;
-   switch (f) {
-   case '%':
-   addchar(&s, '%');
-   break;
-   case '\0':
-   --fmt;
-   break;
-   case 'l':
-   ++nlong;
-   goto morefmt;
-   case 'd':
-   switch (nlong) {
-   case 0:
-   print_int(&s, va_arg(va, int), 10);
-   break;
-   case 1:
-   print_int(&s, va_arg(va, long), 10);
-   break;
-   default:
-   print_int(&s, va_arg(va, long long), 10);
-   break;
-   }
-   break;
-   case 'x':
-   switch (nlong) {
-   case 0:
-   print_unsigned(&s, va_arg(va, unsigned), 16);
-   break;
-   case 1:
-   print_unsigned(&s, va_arg(va, unsigned long), 16);
-   break;
-   default:
-   print_unsigned(&s, va_arg(va, unsigned long long), 16);
-   break;
-   }
-   break;
-   case 'p':
-   print_str(&s, "0x");
-   print_unsigned(&s, (unsigned long)va_arg(va, void *), 16);
-   break;
-   case 's':
-   print_str(&s, va_arg(va, const char *));
-   break;
-   default:
-   addchar(&s, f);
-   break;
-   }
-}
-*s.buffer = 0;
-++s.added;
-return s.added;
-}
-
-
-int snprintf(char *buf, int size, const char *fmt, ...)
-{
-va_list va;
-int r;
-
-va_start(va, fmt);
-r = vsnprintf(buf, size, fmt, va);
-va_end(va);
-return r;
-}
-
-void print_serial(const char *buf)
-{
-unsigned long len = strlen(buf);
-
-asm volatile ("rep/outsb" : "+S"(buf), "+c"(len) : "d"(0xf1));
-}
-
-int printf(const char *fmt, ...)
-{
-va_list va;
-char buf[2000];
-int r;
-
-va_start(va, f

[PATCH 4 of 4] Add initial PowerPC libcflat files & make file changes

2008-06-25 Thread Jerone Young
4 files changed, 124 insertions(+), 1 deletion(-)
user/config-powerpc.mak   |   10 ++-
user/test/lib/powerpc/44x/map.c   |   51 +
user/test/lib/powerpc/44x/tlbwe.S |   29 +
user/test/lib/powerpc/io.c|   35 +


Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>
Signed-off-by: Jerone Young <[EMAIL PROTECTED]>

diff --git a/user/config-powerpc.mak b/user/config-powerpc.mak
--- a/user/config-powerpc.mak
+++ b/user/config-powerpc.mak
@@ -3,6 +3,14 @@
 CFLAGS += -I $(KERNELDIR)/include
 # for some reaons binutils hates tlbsx unless we say we're 405  :(
 CFLAGS += -Wa,-mregnames,-m405
+
+cflatobjs += \
+   test/lib/powerpc/io.o \
+   test/lib/powerpc/44x/map.o \
+   test/lib/powerpc/44x/tlbwe.o
+
+$(libcflat): LDFLAGS += -nostdlib
+$(libcflat): CFLAGS += -ffreestanding -I test/lib -I test/lib/powerpc/44x
 
 %.bin: %.o
$(OBJCOPY) -O binary $^ $@
@@ -18,7 +26,7 @@
 
 tests := $(addprefix test/powerpc/, $(testobjs))
 
-all: kvmtrace kvmctl $(tests)
+all: kvmtrace kvmctl $(libcflat) $(tests)
 
 kvmctl_objs = main-ppc.o iotable.o ../libkvm/libkvm.a
 
diff --git a/user/test/lib/powerpc/44x/map.c b/user/test/lib/powerpc/44x/map.c
new file mode 100644
--- /dev/null
+++ b/user/test/lib/powerpc/44x/map.c
@@ -0,0 +1,51 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * Authors: Hollis Blanchard <[EMAIL PROTECTED]>
+ */
+
+#include "libcflat.h"
+
+#define TLB_SIZE 64
+
+extern void tlbwe(unsigned int index,
+  unsigned char tid,
+  unsigned int word0,
+  unsigned int word1,
+  unsigned int word2);
+
+unsigned int next_free_index;
+
+#define PAGE_SHIFT 12
+#define PAGE_MASK (~((1<= TLB_SIZE)
+   panic("TLB overflow");
+
+   w0 = (vaddr & PAGE_MASK) | V;
+   w1 = paddr & PAGE_MASK;
+   w2 = 0x3;
+
+   tlbwe(next_free_index, 0, w0, w1, w2);
+}
diff --git a/user/test/lib/powerpc/44x/tlbwe.S 
b/user/test/lib/powerpc/44x/tlbwe.S
new file mode 100644
--- /dev/null
+++ b/user/test/lib/powerpc/44x/tlbwe.S
@@ -0,0 +1,29 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * Authors: Hollis Blanchard <[EMAIL PROTECTED]>
+ */
+
+#define SPRN_MMUCR 0x3b2
+
+/* tlbwe(uint index, uint8_t tid, uint word0, uint word1, uint word2) */
+.global tlbwe
+tlbwe:
+   mtspr   SPRN_MMUCR, r4
+   tlbwe   r5, r3, 0
+   tlbwe   r6, r3, 1
+   tlbwe   r7, r3, 2
+   blr
diff --git a/user/test/lib/powerpc/io.c b/user/test/lib/powerpc/io.c
new file mode 100644
--- /dev/null
+++ b/user/test/lib/powerpc/io.c
@@ -0,0 +1,35 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * Authors: Hollis Blanchard <[EMAIL PROTECTED]>
+ */
+
+#include "libcflat.h"
+
+#define BASE 0xf000
+#define _putc ((volatile char *)(BASE))
+#define _exit ((volatile char *)(BASE+1))
+
+void puts(const char *s)
+{
+   while (*s != '\0')
+   *_putc = *s++;
+}
+
+void exit(int code)
+{
+   *_exit = code;
+}
-

[PATCH 1 of 4] Consilidate libcflat for x86 to single lib for all archs

2008-06-25 Thread Jerone Young
8 files changed, 453 insertions(+)
user/test/lib/libcflat.h |   37 +
user/test/lib/panic.c|   13 +++
user/test/lib/printf.c   |  179 ++
user/test/lib/string.c   |   21 +
user/test/lib/x86/apic.h |   14 +++
user/test/lib/x86/io.c   |   23 +
user/test/lib/x86/smp.c  |  150 ++
user/test/lib/x86/smp.h  |   16 


This patch lays the ground work for a sinlge libcflat library that can be used 
for x86. But this allows for other archs to share common code, and build a 
single archive for tests to use libcflat functions.

Signed-off-by: Jerone Young <[EMAIL PROTECTED]>

diff --git a/user/test/lib/libcflat.h b/user/test/lib/libcflat.h
new file mode 100644
--- /dev/null
+++ b/user/test/lib/libcflat.h
@@ -0,0 +1,37 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * Authors: Hollis Blanchard <[EMAIL PROTECTED]>
+ */
+
+#ifndef __LIBCFLAT_H
+#define __LIBCFLAT_H
+
+#include 
+
+extern int main(void);
+extern void exit(int code);
+extern void panic(char *fmt, ...);
+
+extern unsigned long strlen(const char *buf);
+extern char *strcat(char *dest, const char *src);
+
+extern int printf(const char *fmt, ...);
+extern int vsnprintf(char *buf, int size, const char *fmt, va_list va);
+
+extern void puts(const char *s);
+
+#endif
diff --git a/user/test/lib/panic.c b/user/test/lib/panic.c
new file mode 100644
--- /dev/null
+++ b/user/test/lib/panic.c
@@ -0,0 +1,13 @@
+#include "libcflat.h"
+
+void panic(char *fmt, ...)
+{
+   va_list va;
+   char buf[2000];
+
+   va_start(va, fmt);
+   vsnprintf(buf, sizeof(buf), fmt, va);
+   va_end(va);
+   puts(buf);
+   exit(-1);
+}
diff --git a/user/test/lib/printf.c b/user/test/lib/printf.c
new file mode 100644
--- /dev/null
+++ b/user/test/lib/printf.c
@@ -0,0 +1,179 @@
+#include "libcflat.h"
+
+typedef struct pstream {
+char *buffer;
+int remain;
+int added;
+} pstream_t;
+
+static void addchar(pstream_t *p, char c)
+{
+if (p->remain) {
+   *p->buffer++ = c;
+   --p->remain;
+}
+++p->added;
+}
+
+void print_str(pstream_t *p, const char *s)
+{
+while (*s)
+   addchar(p, *s++);
+}
+
+static char digits[16] = "0123456789abcdef";
+
+void print_int(pstream_t *ps, long long n, int base)
+{
+char buf[sizeof(long) * 3 + 2], *p = buf;
+int s = 0, i;
+
+if (n < 0) {
+   n = -n;
+   s = 1;
+}
+
+while (n) {
+   *p++ = digits[n % base];
+   n /= base;
+}
+
+if (s)
+   *p++ = '-';
+
+if (p == buf)
+   *p++ = '0';
+
+for (i = 0; i < (p - buf) / 2; ++i) {
+   char tmp;
+
+   tmp = buf[i];
+   buf[i] = p[-1-i];
+   p[-1-i] = tmp;
+}
+
+*p = 0;
+
+print_str(ps, buf);
+}
+
+void print_unsigned(pstream_t *ps, unsigned long long n, int base)
+{
+char buf[sizeof(long) * 3 + 1], *p = buf;
+int i;
+
+while (n) {
+   *p++ = digits[n % base];
+   n /= base;
+}
+
+if (p == buf)
+   *p++ = '0';
+
+for (i = 0; i < (p - buf) / 2; ++i) {
+   char tmp;
+
+   tmp = buf[i];
+   buf[i] = p[-1-i];
+   p[-1-i] = tmp;
+}
+
+*p = 0;
+
+print_str(ps, buf);
+}
+
+int vsnprintf(char *buf, int size, const char *fmt, va_list va)
+{
+pstream_t s;
+
+s.buffer = buf;
+s.remain = size - 1;
+s.added = 0;
+while (*fmt) {
+   char f = *fmt++;
+   int nlong = 0;
+
+   if (f != '%') {
+   addchar(&s, f);
+   continue;
+   }
+morefmt:
+   f = *fmt++;
+   switch (f) {
+   case '%':
+   addchar(&s, '%');
+   break;
+   case '\0':
+   --fmt;
+   break;
+   case 'l':
+   ++nlong;
+   goto morefmt;
+   case 'd':
+   switch (nlong) {
+   case 0:
+   print_int(&s, va_arg(va, int), 10);
+   break;
+   case 1:
+   print_int(&s, va_arg(va, long), 10);
+   break;
+   default:
+   print_int(&s, va_arg(va, long long), 10);
+   break;
+   }
+   break;
+   case 'x':
+   switch (nlong) {
+   case 0:
+   print_unsigned(&s, va_arg(va, unsigned), 16);
+   break;
+   case 1:
+ 

[PATCH 2 of 4] Add Makefile and test changes required for x86 to use libcflat

2008-06-25 Thread Jerone Young
6 files changed, 25 insertions(+), 15 deletions(-)
user/Makefile  |   11 ++-
user/config-x86-common.mak |   16 ++--
user/main.c|2 +-
user/test/x86/port80.c |3 +--
user/test/x86/smptest.c|5 ++---
user/test/x86/tsc.c|3 +--


Signed-off-by: Jerone Young <[EMAIL PROTECTED]>

diff --git a/user/Makefile b/user/Makefile
--- a/user/Makefile
+++ b/user/Makefile
@@ -9,6 +9,12 @@
 CFLAGS =
 
 libgcc := $(shell $(CC) --print-libgcc-file-name)
+
+libcflat := test/lib/libcflat.a
+cflatobjs := \
+   test/lib/panic.o \
+   test/lib/printf.o \
+   test/lib/string.o
 
 #include architecure specific make rules
 include config-$(ARCH).mak
@@ -41,10 +47,13 @@
 kvmtrace: $(kvmtrace_objs)
$(CC) $(LDFLAGS) $^ -o $@
 
+$(libcflat): $(cflatobjs)
+   ar rcs $@ $^
+
 %.o: %.S
$(CC) $(CFLAGS) -c -nostdlib -o $@ $^
 
 -include .*.d
 
 clean: arch_clean
-   $(RM) kvmctl kvmtrace *.o *.a .*.d
+   $(RM) kvmctl kvmtrace *.o *.a .*.d $(libcflat) $(cflatobjs)
diff --git a/user/config-x86-common.mak b/user/config-x86-common.mak
--- a/user/config-x86-common.mak
+++ b/user/config-x86-common.mak
@@ -5,7 +5,15 @@
 kvmctl_objs= main.o iotable.o ../libkvm/libkvm.a
 balloon_ctl: balloon_ctl.o
 
-FLATLIBS = $(TEST_DIR)/libcflat.a $(libgcc)
+cflatobjs += \
+   test/lib/x86/io.o \
+   test/lib/x86/smp.o
+
+$(libcflat): LDFLAGS += -nostdlib
+$(libcflat): CFLAGS += -ffreestanding -I test/lib
+
+
+FLATLIBS = test/lib/libcflat.a $(libgcc)
 %.flat: %.o $(FLATLIBS)
$(CC) $(CFLAGS) -nostdlib -o $@ -Wl,-T,flat.lds $^ $(FLATLIBS)
 
@@ -15,7 +23,7 @@
 
 test_cases: $(tests-common) $(tests)
 
-$(TEST_DIR)/%.o: CFLAGS += -std=gnu99 -ffreestanding -I$(TEST_DIR)/lib
+$(TEST_DIR)/%.o: CFLAGS += -std=gnu99 -ffreestanding -I test/lib -I 
test/lib/x86
  
 $(TEST_DIR)/bootstrap: $(TEST_DIR)/bootstrap.o
$(CC) -nostdlib -o $@ -Wl,-T,bootstrap.lds $^
@@ -41,10 +49,6 @@
 
 $(TEST_DIR)/tsc.flat: $(cstart.o) $(TEST_DIR)/tsc.o
 
-$(TEST_DIR)/libcflat.a: $(TEST_DIR)/lib/exit.o $(TEST_DIR)/lib/printf.o \
-   $(TEST_DIR)/lib/smp.o $(TEST_DIR)/lib/string.o
-   ar rcs $@ $^
-
 arch_clean:
$(RM) $(TEST_DIR)/bootstrap $(TEST_DIR)/*.o $(TEST_DIR)/*.flat \
$(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o
diff --git a/user/main.c b/user/main.c
--- a/user/main.c
+++ b/user/main.c
@@ -17,7 +17,7 @@
 #define _GNU_SOURCE
 
 #include 
-#include "test/x86/lib/apic.h"
+#include "test/lib/x86/apic.h"
 #include "test/x86/ioram.h"
 
 #include 
diff --git a/user/test/x86/port80.c b/user/test/x86/port80.c
--- a/user/test/x86/port80.c
+++ b/user/test/x86/port80.c
@@ -1,5 +1,4 @@
-
-#include "printf.h"
+#include 
 
 int main()
 {
diff --git a/user/test/x86/smptest.c b/user/test/x86/smptest.c
--- a/user/test/x86/smptest.c
+++ b/user/test/x86/smptest.c
@@ -1,6 +1,5 @@
-
-#include "smp.h"
-#include "printf.h"
+#include 
+#include 
 
 static void ipi_test(void *data)
 {
diff --git a/user/test/x86/tsc.c b/user/test/x86/tsc.c
--- a/user/test/x86/tsc.c
+++ b/user/test/x86/tsc.c
@@ -1,5 +1,4 @@
-
-#include "printf.h"
+#include 
 
 typedef unsigned long long u64;
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0 of 4] [kvm-userspace][test] consolidate test libs to libcflat

2008-06-25 Thread Jerone Young
This set of patches are to consolidate test libraries into a single library 
archive. This lib archive is libcflat. This will allow common code to be shared 
among archs.

Signed-off-by: Jerone Young <[EMAIL PROTECTED]>

26 files changed, 602 insertions(+), 427 deletions(-)
user/Makefile |   11 +-
user/config-powerpc.mak   |   10 +
user/config-x86-common.mak|   16 +--
user/main.c   |2 
user/test/lib/libcflat.h  |   37 +++
user/test/lib/panic.c |   13 ++
user/test/lib/powerpc/44x/map.c   |   51 +
user/test/lib/powerpc/44x/tlbwe.S |   29 +
user/test/lib/powerpc/io.c|   35 ++
user/test/lib/printf.c|  179 +
user/test/lib/string.c|   21 +++
user/test/lib/x86/apic.h  |   14 ++
user/test/lib/x86/io.c|   23 
user/test/lib/x86/smp.c   |  150 
user/test/lib/x86/smp.h   |   16 +++
user/test/x86/lib/apic.h  |   14 --
user/test/x86/lib/exit.c  |5 
user/test/x86/lib/printf.c|  195 -
user/test/x86/lib/printf.h|2 
user/test/x86/lib/smp.c   |  151 
user/test/x86/lib/smp.h   |   16 ---
user/test/x86/lib/string.c|   21 ---
user/test/x86/lib/string.h|7 -
user/test/x86/port80.c|3 
user/test/x86/smptest.c   |5 
user/test/x86/tsc.c   |3 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM fixes for 2.6.26-rc7

2008-06-25 Thread Bernd Schubert
On Wednesday 25 June 2008, Marcelo Tosatti wrote:
> On Wed, Jun 25, 2008 at 04:17:23PM -0300, Marcelo Tosatti wrote:
> > > Just found out, it is CONFIG_KVM_CLOCK. With CONFIG_KVM_CLOCK=y it does
> > > boot fine.
> >
> > You mean with CONFIG_KVM_CLOCK=n it boots fine, I suppose.
> >
> > You should upgrade the guest kernel to the git tree, kvm clock changes
> > break compatibility with older kernels.
>
> Err, I mean you have to upgrade the host kernel too.

Ah, unfortunately I can't do this easily :( 
I'm presently on vacation and the host is my desktop at work, the risk to 
break something remotely is too high. And even when I'n back is a problem, 
unfortunately the host is a nvidia system and dual display only works with 
the binary only driver (*grr* why did I have to get a stupid nvidia system :
( ).


Thanks,
Bernd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM fixes for 2.6.26-rc7

2008-06-25 Thread Marcelo Tosatti
On Wed, Jun 25, 2008 at 04:17:23PM -0300, Marcelo Tosatti wrote:
> > Just found out, it is CONFIG_KVM_CLOCK. With CONFIG_KVM_CLOCK=y it does 
> > boot 
> > fine.
> 
> You mean with CONFIG_KVM_CLOCK=n it boots fine, I suppose.
> 
> You should upgrade the guest kernel to the git tree, kvm clock changes
> break compatibility with older kernels.

Err, I mean you have to upgrade the host kernel too.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM fixes for 2.6.26-rc7

2008-06-25 Thread Bernd Schubert
On Wednesday 25 June 2008, Marcelo Tosatti wrote:
> On Wed, Jun 25, 2008 at 08:54:44PM +0200, Bernd Schubert wrote:
> > On Wednesday 25 June 2008, Marcelo Tosatti wrote:
> > > On Wed, Jun 25, 2008 at 06:13:05PM +0200, Bernd Schubert wrote:
> > > > Avi Kivity wrote:
> > > > > Linus, please pull from the repo and branch at:
> > > > >
> > > > >   git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git
> > > > > kvm-updates-2.6.26
> > > >
> > > > I just pulled from Linus and now it stalls to boot at
> > > >
> > > > [0.616031] pnp: PnP ACPI init
> > > > [0.628031] ACPI: bus type pnp registered
> > > > [0.640031] pnp: PnP ACPI: found 7 devices
> > > > [0.652031] ACPI: ACPI bus type pnp unregistered
> > > > [0.660031] SCSI subsystem initialized
> > > > [0.664031] PCI: Using ACPI for IRQ routing
> > > > [0.692031] PCI-GART: No AMD northbridge found.
> > > >
> > > > The kvm process is at 100% time. Taking the many problems I already
> > > > reported about, 2.6.26 probably will be entirely broken regarding kvm
> > > > :(
> > >
> > > Do you have CONFIG_KVM_CLOCK or CONFIG_KVM_GUEST enabled ?
> > >
> > > There is a known problem with CONFIG_KVM_GUEST being worked on.
> > >
> > > Can you provide this details, and pinpoint which option is the culprit?
> >
> > Just found out, it is CONFIG_KVM_CLOCK. With CONFIG_KVM_CLOCK=y it does
> > boot fine.
>
> You mean with CONFIG_KVM_CLOCK=n it boots fine, I suppose.

Oh sorry, this was supposed to be "With CONFIG_KVM_GUEST=y it boots fine". So 
yes, CONFIG_KVM_CLOCK is the culprit.

>
> You should upgrade the guest kernel to the git tree, kvm clock changes
> break compatibility with older kernels.

This __is__ the after updating to recent git (before that my problem was 
different). Latest commit is 

commit 543cf4cb3fe6f6cae3651ba918b9c56200b257d0
Author: Linus Torvalds <[EMAIL PROTECTED]>
Date:   Tue Jun 24 18:58:20 2008 -0700

Linux 2.6.26-rc8



Thanks,
Bernd

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM fixes for 2.6.26-rc7

2008-06-25 Thread Marcelo Tosatti
On Wed, Jun 25, 2008 at 08:54:44PM +0200, Bernd Schubert wrote:
> On Wednesday 25 June 2008, Marcelo Tosatti wrote:
> > On Wed, Jun 25, 2008 at 06:13:05PM +0200, Bernd Schubert wrote:
> > > Avi Kivity wrote:
> > > > Linus, please pull from the repo and branch at:
> > > >
> > > >   git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git
> > > > kvm-updates-2.6.26
> > >
> > > I just pulled from Linus and now it stalls to boot at
> > >
> > > [0.616031] pnp: PnP ACPI init
> > > [0.628031] ACPI: bus type pnp registered
> > > [0.640031] pnp: PnP ACPI: found 7 devices
> > > [0.652031] ACPI: ACPI bus type pnp unregistered
> > > [0.660031] SCSI subsystem initialized
> > > [0.664031] PCI: Using ACPI for IRQ routing
> > > [0.692031] PCI-GART: No AMD northbridge found.
> > >
> > > The kvm process is at 100% time. Taking the many problems I already
> > > reported about, 2.6.26 probably will be entirely broken regarding kvm :(
> >
> > Do you have CONFIG_KVM_CLOCK or CONFIG_KVM_GUEST enabled ?
> >
> > There is a known problem with CONFIG_KVM_GUEST being worked on.
> >
> > Can you provide this details, and pinpoint which option is the culprit?
> 
> Just found out, it is CONFIG_KVM_CLOCK. With CONFIG_KVM_CLOCK=y it does boot 
> fine.

You mean with CONFIG_KVM_CLOCK=n it boots fine, I suppose.

You should upgrade the guest kernel to the git tree, kvm clock changes
break compatibility with older kernels.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM fixes for 2.6.26-rc7

2008-06-25 Thread Bernd Schubert
On Wednesday 25 June 2008, Marcelo Tosatti wrote:
> On Wed, Jun 25, 2008 at 06:13:05PM +0200, Bernd Schubert wrote:
> > Avi Kivity wrote:
> > > Linus, please pull from the repo and branch at:
> > >
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git
> > > kvm-updates-2.6.26
> >
> > I just pulled from Linus and now it stalls to boot at
> >
> > [0.616031] pnp: PnP ACPI init
> > [0.628031] ACPI: bus type pnp registered
> > [0.640031] pnp: PnP ACPI: found 7 devices
> > [0.652031] ACPI: ACPI bus type pnp unregistered
> > [0.660031] SCSI subsystem initialized
> > [0.664031] PCI: Using ACPI for IRQ routing
> > [0.692031] PCI-GART: No AMD northbridge found.
> >
> > The kvm process is at 100% time. Taking the many problems I already
> > reported about, 2.6.26 probably will be entirely broken regarding kvm :(
>
> Do you have CONFIG_KVM_CLOCK or CONFIG_KVM_GUEST enabled ?
>
> There is a known problem with CONFIG_KVM_GUEST being worked on.
>
> Can you provide this details, and pinpoint which option is the culprit?

Just found out, it is CONFIG_KVM_CLOCK. With CONFIG_KVM_CLOCK=y it does boot 
fine.

Thanks,
Bernd


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM fixes for 2.6.26-rc7

2008-06-25 Thread Marcelo Tosatti
On Wed, Jun 25, 2008 at 06:13:05PM +0200, Bernd Schubert wrote:
> Avi Kivity wrote:
> 
> > Linus, please pull from the repo and branch at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git
> > kvm-updates-2.6.26
> 
> I just pulled from Linus and now it stalls to boot at
> 
> [0.616031] pnp: PnP ACPI init
> [0.628031] ACPI: bus type pnp registered
> [0.640031] pnp: PnP ACPI: found 7 devices
> [0.652031] ACPI: ACPI bus type pnp unregistered
> [0.660031] SCSI subsystem initialized
> [0.664031] PCI: Using ACPI for IRQ routing
> [0.692031] PCI-GART: No AMD northbridge found.
> 
> The kvm process is at 100% time. Taking the many problems I already reported
> about, 2.6.26 probably will be entirely broken regarding kvm :(

Do you have CONFIG_KVM_CLOCK or CONFIG_KVM_GUEST enabled ?

There is a known problem with CONFIG_KVM_GUEST being worked on.

Can you provide this details, and pinpoint which option is the culprit?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Avoid fragment virtio-blk transfers by copying

2008-06-25 Thread Anthony Liguori

Avi Kivity wrote:

Anthony Liguori wrote:
A major source of performance loss for virtio-blk has been the fact 
that we
split transfers into multiple requests.  This is particularly harmful 
if you

have striped storage beneath your virtual machine.

This patch copies the request data into a single contiguous buffer to 
ensure
that we don't split requests.  This improves performance from about 
80 MB/sec
to about 155 MB/sec with my fibre channel link.  185 MB/sec is what 
we get on

native so this gets us pretty darn close.

  


If the guest issues a request for a terabyte of memory, the host will 
try to allocate it and drop to swap/oom.


An unprivileged user should not be able to OOM the kernel by simply 
doing memory allocations.  Likewise, while it will start swapping, that 
should primarily effect the application allocating memory (although yes, 
it's consuming IO bandwidth to do the swapping).


As long as we properly handle memory allocation failures, it's my 
contention that allowing a guest to allocate unbound amounts of virtual 
memory is safe.


  So we need to either fragment beyond some size, or to avoid copying 
and thus the need for allocation.


As Marcelo mentioned, there are very practical limitations on how much 
memory can be in-flight on any given queue.  A malicious guest could 
construct a nasty queue that basically pointed to all of the guest 
physical memory for each entry in the queue but that's still a bound size.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM fixes for 2.6.26-rc7

2008-06-25 Thread Bernd Schubert
Avi Kivity wrote:

> Linus, please pull from the repo and branch at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git
> kvm-updates-2.6.26

I just pulled from Linus and now it stalls to boot at

[0.616031] pnp: PnP ACPI init
[0.628031] ACPI: bus type pnp registered
[0.640031] pnp: PnP ACPI: found 7 devices
[0.652031] ACPI: ACPI bus type pnp unregistered
[0.660031] SCSI subsystem initialized
[0.664031] PCI: Using ACPI for IRQ routing
[0.692031] PCI-GART: No AMD northbridge found.

The kvm process is at 100% time. Taking the many problems I already reported
about, 2.6.26 probably will be entirely broken regarding kvm :(


Thanks,
Bernd




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Avoid fragment virtio-blk transfers by copying

2008-06-25 Thread Marcelo Tosatti
On Wed, Jun 25, 2008 at 01:44:35PM +0300, Avi Kivity wrote:
> Anthony Liguori wrote:
>> A major source of performance loss for virtio-blk has been the fact that we
>> split transfers into multiple requests.  This is particularly harmful if you
>> have striped storage beneath your virtual machine.
>>
>> This patch copies the request data into a single contiguous buffer to ensure
>> that we don't split requests.  This improves performance from about 80 MB/sec
>> to about 155 MB/sec with my fibre channel link.  185 MB/sec is what we get on
>> native so this gets us pretty darn close.
>>
>>   
>
> If the guest issues a request for a terabyte of memory, the host will  
> try to allocate it and drop to swap/oom.  So we need to either fragment  
> beyond some size, or to avoid copying and thus the need for allocation.

The maximum request size for Linux guests is 512K (after tuning
virtio-blk guest driver, current max is 124K). I'm not sure what the max
number of requests is, but I guess is between 128 and 1024, Anthony?

So with the current configuration your concern is not an issue. BTW,
what is maximum request size for the Windows driver?

Point is that the guest is responsible for limiting the amount of data
in-flight. A malicious guest can only hurt itself by attempting to DoS
the host, with proper memory limits in place. IMO this issue should not
be handled in the virtio-blk backend.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [v2] Remove use of bit fields in kvm trace structure

2008-06-25 Thread Hollis Blanchard
On Tue, 2008-06-24 at 13:54 -0500, Jerone Young wrote:
> 2 files changed, 30 insertions(+), 11 deletions(-)
> include/linux/kvm.h  |   17 ++---
> virt/kvm/kvm_trace.c |   24 
> 
> 
>  *Updates:
>   Create global definitions for setting trace records as opposed to
>   explicitly setting them inside of a function.
> 
> This patch fixes kvmtrace use on big endian systems. When using bit
> fields the compiler will lay data out in the wrong order expected when
> laid down into a file. This fixes it by using one variable instead of
> using bit fields.
> 
> Signed-off-by: Jerone Young <[EMAIL PROTECTED]>

Acked-by: Hollis Blanchard <[EMAIL PROTECTED]>

Avi, if this is OK now, please also apply Jerone's earlier patches:
[PATCH 2 of 3] Move KVM TRACE DEFINITIONS to common header
[PATCH 3 of 3] Add new KVM TRACE events
(and add my Acked-by to those as well).

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use qemu_memalign instead of qemu_malloc

2008-06-25 Thread Anthony Liguori

Kevin Wolf wrote:

Anthony Liguori schrieb:
  

Kevin Wolf wrote:


Anthony Liguori schrieb:
 
  

Kevin Wolf wrote:
   


Anthony Liguori schrieb:
 
Yes, if it fails, the EINVAL is no surprise. I meant what code path it

was using. Obviously we missed something in our patch and I'd like to
fix that. Did the error occur on raw images or something like qcow2?

  

It's a raw image and the calls are being made via
bdrv_aio_read/bdrv_aio_write.  It doesn't occur with a qcow2 but then
cache=off doesn't seem to do what it's supposed to with cache=off (I
believe the underlying backing file is not opened O_DIRECT?).



This is really strange. In raw_aio_read/write there is a check like this:

if (unlikely(s->aligned_buf != NULL && ((uintptr_t) buf % 512))) {
// emulate it using raw_pread/write which uses
// s->aligned_buf for the request then
}
  
  

Something is goofy then.



For qcow2 I think O_DIRECT actually is in effect. Otherwise it would
have worked even without our patch, and it didn't. And indeed, looking
at the code, it passes flags to bdrv_file_open when it opens the image
file.
  
  

Something's broken then.  Maybe -snapshot doesn't pick up the
O_DIRECT'ness?  I'll have to check again.  I was definitely seeing page
cache behavior with cache=off.



Right, qemu seems to drop the flags for the backing file when using
BDRV_O_SNAPSHOT (bdrv2_open in block.c opens the file). So O_DIRECT
applies only to new data.

Have you been using -snapshot when you had trouble with the unaligned
buffer, too? I don't think I have tested this one when I made the patch...
  


Nope.  I was using a raw image.  Actually, an LVM partition.

Regards,

Anthony Liguori


Kevin
  


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use qemu_memalign instead of qemu_malloc

2008-06-25 Thread Kevin Wolf
Anthony Liguori schrieb:
> Kevin Wolf wrote:
>> Anthony Liguori schrieb:
>>  
>>> Kevin Wolf wrote:
>>>
 Anthony Liguori schrieb:
  
 Yes, if it fails, the EINVAL is no surprise. I meant what code path it
 was using. Obviously we missed something in our patch and I'd like to
 fix that. Did the error occur on raw images or something like qcow2?
 
>>> It's a raw image and the calls are being made via
>>> bdrv_aio_read/bdrv_aio_write.  It doesn't occur with a qcow2 but then
>>> cache=off doesn't seem to do what it's supposed to with cache=off (I
>>> believe the underlying backing file is not opened O_DIRECT?).
>>> 
>>
>> This is really strange. In raw_aio_read/write there is a check like this:
>>
>> if (unlikely(s->aligned_buf != NULL && ((uintptr_t) buf % 512))) {
>> // emulate it using raw_pread/write which uses
>> // s->aligned_buf for the request then
>> }
>>   
> 
> Something is goofy then.
> 
>> For qcow2 I think O_DIRECT actually is in effect. Otherwise it would
>> have worked even without our patch, and it didn't. And indeed, looking
>> at the code, it passes flags to bdrv_file_open when it opens the image
>> file.
>>   
> 
> Something's broken then.  Maybe -snapshot doesn't pick up the
> O_DIRECT'ness?  I'll have to check again.  I was definitely seeing page
> cache behavior with cache=off.

Right, qemu seems to drop the flags for the backing file when using
BDRV_O_SNAPSHOT (bdrv2_open in block.c opens the file). So O_DIRECT
applies only to new data.

Have you been using -snapshot when you had trouble with the unaligned
buffer, too? I don't think I have tested this one when I made the patch...

Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use qemu_memalign instead of qemu_malloc

2008-06-25 Thread Anthony Liguori

Kevin Wolf wrote:

Anthony Liguori schrieb:
  

Kevin Wolf wrote:


Anthony Liguori schrieb:
 
Yes, if it fails, the EINVAL is no surprise. I meant what code path it

was using. Obviously we missed something in our patch and I'd like to
fix that. Did the error occur on raw images or something like qcow2?
  
  

It's a raw image and the calls are being made via
bdrv_aio_read/bdrv_aio_write.  It doesn't occur with a qcow2 but then
cache=off doesn't seem to do what it's supposed to with cache=off (I
believe the underlying backing file is not opened O_DIRECT?).



This is really strange. In raw_aio_read/write there is a check like this:

if (unlikely(s->aligned_buf != NULL && ((uintptr_t) buf % 512))) {
// emulate it using raw_pread/write which uses
// s->aligned_buf for the request then
}
  


Something is goofy then.


For qcow2 I think O_DIRECT actually is in effect. Otherwise it would
have worked even without our patch, and it didn't. And indeed, looking
at the code, it passes flags to bdrv_file_open when it opens the image file.
  


Something's broken then.  Maybe -snapshot doesn't pick up the 
O_DIRECT'ness?  I'll have to check again.  I was definitely seeing page 
cache behavior with cache=off.



Kevin
  


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use qemu_memalign instead of qemu_malloc

2008-06-25 Thread Kevin Wolf
Laurent Vivier schrieb:
> Generally EINVAL with O_DIRECT opened files means there is an alignment
> problem with offset, buffer address or size to read (must be multiple of
> 512).

Apparently the qemu_memalign for the buffer helps, so it's the buffer
address in this case.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use qemu_memalign instead of qemu_malloc

2008-06-25 Thread Laurent Vivier
Le mercredi 25 juin 2008 à 16:15 +0200, Kevin Wolf a écrit :
> Anthony Liguori schrieb:
> > Kevin Wolf wrote:
> >> Anthony Liguori schrieb:
> >>  
> >>> I guess the main block code is not as defensive as I thought it was. 
> >>> This patch
> >>> uses qemu_memalign to allocate the buffers for IO so that you don't
> >>> get errors
> >>> when using O_DIRECT.
> >>> 
> >>
> >> Actually, the block code should be able to deal with unaligned buffers
> >> since qemu rev. 4599. This change seems to be present in current KVM.
> >>   
> > 
> > That was what I thought at first too.
> > 
> >> Can you tell exactly which operation failed?
> > 
> > The aio requests fail with -22 (EINVAL).
> 
> Yes, if it fails, the EINVAL is no surprise. I meant what code path it
> was using. Obviously we missed something in our patch and I'd like to
> fix that. Did the error occur on raw images or something like qcow2?

Generally EINVAL with O_DIRECT opened files means there is an alignment
problem with offset, buffer address or size to read (must be multiple of
512).

Regards,
Laurent
-- 
- [EMAIL PROTECTED] ---
"The best way to predict the future is to invent it."
- Alan Kay

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use qemu_memalign instead of qemu_malloc

2008-06-25 Thread Kevin Wolf
Anthony Liguori schrieb:
> Kevin Wolf wrote:
>> Anthony Liguori schrieb:
>>  
>> Yes, if it fails, the EINVAL is no surprise. I meant what code path it
>> was using. Obviously we missed something in our patch and I'd like to
>> fix that. Did the error occur on raw images or something like qcow2?
>>   
> 
> It's a raw image and the calls are being made via
> bdrv_aio_read/bdrv_aio_write.  It doesn't occur with a qcow2 but then
> cache=off doesn't seem to do what it's supposed to with cache=off (I
> believe the underlying backing file is not opened O_DIRECT?).

This is really strange. In raw_aio_read/write there is a check like this:

if (unlikely(s->aligned_buf != NULL && ((uintptr_t) buf % 512))) {
// emulate it using raw_pread/write which uses
// s->aligned_buf for the request then
}

For qcow2 I think O_DIRECT actually is in effect. Otherwise it would
have worked even without our patch, and it didn't. And indeed, looking
at the code, it passes flags to bdrv_file_open when it opens the image file.

Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use qemu_memalign instead of qemu_malloc

2008-06-25 Thread Anthony Liguori

Kevin Wolf wrote:

Anthony Liguori schrieb:
  

Kevin Wolf wrote:


Anthony Liguori schrieb:
 
  
I guess the main block code is not as defensive as I thought it was. 
This patch

uses qemu_memalign to allocate the buffers for IO so that you don't
get errors
when using O_DIRECT.



Actually, the block code should be able to deal with unaligned buffers
since qemu rev. 4599. This change seems to be present in current KVM.
  
  

That was what I thought at first too.



Can you tell exactly which operation failed?
  

The aio requests fail with -22 (EINVAL).



Yes, if it fails, the EINVAL is no surprise. I meant what code path it
was using. Obviously we missed something in our patch and I'd like to
fix that. Did the error occur on raw images or something like qcow2?
  


It's a raw image and the calls are being made via 
bdrv_aio_read/bdrv_aio_write.  It doesn't occur with a qcow2 but then 
cache=off doesn't seem to do what it's supposed to with cache=off (I 
believe the underlying backing file is not opened O_DIRECT?).


Regards,

Anthony Liguori


Kevin
  


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use qemu_memalign instead of qemu_malloc

2008-06-25 Thread Kevin Wolf
Anthony Liguori schrieb:
> Kevin Wolf wrote:
>> Anthony Liguori schrieb:
>>  
>>> I guess the main block code is not as defensive as I thought it was. 
>>> This patch
>>> uses qemu_memalign to allocate the buffers for IO so that you don't
>>> get errors
>>> when using O_DIRECT.
>>> 
>>
>> Actually, the block code should be able to deal with unaligned buffers
>> since qemu rev. 4599. This change seems to be present in current KVM.
>>   
> 
> That was what I thought at first too.
> 
>> Can you tell exactly which operation failed?
> 
> The aio requests fail with -22 (EINVAL).

Yes, if it fails, the EINVAL is no surprise. I meant what code path it
was using. Obviously we missed something in our patch and I'd like to
fix that. Did the error occur on raw images or something like qcow2?

Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Memory image of kvm guest

2008-06-25 Thread Jan Kiszka
Tomas Kouba wrote:
> Hello,
> is it possible to see and modify guest memory of the guest running under
> kvm?
> For example when I know the address of a kernel symbol, can I read the
> memory
> of the symbol in my application running on host?
> 
> (I am quite new to KVM but similar things are possible in XEN via
> xenctrl library calls).

Even better: Inherited from QEMU, KVM provides a full-blown gdb backend.
So you can do source-level debugging of your guest very comfortably. If
you just want to get the content of some memory chunk: QEMU monitor, 'x'
(as known from gdb, see also qemu/qemu-doc.html). But modification
requires a gdb frontend again.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use qemu_memalign instead of qemu_malloc

2008-06-25 Thread Anthony Liguori

Kevin Wolf wrote:

Anthony Liguori schrieb:
  

I guess the main block code is not as defensive as I thought it was.  This patch
uses qemu_memalign to allocate the buffers for IO so that you don't get errors
when using O_DIRECT.



Actually, the block code should be able to deal with unaligned buffers
since qemu rev. 4599. This change seems to be present in current KVM.
  


That was what I thought at first too.


Can you tell exactly which operation failed?
  


The aio requests fail with -22 (EINVAL).


But apart from that, qemu_memalign is the right thing to do, because
copying from/into an aligned buffer in the block code costs performance
(don't know how much, though).
  


Regards,

Anthony Liguori


Kevin
  


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Memory image of kvm guest

2008-06-25 Thread Tomas Kouba

Hello,
is it possible to see and modify guest memory of the guest running under kvm?
For example when I know the address of a kernel symbol, can I read the memory
of the symbol in my application running on host?

(I am quite new to KVM but similar things are possible in XEN via xenctrl 
library calls).

Thank you,

--
Tomas Kouba

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2001452 ] Restarted Windows 2003 Server guests have disk corruption

2008-06-25 Thread SourceForge.net
Bugs item #2001452, was opened at 2008-06-24 00:27
Message generated for change (Comment added) made by iggy_cav
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: gwachs (gerdwachs)
Assigned to: Nobody/Anonymous (nobody)
Summary: Restarted Windows 2003 Server guests have disk corruption

Initial Comment:
I have a number of Windows 2003 32Bit guests.

I use them to perform installation and configuration
tests of a large software product.

During these tests, the guests are restarted.

Randomly, the guests produce disk corruption messages
after a restart.

The following are two examples :

---
Windows  Registry Hive Recovered

Registry hive (file): SOFTWARE was corrupted and it has
been recovered. Some data might have been lost.
---
The system cannot log on due to the following error:
Unable to complete the requested operation because of
either a catastrophic media failure or a data structure
corruption on the disk.
---

OS : Ubuntu 8.04 x86_64

Kernel : 2.6.24-18-server #1 SMP x86_64 GNU/Linux

KVM: kvm-70

CPU: Intel(R) Core(TM)2 Quad CPU @ 2.40GHz

flags  : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 
ssse3 cx16 xt

Start Command : sudo /usr/local/kvm/bin/qemu-system-x86_64 -hda asit51ascs.img \
-m 1024 -std-vga -boot c -k sv -usb -usbdevice tablet -snapshot 
-vnc :51 \
   -net nic,vlan=0,macaddr=00:16:3e:00:51:00 -net 
tap,vlan=0,script=/etc/qemu-ifup-br0 \
   -net nic,vlan=1,macaddr=00:16:3e:00:51:01 -net 
tap,vlan=1,script=/etc/qemu-ifup-br1

no-kvm : Cannot do due to the loss of performance.
 Tests execute time is 7 hours with kvm.




--

Comment By: Brian Jackson (iggy_cav)
Date: 2008-06-25 07:59

Message:
Logged In: YES 
user_id=611130
Originator: NO

Can you try to revert (patch -R) the virtio async feature? Someone else in
the irc channel that was having fs corruption had luck doing that.
http://people.redhat.com/~mtosatti/virtioblk-async.patch

Otherwise, just stick with the kvm-69 userspace until it's fixed.

--

Comment By: gwachs (gerdwachs)
Date: 2008-06-25 02:26

Message:
Logged In: YES 
user_id=2122332
Originator: YES

Windows 2003 Guests also have random BSOD

The problems in this bug report put a stop to running Windows 2003 Server
on kvm
at this point in time.

--

Comment By: gwachs (gerdwachs)
Date: 2008-06-24 03:33

Message:
Logged In: YES 
user_id=2122332
Originator: YES

The message : apic write: bad size=1 fee00030 
does not occur when using the option : -no-kvm-irqchip
Will continue testing.

--

Comment By: gwachs (gerdwachs)
Date: 2008-06-24 03:15

Message:
Logged In: YES 
user_id=2122332
Originator: YES

The message : apic write: bad size=1 fee00030 

only occurs when the guest is started using kvm.

i.e does not occur with the -no-kvm option.

When using the -no-acpi option, the guest does not start kvm or no kvm

--

Comment By: gwachs (gerdwachs)
Date: 2008-06-24 02:32

Message:
Logged In: YES 
user_id=2122332
Originator: YES

Noted that I get the following in the linux console :

apic write: bad size=1 fee00030


--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: Introduce a callback routine for IOAPIC ack handling

2008-06-25 Thread Amit Shah
On Wednesday 25 June 2008 12:49:39 Amit Shah wrote:
> This will be useful for acking irqs of assigned devices
>
> Signed-off-by: Amit Shah <[EMAIL PROTECTED]>
> ---
>  virt/kvm/ioapic.c |3 +++
>  virt/kvm/ioapic.h |1 +
>  2 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> index 9d02136..4759d77 100644
> --- a/virt/kvm/ioapic.c
> +++ b/virt/kvm/ioapic.c
> @@ -295,6 +295,9 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic
> *ioapic, int gsi) ent->fields.remote_irr = 0;
>   if (!ent->fields.mask && (ioapic->irr & (1 << gsi)))
>   ioapic_deliver(ioapic, gsi);
> +
> + if (ioapic->callback)
> + ioapic->callback(ioapic->kvm, gsi);

I don't mean to call this function 'callback'; but I don't know what to name 
it as well: ack_notifier, maybe?

Also, guests with PIC aren't yet supported (haven't been at all all this 
while). I do have a patch ready for that though which needs testing.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PCI PT: irq issue

2008-06-25 Thread Amit Shah
On Wednesday 25 June 2008 15:03:11 Han, Weidong wrote:
> Amit Shah wrote:
> > On Monday 23 June 2008 20:46:18 Han, Weidong wrote:
> >> Amit Shah wrote:
> >>> On Saturday 21 June 2008 09:41:18 Han, Weidong wrote:
>  Amit Shah wrote:
> > A couple of notes for the VT-d patch:
> > - The pci_dev struct is now available in the pci_pt kernel
> > structure, so just use that information each time you want to add
> > a device instead of searching for it each time.
> > - The kernel with KVM VT-d patches doesn't build on the
> > kvm-userspace.git tree. Please fix that.
> 
>  I pulled the latest VT-d branch, and it works fine for me.
> >>>
> >>> I mean the 'vtd' branch in the kernel tree with 'master' branch of
> >>> the userspace tree aren't compatible.
>
> Amit, why do you want them to be compatible? 'vtd' branch of the kernel
> tree and 'vtd' branch of the userspace tree is the combination. right?
> 'master' branch of the userspace may don't have some patches which in
> 'vtd' branch of the userspace.

We need this for compatibility: old kernel with new userspace and new kernel 
with old userspace should all work fine.

Amit.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu: remove overzelaous virtio-net printf

2008-06-25 Thread Avi Kivity

Marcelo Tosatti wrote:

When two virtio devices share an interrupt virtio-net floods the console
with "this should not happen" message.

As Anthony points this is not a fatal condition: its possible that the
guest consumed all ring elements between the can_receive check and
actual net_receive call.

  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][REPOST] KVM: VMX: Fix a wrong usage of vmcs_config

2008-06-25 Thread Avi Kivity

Yang, Sheng wrote:

From a1c929709718c015686b0c23046cc08b8bc47a62 Mon Sep 17 00:00:00 2001
From: Sheng Yang <[EMAIL PROTECTED]>
Date: Wed, 18 Jun 2008 14:43:38 +0800
Subject: [PATCH] KVM: VMX: Fix a wrong usage of vmcs_config

The function ept_update_paging_mode_cr0() write to
CPU_BASED_VM_EXEC_CONTROL based on vmcs_config.cpu_based_exec_ctrl. That's
wrong because the variable may not consistent with the content in the
CPU_BASE_VM_EXEC_CONTROL MSR.

  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Add ept_sync_context in flush_tlb

2008-06-25 Thread Avi Kivity

Yang, Sheng wrote:

From 54dc26e44f1c0aa460bef409b799f36dae56a911 Mon Sep 17 00:00:00 2001
From: Sheng Yang <[EMAIL PROTECTED]>
Date: Wed, 18 Jun 2008 11:23:13 +0800
Subject: [PATCH] KVM: VMX: Add ept_sync_context in flush_tlb

Fix a potention issue caused by kvm_mmu_slot_remove_write_access(). 
The old behavior don't sync EPT TLB with modified EPT entry, which 
result in inconsistent content of EPT TLB and EPT table.



@@ -1407,6 +1408,8 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
 static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
 {
vpid_sync_vcpu_all(to_vmx(vcpu));
+   if (vm_need_ept())
+   ept_sync_context(to_vmx(vcpu));
 }
  


So we're flushing both the vpid tlb and the ept context?  What does an 
ept context flush mean exactly?  tlb entries for gpa->hpa?



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] configure: remove configure warning against not using gcc3

2008-06-25 Thread Avi Kivity

Carlo Marcelo Arenas Belon wrote:

complement ff5396cfeacf74ad9611a35e882ff100b10126aci, removing
the warning printed by ./configure --help which recommended
at configure time against using gcc4 as it wasn't supported by
dyngen.

@@ -28,8 +28,6 @@ usage() {
 
 	Any additional option is given to qemu's configure verbatim; including:
 
-	--disable-gcc-checkdon't insist on gcc-3.x

-  CAUTION: this will break running without kvm
  


You've orphaned the "including:" above!

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] irq assignment

2008-06-25 Thread Avi Kivity

(copying qemu-devel)

Xu, Anthony wrote:

Subject: [PATCH] Irq assignment

1. use bimodal _PRT
2. pci device can use irq > 15, reduce interrupt sharing
3. test by running linux guest in kvm-ia64, kvm-i32(w/ wo/ -no-kvm)

  




+
+static int ioapic_irq_count[IOAPIC_NUM_PINS];
+
 void ioapic_set_irq(void *opaque, int vector, int level)
 {
 IOAPICState *s = opaque;
-
+if( vector >= 16 ){
+ if( level )
+ ioapic_irq_count[vector] += 1;
+ else
+ ioapic_irq_count[vector] -= 1;
+ level = (ioapic_irq_count[vector] != 0);
+}
+#ifdef KVM_CAP_IRQCHIP
+if (kvm_enabled())
+   if (kvm_set_irq(vector, ioapic_irq_count[vector] == 0))
+   return;
+#endif
 


It's legal to call ioapic_set_irq(vector, 1) twice, which will screw up 
the level calculation.


We need qemu_irq_or(), similar to qemu_irq_invert():

   qemu_irq qemu_irq_or(qemu_irq irqs[], int nr);


Also, this is not the place for doing the or.  The ioapic does not know 
which interrupts are level connected and which are not.  This belongs on 
the pci level (or the mainboard level).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make qemu compile for kvm/ia64

2008-06-25 Thread Avi Kivity

Xu, Anthony wrote:

Thanks for your comments
It is the revised one


From 1a30adfc5ded3608ac2f09499b42234cf7d54a19 Mon Sep 17 00:00:00 2001
From: Anthony Xu  <[EMAIL PROTECTED]>
Date: Fri, 20 Jun 2008 10:45:13 -0400
Subject: [PATCH] Make qemu compile for kvm-ia64

Since merging with Qemu upsteram, it can't be compiled
for kvm-ia64

  

Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Avoid fragment virtio-blk transfers by copying

2008-06-25 Thread Avi Kivity

Anthony Liguori wrote:

A major source of performance loss for virtio-blk has been the fact that we
split transfers into multiple requests.  This is particularly harmful if you
have striped storage beneath your virtual machine.

This patch copies the request data into a single contiguous buffer to ensure
that we don't split requests.  This improves performance from about 80 MB/sec
to about 155 MB/sec with my fibre channel link.  185 MB/sec is what we get on
native so this gets us pretty darn close.

  


If the guest issues a request for a terabyte of memory, the host will 
try to allocate it and drop to swap/oom.  So we need to either fragment 
beyond some size, or to avoid copying and thus the need for allocation.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] x86: Add "virt flag" in /proc/cpuinfo

2008-06-25 Thread Avi Kivity

(copying Ingo)


Yang, Sheng wrote:

From 54b1bb9fe5d2fe40fc047b43dd4e1a480d41a977 Mon Sep 17 00:00:00 2001
From: Sheng Yang <[EMAIL PROTECTED]>
Date: Tue, 24 Jun 2008 17:03:17 +0800
Subject: [PATCH] x86: Add "virt flag" in /proc/cpuinfo

The hardware virtualization technology evolves very fast. But currently
it's hard to tell if your CPU support a certain kind of HW technology without
dig into the source code.

The patch add a new item under /proc/cpuinfo, named "virt flag". The "virt
flag" got the similar function as "flag". It is used to indicate what
features does this CPU supported. It don't cover all features but only the
important ones.

  


Ingo, do you prefer this as a separate 'virt flags' line or as addition 
to the 'flag' line?



Current implement just cover Intel VMX side.

Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
---
 arch/x86/kernel/cpu/proc.c   |   28 
 include/asm-x86/cpufeature.h |9 +
 2 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index 0d0d905..03b30d0 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -77,6 +77,31 @@ static void show_cpuinfo_misc(struct seq_file *m, struct 
cpuinfo_x86 *c)

 }
 #endif

+static void show_cpuinfo_vmx_virtflag(struct seq_file *m)
+{
+   u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2;
+
+   seq_printf(m, "\nvirt flag\t:");
+   rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high);
+   msr_ctl = 0x & vmx_msr_high | vmx_msr_low;
+   if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW)
+   seq_printf(m, " tpr_shadow");
+   if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_VNMI)
+   seq_printf(m, " vnmi");
+   if (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS) {
+   rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2,
+ vmx_msr_low, vmx_msr_high);
+   msr_ctl2 = 0x & vmx_msr_high | vmx_msr_low;
+   if ((msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC) &&
+   (msr_ctl & X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW))
+   seq_printf(m, " flexpriority");
+   if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_EPT)
+   seq_printf(m, " ept");
+   if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID)
+   seq_printf(m, " vpid");
+   }
+}
+
 static int show_cpuinfo(struct seq_file *m, void *v)
 {
struct cpuinfo_x86 *c = v;
@@ -123,6 +148,9 @@ static int show_cpuinfo(struct seq_file *m, void *v)
if (cpu_has(c, i) && x86_cap_flags[i] != NULL)
seq_printf(m, " %s", x86_cap_flags[i]);

+   if (cpu_has(c, X86_FEATURE_VMX))
+   show_cpuinfo_vmx_virtflag(m);
+
seq_printf(m, "\nbogomips\t: %lu.%02lu\n",
   c->loops_per_jiffy/(50/HZ),
   (c->loops_per_jiffy/(5000/HZ)) % 100);
diff --git a/include/asm-x86/cpufeature.h b/include/asm-x86/cpufeature.h
index 0d609c8..87d8084 100644
--- a/include/asm-x86/cpufeature.h
+++ b/include/asm-x86/cpufeature.h
@@ -84,6 +84,7 @@
 #define X86_FEATURE_XMM3   (4*32+ 0) /* Streaming SIMD Extensions-3 */
 #define X86_FEATURE_MWAIT  (4*32+ 3) /* Monitor/Mwait support */
 #define X86_FEATURE_DSCPL  (4*32+ 4) /* CPL Qualified Debug Store */
+#define X86_FEATURE_VMX(4*32+ 5) /* Virtual Machine eXtensions 
*/
 #define X86_FEATURE_EST(4*32+ 7) /* Enhanced SpeedStep */
 #define X86_FEATURE_TM2(4*32+ 8) /* Thermal Monitor 2 */
 #define X86_FEATURE_CID(4*32+10) /* Context ID */
@@ -113,6 +114,14 @@
  */
 #define X86_FEATURE_IDA(7*32+ 0) /* Intel Dynamic Acceleration 
*/

+/* Intel VMX MSR indicated features */
+#define X86_VMX_FEATURE_PROC_CTLS_TPR_SHADOW   0x0020
+#define X86_VMX_FEATURE_PROC_CTLS_VNMI 0x0040
+#define X86_VMX_FEATURE_PROC_CTLS_2ND_CTLS 0x8000
+#define X86_VMX_FEATURE_PROC_CTLS2_VIRT_APIC   0x0001
+#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x0002
+#define X86_VMX_FEATURE_PROC_CTLS2_VPID0x0020
+
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)

 #include 
--
1.5.5

  



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: Some defined name fix

2008-06-25 Thread Avi Kivity

Yang, Sheng wrote:

From 0dae764c94f48bd05f796947df1c85028ade59fa Mon Sep 17 00:00:00 2001
From: Sheng Yang <[EMAIL PROTECTED]>
Date: Tue, 24 Jun 2008 17:02:38 +0800
Subject: [PATCH] KVM: VMX: Some defined name fix

MSR_IA32_FEATURE_LOCKED is just a bit in fact, which shouldn't prefix with 
MSR_.

So did MSR_IA32_FEATURE_VMXON_ENABLED.

  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3)

2008-06-25 Thread Avi Kivity

David S. Ahern wrote:


RHEL3 is in Maintenance mode (for an explanation see
http://www.redhat.com/security/updates/errata/) which means performance
enhancement patches will not make it in.

  


Scratch that idea, then.


Also, I'm going to be out of the office for a couple of weeks in July,
so I will need to put this aside until mid-August or so. I'll reevaluate
options then.
  


One thing I'm looking at is implementing out-of-sync like Xen, which 
looks like it will obsolete the entire emulate vs flood thing at the 
cost of making unshadowing a little more expensive and consuming more 
memory.  See 
http://thread.gmane.org/gmane.comp.emulators.xen.devel/52557 (and 58, 
59, 60).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: PCI PT: irq issue

2008-06-25 Thread Han, Weidong
Amit Shah wrote:
> On Monday 23 June 2008 20:46:18 Han, Weidong wrote:
>> Amit Shah wrote:
>>> On Saturday 21 June 2008 09:41:18 Han, Weidong wrote:
 Amit Shah wrote:
> A couple of notes for the VT-d patch:
> - The pci_dev struct is now available in the pci_pt kernel
> structure, so just use that information each time you want to add
> a device instead of searching for it each time.
> - The kernel with KVM VT-d patches doesn't build on the
> kvm-userspace.git tree. Please fix that.
 
 I pulled the latest VT-d branch, and it works fine for me.
>>> 
>>> I mean the 'vtd' branch in the kernel tree with 'master' branch of
>>> the userspace tree aren't compatible.

Amit, why do you want them to be compatible? 'vtd' branch of the kernel
tree and 'vtd' branch of the userspace tree is the combination. right?
'master' branch of the userspace may don't have some patches which in
'vtd' branch of the userspace. 

Randy (Weidong)

>> 
>> I tried it. It can build. Can you show me the errors?
> 
> If you compile kvm as external modules, the userspace tree doesn't
> contain the vtd.o file while linking, but kvm.ko refers to objects
> within it: 
> 
> 
>   Building modules, stage 2.
>   MODPOST 3 modules
> WARNING: "kvm_intel_iommu_found"
> [/home/amit/src/kvm-userspace/kernel/kvm.ko] undefined!
> WARNING: "kvm_iommu_unmap_guest"
> [/home/amit/src/kvm-userspace/kernel/kvm.ko] undefined!
> WARNING: "kvm_iommu_map_guest"
> [/home/amit/src/kvm-userspace/kernel/kvm.ko] undefined!
> WARNING: "kvm_iommu_map_pages"
> [/home/amit/src/kvm-userspace/kernel/kvm.ko] undefined!

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use qemu_memalign instead of qemu_malloc

2008-06-25 Thread Kevin Wolf
Anthony Liguori schrieb:
> I guess the main block code is not as defensive as I thought it was.  This 
> patch
> uses qemu_memalign to allocate the buffers for IO so that you don't get errors
> when using O_DIRECT.

Actually, the block code should be able to deal with unaligned buffers
since qemu rev. 4599. This change seems to be present in current KVM.
Can you tell exactly which operation failed?

But apart from that, qemu_memalign is the right thing to do, because
copying from/into an aligned buffer in the block code costs performance
(don't know how much, though).

Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][REPOST] KVM: VMX: Add ept_sync_context in flush_tlb

2008-06-25 Thread Yang, Sheng
From 54dc26e44f1c0aa460bef409b799f36dae56a911 Mon Sep 17 00:00:00 2001
From: Sheng Yang <[EMAIL PROTECTED]>
Date: Wed, 18 Jun 2008 11:23:13 +0800
Subject: [PATCH] KVM: VMX: Add ept_sync_context in flush_tlb

Fix a potention issue caused by kvm_mmu_slot_remove_write_access(). 
The old behavior don't sync EPT TLB with modified EPT entry, which 
result in inconsistent content of EPT TLB and EPT table.

Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |   18 --
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6e4278d..5e2a800 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -83,6 +83,7 @@ struct vcpu_vmx {
} irq;
} rmode;
int vpid;
+   u64 eptp;
 };

 static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
@@ -364,24 +365,24 @@ static inline void ept_sync_global(void)
__invept(VMX_EPT_EXTENT_GLOBAL, 0, 0);
 }

-static inline void ept_sync_context(u64 eptp)
+static inline void ept_sync_context(struct vcpu_vmx *vmx)
 {
if (vm_need_ept()) {
if (cpu_has_vmx_invept_context())
-   __invept(VMX_EPT_EXTENT_CONTEXT, eptp, 0);
+   __invept(VMX_EPT_EXTENT_CONTEXT, vmx->eptp, 0);
else
ept_sync_global();
}
 }

-static inline void ept_sync_individual_addr(u64 eptp, gpa_t gpa)
+static inline void ept_sync_individual_addr(struct vcpu_vmx *vmx, 
gpa_t gpa)
 {
if (vm_need_ept()) {
if (cpu_has_vmx_invept_individual_addr())
__invept(VMX_EPT_EXTENT_INDIVIDUAL_ADDR,
-   eptp, gpa);
+   vmx->eptp, gpa);
else
-   ept_sync_context(eptp);
+   ept_sync_context(vmx);
}
 }

@@ -1407,6 +1408,8 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
 static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
 {
vpid_sync_vcpu_all(to_vmx(vcpu));
+   if (vm_need_ept())
+   ept_sync_context(to_vmx(vcpu));
 }

 static void vmx_decache_cr4_guest_bits(struct kvm_vcpu *vcpu)
@@ -1517,12 +1520,15 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, 
unsigned long cr3)
 {
unsigned long guest_cr3;
u64 eptp;
+   struct vcpu_vmx *vmx;

+   vmx = to_vmx(vcpu);
guest_cr3 = cr3;
if (vm_need_ept()) {
eptp = construct_eptp(cr3);
vmcs_write64(EPT_POINTER, eptp);
-   ept_sync_context(eptp);
+   vmx->eptp = eptp;
+   ept_sync_context(vmx);
ept_load_pdptrs(vcpu);
guest_cr3 = is_paging(vcpu) ? vcpu->arch.cr3 :
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
--
1.5.5

From 54dc26e44f1c0aa460bef409b799f36dae56a911 Mon Sep 17 00:00:00 2001
From: Sheng Yang <[EMAIL PROTECTED]>
Date: Wed, 18 Jun 2008 11:23:13 +0800
Subject: [PATCH] KVM: VMX: Add ept_sync_context in flush_tlb

Fix a potention issue caused by kvm_mmu_slot_remove_write_access(). The
old behavior don't sync EPT TLB with modified EPT entry, which result
in inconsistent content of EPT TLB and EPT table.

Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |   18 --
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6e4278d..5e2a800 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -83,6 +83,7 @@ struct vcpu_vmx {
 		} irq;
 	} rmode;
 	int vpid;
+	u64 eptp;
 };
 
 static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
@@ -364,24 +365,24 @@ static inline void ept_sync_global(void)
 		__invept(VMX_EPT_EXTENT_GLOBAL, 0, 0);
 }
 
-static inline void ept_sync_context(u64 eptp)
+static inline void ept_sync_context(struct vcpu_vmx *vmx)
 {
 	if (vm_need_ept()) {
 		if (cpu_has_vmx_invept_context())
-			__invept(VMX_EPT_EXTENT_CONTEXT, eptp, 0);
+			__invept(VMX_EPT_EXTENT_CONTEXT, vmx->eptp, 0);
 		else
 			ept_sync_global();
 	}
 }
 
-static inline void ept_sync_individual_addr(u64 eptp, gpa_t gpa)
+static inline void ept_sync_individual_addr(struct vcpu_vmx *vmx, gpa_t gpa)
 {
 	if (vm_need_ept()) {
 		if (cpu_has_vmx_invept_individual_addr())
 			__invept(VMX_EPT_EXTENT_INDIVIDUAL_ADDR,
-	eptp, gpa);
+	vmx->eptp, gpa);
 		else
-			ept_sync_context(eptp);
+			ept_sync_context(vmx);
 	}
 }
 
@@ -1407,6 +1408,8 @@ static void exit_lmode(struct kvm_vcpu *vcpu)
 static void vmx_flush_tlb(struct kvm_vcpu *vcpu)
 {
 	vpid_sync_vcpu_all(to_vmx(vcpu));
+	if (vm_need_ept())
+		ept_sync_context(to_vmx(vcpu));
 }
 
 static void vmx_decache_cr4_guest_bits(struct kvm_vcpu *vcpu)
@@ -1517,12 +1520,15 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	unsigned long guest_cr3;
 	u64 eptp;
+	struct vcpu_vmx *vmx;
 
+	vmx = to_vmx(vcpu);
 	guest_cr3 = cr3;
 	if (vm_need_ept()) {
 		eptp

[PATCH][REPOST] KVM: VMX: Fix a wrong usage of vmcs_config

2008-06-25 Thread Yang, Sheng
From a1c929709718c015686b0c23046cc08b8bc47a62 Mon Sep 17 00:00:00 2001
From: Sheng Yang <[EMAIL PROTECTED]>
Date: Wed, 18 Jun 2008 14:43:38 +0800
Subject: [PATCH] KVM: VMX: Fix a wrong usage of vmcs_config

The function ept_update_paging_mode_cr0() write to
CPU_BASED_VM_EXEC_CONTROL based on vmcs_config.cpu_based_exec_ctrl. That's
wrong because the variable may not consistent with the content in the
CPU_BASE_VM_EXEC_CONTROL MSR.

Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6e4278d..6a31406 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1438,7 +1438,7 @@ static void ept_update_paging_mode_cr0(unsigned long 
*hw_cr0,
if (!(cr0 & X86_CR0_PG)) {
/* From paging/starting to nonpaging */
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
-vmcs_config.cpu_based_exec_ctrl |
+vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) |
 (CPU_BASED_CR3_LOAD_EXITING |
  CPU_BASED_CR3_STORE_EXITING));
vcpu->arch.cr0 = cr0;
@@ -1448,7 +1448,7 @@ static void ept_update_paging_mode_cr0(unsigned long 
*hw_cr0,
} else if (!is_paging(vcpu)) {
/* From nonpaging to paging */
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
-vmcs_config.cpu_based_exec_ctrl &
+vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) &
 ~(CPU_BASED_CR3_LOAD_EXITING |
   CPU_BASED_CR3_STORE_EXITING));
vcpu->arch.cr0 = cr0;
--
1.5.5

From a1c929709718c015686b0c23046cc08b8bc47a62 Mon Sep 17 00:00:00 2001
From: Sheng Yang <[EMAIL PROTECTED]>
Date: Wed, 18 Jun 2008 14:43:38 +0800
Subject: [PATCH] KVM: VMX: Fix a wrong usage of vmcs_config

The function ept_update_paging_mode_cr0() write to
CPU_BASED_VM_EXEC_CONTROL based on vmcs_config.cpu_based_exec_ctrl. That's
wrong because the variable may not consistent with the content in the
CPU_BASE_VM_EXEC_CONTROL MSR.

Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6e4278d..6a31406 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1438,7 +1438,7 @@ static void ept_update_paging_mode_cr0(unsigned long *hw_cr0,
 	if (!(cr0 & X86_CR0_PG)) {
 		/* From paging/starting to nonpaging */
 		vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
-			 vmcs_config.cpu_based_exec_ctrl |
+			 vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) |
 			 (CPU_BASED_CR3_LOAD_EXITING |
 			  CPU_BASED_CR3_STORE_EXITING));
 		vcpu->arch.cr0 = cr0;
@@ -1448,7 +1448,7 @@ static void ept_update_paging_mode_cr0(unsigned long *hw_cr0,
 	} else if (!is_paging(vcpu)) {
 		/* From nonpaging to paging */
 		vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
-			 vmcs_config.cpu_based_exec_ctrl &
+			 vmcs_read32(CPU_BASED_VM_EXEC_CONTROL) &
 			 ~(CPU_BASED_CR3_LOAD_EXITING |
 			   CPU_BASED_CR3_STORE_EXITING));
 		vcpu->arch.cr0 = cr0;
-- 
1.5.5



[PATCH 2/2][REPOST] kvm: user: Add event mask support in kvmtrace.

2008-06-25 Thread Yang, Sheng
From 0456679ed9444d153c60e41dfc556b7ca4d6277f Mon Sep 17 00:00:00 2001
From: Sheng Yang <[EMAIL PROTECTED]>
Date: Wed, 25 Jun 2008 16:40:56 +0800
Subject: [PATCH] kvm: user: Add event mask support in kvmtrace.

Signed-off-by: Feng (Eric) Liu <[EMAIL PROTECTED]>
---
 user/kvmtrace.c |   56 --
 1 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/user/kvmtrace.c b/user/kvmtrace.c
index 876ac27..a7b2071 100644
--- a/user/kvmtrace.c
+++ b/user/kvmtrace.c
@@ -54,7 +54,7 @@ static char kvmtrace_version[] = "0.1";

 #define max(a, b)  ((a) > (b) ? (a) : (b))

-#define S_OPTS "r:o:w:?Vb:n:D:"
+#define S_OPTS "r:o:w:?Vb:n:D:e:"
 static struct option l_opts[] = {
{
.name = "relay",
@@ -99,6 +99,12 @@ static struct option l_opts[] = {
.val = 'D'
},
{
+   .name = "event_mask",
+   .has_arg = required_argument,
+   .flag = NULL,
+   .val = 'e'
+   },
+   {
.name = NULL,
}
 };
@@ -154,6 +160,8 @@ static char *output_dir;
 static int stop_watch;
 static unsigned long buf_size = BUF_SIZE;
 static unsigned long buf_nr = BUF_NR;
+static int cat_mask = ~0u;
+static unsigned long long act_bitmap[16];
 static unsigned int page_size;

 #define for_each_cpu_online(cpu) \
@@ -175,6 +183,13 @@ static void handle_sigint(__attribute__((__unused__)) int 
sig)
done = 1;
 }

+static inline int valid_mask_opt(int x)
+{
+   return ((1 << KVM_TRC_SHIFT) <= x) &&
+  (x < (1 << (KVM_TRC_CAT_NR_BITS + KVM_TRC_SHIFT))) &&
+  (0 <= KVM_TRC_ACT(x)) && (KVM_TRC_ACT(x) < 64);
+}
+
 static int get_lost_records()
 {
int fd;
@@ -473,6 +488,8 @@ static int start_trace(void)
memset(&kuts, 0, sizeof(kuts));
kuts.buf_size = trace_information.buf_size = buf_size;
kuts.buf_nr = trace_information.buf_nr = buf_nr;
+   kuts.cat_mask = cat_mask;
+   memcpy(kuts.act_bitmap, act_bitmap, sizeof(act_bitmap));

if (ioctl(trace_information.fd , KVM_TRACE_ENABLE, &kuts) < 0) {
perror("KVM_TRACE_ENABLE");
@@ -587,13 +604,21 @@ static void show_stats(void)

 static char usage_str[] = \
"[ -r debugfs path ] [ -D output dir ] [ -b buffer size ]\n" \
-   "[ -n number of buffers] [ -o  ] [ -w time  ] [ -V ]\n\n" \
+   "[ -n number of buffers] [ -o  ] [ -e event mask ]" \
+   "[ -w time ] [ -V ]\n\n" \
"\t-r Path to mounted debugfs, defaults to /sys/kernel/debug\n" \
"\t-o File(s) to send output to\n" \
"\t-D Directory to prepend to output file names\n" \
"\t-w Stop after defined time, in seconds\n" \
"\t-b Sub buffer size in KiB\n" \
"\t-n Number of sub buffers\n" \
+   "\t-e Only trace specified categories or actions.\n" \
+   "\t   kvmtrace defaults to collecting all events can be traced.\n" \
+   "\t   To limit the events being captured, you can specify filter.\n" \
+   "\t   if you want to trace all the actions of one category," \
+   " set action to zero. \n" \
+   "\t   eg: -e 0x0001 -e 00020001 trace entryexit and PAGE_FAULT \n" \
+   "\t   -e 0x00020005 -e 00020006 trace IO_READ and IO_WRITE. \n" \
"\t-V Print program version info\n\n";

 static void show_usage(char *prog)
@@ -604,7 +629,7 @@ static void show_usage(char *prog)

 void parse_args(int argc, char **argv)
 {
-   int c;
+   int c, cat_mask_tmp = 0;

while ((c = getopt_long(argc, argv, S_OPTS, l_opts, NULL)) >= 0) {
switch (c) {
@@ -647,6 +672,26 @@ void parse_args(int argc, char **argv)
case 'D':
output_dir = optarg;
break;
+   case 'e': {
+   int index, mask;
+
+   if ((sscanf(optarg, "%x", &mask) != 1) ||
+   !valid_mask_opt(mask)) {
+   fprintf(stderr,
+   "Invalid event mask (%u)\n", mask);
+   exit(EXIT_FAILURE);
+   }
+   cat_mask_tmp |= KVM_TRC_CAT(mask);
+   index = ffs(KVM_TRC_CAT(mask)) - 1;
+   if (KVM_TRC_ACT(mask) == 0)
+   act_bitmap[index] = ~0ull;
+   else {
+   if (act_bitmap[index] == ~0ull)
+   act_bitmap[index] = 0;
+   act_bitmap[index] |= 1 << KVM_TRC_ACT(mask);
+   }
+   break;
+   }
default:
show_usage(argv[0]);
}
@@ -654,12 +699,17 @@ void parse_args(int argc, char **argv)

if (optind < argc || output_name == NULL)
show_usage(argv[0]);
+
+ 

[PATCH 1/2][REPOST] KVM: trace: Add event mask support.

2008-06-25 Thread Yang, Sheng
From 5eb5fb6b40c45b1c140f46b00f5de9f73fc0bab6 Mon Sep 17 00:00:00 2001
From: Sheng Yang <[EMAIL PROTECTED]>
Date: Wed, 25 Jun 2008 16:37:31 +0800
Subject: [PATCH] KVM: trace: Add event mask support.

Allow user space application to specify one or morefilter masks to limit the
events being captured via it.

Signed-off-by: Feng (Eric) Liu <[EMAIL PROTECTED]>
Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
---
 include/linux/kvm.h  |7 +++
 virt/kvm/kvm_trace.c |   24 
 2 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 0ea064c..f1d3d9e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -18,6 +18,9 @@
 struct kvm_user_trace_setup {
__u32 buf_size; /* sub_buffer size of each per-cpu */
__u32 buf_nr; /* the number of sub_buffers of each per-cpu */
+   __u16 cat_mask; /* the tracing categories are enabled */
+   __u16 pad1[3];
+   __u64 act_bitmap[16]; /* the actions are enabled for each category */
 };

 /* for KVM_CREATE_MEMORY_REGION */
@@ -292,6 +295,7 @@ struct kvm_s390_interrupt {
 };

 #define KVM_TRC_SHIFT   16
+#define KVM_TRC_CAT_NR_BITS 12
 /*
  * kvm trace categories
  */
@@ -305,6 +309,9 @@ struct kvm_s390_interrupt {
 #define KVM_TRC_VMEXIT  (KVM_TRC_ENTRYEXIT + 0x02)
 #define KVM_TRC_PAGE_FAULT  (KVM_TRC_HANDLER + 0x01)

+#define KVM_TRC_CAT(evt)(((evt) >> KVM_TRC_SHIFT) & 0x0fff)
+#define KVM_TRC_ACT(evt)((evt) & (~0u >> KVM_TRC_SHIFT))
+
 #define KVM_TRC_HEAD_SIZE   12
 #define KVM_TRC_CYCLE_SIZE  8
 #define KVM_TRC_EXTRA_MAX   7
diff --git a/virt/kvm/kvm_trace.c b/virt/kvm/kvm_trace.c
index 58141f3..179e11f 100644
--- a/virt/kvm/kvm_trace.c
+++ b/virt/kvm/kvm_trace.c
@@ -26,6 +26,8 @@

 struct kvm_trace {
int trace_state;
+   u16 cat_mask;
+   u64 act_bitmap[16];
struct rchan *rchan;
struct dentry *lost_file;
atomic_t lost_records;
@@ -39,6 +41,23 @@ struct kvm_trace_probe {
marker_probe_func *probe_func;
 };

+static inline int check_event_mask(struct kvm_trace *kt, u32 event)
+{
+   unsigned long category;
+   int i;
+
+   category = KVM_TRC_CAT(event);
+   if (!(category & kt->cat_mask))
+   return 1;
+
+   i = find_first_bit(&category, KVM_TRC_CAT_NR_BITS);
+
+   if (!test_bit(KVM_TRC_ACT(event), &kt->act_bitmap[i]))
+   return 1;
+
+   return 0;
+}
+
 static inline int calc_rec_size(int cycle, int extra)
 {
int rec_size = KVM_TRC_HEAD_SIZE;
@@ -60,6 +79,9 @@ static void kvm_add_trace(void *probe_private, void 
*call_data,
return;

rec.event   = va_arg(*args, u32);
+   if (check_event_mask(kt, rec.event))
+   return;
+
vcpu= va_arg(*args, struct kvm_vcpu *);
rec.pid = current->tgid;
rec.vcpu_id = vcpu->vcpu_id;
@@ -175,6 +197,8 @@ static int do_kvm_trace_enable(struct kvm_user_trace_setup 
*kuts)
if (!kt->rchan)
goto err;

+   kt->cat_mask = kuts->cat_mask;
+   memcpy(kt->act_bitmap, kuts->act_bitmap, sizeof(kuts->act_bitmap));
kvm_trace = kt;

for (i = 0; i < ARRAY_SIZE(kvm_trace_probes); i++) {
--
1.5.5

From 5eb5fb6b40c45b1c140f46b00f5de9f73fc0bab6 Mon Sep 17 00:00:00 2001
From: Sheng Yang <[EMAIL PROTECTED]>
Date: Wed, 25 Jun 2008 16:37:31 +0800
Subject: [PATCH] KVM: trace: Add event mask support.

Allow user space application to specify one or morefilter masks to limit the
events being captured via it.

Signed-off-by: Feng (Eric) Liu <[EMAIL PROTECTED]>
Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
---
 include/linux/kvm.h  |7 +++
 virt/kvm/kvm_trace.c |   24 
 2 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 0ea064c..f1d3d9e 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -18,6 +18,9 @@
 struct kvm_user_trace_setup {
 	__u32 buf_size; /* sub_buffer size of each per-cpu */
 	__u32 buf_nr; /* the number of sub_buffers of each per-cpu */
+	__u16 cat_mask; /* the tracing categories are enabled */
+	__u16 pad1[3];
+	__u64 act_bitmap[16]; /* the actions are enabled for each category */
 };
 
 /* for KVM_CREATE_MEMORY_REGION */
@@ -292,6 +295,7 @@ struct kvm_s390_interrupt {
 };
 
 #define KVM_TRC_SHIFT   16
+#define KVM_TRC_CAT_NR_BITS 12
 /*
  * kvm trace categories
  */
@@ -305,6 +309,9 @@ struct kvm_s390_interrupt {
 #define KVM_TRC_VMEXIT  (KVM_TRC_ENTRYEXIT + 0x02)
 #define KVM_TRC_PAGE_FAULT  (KVM_TRC_HANDLER + 0x01)
 
+#define KVM_TRC_CAT(evt)(((evt) >> KVM_TRC_SHIFT) & 0x0fff)
+#define KVM_TRC_ACT(evt)((evt) & (~0u >> KVM_TRC_SHIFT))
+
 #define KVM_TRC_HEAD_SIZE   12
 #define KVM_TRC_CYCLE_SIZE  8
 #define KVM_TRC_EXTRA_MAX   7
diff --git a/virt/kvm/kvm_trace.c b/virt/kvm/kvm_trace.c
index 58141f3..179e1

Re: [PATCH] Ignore DEBUGCTL MSRs

2008-06-25 Thread Alexander Graf

Hi Joerg,

On Jun 24, 2008, at 3:40 PM, Joerg Roedel wrote:


Hi Alex,

On Tue, Jun 24, 2008 at 07:04:45AM +0200, Alexander Graf wrote:

Netware writes and reads to the DEBUGCTL and LAST*IP MSRs without
further checks and is really confused to receive a #GP during that.  
To

make it happy we should just make them stubs, which is exactly what
SVM already does.

To support VMX too, I put these in the generic code. Maybe the SVM
code could be cleaned up to use generic code too.


I would prefer if you put that into the VMX specific code. We can't  
move

the SVM parts of it into generic code because Barcelona has hardware
support to virtualize these registers. Therefore SVM don't need that
in generic code.


Hum, I'd actually prefer not to handle this specifically in the VMX  
code. MSRs should be handled in a generic way if they do not need  
special treatment from the extension side, as in your case.
For example if a third virtualization extension might come to life, I  
would rather see more things handled in x86.c than in all three targets.


Any objections to this? If not, please apply, as it makes Netware work  
in KVM.


Alex





Signed-off-by: Alexander Graf <[EMAIL PROTECTED]>





diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fc0721e..02f8490 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -609,6 +609,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu,  
u32 msr, u64 data)

pr_unimpl(vcpu, "%s: MSR_IA32_MCG_CTL 0x%llx, nop\n",
__func__, data);
break;
+   case MSR_IA32_DEBUGCTLMSR:
+   case MSR_IA32_LASTBRANCHFROMIP:
+   case MSR_IA32_LASTBRANCHTOIP:
+   case MSR_IA32_LASTINTFROMIP:
+   case MSR_IA32_LASTINTTOIP:
case MSR_IA32_UCODE_REV:
case MSR_IA32_UCODE_WRITE:
break;
@@ -705,6 +710,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu,  
u32 msr, u64 *pdata)

case MSR_IA32_MC0_MISC+16:
case MSR_IA32_UCODE_REV:
case MSR_IA32_EBL_CR_POWERON:
+   case MSR_IA32_DEBUGCTLMSR:
+   case MSR_IA32_LASTBRANCHFROMIP:
+   case MSR_IA32_LASTBRANCHTOIP:
+   case MSR_IA32_LASTINTFROMIP:
+   case MSR_IA32_LASTINTTOIP:
data = 0;
break;
case MSR_MTRRcap:




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2001452 ] Restarted Windows 2003 Server guests have disk corruption

2008-06-25 Thread SourceForge.net
Bugs item #2001452, was opened at 2008-06-24 07:27
Message generated for change (Comment added) made by gerdwachs
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Open
Resolution: None
>Priority: 7
Private: No
Submitted By: gwachs (gerdwachs)
Assigned to: Nobody/Anonymous (nobody)
Summary: Restarted Windows 2003 Server guests have disk corruption

Initial Comment:
I have a number of Windows 2003 32Bit guests.

I use them to perform installation and configuration
tests of a large software product.

During these tests, the guests are restarted.

Randomly, the guests produce disk corruption messages
after a restart.

The following are two examples :

---
Windows  Registry Hive Recovered

Registry hive (file): SOFTWARE was corrupted and it has
been recovered. Some data might have been lost.
---
The system cannot log on due to the following error:
Unable to complete the requested operation because of
either a catastrophic media failure or a data structure
corruption on the disk.
---

OS : Ubuntu 8.04 x86_64

Kernel : 2.6.24-18-server #1 SMP x86_64 GNU/Linux

KVM: kvm-70

CPU: Intel(R) Core(TM)2 Quad CPU @ 2.40GHz

flags  : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm 
constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 
ssse3 cx16 xt

Start Command : sudo /usr/local/kvm/bin/qemu-system-x86_64 -hda asit51ascs.img \
-m 1024 -std-vga -boot c -k sv -usb -usbdevice tablet -snapshot 
-vnc :51 \
   -net nic,vlan=0,macaddr=00:16:3e:00:51:00 -net 
tap,vlan=0,script=/etc/qemu-ifup-br0 \
   -net nic,vlan=1,macaddr=00:16:3e:00:51:01 -net 
tap,vlan=1,script=/etc/qemu-ifup-br1

no-kvm : Cannot do due to the loss of performance.
 Tests execute time is 7 hours with kvm.




--

>Comment By: gwachs (gerdwachs)
Date: 2008-06-25 09:26

Message:
Logged In: YES 
user_id=2122332
Originator: YES

Windows 2003 Guests also have random BSOD

The problems in this bug report put a stop to running Windows 2003 Server
on kvm
at this point in time.

--

Comment By: gwachs (gerdwachs)
Date: 2008-06-24 10:33

Message:
Logged In: YES 
user_id=2122332
Originator: YES

The message : apic write: bad size=1 fee00030 
does not occur when using the option : -no-kvm-irqchip
Will continue testing.

--

Comment By: gwachs (gerdwachs)
Date: 2008-06-24 10:15

Message:
Logged In: YES 
user_id=2122332
Originator: YES

The message : apic write: bad size=1 fee00030 

only occurs when the guest is started using kvm.

i.e does not occur with the -no-kvm option.

When using the -no-acpi option, the guest does not start kvm or no kvm

--

Comment By: gwachs (gerdwachs)
Date: 2008-06-24 09:32

Message:
Logged In: YES 
user_id=2122332
Originator: YES

Noted that I get the following in the linux console :

apic write: bad size=1 fee00030


--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2001452&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: Handle device assignment to guests

2008-06-25 Thread Amit Shah
From: Amit Shah <[EMAIL PROTECTED]>
From: Ben-Ami Yassour <[EMAIL PROTECTED]>
From: Han, Weidong <[EMAIL PROTECTED]>

This patch adds support for handling PCI devices that are assigned to the
guest ("PCI passthrough").

The device to be assigned to the guest is registered in the host kernel and
interrupt delivery is handled. If a device is already assigned, or the device
driver for it is still loaded on the host, the device assignment is failed by
conveying a -EBUSY reply to the userspace.

Devices that share their interrupt line are not supported at the moment.

By itself, this patch will not make devices work within the guest. There has
to be some mechanism of translating guest DMA addresses into machine addresses.
This support comes from one of three approaches:

1. If you have recent Intel hardware with VT-d support, you can use the patches
in

git.kernel.org/pub/scm/linux/kernel/git/amit/kvm.git vtd
git.kernel.org/pub/scm/linux/kernel/git/amit/kvm-userspace.git vtd

These patches are for the host kernel.

2. For paravirtualised Linux guests, you can use the patches in

git.kernel.org/pub/scm/linux/kernel/git/amit/kvm.git pvdma
git.kernel.org/pub/scm/linux/kernel/git/amit/kvm-userspace.git pvdma

This kernel tree has patches for host as well as guest kernels.

3. 1-1 mapping of guest in host address space

The patches to do this are available on the kvm / lkml list archives:

http://thread.gmane.org/gmane.comp.emulators.kvm.devel/18722/focus=18753

Signed-off-by: Amit Shah <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86.c |  291 
 include/asm-x86/kvm_host.h |   38 ++
 include/asm-x86/kvm_para.h |   16 +++-
 include/linux/kvm.h|3 +
 virt/kvm/ioapic.c  |   12 ++-
 5 files changed, 357 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0fbc032..b2f6c78 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4,10 +4,12 @@
  * derived from drivers/kvm/kvm_main.c
  *
  * Copyright (C) 2006 Qumranet, Inc.
+ * Copyright (C) 2008 Qumranet, Inc.
  *
  * Authors:
  *   Avi Kivity   <[EMAIL PROTECTED]>
  *   Yaniv Kamay  <[EMAIL PROTECTED]>
+ *   Amit Shah<[EMAIL PROTECTED]>
  *
  * This work is licensed under the terms of the GNU GPL, version 2.  See
  * the COPYING file in the top-level directory.
@@ -21,8 +23,10 @@
 #include "tss.h"
 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -95,6 +99,280 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
 };
 
+DEFINE_RWLOCK(kvm_pci_pt_lock);
+
+/*
+ * Used to find a registered host PCI device (a "passthrough" device)
+ * during ioctls, interrupts or EOI
+ */
+struct kvm_pci_pt_dev_list *
+kvm_find_pci_pt_dev(struct list_head *head,
+   struct kvm_pci_pt_info *pt_pci_info, int irq, int source)
+{
+   struct list_head *ptr;
+   struct kvm_pci_pt_dev_list *match;
+
+   list_for_each(ptr, head) {
+   match = list_entry(ptr, struct kvm_pci_pt_dev_list, list);
+
+   switch (source) {
+   case KVM_PT_SOURCE_IRQ:
+   /*
+* Used to find a registered host device
+* during interrupt context on host
+*/
+   if (match->pt_dev.host.irq == irq)
+   return match;
+   break;
+   case KVM_PT_SOURCE_IRQ_ACK:
+   /*
+* Used to find a registered host device when
+* the guest acks an interrupt
+*/
+   if (match->pt_dev.guest.irq == irq)
+   return match;
+   break;
+   case KVM_PT_SOURCE_UPDATE:
+   if ((match->pt_dev.host.busnr == pt_pci_info->busnr) &&
+   (match->pt_dev.host.devfn == pt_pci_info->devfn))
+   return match;
+   break;
+   }
+   }
+   return NULL;
+}
+
+static DECLARE_BITMAP(pt_irq_handled, NR_IRQS);
+
+static void kvm_pci_pt_work_fn(struct work_struct *work)
+{
+   struct kvm_pci_pt_dev_list *match;
+   struct kvm_pci_pt_work *int_work;
+   int source;
+   unsigned long flags;
+   int guest_irq;
+   int host_irq;
+
+   int_work = container_of(work, struct kvm_pci_pt_work, work);
+
+   source = int_work->source ? KVM_PT_SOURCE_IRQ_ACK : KVM_PT_SOURCE_IRQ;
+
+   /* This is taken to safely inject irq inside the guest. When
+* the interrupt injection (or the ioapic code) uses a
+* finer-grained lock, update this
+*/
+   mutex_lock(&int_work->kvm->lock);
+   read_lock_irqsave(&kvm_pci_pt_lock, flags);
+   match = kvm_find_pci_pt_dev(&int_work->kvm->arch.pci_pt_dev_head, NULL,
+  

[PATCH 1/3] KVM: Introduce kvm_set_irq to inject interrupts in guests

2008-06-25 Thread Amit Shah
This function injects an interrupt into the guest given the kvm struct,
the (guest) irq number and the interrupt level.

Signed-off-by: Amit Shah <[EMAIL PROTECTED]>
---
 arch/x86/kvm/irq.c |   11 +++
 arch/x86/kvm/irq.h |2 ++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 76d736b..0d9e552 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -100,3 +100,14 @@ void __kvm_migrate_timers(struct kvm_vcpu *vcpu)
__kvm_migrate_apic_timer(vcpu);
__kvm_migrate_pit_timer(vcpu);
 }
+
+/* This should be called with the kvm->lock mutex held */
+void kvm_set_irq(struct kvm *kvm, int irq, int level)
+{
+   /* Not possible to detect if the guest uses the PIC or the
+* IOAPIC.  So set the bit in both. The guest will ignore
+* writes to the unused one.
+*/
+   kvm_ioapic_set_irq(kvm->arch.vioapic, irq, level);
+   kvm_pic_set_irq(pic_irqchip(kvm), irq, level);
+}
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 2a15be2..ba4e3bf 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -80,6 +80,8 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
 
 void kvm_pic_reset(struct kvm_kpic_state *s);
 
+void kvm_set_irq(struct kvm *kvm, int irq, int level);
+
 void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
-- 
1.5.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: Introduce a callback routine for IOAPIC ack handling

2008-06-25 Thread Amit Shah
This will be useful for acking irqs of assigned devices

Signed-off-by: Amit Shah <[EMAIL PROTECTED]>
---
 virt/kvm/ioapic.c |3 +++
 virt/kvm/ioapic.h |1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 9d02136..4759d77 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -295,6 +295,9 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic 
*ioapic, int gsi)
ent->fields.remote_irr = 0;
if (!ent->fields.mask && (ioapic->irr & (1 << gsi)))
ioapic_deliver(ioapic, gsi);
+
+   if (ioapic->callback)
+   ioapic->callback(ioapic->kvm, gsi);
 }
 
 void kvm_ioapic_update_eoi(struct kvm *kvm, int vector)
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index 7f16675..481740a 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -58,6 +58,7 @@ struct kvm_ioapic {
} redirtbl[IOAPIC_NUM_PINS];
struct kvm_io_device dev;
struct kvm *kvm;
+   void (*callback)(void *opaque, int irq);
 };
 
 #ifdef DEBUG
-- 
1.5.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PCI device assignment to guests

2008-06-25 Thread Amit Shah

This patchset introduces support for assigning PCI devices to guests (kernel 
part).

The main difference from the last version is reserving the PCI device for our 
exclusive use so that multiple device assignment will fail, as will device 
assignment when a driver for the device being assigned is loaded.

In this version, we also add support for autodetecting the IRQ assigned to the 
device on the host, so passing the IRQ number on the command line is no longer 
necessary.

With these changes, the kernel support is ready for merge after review.

The patches are compile- and run- tested with PVDMA as well as VT-d and are 
checkpatch-clean.

Amit.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM fixes for 2.6.26-rc7

2008-06-25 Thread Avi Kivity

Linus Torvalds wrote:

On Wed, 25 Jun 2008, Avi Kivity wrote:
  

Linus, please pull from the repo and branch at:

 git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git kvm-updates-2.6.26

to receive kvm updates for 2.6.26-rc7.  The patches fix host oopses,
guest interrupt loss, and total kvm clock borkage.



Avi, you _really_ need to start respecting the merge window.

If you can't learn, I will have to just stop pulling from you. This is 
simply too big for this late in the game.


I pulled, but I'm simply not going to continue doing this dance. I don't 
care much for virtualization, so I've let it slide, but you need to learn 
that 

  

18 files changed, 358 insertions(+), 266 deletions(-)



is simply not acceptable this late. 
  


You're right, I guess I should have disabled kvm clock for 2.6.26 (which 
makes up the bulk of the changes) and re-enabled it for 2.6.27, but 
ended up not resisting the temptation.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html