[kvm-devel] [PATCH] KVM: VMX: Fix invalid opcode of VPID

2008-01-29 Thread Yang, Sheng
From db6524fb36bbf1f297ae171f18de382c32ed6840 Mon Sep 17 00:00:00 2001
From: Sheng Yang <[EMAIL PROTECTED]>
Date: Tue, 29 Jan 2008 08:17:57 +0800
Subject: [PATCH] KVM: VMX: Fix invalid opcode of VPID

Add the missing "memory" in VPID clobber list, otherwise it would cause 
invalid opcode on host.

Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 00a00e4..3d8949a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -238,7 +238,7 @@ static inline void __invvpid(int ext, u16 vpid, gva_t gva)
 asm volatile (ASM_VMX_INVVPID
  /* CF==1 or ZF==1 --> rc = -1 */
  "; ja 1f ; ud2 ; 1:"
- : : "a"(&operand), "c"(ext) : "cc");
+ : : "a"(&operand), "c"(ext) : "cc", "memory");
 }

 static struct kvm_msr_entry *find_msr_entry(struct vcpu_vmx *vmx, u32 msr)
--
debian.1.5.3.7.1-dirty

From db6524fb36bbf1f297ae171f18de382c32ed6840 Mon Sep 17 00:00:00 2001
From: Sheng Yang <[EMAIL PROTECTED]>
Date: Tue, 29 Jan 2008 08:17:57 +0800
Subject: [PATCH] KVM: VMX: Fix invalid opcode of VPID

Add the missing "memory" in VPID clobber list, otherwise it would cause invalid opcode on host.

Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
---
 arch/x86/kvm/vmx.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 00a00e4..3d8949a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -238,7 +238,7 @@ static inline void __invvpid(int ext, u16 vpid, gva_t gva)
 asm volatile (ASM_VMX_INVVPID
 		  /* CF==1 or ZF==1 --> rc = -1 */
 		  "; ja 1f ; ud2 ; 1:"
-		  : : "a"(&operand), "c"(ext) : "cc");
+		  : : "a"(&operand), "c"(ext) : "cc", "memory");
 }
 
 static struct kvm_msr_entry *find_msr_entry(struct vcpu_vmx *vmx, u32 msr)
-- 
debian.1.5.3.7.1-dirty

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] KVM: VMX: Fix invalid opcode of VPID

2008-01-29 Thread Avi Kivity
Yang, Sheng wrote:
> From db6524fb36bbf1f297ae171f18de382c32ed6840 Mon Sep 17 00:00:00 2001
> From: Sheng Yang <[EMAIL PROTECTED]>
> Date: Tue, 29 Jan 2008 08:17:57 +0800
> Subject: [PATCH] KVM: VMX: Fix invalid opcode of VPID
>
> Add the missing "memory" in VPID clobber list, otherwise it would cause 
> invalid opcode on host.
>
> Signed-off-by: Sheng Yang <[EMAIL PROTECTED]>
> ---
>  arch/x86/kvm/vmx.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 00a00e4..3d8949a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -238,7 +238,7 @@ static inline void __invvpid(int ext, u16 vpid, gva_t gva)
>  asm volatile (ASM_VMX_INVVPID
> /* CF==1 or ZF==1 --> rc = -1 */
> "; ja 1f ; ud2 ; 1:"
> -   : : "a"(&operand), "c"(ext) : "cc");
> +   : : "a"(&operand), "c"(ext) : "cc", "memory");
>  }
>
>  static struct kvm_msr_entry *find_msr_entry(struct vcpu_vmx *vmx, u32 msr)
>   

Ah, yes.  That's my fault -- since invvpid doesn't modify memory, I 
assumed a memory clobber wan't necessary, but it is since otherwise gcc 
doesn't force the operand into memory.  Applied.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC] VMX CR3 cache

2008-01-29 Thread Gerd Hoffmann
Marcelo Tosatti wrote:
> Hi,
> 
> The CR3 cache feature of VMX CPU's does not seem to increase
> context switch performance significantly as it did in the original
> implementation (http://lkml.org/lkml/2007/1/5/205).
> 
> The following is similar to the original, but it also caches roots for
> 4-level pagetables on x86-64, and clearing the cache is only performed
> in zap_page() instead of on every pagefault.

Hmm, what kvm version is this against?  latest git I guess?  After
applying to kvm-60 (and fixing up some trivial rejects) it doesn't build.

> Nowhere near the results achieved earlier (and kernel compilation and
> httperf seems slightly slower, probably due to paravirt overhead).

Even if it it doesn't help much on native:  With xenner it probably
gives a nice speedup especially on 64 bit where each guest syscall
involves a cr3 switch (not benchmarked yet though).

>  #ifdef __KERNEL__
>  #include 
>  
> -#define KVM_PARA_FEATURES (1UL << KVM_FEATURE_NOP_IO_DELAY)
> +#define KVM_PARA_FEATURES ((1UL << KVM_FEATURE_NOP_IO_DELAY) |   \
> +(1UL << KVM_FEATURE_CR3_CACHE))
> +
> +#define KVM_MSR_SET_CR3_CACHE 0x87655678
> +
> +#define KVM_CR3_CACHE_SIZE 4
> +
> +struct kvm_cr3_cache_entry {
> + u64 guest_cr3;
> + u64 host_cr3;
> +};
> +
> +struct kvm_cr3_cache {
> + struct kvm_cr3_cache_entry entry[KVM_CR3_CACHE_SIZE];
> + u32 max_idx;
> +};

Can you move the structs out of #ifdef __KERNEL__ please?

thanks,
  Gerd



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Use CONFIG_PREEMPT_NOTIFIERS around struct preempt_notifier

2008-01-29 Thread Avi Kivity
Hollis Blanchard wrote:
> This allows kvm_host.h to be #included even when struct preempt_notifier is
> undefined. This is needed to build asm-offsets.h.
>
>   

Applied, thanks.


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC] VMX CR3 cache

2008-01-29 Thread Gerd Hoffmann
Gerd Hoffmann wrote:

> Hmm, what kvm version is this against?  latest git I guess?  After
> applying to kvm-60 (and fixing up some trivial rejects) it doesn't build.

Looks like the mmu.h chunk is missing in the patch.

cheers,
  Gerd



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC] VMX CR3 cache

2008-01-29 Thread Gerd Hoffmann
Gerd Hoffmann wrote:
> Gerd Hoffmann wrote:
> 
>> Hmm, what kvm version is this against?  latest git I guess?  After
>> applying to kvm-60 (and fixing up some trivial rejects) it doesn't build.
> 
> Looks like the mmu.h chunk is missing in the patch.

Hmm, and x86.c looks incomplete too.  vcpu->arch.mmu.root_hpa becomes an
array, but mmu.h and x86.c still use it the old way.

Can you double-check and resend the patch please?

thanks,
  Gerd



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH] SVM: set NM intercept when enabling CR0.TS in the guest

2008-01-29 Thread Joerg Roedel
Explicitly enable the NM intercept in svm_set_cr0 if we enable TS in the guest
copy of CR0 for lazy FPU switching. This fixes guest SMP with Linux under SVM.
Without that patch Linux deadlocks or panics right after trying to boot the
other CPUs.

Signed-off-by: Joerg Roedel <[EMAIL PROTECTED]>
Signed-off-by: Markus Rechberger <[EMAIL PROTECTED]>
---
 arch/x86/kvm/svm.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 7bdbe16..d1c7fcb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -792,8 +792,10 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
vcpu->arch.cr0 = cr0;
cr0 |= X86_CR0_PG | X86_CR0_WP;
cr0 &= ~(X86_CR0_CD | X86_CR0_NW);
-   if (!vcpu->fpu_active)
+   if (!vcpu->fpu_active) {
+   svm->vmcb->control.intercept_exceptions |= (1 << NM_VECTOR);
cr0 |= X86_CR0_TS;
+   }
svm->vmcb->save.cr0 = cr0;
 }
 
-- 
1.5.3.7




-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 4/6] MMU notifier: invalidate_page callbacks using Linux rmaps

2008-01-29 Thread Andrea Arcangeli
This should fix the aging bugs you introduced through the faulty cpp
expansion. This is hard to write for me, given any time somebody does
a ptep_clear_flush_young w/o manually cpp-expandin "|
mmu_notifier_age_page" after it, it's always a bug that needs fixing,
similar bugs can emerge with time for ptep_clear_flush too. What will
happen is that somebody will cleanup in 26+ and we'll remain with a
#ifdef KERNEL_VERSION() < 2.6.26 in ksm.c to call
mmu_notifier(invalidate_page) explicitly. Performance and
optimizations or unnecessary invalidate_page are a red-herring, it can
be fully optimized both ways. 99% of the time when somebody calls
ptep_clear_flush and ptep_clear_flush_young, the respective mmu
notifier can't be forgotten (and calling them once more even if a
later invalidate_range is invoked, is always safer and preferable than
not calling them at all) so I fail to see how this will not be cleaned
up eventually, the same way the tlb flushes have been cleaned up
already. Nevertheless I back your implementation and I'm not even
trying at changing it with the risk to slowdown merging.

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>

diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -285,10 +285,8 @@ static int page_referenced_one(struct pa
if (!pte)
goto out;
 
-   if (ptep_clear_flush_young(vma, address, pte))
-   referenced++;
-
-   if (mmu_notifier_age_page(mm, address))
+   if (ptep_clear_flush_young(vma, address, pte) |
+   mmu_notifier_age_page(mm, address))
referenced++;
 
/* Pretend the page is referenced if the task has the
@@ -684,7 +682,7 @@ static int try_to_unmap_one(struct page 
 * skipped over this mm) then we should reactivate it.
 */
if (!migration && ((vma->vm_flags & VM_LOCKED) ||
-   (ptep_clear_flush_young(vma, address, pte) ||
+   (ptep_clear_flush_young(vma, address, pte) |
mmu_notifier_age_page(mm, address {
ret = SWAP_FAIL;
goto out_unmap;
@@ -818,10 +816,8 @@ static void try_to_unmap_cluster(unsigne
page = vm_normal_page(vma, address, *pte);
BUG_ON(!page || PageAnon(page));
 
-   if (ptep_clear_flush_young(vma, address, pte))
-   continue;
-
-   if (mmu_notifier_age_page(mm, address))
+   if (ptep_clear_flush_young(vma, address, pte) | 
+   mmu_notifier_age_page(mm, address))
continue;
 
/* Nuke the page table entry. */

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 1/6] mmu_notifier: Core code

2008-01-29 Thread Andrea Arcangeli
On Mon, Jan 28, 2008 at 12:28:41PM -0800, Christoph Lameter wrote:
> +struct mmu_notifier_head {
> + struct hlist_head head;
> +};
> +
>  struct mm_struct {
>   struct vm_area_struct * mmap;   /* list of VMAs */
>   struct rb_root mm_rb;
> @@ -219,6 +223,8 @@ struct mm_struct {
>   /* aio bits */
>   rwlock_tioctx_list_lock;
>   struct kioctx   *ioctx_list;
> +
> + struct mmu_notifier_head mmu_notifier; /* MMU notifier list */
>  };

Not sure why you prefer to waste ram when MMU_NOTIFIER=n, this is a
regression (a minor one though).

> + /*
> +  * lock indicates that the function is called under spinlock.
> +  */
> + void (*invalidate_range)(struct mmu_notifier *mn,
> +  struct mm_struct *mm,
> +  unsigned long start, unsigned long end,
> +  int lock);
> +};

It's out of my reach how can you be ok with lock=1. You said you have
to block, if you can deal with lock=1 once, why can't you deal with
lock=1 _always_?

> +/*
> + * Note that all notifiers use RCU. The updates are only guaranteed to be
> + * visible to other processes after a RCU quiescent period!
> + */
> +void __mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm)
> +{
> + hlist_add_head_rcu(&mn->hlist, &mm->mmu_notifier.head);
> +}
> +EXPORT_SYMBOL_GPL(__mmu_notifier_register);
> +
> +void mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm)
> +{
> + down_write(&mm->mmap_sem);
> + __mmu_notifier_register(mn, mm);
> + up_write(&mm->mmap_sem);
> +}
> +EXPORT_SYMBOL_GPL(mmu_notifier_register);

The down_write is garbage. The caller should put it around
mmu_notifier_register if something. The same way the caller should
call synchronize_rcu after mmu_notifier_register if it needs
synchronous behavior from the notifiers. The default version of
mmu_notifier_register shouldn't be cluttered with unnecessary locking.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Carsten Otte
Andrea Arcangeli wrote:
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ea4764b..9349160 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include 
> @@ -118,6 +119,7 @@ struct kvm {
>   struct kvm_io_bus pio_bus;
>   struct kvm_vm_stat stat;
>   struct kvm_arch arch;
> + struct mmu_notifier mmu_notifier;
>  };
> 
>  /* The guest did something we don't support. */
This should not be in struct kvm, it should go to x86's kvm_arch. This 
is x86 specific, we don't need a notifier since the core-vm will just 
page out our guest memory like regular userspace mem.

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8fc12dc..bb4747c 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -165,6 +165,7 @@ static struct kvm *kvm_create_vm(void)
> 
>   kvm->mm = current->mm;
>   atomic_inc(&kvm->mm->mm_count);
> + mmu_notifier_register(&kvm->mmu_notifier, kvm->mm);
>   spin_lock_init(&kvm->mmu_lock);
>   kvm_io_bus_init(&kvm->pio_bus);
>   mutex_init(&kvm->lock);
to kvm_arch_create_vm please

> @@ -1265,7 +1266,11 @@ static int kvm_resume(struct sys_device *dev)
>  }
> 
>  static struct sysdev_class kvm_sysdev_class = {
> +#ifdef set_kset_name
>   set_kset_name("kvm"),
> +#else
> + .name = "kvm",
> +#endif
>   .suspend = kvm_suspend,
>   .resume = kvm_resume,
>  };

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 4295623..a67e38f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -298,7 +299,15 @@ int __kvm_set_memory_region(struct kvm *kvm,
>   memset(new.rmap, 0, npages * sizeof(*new.rmap));
> 
>   new.user_alloc = user_alloc;
> - new.userspace_addr = mem->userspace_addr;
> + /*
> +  * hva_to_rmmap() serialzies with the mmu_lock and to be
> +  * safe it has to ignore memslots with !user_alloc &&
> +  * !userspace_addr.
> +  */
> + if (user_alloc)
> + new.userspace_addr = mem->userspace_addr;
> + else
> + new.userspace_addr = 0;
>   }
> 
>   /* Allocate page dirty bitmap if needed */
> @@ -311,14 +320,18 @@ int __kvm_set_memory_region(struct kvm *kvm,
>   memset(new.dirty_bitmap, 0, dirty_bytes);
>   }
> 
> + spin_lock(&kvm->mmu_lock);
>   if (mem->slot >= kvm->nmemslots)
>   kvm->nmemslots = mem->slot + 1;
> 
>   *memslot = new;
> + spin_unlock(&kvm->mmu_lock);
> 
>   r = kvm_arch_set_memory_region(kvm, mem, old, user_alloc);
>   if (r) {
> + spin_lock(&kvm->mmu_lock);
>   *memslot = old;
> + spin_unlock(&kvm->mmu_lock);
>   goto out_free;
>   }
> 
> 
> 
This needs to go to arch too.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] 很高兴能与您合作

2008-01-29 Thread 小姐 张
您好!
   因本公司进项较多,现完成不了每月销售额度,
   
  每月有一部 (国税、增值、 地税)等票。最优惠
   
  代开或合作,点数较低.代开范围:(商品销售、广
   
  告、运输、其它服务、租赁、建筑安装、餐饮
   
  定额发票等)。可待您查验之后付款!期待与您的
   
  合作!
   
  联系人:张生  
   
  联系电话:13510115082。 
   
  [EMAIL PROTECTED]
  Q   Q:4 1 3 8 9 6 7 8 0
   

   
-
雅虎邮箱传递新年祝福,个性贺卡送亲朋! -
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 1/6] mmu_notifier: Core code

2008-01-29 Thread Robin Holt
I am going to seperate my comments into individual replies to help
reduce the chance they are lost.

> +void mmu_notifier_release(struct mm_struct *mm)
...
> + hlist_for_each_entry_safe_rcu(mn, n, t,
> +   &mm->mmu_notifier.head, hlist) {
> + if (mn->ops->release)
> + mn->ops->release(mn, mm);
> + hlist_del(&mn->hlist);

This is a use-after-free issue.  The hlist_del_rcu needs to be done before
the callout as the structure containing the mmu_notifier structure will
need to be freed from within the ->release callout.

Thanks,
Robin

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH]

2008-01-29 Thread S.Çağlar Onur
Hi;

29 Oca 2008 Sal tarihinde, Izik Eidus şunları yazmıştı: 
> we better wait for qemu to merge it and then when we will merge with 
> qemu cvs we will have it

OK fair enough, seems they submitted to qemu-devel yesterday by Aurelien Jarno 
[1] [2], although xen merged these 3 months ago :(

[1] http://lists.gnu.org/archive/html/qemu-devel/2008-01/msg00704.html
[2] http://lists.gnu.org/archive/html/qemu-devel/2008-01/msg00705.html

Cheers
-- 
S.Çağlar Onur <[EMAIL PROTECTED]>
http://cekirdek.pardus.org.tr/~caglar/

Linux is like living in a teepee. No Windows, no Gates and an Apache in house!


signature.asc
Description: This is a digitally signed message part.
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 6/6] mmu_notifier: Add invalidate_all()

2008-01-29 Thread Robin Holt
What is the status of getting invalidate_all adjusted to indicate a need
to also call _release?

Thanks,
Robin

On Mon, Jan 28, 2008 at 12:28:46PM -0800, Christoph Lameter wrote:
> when a task exits we can remove all external pts at once. At that point the
> extern mmu may also unregister itself from the mmu notifier chain to avoid
> future calls.
> 
> Note the complications because of RCU. Other processors may not see that the
> notifier was unlinked until a quiescent period has passed!
> 
> Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
> 
> ---
>  include/linux/mmu_notifier.h |4 
>  mm/mmap.c|1 +
>  2 files changed, 5 insertions(+)
> 
> Index: linux-2.6/include/linux/mmu_notifier.h
> ===
> --- linux-2.6.orig/include/linux/mmu_notifier.h   2008-01-28 
> 11:43:03.0 -0800
> +++ linux-2.6/include/linux/mmu_notifier.h2008-01-28 12:21:33.0 
> -0800
> @@ -62,6 +62,10 @@ struct mmu_notifier_ops {
>   struct mm_struct *mm,
>   unsigned long address);
>  
> + /* Dummy needed because the mmu_notifier() macro requires it */
> + void (*invalidate_all)(struct mmu_notifier *mn, struct mm_struct *mm,
> + int dummy);
> +
>   /*
>* lock indicates that the function is called under spinlock.
>*/
> Index: linux-2.6/mm/mmap.c
> ===
> --- linux-2.6.orig/mm/mmap.c  2008-01-28 11:47:53.0 -0800
> +++ linux-2.6/mm/mmap.c   2008-01-28 11:57:45.0 -0800
> @@ -2034,6 +2034,7 @@ void exit_mmap(struct mm_struct *mm)
>   unsigned long end;
>  
>   /* mm's last user has gone, and its about to be pulled down */
> + mmu_notifier(invalidate_all, mm, 0);
>   arch_exit_mmap(mm);
>  
>   lru_add_drain();
> 
> -- 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Avi Kivity
Carsten Otte wrote:
>>  #include 
>> @@ -118,6 +119,7 @@ struct kvm {
>>  struct kvm_io_bus pio_bus;
>>  struct kvm_vm_stat stat;
>>  struct kvm_arch arch;
>> +struct mmu_notifier mmu_notifier;
>>  };
>>
>>  /* The guest did something we don't support. */
>> 
> This should not be in struct kvm, it should go to x86's kvm_arch. This 
> is x86 specific, we don't need a notifier since the core-vm will just 
> page out our guest memory like regular userspace mem.
>
>   

Every arch except s390 needs it.  An ugly #ifndef 
CONFIG_KVM_HARDWARE_TLB_SYNC is preferred to duplicating the code.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH] Making SLIRP code more 64-bit clean

2008-01-29 Thread Scott Pakin

The attached patch corrects a bug in qemu/slirp/tcp_var.h that defines
the seg_next field in struct tcpcb to be 32 bits wide regardless of
32/64-bitness.  seg_next is assigned a pointer value in
qemu/slirp/tcp_subr.c, then cast back to a pointer in qemu/slirp/tcp_input.c
and dereferenced.  That produces a SIGSEGV on my system.

For more information, see the thread "[ 1881532 ] Network access seg faults
KVM on large-memory machine" on the KVM Bugs page on SourceForge
(http://tinyurl.com/2fxfbx).

Regards,
-- Scott

P.S.  Note: This message was sent to both qemu-devel and kvm-devel.
--- qemu/slirp/tcp_var.h.ORIG   2008-01-28 17:27:09.0 -0700
+++ qemu/slirp/tcp_var.h2008-01-28 17:27:20.0 -0700
@@ -40,11 +40,7 @@
 #include "tcpip.h"
 #include "tcp_timer.h"

-#if SIZEOF_CHAR_P == 4
- typedef struct tcpiphdr *tcpiphdrp_32;
-#else
- typedef u_int32_t tcpiphdrp_32;
-#endif
+typedef struct tcpiphdr *tcpiphdrp_32;

 /*
  * Tcp control block, one per tcp; fields:
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Remove unoptimal code from qemu dcr handles for powerpc

2008-01-29 Thread Hollis Blanchard
On Mon, 2008-01-28 at 23:38 -0600, Jerone Young wrote:
> A patch I submitted yesterday to use the call qemu_kvm_cpu_env() in
> the dcr handles is not needed since in kvm_arch_post_kvm_run variable
> cpu_single_env is set to env. So just use cpu_single_env to get env.
> 
> Signed-off-by: Jerone Young <[EMAIL PROTECTED]>

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>

-- 
Hollis Blanchard
IBM Linux Technology Center


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Andrea Arcangeli
On Tue, Jan 29, 2008 at 06:24:12PM +0200, Avi Kivity wrote:
> Carsten Otte wrote:
>>>  #include 
>>> @@ -118,6 +119,7 @@ struct kvm {
>>> struct kvm_io_bus pio_bus;
>>> struct kvm_vm_stat stat;
>>> struct kvm_arch arch;
>>> +   struct mmu_notifier mmu_notifier;
>>>  };
>>>
>>>  /* The guest did something we don't support. */
>>> 
>> This should not be in struct kvm, it should go to x86's kvm_arch. This is 
>> x86 specific, we don't need a notifier since the core-vm will just page 
>> out our guest memory like regular userspace mem.
>>
>>   
>
> Every arch except s390 needs it.  An ugly #ifndef 
> CONFIG_KVM_HARDWARE_TLB_SYNC is preferred to duplicating the code.

Well I already moved that bit to x86, at least that had a good reason
for being moved there, it's really invisible code to s390. The memslot
are all but invisible to s390 instead, and so the locking rules of the
memslots should be common as long as memslots remains a common-code
data structure too IMHO.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Andrea Arcangeli
Didn't realize s390 doesn't need those at all. Do you think
mmu_notifier.h should also go in asm/mmu_notifier? We can always move
them there later after merging with some compat code if needed.

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 4086080..c527d7d 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -18,6 +18,7 @@ config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on ARCH_SUPPORTS_KVM && EXPERIMENTAL
select PREEMPT_NOTIFIERS
+   select MMU_NOTIFIER
select ANON_INODES
---help---
  Support hosting fully virtualized guest machines using hardware
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 635e70c..80ebc19 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -524,6 +524,110 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn)
kvm_flush_remote_tlbs(kvm);
 }
 
+static void kvm_unmap_spte(struct kvm *kvm, u64 *spte)
+{
+   struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> 
PAGE_SHIFT);
+   get_page(page);
+   rmap_remove(kvm, spte);
+   set_shadow_pte(spte, shadow_trap_nonpresent_pte);
+   kvm_flush_remote_tlbs(kvm);
+   __free_page(page);
+}
+
+static void kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
+{
+   u64 *spte, *curr_spte;
+
+   spte = rmap_next(kvm, rmapp, NULL);
+   while (spte) {
+   BUG_ON(!(*spte & PT_PRESENT_MASK));
+   rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte);
+   curr_spte = spte;
+   spte = rmap_next(kvm, rmapp, spte);
+   kvm_unmap_spte(kvm, curr_spte);
+   }
+}
+
+void kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
+{
+   int i;
+
+   /*
+* If mmap_sem isn't taken, we can look the memslots with only
+* the mmu_lock by skipping over the slots with userspace_addr == 0.
+*/
+   spin_lock(&kvm->mmu_lock);
+   for (i = 0; i < kvm->nmemslots; i++) {
+   struct kvm_memory_slot *memslot = &kvm->memslots[i];
+   unsigned long start = memslot->userspace_addr;
+   unsigned long end;
+
+   /* mmu_lock protects userspace_addr */
+   if (!start)
+   continue;
+
+   end = start + (memslot->npages << PAGE_SHIFT);
+   if (hva >= start && hva < end) {
+   gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
+   kvm_unmap_rmapp(kvm, &memslot->rmap[gfn_offset]);
+   }
+   }
+   spin_unlock(&kvm->mmu_lock);
+}
+
+static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp)
+{
+   u64 *spte;
+   int young = 0;
+
+   spte = rmap_next(kvm, rmapp, NULL);
+   while (spte) {
+   int _young;
+   u64 _spte = *spte;
+   BUG_ON(!(_spte & PT_PRESENT_MASK));
+   _young = _spte & PT_ACCESSED_MASK;
+   if (_young) {
+   young = !!_young;
+   set_shadow_pte(spte, _spte & ~PT_ACCESSED_MASK);
+   }
+   spte = rmap_next(kvm, rmapp, spte);
+   }
+   return young;
+}
+
+int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+   int i;
+   int young = 0;
+
+   /*
+* If mmap_sem isn't taken, we can look the memslots with only
+* the mmu_lock by skipping over the slots with userspace_addr == 0.
+*/
+   spin_lock(&kvm->mmu_lock);
+   for (i = 0; i < kvm->nmemslots; i++) {
+   struct kvm_memory_slot *memslot = &kvm->memslots[i];
+   unsigned long start = memslot->userspace_addr;
+   unsigned long end;
+
+   /* mmu_lock protects userspace_addr */
+   if (!start)
+   continue;
+
+   end = start + (memslot->npages << PAGE_SHIFT);
+   if (hva >= start && hva < end) {
+   gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
+   young |= kvm_age_rmapp(kvm, &memslot->rmap[gfn_offset]);
+   }
+   }
+   spin_unlock(&kvm->mmu_lock);
+
+   if (young)
+   kvm_flush_remote_tlbs(kvm);
+
+   return young;
+}
+
 #ifdef MMU_DEBUG
 static int is_empty_shadow_page(u64 *spt)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f94a0b..f556af6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3167,6 +3167,46 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
free_page((unsigned long)vcpu->arch.pio_data);
 }
 
+static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
+{
+   struct kvm_arch *kvm_arch;
+   kvm_arch = container_of(mn, struct kvm_arch, mmu_notifier);
+   return container_of(kvm_arch, struct kvm, arch);
+}
+
+void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
+   

[kvm-devel] Hugging My Pillow

2008-01-29 Thread kaamdhenoo
A Rose http://89.139.80.119/


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Izik Eidus
Andrea Arcangeli wrote:
> Hello,
>
> I'm testing KVM swapping on top of Christoph's latest patch
> series. However the host is hanging hard for me. Could others test it?
>   

i will ask alexey to run it

> I changed test-hardware, kernel version and kvm kernel version at the
> same time, so it might not be an issue with MMU Notifiers V2 but
> something else with my new test-setup (previously I was developing and
> testing on my workstation which was by far not ideal).
>   


-- 
woof.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Andrea Arcangeli
Hello,

I'm testing KVM swapping on top of Christoph's latest patch
series. However the host is hanging hard for me. Could others test it?
I changed test-hardware, kernel version and kvm kernel version at the
same time, so it might not be an issue with MMU Notifiers V2 but
something else with my new test-setup (previously I was developing and
testing on my workstation which was by far not ideal).

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 4086080..c527d7d 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -18,6 +18,7 @@ config KVM
tristate "Kernel-based Virtual Machine (KVM) support"
depends on ARCH_SUPPORTS_KVM && EXPERIMENTAL
select PREEMPT_NOTIFIERS
+   select MMU_NOTIFIER
select ANON_INODES
---help---
  Support hosting fully virtualized guest machines using hardware
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 635e70c..80ebc19 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -524,6 +524,110 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn)
kvm_flush_remote_tlbs(kvm);
 }
 
+static void kvm_unmap_spte(struct kvm *kvm, u64 *spte)
+{
+   struct page *page = pfn_to_page((*spte & PT64_BASE_ADDR_MASK) >> 
PAGE_SHIFT);
+   get_page(page);
+   rmap_remove(kvm, spte);
+   set_shadow_pte(spte, shadow_trap_nonpresent_pte);
+   kvm_flush_remote_tlbs(kvm);
+   __free_page(page);
+}
+
+static void kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp)
+{
+   u64 *spte, *curr_spte;
+
+   spte = rmap_next(kvm, rmapp, NULL);
+   while (spte) {
+   BUG_ON(!(*spte & PT_PRESENT_MASK));
+   rmap_printk("kvm_rmap_unmap_hva: spte %p %llx\n", spte, *spte);
+   curr_spte = spte;
+   spte = rmap_next(kvm, rmapp, spte);
+   kvm_unmap_spte(kvm, curr_spte);
+   }
+}
+
+void kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
+{
+   int i;
+
+   /*
+* If mmap_sem isn't taken, we can look the memslots with only
+* the mmu_lock by skipping over the slots with userspace_addr == 0.
+*/
+   spin_lock(&kvm->mmu_lock);
+   for (i = 0; i < kvm->nmemslots; i++) {
+   struct kvm_memory_slot *memslot = &kvm->memslots[i];
+   unsigned long start = memslot->userspace_addr;
+   unsigned long end;
+
+   /* mmu_lock protects userspace_addr */
+   if (!start)
+   continue;
+
+   end = start + (memslot->npages << PAGE_SHIFT);
+   if (hva >= start && hva < end) {
+   gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
+   kvm_unmap_rmapp(kvm, &memslot->rmap[gfn_offset]);
+   }
+   }
+   spin_unlock(&kvm->mmu_lock);
+}
+
+static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp)
+{
+   u64 *spte;
+   int young = 0;
+
+   spte = rmap_next(kvm, rmapp, NULL);
+   while (spte) {
+   int _young;
+   u64 _spte = *spte;
+   BUG_ON(!(_spte & PT_PRESENT_MASK));
+   _young = _spte & PT_ACCESSED_MASK;
+   if (_young) {
+   young = !!_young;
+   set_shadow_pte(spte, _spte & ~PT_ACCESSED_MASK);
+   }
+   spte = rmap_next(kvm, rmapp, spte);
+   }
+   return young;
+}
+
+int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+   int i;
+   int young = 0;
+
+   /*
+* If mmap_sem isn't taken, we can look the memslots with only
+* the mmu_lock by skipping over the slots with userspace_addr == 0.
+*/
+   spin_lock(&kvm->mmu_lock);
+   for (i = 0; i < kvm->nmemslots; i++) {
+   struct kvm_memory_slot *memslot = &kvm->memslots[i];
+   unsigned long start = memslot->userspace_addr;
+   unsigned long end;
+
+   /* mmu_lock protects userspace_addr */
+   if (!start)
+   continue;
+
+   end = start + (memslot->npages << PAGE_SHIFT);
+   if (hva >= start && hva < end) {
+   gfn_t gfn_offset = (hva - start) >> PAGE_SHIFT;
+   young |= kvm_age_rmapp(kvm, &memslot->rmap[gfn_offset]);
+   }
+   }
+   spin_unlock(&kvm->mmu_lock);
+
+   if (young)
+   kvm_flush_remote_tlbs(kvm);
+
+   return young;
+}
+
 #ifdef MMU_DEBUG
 static int is_empty_shadow_page(u64 *spt)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f94a0b..8954836 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3167,6 +3167,44 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
free_page((unsigned long)vcpu->arch.pio_data);
 }
 
+static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn)
+{
+   return container_of(mn,

Re: [kvm-devel] [patch 4/6] MMU notifier: invalidate_page callbacks using Linux rmaps

2008-01-29 Thread Andrea Arcangeli
On Mon, Jan 28, 2008 at 12:28:44PM -0800, Christoph Lameter wrote:
>   if (!migration && ((vma->vm_flags & VM_LOCKED) ||
> - (ptep_clear_flush_young(vma, address, pte {
> + (ptep_clear_flush_young(vma, address, pte) ||
> + mmu_notifier_age_page(mm, address {

here an example of how inferior and error prone it is to have
mmu_notifier_age_page and invalidate_page outside of pgtable.h, you
just managed to break again with the above || go figure. The
mmu_notifier_age_page has to be called unconditionally regardless of
ptep_clear_flush_young return value, we want to give only one
additional LRU scan to the referenced pages, not more than that or the
KVM guest pages will get tons more priority than the regular linux
anonymous memory.

>   ret = SWAP_FAIL;
>   goto out_unmap;
>   }
> @@ -688,6 +693,7 @@ static int try_to_unmap_one(struct page 
>   /* Nuke the page table entry. */
>   flush_cache_page(vma, address, page_to_pfn(page));
>   pteval = ptep_clear_flush(vma, address, pte);
> + mmu_notifier(invalidate_page, mm, address);
>  
>   /* Move the dirty bit to the physical page now the pte is gone. */
>   if (pte_dirty(pteval))
> @@ -815,9 +821,13 @@ static void try_to_unmap_cluster(unsigne
>   if (ptep_clear_flush_young(vma, address, pte))
>   continue;
>  
> + if (mmu_notifier_age_page(mm, address))
> + continue;
> +

Here the same exact aging regression compared to my code.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC] VMX CR3 cache

2008-01-29 Thread Marcelo Tosatti
On Tue, Jan 29, 2008 at 11:28:00AM +0100, Gerd Hoffmann wrote:
> Gerd Hoffmann wrote:
> 
> > Hmm, what kvm version is this against?  latest git I guess?  After
> > applying to kvm-60 (and fixing up some trivial rejects) it doesn't build.
> 
> Looks like the mmu.h chunk is missing in the patch.

Yes it is, sorry. It also misses the kernel/kvm.c guest bits.

And this is against a changed x86.git -mm tree (with pvops64 patches).
I'll send the PTE-write-via-hypercall patches soon and will rebase on
top of that (the CR3 cache needs more testing/tuning apparently).

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Andrea Arcangeli
On Tue, Jan 29, 2008 at 07:04:59PM +0200, Avi Kivity wrote:
> Andrea Arcangeli wrote:
>> Didn't realize s390 doesn't need those at all. Do you think
>> mmu_notifier.h should also go in asm/mmu_notifier? We can always move
>> them there later after merging with some compat code if needed.
>>   
>
> s390 is the odd bird, not x86, so I'd like a solution that doesn't involve 

;)

> duplication for x86, ppc, and ia64.

There is a few lines of duplication yes, the bulk of the code was
already only in x86.c and it has to stay there. It's only the few
lines of registration we're talking about it, so I don't think the
#ifdef would save enough.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Cleanup extern declerations for now removed vcpu_env in Qemu

2008-01-29 Thread Hollis Blanchard
On Mon, 2008-01-28 at 19:03 -0600, Jerone Young wrote:
> # HG changeset patch
> # User Jerone Young <[EMAIL PROTECTED]>
> # Date 1201568508 21600
> # Node ID a568d031723942e1baf77077031d2b77795cbd8a
> # Parent  5ce532cf9a1f711d1fecb42814d301abd37aa378
> Cleanup extern declerations for now removed vcpu_env in Qemu
> 
> This patch removes extern decleration for vcpu_env that was recently
> removed for PowerPC & IA64 in KVM.
> 
> Signed-off-by: Jerone Young <[EMAIL PROTECTED]>

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>

-- 
Hollis Blanchard
IBM Linux Technology Center


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Avi Kivity
Andrea Arcangeli wrote:
> Didn't realize s390 doesn't need those at all. Do you think
> mmu_notifier.h should also go in asm/mmu_notifier? We can always move
> them there later after merging with some compat code if needed.
>   

s390 is the odd bird, not x86, so I'd like a solution that doesn't 
involve duplication for x86, ppc, and ia64.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Avi Kivity
Carsten Otte wrote:
> Avi Kivity wrote:
>> Every arch except s390 needs it.  An ugly #ifndef 
>> CONFIG_KVM_HARDWARE_TLB_SYNC is preferred to duplicating the code.
> BTW, from reading AMDs spec I don't expect NPT to need this vehicle 
> for swapping either. They can just let core-vm page out guest pages 
> and will receive a proper page fault in the host. Jörg can you confirm 
> that?
>

No, that doesn't work:

- even though npt can use the same pagetable for guest and host, that 
isn't workable for kvm as npt doesn't have an offset/size thing.  so kvm 
uses a separate pagetable for guest and host.
- npt doesn't have a dual-tagged tlb, where a host tlb invalidate also 
invalidates all guest tlbs that point to the same page

Try again in 25 years, maybe x86 will reach where s390 is now by that 
timeframe.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Andrea Arcangeli
On Tue, Jan 29, 2008 at 06:17:51PM +0100, Carsten Otte wrote:
> Andrea Arcangeli wrote:
>> Well I already moved that bit to x86, at least that had a good reason
>> for being moved there, it's really invisible code to s390. The memslot
>> are all but invisible to s390 instead, and so the locking rules of the
>> memslots should be common as long as memslots remains a common-code
>> data structure too IMHO.
> That makes sense to me.

Ok thanks!

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH]

2008-01-29 Thread Izik Eidus
S.Çağlar Onur wrote:
> Hi;
>
> This patch (rediffed againg kvm-60) from Tavis Ormandy <[EMAIL PROTECTED]> 
> fixes an infinite
> loop in the emulated SB16 device (See http://taviso.decsystem.org/virtsec.pdf 
> for more details.)
>
> I'm not sure why qemu upstream not merged these but Xen already did.
>
> [1] http://xenbits.xensource.com/xen-3.1-testing.hg?rev/4b22d472bda6
>
> diff -ur kvm-60.orig/qemu/hw/sb16.c kvm-60/qemu/hw/sb16.c
> --- kvm-60.orig/qemu/hw/sb16.c2008-01-20 14:35:04.0 +0200
> +++ kvm-60/qemu/hw/sb16.c 2008-01-29 01:46:20.0 +0200
> @@ -1240,8 +1240,10 @@
>  s->block_size);
>  #endif
>  
> -while (s->left_till_irq <= 0) {
> -s->left_till_irq = s->block_size + s->left_till_irq;
> +if (s->block_size) {
> +while (s->left_till_irq <= 0) {
> +s->left_till_irq = s->block_size + s->left_till_irq;
> +}
>  }
>  
>  return dma_pos;
>
> Cheers
>   
we better wait for qemu to merge it and then when we will merge with 
qemu cvs we will have it

thanks

-- 
woof.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Andrea Arcangeli
On Tue, Jan 29, 2008 at 05:35:34PM +0100, Carsten Otte wrote:
> Avi Kivity wrote:
> > Every arch except s390 needs it.  An ugly #ifndef 
> > CONFIG_KVM_HARDWARE_TLB_SYNC is preferred to duplicating the code.
> BTW, from reading AMDs spec I don't expect NPT to need this vehicle 

By your conclusion I suppose you thought NPT maps guest physical to
host virtual. If it was the case the cpu would to walk three layer of
pagetables (each layer is an arrow): guest virtual -> guest physical
-> host virtual -> host physical. Instead it's just guest virtual ->
guest physical -> host physical. Or even less for shadow: guest
virtual -> host physical, which is why shadow should be faster for
number crunching still (and definitely slower for dbms).

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Avi Kivity
Andrea Arcangeli wrote:
> On Tue, Jan 29, 2008 at 05:35:34PM +0100, Carsten Otte wrote:
>   
>> Avi Kivity wrote:
>> 
>>> Every arch except s390 needs it.  An ugly #ifndef 
>>> CONFIG_KVM_HARDWARE_TLB_SYNC is preferred to duplicating the code.
>>>   
>> BTW, from reading AMDs spec I don't expect NPT to need this vehicle 
>> 
>
> By your conclusion I suppose you thought NPT maps guest physical to
> host virtual. If it was the case the cpu would to walk three layer of
> pagetables (each layer is an arrow): guest virtual -> guest physical
> -> host virtual -> host physical. Instead it's just guest virtual ->
> guest physical -> host physical. Or even less for shadow: guest
> virtual -> host physical, which is why shadow should be faster for
> number crunching still (and definitely slower for dbms).
>   

If a hypervisor mandates (host virtual) == (guest physical), it would 
work.  x86 still misses the dual-tagged tlb, so mmu notifiers are needed 
regardless.  With s390, they have an additional offset parameter, so 
(host virtual) == (guest physical) + offset, so qemu can coexist with 
the guest, and dual tagged tlb so that a host invalidate also evicts 
guest tlb entries.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Avi Kivity
Andrea Arcangeli wrote:
>
>> duplication for x86, ppc, and ia64.
>> 
>
> There is a few lines of duplication yes, the bulk of the code was
> already only in x86.c and it has to stay there. It's only the few
> lines of registration we're talking about it, so I don't think the
> #ifdef would save enough.
>   

Okay; fine by me.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] How-to use paravirt layer for network and block devices

2008-01-29 Thread Cameron Macdonell

Hi,

What are the command-line options necessary to get the guest devices  
to use the paravirt layer?

Thanks,
Cam

---
A. Cameron Macdonell
Ph.D. Student
Department of Computing Science
University of Alberta
[EMAIL PROTECTED]




-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Joerg Roedel
On Tue, Jan 29, 2008 at 07:02:52PM +0200, Avi Kivity wrote:
> Carsten Otte wrote:
> >Avi Kivity wrote:
> >>Every arch except s390 needs it.  An ugly #ifndef 
> >>CONFIG_KVM_HARDWARE_TLB_SYNC is preferred to duplicating 
> >>the code.
> >BTW, from reading AMDs spec I don't expect NPT to need this vehicle for 
> >swapping either. They can just let 
> >core-vm page out guest pages and will receive a proper page fault in the 
> >host. Jörg can you confirm that?
> >
> 
> No, that doesn't work:
> 
> - even though npt can use the same pagetable for guest and host, that isn't 
> workable for kvm as npt doesn't 
> have an offset/size thing.  so kvm uses a separate pagetable for guest and 
> host.

Right. We can't reuse page tables from the Linux MM for Nested Paging.

> - npt doesn't have a dual-tagged tlb, where a host tlb invalidate also 
> invalidates all guest tlbs that point 
> to the same page

Since we have our own page table for Nested Paging this is also true.

Joerg

-- 
   |   AMD Saxony Limited Liability Company & Co. KG
 Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany
 System|  Register Court Dresden: HRA 4896
 Research  |  General Partner authorized to represent:
 Center| AMD Saxony LLC (Wilmington, Delaware, US)
   | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Carsten Otte
Andrea Arcangeli wrote:
> Didn't realize s390 doesn't need those at all. Do you think
> mmu_notifier.h should also go in asm/mmu_notifier? We can always move
> them there later after merging with some compat code if needed.
No I think mmu_notifier.h is fine in include/linux. I just think kvm 
should only _use_ it on archs that do require assistence.

On s390, we use the same page table to translate host.user -> 
host.phys and guest.phys -> host.phys. Using storage keys, the host 
memory management takes into account dirty and reference operations 
done by the guest when doing it's swapping descitions. The host does 
invalidate a page table entry by using a magic "invalidate page table 
entry" instruction. Running virtual cpus are guaranteed not to rely on 
tlb data once the page table entry was invalidated by that 
instruction. Maybe we should just fix other hardware ;-).

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] large page support for kvm

2008-01-29 Thread Avi Kivity
The npt patches started me thinking on large page support (2MB/4MB 
pages), and I think we can implement them even when npt/ept are not 
available.

Here's a rough sketch of my proposal:

- For every memory slot, allocate an array containing one int for every 
potential large page included within that memory slot.  Each entry in 
the array contains the number of write-protected 4KB pages within the 
large page frame corresponding to that entry.

For example, if we have a memory slot for gpas 1MB-1GB, we'd have an 
array of size 511, corresponding to the 511 2MB pages from 2MB upwards.  
If we shadow a pagetable at address 4MB+8KB, we'd increment the entry 
corresponding to the large page at 4MB.  When we unshadow that page, 
decrement the entry.

- If we attempt to shadow a large page (either a guest pse pte, or a 
real-mode pseudo pte), we check if the host page is a large page.  If 
so, we also check the write-protect count array.  If the result is zero, 
we create a shadow pse pte.

- Whenever we write-protect a page, also zap any large-page mappings for 
that page.  This means rmap will need some extension to handle pde rmaps 
in addition to pte rmaps.

- qemu is extended to have a command-line option to use large pages to 
back guest memory.

Large pages should improve performance significantly, both with 
traditional shadow and npt/ept.  Hopefully we will have transparent 
Linux support for them one day.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 2/2] virtio reset support

2008-01-29 Thread Avi Kivity
Anthony Liguori wrote:
> Avi Kivity wrote:
>> Anthony Liguori wrote:
>>>
>>> PCI-e has a common reset concept (warm and cold).  I've been looking 
>>> around and I can't seem to find any common reset mechanism for PCI.  
>>> Is FLR something that is per-device or a standard PCI mechanism?  If 
>>> it's the former, than we've basically implemented FLR using this bit 
>>> in the config space.
>>>
>>
>> I believe it is a standard mechanism, albeit new, so perhaps many 
>> devices don't implement it.
>
> I don't have a copy of the PCI specification handy.  Can you dig up 
> how it's implemented?  I don't see any references in my local 
> documentation and I couldn't find anything in Linux that referenced it 
> at the PCI level.
>

I think it's NDA material, so even if I found it, I couldn't publish 
it.  It can probably be reverse-engineered from the Xen patches, or 
perhaps your employer is a PCI-SIG member so you can get access to it.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] SVM: set NM intercept when enabling CR0.TS in the guest

2008-01-29 Thread Avi Kivity
Joerg Roedel wrote:
> Explicitly enable the NM intercept in svm_set_cr0 if we enable TS in the guest
> copy of CR0 for lazy FPU switching. This fixes guest SMP with Linux under SVM.
> Without that patch Linux deadlocks or panics right after trying to boot the
> other CPUs.
>   

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 1/6] mmu_notifier: Core code

2008-01-29 Thread Andrea Arcangeli
On Tue, Jan 29, 2008 at 02:59:14PM +0100, Andrea Arcangeli wrote:
> The down_write is garbage. The caller should put it around
> mmu_notifier_register if something. The same way the caller should
> call synchronize_rcu after mmu_notifier_register if it needs
> synchronous behavior from the notifiers. The default version of
> mmu_notifier_register shouldn't be cluttered with unnecessary locking.

Ooops my spinlock was gone from the notifier head so the above
comment is wrong sorry! I thought down_write was needed to serialize
against some _external_ event, not to serialize the list updates in
place of my explicit lock. The critical section is so small that a
semaphore is the wrong locking choice, that's why I assumed it was for
an external event. Anyway RCU won't be optimal for a huge flood of
register/unregister, I agree the down_write shouldn't create much
contention and it saves 4 bytes from each mm_struct, and we can always
change it to a proper spinlock later if needed.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Joerg Roedel
On Tue, Jan 29, 2008 at 05:35:34PM +0100, Carsten Otte wrote:
> Avi Kivity wrote:
> > Every arch except s390 needs it.  An ugly #ifndef 
> > CONFIG_KVM_HARDWARE_TLB_SYNC is preferred to duplicating the code.
> BTW, from reading AMDs spec I don't expect NPT to need this vehicle 
> for swapping either. They can just let core-vm page out guest pages 
> and will receive a proper page fault in the host. Jörg can you confirm 
> that?

Since NPT uses the host page table format it is in theory possible to
add the pagetable to the Linux MM rmap. In this case it would not be
necessary to use MMU notifiers. But I think this would complicate the
NPT support code significantly.

Joerg

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Carsten Otte
Avi Kivity wrote:
> Every arch except s390 needs it.  An ugly #ifndef 
> CONFIG_KVM_HARDWARE_TLB_SYNC is preferred to duplicating the code.
Yea I guess you've got a point there. The struct should be in struct 
kvm, but the call to initialize/destroy it should go out to arch_init.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Carsten Otte
Andrea Arcangeli wrote:
> Well I already moved that bit to x86, at least that had a good reason
> for being moved there, it's really invisible code to s390. The memslot
> are all but invisible to s390 instead, and so the locking rules of the
> memslots should be common as long as memslots remains a common-code
> data structure too IMHO.
That makes sense to me.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Andrea Arcangeli
On Mon, Jan 28, 2008 at 12:28:42PM -0800, Christoph Lameter wrote:
> Index: linux-2.6/mm/fremap.c
> ===
> --- linux-2.6.orig/mm/fremap.c2008-01-25 19:31:05.0 -0800
> +++ linux-2.6/mm/fremap.c 2008-01-25 19:32:49.0 -0800
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -211,6 +212,7 @@ asmlinkage long sys_remap_file_pages(uns
>   spin_unlock(&mapping->i_mmap_lock);
>   }
>  
> + mmu_notifier(invalidate_range, mm, start, start + size, 0);
>   err = populate_range(mm, vma, start, size, pgoff);

How can it be right to invalidate_range _before_ ptep_clear_flush?

> @@ -1634,6 +1639,8 @@ gotten:
>   /*
>* Re-check the pte - we dropped the lock
>*/
> + mmu_notifier(invalidate_range, mm, address,
> + address + PAGE_SIZE - 1, 0);
>   page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
>   if (likely(pte_same(*page_table, orig_pte))) {
>   if (old_page) {

What's the point of invalidate_range when the size is PAGE_SIZE? And
how can it be right to invalidate_range _before_ ptep_clear_flush?

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap

2008-01-29 Thread Robin Holt
I don't understand how this is intended to work.  I think the page flag
needs to be maintained by the mmu_notifier subsystem.

Let's assume we have a mapping that has a grant from xpmem and an
additional grant from kvm.  The exporters are not important, the fact
that there may be two is.

Assume that the user revokes the grant from xpmem (we call that
xpmem_remove).  As far as xpmem is concerned, there are no longer any
exports of that page so the page should no longer have its exported
flag set.  Note: This is not a process exit, but a function of xpmem.

In that case, at the remove time, we have no idea whether the flag should
be cleared.

For the invalidate_page side, I think we should have:
> @@ -473,6 +474,10 @@ int page_mkclean(struct page *page)
>   struct address_space *mapping = page_mapping(page);
>   if (mapping) {
>   ret = page_mkclean_file(mapping, page);
> + if (unlikely(PageExternalRmap(page))) {
> + mmu_rmap_notifier(invalidate_page, page);
> + ClearPageExternalRmap(page);
> + }
>   if (page_test_dirty(page)) {
>   page_clear_dirty(page);
>   ret = 1;

I would assume we would then want a function which sets the page flag.

Additionally, I would think we would want some intervention in the
freeing of the page side to ensure the page flag is cleared as well.

Thanks,
Robin

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Carsten Otte
Avi Kivity wrote:
> Every arch except s390 needs it.  An ugly #ifndef 
> CONFIG_KVM_HARDWARE_TLB_SYNC is preferred to duplicating the code.
BTW, from reading AMDs spec I don't expect NPT to need this vehicle 
for swapping either. They can just let core-vm page out guest pages 
and will receive a proper page fault in the host. Jörg can you confirm 
that?


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Andrea Arcangeli
Christoph, the below patch should fix the current leak of the pinned
pages. I hope the page-pin that should be dropped by the
invalidate_range op, is enough to prevent the "physical page" mapped
on that "mm+address" to change before invalidate_range returns. If
that would ever happen, there would be a coherency loss between the
guest VM writes and the writes coming from userland on the same
mm+address from a different thread (qemu, whatever). invalidate_page
before PT lock was obviously safe. Now we entirely relay on the pin to
prevent the page to change before invalidate_range returns. If the pte
is unmapped and the page is mapped back in with a minor fault that's
ok, as long as the physical page remains the same for that mm+address,
until all sptes are gone.

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>

diff --git a/mm/fremap.c b/mm/fremap.c
--- a/mm/fremap.c
+++ b/mm/fremap.c
@@ -212,8 +212,8 @@ asmlinkage long sys_remap_file_pages(uns
spin_unlock(&mapping->i_mmap_lock);
}
 
+   err = populate_range(mm, vma, start, size, pgoff);
mmu_notifier(invalidate_range, mm, start, start + size, 0);
-   err = populate_range(mm, vma, start, size, pgoff);
if (!err && !(flags & MAP_NONBLOCK)) {
if (unlikely(has_write_lock)) {
downgrade_write(&mm->mmap_sem);
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1639,8 +1639,6 @@ gotten:
/*
 * Re-check the pte - we dropped the lock
 */
-   mmu_notifier(invalidate_range, mm, address,
-   address + PAGE_SIZE - 1, 0);
page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
if (likely(pte_same(*page_table, orig_pte))) {
if (old_page) {
@@ -1676,6 +1674,8 @@ gotten:
page_cache_release(old_page);
 unlock:
pte_unmap_unlock(page_table, ptl);
+   mmu_notifier(invalidate_range, mm, address,
+   address + PAGE_SIZE - 1, 0);
if (dirty_page) {
if (vma->vm_file)
file_update_time(vma->vm_file);

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Andrea Arcangeli
On Tue, Jan 29, 2008 at 11:55:10AM -0800, Christoph Lameter wrote:
> I am not sure. AFAICT you wrote that code.

Actually I didn't need to change a single line in do_wp_page because
ptep_clear_flush was already doing everything transparently for
me. This was the memory.c part of my last patch I posted, it only
touches zap_page_range, remap_pfn_range and apply_to_page_range.

diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -889,6 +889,7 @@ unsigned long zap_page_range(struct vm_a
end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
if (tlb)
tlb_finish_mmu(tlb, address, end);
+   mmu_notifier(invalidate_range, mm, address, end);
return end;
 }
 
@@ -1317,7 +1318,7 @@ int remap_pfn_range(struct vm_area_struc
 {
pgd_t *pgd;
unsigned long next;
-   unsigned long end = addr + PAGE_ALIGN(size);
+   unsigned long start = addr, end = addr + PAGE_ALIGN(size);
struct mm_struct *mm = vma->vm_mm;
int err;
 
@@ -1358,6 +1359,7 @@ int remap_pfn_range(struct vm_area_struc
if (err)
break;
} while (pgd++, addr = next, addr != end);
+   mmu_notifier(invalidate_range, mm, start, end);
return err;
 }
 EXPORT_SYMBOL(remap_pfn_range);
@@ -1441,7 +1443,7 @@ int apply_to_page_range(struct mm_struct
 {
pgd_t *pgd;
unsigned long next;
-   unsigned long end = addr + size;
+   unsigned long start = addr, end = addr + size;
int err;
 
BUG_ON(addr >= end);
@@ -1452,6 +1454,7 @@ int apply_to_page_range(struct mm_struct
if (err)
break;
} while (pgd++, addr = next, addr != end);
+   mmu_notifier(invalidate_range, mm, start, end);
return err;
 }
 EXPORT_SYMBOL_GPL(apply_to_page_range);

> It seems to be okay to invalidate range if you hold mmap_sem writably. In 
> that case no additional faults can happen that would create new ptes.

In that place the mmap_sem is taken but in readonly mode. I never rely
on the mmap_sem in the mmu notifier methods. Not invoking the notifier
before releasing the PT lock adds quite some uncertainty on the smp
safety of the spte invalidates, because the pte may be unmapped and
remapped by a minor fault before invalidate_range is invoked, but I
didn't figure out a kernel crashing race yet thanks to the pin we take
through get_user_pages (and only thanks to it). The requirement is
that invalidate_range is invoked after the last ptep_clear_flush or it
leaks pins that's why I had to move it at the end.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 1/6] mmu_notifier: Core code

2008-01-29 Thread Avi Kivity
Christoph Lameter wrote:
> On Tue, 29 Jan 2008, Andrea Arcangeli wrote:
>
>   
>>> +   struct mmu_notifier_head mmu_notifier; /* MMU notifier list */
>>>  };
>>>   
>> Not sure why you prefer to waste ram when MMU_NOTIFIER=n, this is a
>> regression (a minor one though).
>> 
>
> Andrew does not like #ifdefs and it makes it possible to verify calling 
> conventions if !CONFIG_MMU_NOTIFIER.
>
>   

You could define mmu_notifier_head as an empty struct in that case.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Andrea Arcangeli
On Tue, Jan 29, 2008 at 01:35:58PM -0800, Christoph Lameter wrote:
> On Tue, 29 Jan 2008, Andrea Arcangeli wrote:
> 
> > > It seems to be okay to invalidate range if you hold mmap_sem writably. In 
> > > that case no additional faults can happen that would create new ptes.
> > 
> > In that place the mmap_sem is taken but in readonly mode. I never rely
> > on the mmap_sem in the mmu notifier methods. Not invoking the notifier
> 
> Well it seems that we have to rely on mmap_sem otherwise concurrent faults 
> can occur. The mmap_sem seems to be acquired for write there.
 ^
> 
>   if (!has_write_lock) {
> up_read(&mm->mmap_sem);
> down_write(&mm->mmap_sem);
> has_write_lock = 1;
> goto retry;
> }


hmm, "there" where? When I said it was taken in readonly mode I meant
for the quoted code (it would be at the top if it wasn't cut), so I
quote below again:

> > +   mmu_notifier(invalidate_range, mm, address,
> > +   address + PAGE_SIZE - 1, 0);
> > page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> > if (likely(pte_same(*page_table, orig_pte))) {
> > if (old_page) {

The "there" for me was do_wp_page.

Even for the code you quoted in freemap.c, the has_write_lock is set
to 1 _only_ for the very first time you call sys_remap_file_pages on a
VMA. Only the transition of the VMA between linear to nonlinear
requires the mmap in write mode. So you can be sure all freemap code
99% of the time is populating (overwriting) already present ptes with
only the mmap_sem in readonly mode like do_wp_page. It would be
unnecessary to populate the nonlinear range with the mmap in write
mode. Only the "vma" mangling requires the mmap_sem in write mode, the
pte modifications only requires the PT_lock + mmap_sem in read mode.

Effectively the first invocation of populate_range runs with the
mmap_sem in write mode, I wonder why, there seem to be no good reason
for that. I guess it's a bit that should be optimized, by calling
downgrade_write before calling populate_range even for the first time
the vma switches from linear to nonlinear (after the vma has been
fully updated to the new status). But for sure all later invocations
runs populate_range with the semaphore readonly like the rest of the
VM does when instantiating ptes in the page faults.

> > before releasing the PT lock adds quite some uncertainty on the smp
> > safety of the spte invalidates, because the pte may be unmapped and
> > remapped by a minor fault before invalidate_range is invoked, but I
> > didn't figure out a kernel crashing race yet thanks to the pin we take
> > through get_user_pages (and only thanks to it). The requirement is
> > that invalidate_range is invoked after the last ptep_clear_flush or it
> > leaks pins that's why I had to move it at the end.
>  
> So "pins" means a reference count right? I still do not get why you 

Yes.

> have refcount problems. You take a refcount when you export the page 
> through KVM and then drop the refcount in invalidate page right?

Yes.

> So you walk through the KVM ptes and drop the refcount for each spte you 
> encounter?

Yes.

All pins are gone by the time invalidate_page/range returns. But there
is no critical section between invalidate_page and the _later_
ptep_clear_flush. So get_user_pages is free to run and take the PT
lock before the ptep_clear_flush, find the linux pte still
instantiated, and to create a new spte, before ptep_clear_flush runs.

Think of why the tlb flushes are being called at the end of
ptep_clear_flush. The mmu notifier invalidate has to be called after
for the exact same reason.

Perhaps somebody else should explain this, I started exposing this
smp race the moment after I've seen the backwards ordering being
proposed in export-notifier-v1, sorry if I'm not clear enough.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Andrea Arcangeli
On Tue, Jan 29, 2008 at 12:30:06PM -0800, Christoph Lameter wrote:
> On Tue, 29 Jan 2008, Andrea Arcangeli wrote:
> 
> > diff --git a/mm/fremap.c b/mm/fremap.c
> > --- a/mm/fremap.c
> > +++ b/mm/fremap.c
> > @@ -212,8 +212,8 @@ asmlinkage long sys_remap_file_pages(uns
> > spin_unlock(&mapping->i_mmap_lock);
> > }
> >  
> > +   err = populate_range(mm, vma, start, size, pgoff);
> > mmu_notifier(invalidate_range, mm, start, start + size, 0);
> > -   err = populate_range(mm, vma, start, size, pgoff);
> > if (!err && !(flags & MAP_NONBLOCK)) {
> > if (unlikely(has_write_lock)) {
> > downgrade_write(&mm->mmap_sem);
> 
> We invalidate the range *after* populating it? Isnt it okay to establish 
> references while populate_range() runs?

It's not ok because that function can very well overwrite existing and
present ptes (it's actually the nonlinear common case fast path for
db). With your code the sptes created between invalidate_range and
populate_range, will keep pointing forever to the old physical page
instead of the newly populated one.

I'm also asking myself if it's a smp race not to call
mmu_notifier(invalidate_page) between ptep_clear_flush and set_pte_at
in install_file_pte. Probably not because the guest VM running in a
different thread would need to serialize outside the install_file_pte
code with the task running install_file_pte, if it wants to be sure to
write either all its data to the old or the new page. Certainly doing
the invalidate_page inside the PT lock was obviously safe but I hope
this is safe and this can accommodate your needs too.

> > diff --git a/mm/memory.c b/mm/memory.c
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1639,8 +1639,6 @@ gotten:
> > /*
> >  * Re-check the pte - we dropped the lock
> >  */
> > -   mmu_notifier(invalidate_range, mm, address,
> > -   address + PAGE_SIZE - 1, 0);
> > page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> > if (likely(pte_same(*page_table, orig_pte))) {
> > if (old_page) {
> 
> What we did is to invalidate the page (?!) before taking the pte lock. In 
> the lock we replace the pte to point to another page. This means that we 
> need to clear stale information. So we zap it before. If another reference 
> is established after taking the spinlock then the pte contents have 
> changed at the cirtical section fails.
> 
> Before the critical section starts we have gotten an extra refcount on the 
> original page so the page cannot vanish from under us.

The problem is the missing invalidate_page/range _after_
ptep_clear_flush. If a spte is built between invalidate_range and
pte_offset_map_lock, it will remain pointing to the old page
forever. Nothing will be called to invalidate that stale spte built
between invalidate_page/range and ptep_clear_flush. This is why for
the last few days I kept saying the mmu notifiers have to be invoked
_after_ ptep_clear_flush and never before (remember the export
notifier?). No idea how you can deal with this in your code, certainly
for KVM sptes that's backwards and unworkable ordering of operation
(exactly as backwards are doing the tlb flush before pte_clear in
ptep_clear_flush, think spte as a tlb, you can't flush the tlb before
clearing/updating the pte or it's smp unsafe).

> > @@ -1676,6 +1674,8 @@ gotten:
> > page_cache_release(old_page);
> >  unlock:
> > pte_unmap_unlock(page_table, ptl);
> > +   mmu_notifier(invalidate_range, mm, address,
> > +   address + PAGE_SIZE - 1, 0);
> > if (dirty_page) {
> > if (vma->vm_file)
> > file_update_time(vma->vm_file);
> 
> Now we invalidate the page after the transaction is complete. This means 
> external pte can persist while we change the pte? Possibly even dirty the 
> page?

Yes, and the only reason this can be safe is for the reason explained
at the top of the email, if the other cpu wants to serialize to be
sure to write in the "new" page, it has to serialize with the
page-fault but to serialize it has to wait the page fault to return
(example: we're not going to call futex code until the page fault
returns).

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Christoph Lameter
On Tue, 29 Jan 2008, Andrea Arcangeli wrote:

> diff --git a/mm/fremap.c b/mm/fremap.c
> --- a/mm/fremap.c
> +++ b/mm/fremap.c
> @@ -212,8 +212,8 @@ asmlinkage long sys_remap_file_pages(uns
>   spin_unlock(&mapping->i_mmap_lock);
>   }
>  
> + err = populate_range(mm, vma, start, size, pgoff);
>   mmu_notifier(invalidate_range, mm, start, start + size, 0);
> - err = populate_range(mm, vma, start, size, pgoff);
>   if (!err && !(flags & MAP_NONBLOCK)) {
>   if (unlikely(has_write_lock)) {
>   downgrade_write(&mm->mmap_sem);

We invalidate the range *after* populating it? Isnt it okay to establish 
references while populate_range() runs?

> diff --git a/mm/memory.c b/mm/memory.c
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1639,8 +1639,6 @@ gotten:
>   /*
>* Re-check the pte - we dropped the lock
>*/
> - mmu_notifier(invalidate_range, mm, address,
> - address + PAGE_SIZE - 1, 0);
>   page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
>   if (likely(pte_same(*page_table, orig_pte))) {
>   if (old_page) {

What we did is to invalidate the page (?!) before taking the pte lock. In 
the lock we replace the pte to point to another page. This means that we 
need to clear stale information. So we zap it before. If another reference 
is established after taking the spinlock then the pte contents have 
changed at the cirtical section fails.

Before the critical section starts we have gotten an extra refcount on the 
original page so the page cannot vanish from under us.

> @@ -1676,6 +1674,8 @@ gotten:
>   page_cache_release(old_page);
>  unlock:
>   pte_unmap_unlock(page_table, ptl);
> + mmu_notifier(invalidate_range, mm, address,
> + address + PAGE_SIZE - 1, 0);
>   if (dirty_page) {
>   if (vma->vm_file)
>   file_update_time(vma->vm_file);

Now we invalidate the page after the transaction is complete. This means 
external pte can persist while we change the pte? Possibly even dirty the 
page?




-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 6/6] mmu_notifier: Add invalidate_all()

2008-01-29 Thread Christoph Lameter
On Tue, 29 Jan 2008, Robin Holt wrote:

> What is the status of getting invalidate_all adjusted to indicate a need
> to also call _release?

Release is only called if the mmu_notifier is still registered. If you 
take it out on invalidate_all then there will be no call to release 
(provided you deal with the RCU issues).


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Christoph Lameter
On Tue, 29 Jan 2008, Andrea Arcangeli wrote:

> > +   mmu_notifier(invalidate_range, mm, address,
> > +   address + PAGE_SIZE - 1, 0);
> > page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> > if (likely(pte_same(*page_table, orig_pte))) {
> > if (old_page) {
> 
> What's the point of invalidate_range when the size is PAGE_SIZE? And
> how can it be right to invalidate_range _before_ ptep_clear_flush?

I am not sure. AFAICT you wrote that code.

It seems to be okay to invalidate range if you hold mmap_sem writably. In 
that case no additional faults can happen that would create new ptes.




-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Andrea Arcangeli
On Tue, Jan 29, 2008 at 07:19:18PM +0100, Joerg Roedel wrote:
> Since NPT uses the host page table format it is in theory possible to
> add the pagetable to the Linux MM rmap. In this case it would not be
> necessary to use MMU notifiers. But I think this would complicate the
> NPT support code significantly.

The Linux rmap isn't like Christoph's secondary-rmap, nor similar to
the KVM rmap. The difference is that the linux rmap requires zero ram
in rmap structures, for each new page allocated by a linux page
fault. While KVM rmap requires a few bytes for each new page allocated
with get_user_pages and mapped/cached in some spte. So we can't
represent NTP pagetables in linux rmap and the current mmu notifier
model is quite optimal for it.

What instead may be possible with NTP given the radix tree format, is
to make a KVM rmap implementation for NPT similar to the one in the
linux VM, to avoid losing 64bit of ram for each new NPT pagetable
allocated and mapped, so the mmu notifier may be able to reverse from
host virtual to NPT pagetable without having to use any metadata but
by just walking the NPT tree for the VM. I'm not sure if it's feasible
though.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] swapping with MMU Notifiers V2

2008-01-29 Thread Andrea Arcangeli
On Tue, Jan 29, 2008 at 08:05:20PM +0200, Avi Kivity wrote:
> If a hypervisor mandates (host virtual) == (guest physical), it would work. 
>  x86 still misses the dual-tagged tlb, so mmu notifiers are needed 
> regardless.  With s390, they have an additional offset parameter, so (host 

Yep. NPT is certainly better given two levels are bad enough. The mmu
notifiers aren't really a performance problem unlike the three levels
would be (the bigger cost is likely the ipi for the tlb flushes on a
smp guest on smp host, and they cost nothing on up guest).

> virtual) == (guest physical) + offset, so qemu can coexist with the guest, 
> and dual tagged tlb so that a host invalidate also evicts guest tlb 
> entries.

Thanks!

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 1/6] mmu_notifier: Core code

2008-01-29 Thread Christoph Lameter
On Tue, 29 Jan 2008, Andrea Arcangeli wrote:

> > +   struct mmu_notifier_head mmu_notifier; /* MMU notifier list */
> >  };
> 
> Not sure why you prefer to waste ram when MMU_NOTIFIER=n, this is a
> regression (a minor one though).

Andrew does not like #ifdefs and it makes it possible to verify calling 
conventions if !CONFIG_MMU_NOTIFIER.

> It's out of my reach how can you be ok with lock=1. You said you have
> to block, if you can deal with lock=1 once, why can't you deal with
> lock=1 _always_?

Not sure yet. We may have to do more in that area. Need to have feedback 
from Robin.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 4/6] MMU notifier: invalidate_page callbacks using Linux rmaps

2008-01-29 Thread Christoph Lameter
Thanks I will put that into V3.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Christoph Lameter
On Tue, 29 Jan 2008, Andrea Arcangeli wrote:

> > It seems to be okay to invalidate range if you hold mmap_sem writably. In 
> > that case no additional faults can happen that would create new ptes.
> 
> In that place the mmap_sem is taken but in readonly mode. I never rely
> on the mmap_sem in the mmu notifier methods. Not invoking the notifier

Well it seems that we have to rely on mmap_sem otherwise concurrent faults 
can occur. The mmap_sem seems to be acquired for write there.

  if (!has_write_lock) {
up_read(&mm->mmap_sem);
down_write(&mm->mmap_sem);
has_write_lock = 1;
goto retry;
}


> before releasing the PT lock adds quite some uncertainty on the smp
> safety of the spte invalidates, because the pte may be unmapped and
> remapped by a minor fault before invalidate_range is invoked, but I
> didn't figure out a kernel crashing race yet thanks to the pin we take
> through get_user_pages (and only thanks to it). The requirement is
> that invalidate_range is invoked after the last ptep_clear_flush or it
> leaks pins that's why I had to move it at the end.
 
So "pins" means a reference count right? I still do not get why you 
have refcount problems. You take a refcount when you export the page 
through KVM and then drop the refcount in invalidate page right?

So you walk through the KVM ptes and drop the refcount for each spte you 
encounter?



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Andrea Arcangeli
On Tue, Jan 29, 2008 at 01:53:05PM -0800, Christoph Lameter wrote:
> On Tue, 29 Jan 2008, Andrea Arcangeli wrote:
> 
> > > We invalidate the range *after* populating it? Isnt it okay to establish 
> > > references while populate_range() runs?
> > 
> > It's not ok because that function can very well overwrite existing and
> > present ptes (it's actually the nonlinear common case fast path for
> > db). With your code the sptes created between invalidate_range and
> > populate_range, will keep pointing forever to the old physical page
> > instead of the newly populated one.
> 
> Seems though that the mmap_sem is taken for regular vmas writably and will 
> hold off new mappings.

It's taken writable due to the code being inefficient the first time,
all later times remap_populate_range overwrites ptes with the mmap_sem
in readonly mode (finally rightfully so). The first remap_file_pages I
guess it's irrelevant to optimize, the whole point of nonlinear is to
call remap_file_pages zillon of times on the same vma, overwriting
present ptes the whole time, so if the first time the mutex is not
readonly it probably doesn't make a difference.

get_user_pages invoked by the kvm spte-fault, can happen between
invalidate_range and populate_range. If it can't happen, for sure
nobody pointed out a good reason why it can't happen. The kvm page
faults as well rightfully only takes the mmap_sem in readonly mode, so
get_user_pages is only called internally to gfn_to_page with the
readonly semaphore.

With my approach ptep_clear_flush was not only invalidating sptes
after ptep_clear_flush, but it was also invalidating them inside the
PT lock, so it was totally obvious there could be no race vs
get_user_pages.

> > I'm also asking myself if it's a smp race not to call
> > mmu_notifier(invalidate_page) between ptep_clear_flush and set_pte_at
> > in install_file_pte. Probably not because the guest VM running in a
> > different thread would need to serialize outside the install_file_pte
> > code with the task running install_file_pte, if it wants to be sure to
> > write either all its data to the old or the new page. Certainly doing
> > the invalidate_page inside the PT lock was obviously safe but I hope
> > this is safe and this can accommodate your needs too.
> 
> But that would be doing two invalidates on one pte. One range and one page 
> invalidate.

Yes, but it would have been micro-optimized later if you really cared,
by simply changing ptep_clear_flush to __ptep_clear_flush, no big
deal. Definitely all methods must be robust about them being called
multiple times, even if the rmap finds no spte mapping such host
virtual address.

> Hmmm... So we could only do an invalidate_page here? Drop the strange 
> invalidate_range()?

That's a question you should answer.

> > > > @@ -1676,6 +1674,8 @@ gotten:
> > > > page_cache_release(old_page);
> > > >  unlock:
> > > > pte_unmap_unlock(page_table, ptl);
> > > > +   mmu_notifier(invalidate_range, mm, address,
> > > > +   address + PAGE_SIZE - 1, 0);
> > > > if (dirty_page) {
> > > > if (vma->vm_file)
> > > > file_update_time(vma->vm_file);
> > > 
> > > Now we invalidate the page after the transaction is complete. This means 
> > > external pte can persist while we change the pte? Possibly even dirty the 
> > > page?
> > 
> > Yes, and the only reason this can be safe is for the reason explained
> > at the top of the email, if the other cpu wants to serialize to be
> > sure to write in the "new" page, it has to serialize with the
> > page-fault but to serialize it has to wait the page fault to return
> > (example: we're not going to call futex code until the page fault
> > returns).
> 
> Serialize how? mmap_sem?

No, that's a different angle.

But now I think there may be an issue with a third thread that may
show unsafe the removal of invalidate_page from ptep_clear_flush.

A third thread writing to a page through the linux-pte and the guest
VM writing to the same page through the sptes, will be writing on the
same physical page concurrently and using an userspace spinlock w/o
ever entering the kernel. With your patch that invalidate_range after
dropping the PT lock, the third thread may start writing on the new
page, when the guest is still writing to the old page through the
sptes. While this couldn't happen with my patch.

So really at the light of the third thread, it seems your approach is
smp racey and ptep_clear_flush should invalidate_page as last thing
before returning. My patch was enforcing that ptep_clear_flush would
stop the third thread in a linux page fault, and to drop the spte,
before the new mapping could be instantiated in both the linux pte and
in the sptes. The PT lock provided the needed serialization. This
ensured the third thread and the guest VM would always write on the
same physical page even if the first thread runs a flood of
remap_file_pages on that sam

Re: [kvm-devel] How-to use paravirt layer for network and block devices

2008-01-29 Thread Dor Laor

On Tue, 2008-01-29 at 10:50 -0700, Cameron Macdonell wrote:
> Hi,
> 
> What are the command-line options necessary to get the guest devices  
> to use the paravirt layer?
> 

For network you use '-net nic,model=virtio',
I hope to write a wiki page for it tomorrow.

> Thanks,
> Cam
> 
> ---
> A. Cameron Macdonell
> Ph.D. Student
> Department of Computing Science
> University of Alberta
> [EMAIL PROTECTED]
> 
> 
> 
> 
> -
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
> ___
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Christoph Lameter
On Tue, 29 Jan 2008, Andrea Arcangeli wrote:

> > We invalidate the range *after* populating it? Isnt it okay to establish 
> > references while populate_range() runs?
> 
> It's not ok because that function can very well overwrite existing and
> present ptes (it's actually the nonlinear common case fast path for
> db). With your code the sptes created between invalidate_range and
> populate_range, will keep pointing forever to the old physical page
> instead of the newly populated one.

Seems though that the mmap_sem is taken for regular vmas writably and will 
hold off new mappings.

> I'm also asking myself if it's a smp race not to call
> mmu_notifier(invalidate_page) between ptep_clear_flush and set_pte_at
> in install_file_pte. Probably not because the guest VM running in a
> different thread would need to serialize outside the install_file_pte
> code with the task running install_file_pte, if it wants to be sure to
> write either all its data to the old or the new page. Certainly doing
> the invalidate_page inside the PT lock was obviously safe but I hope
> this is safe and this can accommodate your needs too.

But that would be doing two invalidates on one pte. One range and one page 
invalidate.

> > > diff --git a/mm/memory.c b/mm/memory.c
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -1639,8 +1639,6 @@ gotten:
> > >   /*
> > >* Re-check the pte - we dropped the lock
> > >*/
> > > - mmu_notifier(invalidate_range, mm, address,
> > > - address + PAGE_SIZE - 1, 0);
> > >   page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> > >   if (likely(pte_same(*page_table, orig_pte))) {
> > >   if (old_page) {
> > 
> > What we did is to invalidate the page (?!) before taking the pte lock. In 
> > the lock we replace the pte to point to another page. This means that we 
> > need to clear stale information. So we zap it before. If another reference 
> > is established after taking the spinlock then the pte contents have 
> > changed at the cirtical section fails.
> > 
> > Before the critical section starts we have gotten an extra refcount on the 
> > original page so the page cannot vanish from under us.
> 
> The problem is the missing invalidate_page/range _after_
> ptep_clear_flush. If a spte is built between invalidate_range and
> pte_offset_map_lock, it will remain pointing to the old page
> forever. Nothing will be called to invalidate that stale spte built
> between invalidate_page/range and ptep_clear_flush. This is why for
> the last few days I kept saying the mmu notifiers have to be invoked
> _after_ ptep_clear_flush and never before (remember the export
> notifier?). No idea how you can deal with this in your code, certainly
> for KVM sptes that's backwards and unworkable ordering of operation
> (exactly as backwards are doing the tlb flush before pte_clear in
> ptep_clear_flush, think spte as a tlb, you can't flush the tlb before
> clearing/updating the pte or it's smp unsafe).

Hmmm... So we could only do an invalidate_page here? Drop the strange 
invalidate_range()?

> 
> > > @@ -1676,6 +1674,8 @@ gotten:
> > >   page_cache_release(old_page);
> > >  unlock:
> > >   pte_unmap_unlock(page_table, ptl);
> > > + mmu_notifier(invalidate_range, mm, address,
> > > + address + PAGE_SIZE - 1, 0);
> > >   if (dirty_page) {
> > >   if (vma->vm_file)
> > >   file_update_time(vma->vm_file);
> > 
> > Now we invalidate the page after the transaction is complete. This means 
> > external pte can persist while we change the pte? Possibly even dirty the 
> > page?
> 
> Yes, and the only reason this can be safe is for the reason explained
> at the top of the email, if the other cpu wants to serialize to be
> sure to write in the "new" page, it has to serialize with the
> page-fault but to serialize it has to wait the page fault to return
> (example: we're not going to call futex code until the page fault
> returns).

Serialize how? mmap_sem?
 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Christoph Lameter
n Tue, 29 Jan 2008, Andrea Arcangeli wrote:

> hmm, "there" where? When I said it was taken in readonly mode I meant
> for the quoted code (it would be at the top if it wasn't cut), so I
> quote below again:
> 
> > > +   mmu_notifier(invalidate_range, mm, address,
> > > +   address + PAGE_SIZE - 1, 0);
> > > page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> > > if (likely(pte_same(*page_table, orig_pte))) {
> > > if (old_page) {
> 
> The "there" for me was do_wp_page.

Maybe we better focus on one call at a time?

> Even for the code you quoted in freemap.c, the has_write_lock is set
> to 1 _only_ for the very first time you call sys_remap_file_pages on a
> VMA. Only the transition of the VMA between linear to nonlinear
> requires the mmap in write mode. So you can be sure all freemap code
> 99% of the time is populating (overwriting) already present ptes with
> only the mmap_sem in readonly mode like do_wp_page. It would be
> unnecessary to populate the nonlinear range with the mmap in write
> mode. Only the "vma" mangling requires the mmap_sem in write mode, the
> pte modifications only requires the PT_lock + mmap_sem in read mode.
> 
> Effectively the first invocation of populate_range runs with the
> mmap_sem in write mode, I wonder why, there seem to be no good reason
> for that. I guess it's a bit that should be optimized, by calling
> downgrade_write before calling populate_range even for the first time
> the vma switches from linear to nonlinear (after the vma has been
> fully updated to the new status). But for sure all later invocations
> runs populate_range with the semaphore readonly like the rest of the
> VM does when instantiating ptes in the page faults.

If it does not run in write mode then concurrent faults are permissible 
while we remap pages. Weird. Maybe we better handle this like individual
page operations? Put the invalidate_page back into zap_pte. But then there 
would be no callback w/o lock as required by Robin. Doing the 
invalidate_range after populate allows access to memory for which ptes 
were zapped and the refcount was released.

> All pins are gone by the time invalidate_page/range returns. But there
> is no critical section between invalidate_page and the _later_
> ptep_clear_flush. So get_user_pages is free to run and take the PT
> lock before the ptep_clear_flush, find the linux pte still
> instantiated, and to create a new spte, before ptep_clear_flush runs.

Hmmm... Right. Did not consider get_user_pages. A write to the page that 
is not marked dirty would typically require a fault that will serialize.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Christoph Lameter
On Tue, 29 Jan 2008, Andrea Arcangeli wrote:

> But now I think there may be an issue with a third thread that may
> show unsafe the removal of invalidate_page from ptep_clear_flush.
> 
> A third thread writing to a page through the linux-pte and the guest
> VM writing to the same page through the sptes, will be writing on the
> same physical page concurrently and using an userspace spinlock w/o
> ever entering the kernel. With your patch that invalidate_range after
> dropping the PT lock, the third thread may start writing on the new
> page, when the guest is still writing to the old page through the
> sptes. While this couldn't happen with my patch.

A user space spinlock plays into this??? That is irrelevant to the kernel. 
And we are discussing "your" placement of the invalidate_range not mine.

This is the scenario that I described before. You just need two threads.
One thread is in do_wp_page and the other is writing through the spte. 
We are in do_wp_page. Meaning the page is not writable. The writer will 
have to take fault which will properly serialize access. It a bug if the 
spte would allow write.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [kvm-ppc-devel] [PATCH] Clean up KVM/QEMU interaction

2008-01-29 Thread Hollis Blanchard
On Tue, 2008-01-29 at 16:46 -0600, Anthony Liguori wrote:
> The following patch eliminates almost all uses of #ifdef USE_KVM by
> introducing a kvm_enabled() macro.  If USE_KVM is set, this macro
> evaluates to kvm_allowed. If USE_KVM isn't set, the macro evaluates to
> 0.

This is badly needed IMHO. Qemu seems to conform to the broken window
theory...

-- 
Hollis Blanchard
IBM Linux Technology Center


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH]: Fix memory corruption in-kernel IOAPIC emulation

2008-01-29 Thread Chris Lalancette
All,
 Attached is a patch that fixes the first (of at least a couple) migration
problem that I am running into.  Basically, using the setup I described in my
last post, I was always getting "Disabling IRQ #11" once the guest reached the
destination side, and then no further activity.  Dumping the APIC on both the
source and destination side revealed something interesting:

Source:
APIC 0x2
 (pad is 0x0
 IOAPIC state:
base_address: 0xfec0
ioregsel: 0x2e
id:   0x0
irr:  0x0
pad:  0x0

Destination:
APIC 0x2
 (pad is 0x38)
 IOAPIC state:
base_address: 0xf2001000
ioregsel: 0x2e
id:   0x0
irr:  0x78872f3d
pad:  0x38

You'll notice that the base_address and irr are completely bogus on the
destination side.  Although KVM_CREATE_IRQCHIP does the right thing on the
destination side when first creating the "incoming" guest, the base_address and
other fields get blown away with bogus data during the restore.  The attached
patch fixes this by only restoring the bits that we know were saved on the
source side (i.e. what's in qemu/hw/apic.c:ioapic_save()).

Signed-off-by: Chris Lalancette <[EMAIL PROTECTED]>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f94a0b..b07ea3a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1314,6 +1314,9 @@ static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm, struct kvm_irqchip *chip)
 static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, struct kvm_irqchip *chip)
 {
 	int r;
+	int i;
+	struct kvm_ioapic *kioapic;
+	struct kvm_ioapic_state *uioapic;
 
 	r = 0;
 	switch (chip->chip_id) {
@@ -1328,9 +1331,16 @@ static int kvm_vm_ioctl_set_irqchip(struct kvm *kvm, struct kvm_irqchip *chip)
 			sizeof(struct kvm_pic_state));
 		break;
 	case KVM_IRQCHIP_IOAPIC:
-		memcpy(ioapic_irqchip(kvm),
-			&chip->chip.ioapic,
-			sizeof(struct kvm_ioapic_state));
+		kioapic = ioapic_irqchip(kvm);
+		uioapic = &chip->chip.ioapic;
+
+		kioapic->id = uioapic->id;
+		kioapic->ioregsel = uioapic->ioregsel;
+
+		for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+			kioapic->redirtbl[i].bits = uioapic->redirtbl[i].bits;
+		}
+
 		break;
 	default:
 		r = -EINVAL;
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Use CONFIG_PREEMPT_NOTIFIERS around struct preempt_notifier

2008-01-29 Thread Chris Lalancette
Hollis Blanchard wrote:
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -67,7 +67,9 @@ void kvm_io_bus_register_dev(struct kvm_
>  
>  struct kvm_vcpu {
>   struct kvm *kvm;
> +#ifdef CONFIG_PREEMPT_NOTIFIERS
>   struct preempt_notifier preempt_notifier;
> +#endif
>   int vcpu_id;
>   struct mutex mutex;
>   int   cpu;

Hm, this causes my build to fail on x86_64:

make -C /lib/modules/2.6.23.8-63.fc8/build M=`pwd` "$@"
make[2]: Entering directory `/usr/src/kernels/2.6.23.8-63.fc8-x86_64'
  LD  /tmp/kvm-userspace/kernel/built-in.o
  CC [M]  /tmp/kvm-userspace/kernel/svm.o
  CC [M]  /tmp/kvm-userspace/kernel/vmx.o
  CC [M]  /tmp/kvm-userspace/kernel/vmx-debug.o
  CC [M]  /tmp/kvm-userspace/kernel/kvm_main.o
/tmp/kvm-userspace/kernel/kvm_main.c: In function ‘vcpu_load’:
/tmp/kvm-userspace/kernel/kvm_main.c:82: error: ‘struct kvm_vcpu’ has no member
named ‘preempt_notifier’
/tmp/kvm-userspace/kernel/kvm_main.c: In function ‘vcpu_put’:
/tmp/kvm-userspace/kernel/kvm_main.c:91: error: ‘struct kvm_vcpu’ has no member
named ‘preempt_notifier’
/tmp/kvm-userspace/kernel/kvm_main.c: In function ‘kvm_vm_ioctl_create_vcpu’:
/tmp/kvm-userspace/kernel/kvm_main.c:749: error: ‘struct kvm_vcpu’ has no member
named ‘preempt_notifier’
/tmp/kvm-userspace/kernel/kvm_main.c: In function ‘preempt_notifier_to_vcpu’:
/tmp/kvm-userspace/kernel/kvm_main.c:1284: error: ‘struct kvm_vcpu’ has no
member named ‘preempt_notifier’
/tmp/kvm-userspace/kernel/kvm_main.c:1284: warning: type defaults to ‘int’ in
declaration of ‘__mptr’
/tmp/kvm-userspace/kernel/kvm_main.c:1284: warning: initialization from
incompatible pointer type
/tmp/kvm-userspace/kernel/kvm_main.c:1284: error: ‘struct kvm_vcpu’ has no
member named ‘preempt_notifier’
make[3]: *** [/tmp/kvm-userspace/kernel/kvm_main.o] Error 1
make[2]: *** [_module_/tmp/kvm-userspace/kernel] Error 2
make[2]: Leaving directory `/usr/src/kernels/2.6.23.8-63.fc8-x86_64'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/tmp/kvm-userspace/kernel'
make: *** [kernel] Error 2

Reverting this patch makes the build succeed again.

Chris Lalancette

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH] Remove unnecessary linux/kvm.h include

2008-01-29 Thread Anthony Liguori
This removes an unnecessary include of linux/kvm.h which happens to silence
a warning introduced by my previous patch :-)

Signed-off-by: Anthony Liguori <[EMAIL PROTECTED]>

diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index d798841..048054b 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -18,12 +18,6 @@
 #define __user /* temporary, until installed via make headers_install */
 #endif
 
-#if defined(__i386__) || defined(__x86_64__)
-#define CONFIG_X86
-#endif
-
-#include 
-
 #define EXPECTED_KVM_API_VERSION 12
 
 #if EXPECTED_KVM_API_VERSION != KVM_API_VERSION

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Andrea Arcangeli
On Tue, Jan 29, 2008 at 02:55:56PM -0800, Christoph Lameter wrote:
> On Tue, 29 Jan 2008, Andrea Arcangeli wrote:
> 
> > But now I think there may be an issue with a third thread that may
> > show unsafe the removal of invalidate_page from ptep_clear_flush.
> > 
> > A third thread writing to a page through the linux-pte and the guest
> > VM writing to the same page through the sptes, will be writing on the
> > same physical page concurrently and using an userspace spinlock w/o
> > ever entering the kernel. With your patch that invalidate_range after
> > dropping the PT lock, the third thread may start writing on the new
> > page, when the guest is still writing to the old page through the
> > sptes. While this couldn't happen with my patch.
> 
> A user space spinlock plays into this??? That is irrelevant to the kernel. 
> And we are discussing "your" placement of the invalidate_range not mine.

With "my" code, invalidate_range wasn't placed there at all, my
modification to ptep_clear_flush already covered it in a automatic
way, grep from the word fremap in my latest patch you won't find it,
like you won't find any change to do_wp_page. Not sure why you keep
thinking I added those invalidate_range when infact you did.

The user space spinlock plays also in declaring rdtscp unworkable to
provide a monotone vgettimeofday w/o kernel locking.

My patch by calling invalidate_page inside ptep_clear_flush guaranteed
that both the thread writing through sptes and the thread writing
through linux ptes, couldn't possibly simultaneously write to two
different physical pages.

Your patch allows the thread writing through linux-pte to write to a
new populated page while the old thread writing through sptes still
writes to the old page. Is that safe? I don't know for sure. The fact
the physical page backing the virtual address could change back and
forth, perhaps invalidates the theory that somebody could possibly do
some useful locking out of it relaying on all threads seeing the same
physical page at the same time.

Anyway as long as invalidate_page/range happens after ptep_clear_flush
things are mostly ok.

> This is the scenario that I described before. You just need two threads.
> One thread is in do_wp_page and the other is writing through the spte. 
> We are in do_wp_page. Meaning the page is not writable. The writer will 

Actually above I was describing remap_file_pages not do_wp_page.

> have to take fault which will properly serialize access. It a bug if the 
> spte would allow write.

In that scenario because write is forbidden (unlike remap_file_pages)
like you said things should be ok. The spte reader will eventually see
the updates happening in the new page, as long as the spte invalidate
happens after ptep_clear_flush (i.e. with my incremental fix applied
to your code, or with my latest patch).

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Andrea Arcangeli
On Tue, Jan 29, 2008 at 02:39:00PM -0800, Christoph Lameter wrote:
> If it does not run in write mode then concurrent faults are permissible 
> while we remap pages. Weird. Maybe we better handle this like individual
> page operations? Put the invalidate_page back into zap_pte. But then there 
> would be no callback w/o lock as required by Robin. Doing the 

The Robin requirements and the need to schedule are the source of the
complications indeed.

I posted all the KVM patches using mmu notifiers, today I reposted the
ones to work with your V2 (which crashes my host unlike my last
simpler mmu notifier patch but I also changed a few other variable
besides your mmu notifier changes, so I can't yet be sure it's a bug
in your V2, and the SMP regressions I fixed so far sure can't explain
the crashes because my KVM setup could never run in do_wp_page nor
remap_file_pages so it's something else I need to find ASAP).

Robin, if you don't mind, could you please post or upload somewhere
your GPLv2 code that registers itself in Christoph's V2 notifiers? Or
is it top secret? I wouldn't mind to have a look so I can better
understand what's the exact reason you're sleeping besides attempting
GFP_KERNEL allocations. Thanks!

> invalidate_range after populate allows access to memory for which ptes 
> were zapped and the refcount was released.

The last refcount is released by the invalidate_range itself.
 
> > All pins are gone by the time invalidate_page/range returns. But there
> > is no critical section between invalidate_page and the _later_
> > ptep_clear_flush. So get_user_pages is free to run and take the PT
> > lock before the ptep_clear_flush, find the linux pte still
> > instantiated, and to create a new spte, before ptep_clear_flush runs.
> 
> Hmmm... Right. Did not consider get_user_pages. A write to the page that 
> is not marked dirty would typically require a fault that will serialize.

The pte is already marked dirty (and this is the case only for
get_user_pages, regular linux writes don't fault unless it's
explicitly writeprotect, which is mandatory in a few archs, x86 not).

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [kvm-ppc-devel] [PATCH] Use CONFIG_PREEMPT_NOTIFIERS around struct preempt_notifier

2008-01-29 Thread Hollis Blanchard
On Tue, 2008-01-29 at 18:22 -0500, Chris Lalancette wrote:
> Hollis Blanchard wrote:
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -67,7 +67,9 @@ void kvm_io_bus_register_dev(struct kvm_
> >  
> >  struct kvm_vcpu {
> > struct kvm *kvm;
> > +#ifdef CONFIG_PREEMPT_NOTIFIERS
> > struct preempt_notifier preempt_notifier;
> > +#endif
> > int vcpu_id;
> > struct mutex mutex;
> > int   cpu;
> 
> Hm, this causes my build to fail on x86_64:
> 
> make -C /lib/modules/2.6.23.8-63.fc8/build M=`pwd` "$@"
> make[2]: Entering directory `/usr/src/kernels/2.6.23.8-63.fc8-x86_64'
>   LD  /tmp/kvm-userspace/kernel/built-in.o
>   CC [M]  /tmp/kvm-userspace/kernel/svm.o
>   CC [M]  /tmp/kvm-userspace/kernel/vmx.o
>   CC [M]  /tmp/kvm-userspace/kernel/vmx-debug.o
>   CC [M]  /tmp/kvm-userspace/kernel/kvm_main.o
> /tmp/kvm-userspace/kernel/kvm_main.c: In function ‘vcpu_load’:
> /tmp/kvm-userspace/kernel/kvm_main.c:82: error: ‘struct kvm_vcpu’ has no 
> member
> named ‘preempt_notifier’
> /tmp/kvm-userspace/kernel/kvm_main.c: In function ‘vcpu_put’:
> /tmp/kvm-userspace/kernel/kvm_main.c:91: error: ‘struct kvm_vcpu’ has no 
> member
> named ‘preempt_notifier’
> /tmp/kvm-userspace/kernel/kvm_main.c: In function ‘kvm_vm_ioctl_create_vcpu’:
> /tmp/kvm-userspace/kernel/kvm_main.c:749: error: ‘struct kvm_vcpu’ has no 
> member
> named ‘preempt_notifier’
> /tmp/kvm-userspace/kernel/kvm_main.c: In function ‘preempt_notifier_to_vcpu’:
> /tmp/kvm-userspace/kernel/kvm_main.c:1284: error: ‘struct kvm_vcpu’ has no
> member named ‘preempt_notifier’
> /tmp/kvm-userspace/kernel/kvm_main.c:1284: warning: type defaults to ‘int’ in
> declaration of ‘__mptr’
> /tmp/kvm-userspace/kernel/kvm_main.c:1284: warning: initialization from
> incompatible pointer type
> /tmp/kvm-userspace/kernel/kvm_main.c:1284: error: ‘struct kvm_vcpu’ has no
> member named ‘preempt_notifier’
> make[3]: *** [/tmp/kvm-userspace/kernel/kvm_main.o] Error 1
> make[2]: *** [_module_/tmp/kvm-userspace/kernel] Error 2
> make[2]: Leaving directory `/usr/src/kernels/2.6.23.8-63.fc8-x86_64'
> make[1]: *** [all] Error 2
> make[1]: Leaving directory `/tmp/kvm-userspace/kernel'
> make: *** [kernel] Error 2

This seems to be an artifact of the hackage in external-module-compat.h,
since you're building with a pre-PREEMPT_NOTIFIERS kernel.

Maybe adding 
#define CONFIG_PREEMPT_NOTIFIERS
after
#ifndef CONFIG_PREEMPT_NOTIFIERS
in external-module-compat.h would "fix" it, since kvm_host.h would pick
up the define when it's included later.

The other hackful alternative would be this in kvm_host.h:
#ifdef CONFIG_PREEMPT_NOTIFIERS
struct preempt_notifier preempt_notifier;
#else
long preempt_notifier;
#endif

-- 
Hollis Blanchard
IBM Linux Technology Center


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH] Remove -DCONFIG_X86 from qemu_cflags

2008-01-29 Thread Anthony Liguori
This is not really going to work out if we want to merge with QEMU.  We can't
have magic in QEMU that relies on some external define being set.

Since the define is needed by linux/kvm.h the solution is to define it as
needed before including linux/kvm.h.  This probably depends on my previous
patch.

Signed-off-by: Anthony Liguori <[EMAIL PROTECTED]>

diff --git a/configure b/configure
index 6b20c2f..418dbea 100755
--- a/configure
+++ b/configure
@@ -94,7 +94,7 @@ fi
 #set parameters compiling
 if [ "$arch" = "i386" -o "$arch" = "x86_64" ]; then
 target_exec="x86_64-softmmu"
-qemu_cflags="$qemu_cflags -DCONFIG_X86"
+qemu_cflags="$qemu_cflags"
 fi
 
 if [ "$arch" = "ia64" ]; then
diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index 45f58d6..d798841 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -18,6 +18,10 @@
 #define __user /* temporary, until installed via make headers_install */
 #endif
 
+#if defined(__i386__) || defined(__x86_64__)
+#define CONFIG_X86
+#endif
+
 #include 
 
 #define EXPECTED_KVM_API_VERSION 12
diff --git a/libkvm/libkvm.h b/libkvm/libkvm.h
index 34d188b..097f520 100644
--- a/libkvm/libkvm.h
+++ b/libkvm/libkvm.h
@@ -11,6 +11,10 @@
 #define __user /* temporary, until installed via make headers_install */
 #endif
 
+#if defined(__i386__) || defined(__x86_64__)
+#define CONFIG_X86
+#endif
+
 #include 
 
 #include 
diff --git a/qemu/hw/cirrus_vga.c b/qemu/hw/cirrus_vga.c
index 1915c73..f559def 100644
--- a/qemu/hw/cirrus_vga.c
+++ b/qemu/hw/cirrus_vga.c
@@ -2634,7 +2634,8 @@ int unset_vram_mapping(unsigned long begin, unsigned long 
end)
 
 return 0;
 }
-#ifdef CONFIG_X86
+
+#if defined(TARGET_I386)
 static void kvm_update_vga_alias(CirrusVGAState *s, int ok, int bank,
  unsigned long phys_addr)
 {
@@ -2675,7 +2676,7 @@ static void kvm_update_vga_aliases(CirrusVGAState *s, int 
ok)
 static void cirrus_update_memory_access(CirrusVGAState *s)
 {
 unsigned mode;
-#ifdef CONFIG_X86
+#if defined(TARGET_I386)
 int want_vga_alias = 0;
 #endif
 
@@ -2708,7 +2709,7 @@ static void cirrus_update_memory_access(CirrusVGAState *s)
 s->map_addr = s->cirrus_lfb_addr;
 s->map_end = s->cirrus_lfb_end;
 }
-#ifdef CONFIG_X86
+#if defined(TARGET_I386)
if (kvm_enabled()
&& !(s->cirrus_srcptr != s->cirrus_srcptr_end)
&& !((s->sr[0x07] & 0x01) == 0)
@@ -2740,7 +2741,7 @@ static void cirrus_update_memory_access(CirrusVGAState *s)
 s->cirrus_linear_write[2] = cirrus_linear_writel;
 }
 }
-#if defined(CONFIG_X86)
+#if defined(TARGET_I386)
 kvm_update_vga_aliases(s, want_vga_alias);
 #endif
 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Christoph Lameter
On Wed, 30 Jan 2008, Andrea Arcangeli wrote:

> On Wed, Jan 30, 2008 at 01:00:39AM +0100, Andrea Arcangeli wrote:
> > get_user_pages, regular linux writes don't fault unless it's
> > explicitly writeprotect, which is mandatory in a few archs, x86 not).
> 
> actually get_user_pages doesn't fault either but it calls into
> set_page_dirty, however get_user_pages (unlike a userland-write) at
> least requires mmap_sem in read mode and the PT lock as serialization,
> userland writes don't, they just go ahead and mark the pte in hardware
> w/o faults. Anyway anonymous memory these days always mapped with
> dirty bit set regardless, even for read-faults, after Nick finally
> rightfully cleaned up the zero-page trick.

That is only partially true. pte are created wronly in order to track 
dirty state these days. The first write will lead to a fault that switches 
the pte to writable. When the page undergoes writeback the page again 
becomes write protected. Thus our need to effectively deal with 
page_mkclean.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Jack Steiner
On Tue, Jan 29, 2008 at 04:20:50PM -0800, Christoph Lameter wrote:
> On Wed, 30 Jan 2008, Andrea Arcangeli wrote:
> 
> > > invalidate_range after populate allows access to memory for which ptes 
> > > were zapped and the refcount was released.
> > 
> > The last refcount is released by the invalidate_range itself.
> 
> That is true for your implementation and to address Robin's issues. Jack: 
> Is that true for the GRU?

I'm not sure I understand the question. The GRU never (currently) takes
a reference on a page. It has no mechanism for tracking pages that
were exported to the external TLBs.

--- jack

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Andrea Arcangeli
On Wed, Jan 30, 2008 at 01:00:39AM +0100, Andrea Arcangeli wrote:
> get_user_pages, regular linux writes don't fault unless it's
> explicitly writeprotect, which is mandatory in a few archs, x86 not).

actually get_user_pages doesn't fault either but it calls into
set_page_dirty, however get_user_pages (unlike a userland-write) at
least requires mmap_sem in read mode and the PT lock as serialization,
userland writes don't, they just go ahead and mark the pte in hardware
w/o faults. Anyway anonymous memory these days always mapped with
dirty bit set regardless, even for read-faults, after Nick finally
rightfully cleaned up the zero-page trick.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Christoph Lameter
On Wed, 30 Jan 2008, Andrea Arcangeli wrote:

> > A user space spinlock plays into this??? That is irrelevant to the kernel. 
> > And we are discussing "your" placement of the invalidate_range not mine.
> 
> With "my" code, invalidate_range wasn't placed there at all, my
> modification to ptep_clear_flush already covered it in a automatic
> way, grep from the word fremap in my latest patch you won't find it,
> like you won't find any change to do_wp_page. Not sure why you keep
> thinking I added those invalidate_range when infact you did.

Well you moved the code at minimum. Hmmm... according 
http://marc.info/?l=linux-kernel&m=120114755620891&w=2 it was Robin.

> The user space spinlock plays also in declaring rdtscp unworkable to
> provide a monotone vgettimeofday w/o kernel locking.

No idea what you are talking about.

> My patch by calling invalidate_page inside ptep_clear_flush guaranteed
> that both the thread writing through sptes and the thread writing
> through linux ptes, couldn't possibly simultaneously write to two
> different physical pages.

But then the ptep_clear_flush will issue invalidate_page() for ranges 
that were already covered by invalidate_range(). There are multiple calls 
to clear the same spte.
>
> Your patch allows the thread writing through linux-pte to write to a
> new populated page while the old thread writing through sptes still
> writes to the old page. Is that safe? I don't know for sure. The fact
> the physical page backing the virtual address could change back and
> forth, perhaps invalidates the theory that somebody could possibly do
> some useful locking out of it relaying on all threads seeing the same
> physical page at the same time.

This is referrring to the remap issue not do_wp_page right?

> Actually above I was describing remap_file_pages not do_wp_page.

Ok.

The serialization of remap_file_pages does not seem that critical since we 
only take a read lock on mmap_sem here. There may already be concurrent 
access to pages from other processors while the ptes are remapped. So 
there is already some overlap.

We could take mmap_sem there writably and keep it writably for the case 
that we have an mmu notifier in the mm.



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Christoph Lameter
On Tue, 29 Jan 2008, Jack Steiner wrote:

> > That is true for your implementation and to address Robin's issues. Jack: 
> > Is that true for the GRU?
> 
> I'm not sure I understand the question. The GRU never (currently) takes
> a reference on a page. It has no mechanism for tracking pages that
> were exported to the external TLBs.

Thats what I was looking for. Thanks. KVM takes a refcount and so does 
XPmem.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Christoph Lameter
On Wed, 30 Jan 2008, Andrea Arcangeli wrote:

> > invalidate_range after populate allows access to memory for which ptes 
> > were zapped and the refcount was released.
> 
> The last refcount is released by the invalidate_range itself.

That is true for your implementation and to address Robin's issues. Jack: 
Is that true for the GRU?


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Andrea Arcangeli
On Tue, Jan 29, 2008 at 04:22:46PM -0800, Christoph Lameter wrote:
> That is only partially true. pte are created wronly in order to track 
> dirty state these days. The first write will lead to a fault that switches 
> the pte to writable. When the page undergoes writeback the page again 
> becomes write protected. Thus our need to effectively deal with 
> page_mkclean.

Well I was talking about anonymous memory.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Remove unnecessary linux/kvm.h include

2008-01-29 Thread Anthony Liguori
This patch sucks.  Let me finish up playing around with this stuff and 
I'll send out a better one.

Regards,

Anthony Liguori

Anthony Liguori wrote:
> This removes an unnecessary include of linux/kvm.h which happens to silence
> a warning introduced by my previous patch :-)
>
> Signed-off-by: Anthony Liguori <[EMAIL PROTECTED]>
>
> diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
> index d798841..048054b 100644
> --- a/libkvm/libkvm.c
> +++ b/libkvm/libkvm.c
> @@ -18,12 +18,6 @@
>  #define __user /* temporary, until installed via make headers_install */
>  #endif
>
> -#if defined(__i386__) || defined(__x86_64__)
> -#define CONFIG_X86
> -#endif
> -
> -#include 
> -
>  #define EXPECTED_KVM_API_VERSION 12
>
>  #if EXPECTED_KVM_API_VERSION != KVM_API_VERSION
>   


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Use CONFIG_PREEMPT_NOTIFIERS around struct preempt_notifier

2008-01-29 Thread Hollis Blanchard

On Tue, 2008-01-29 at 18:22 -0500, Chris Lalancette wrote:
> Hollis Blanchard wrote:
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -67,7 +67,9 @@ void kvm_io_bus_register_dev(struct kvm_
> >  
> >  struct kvm_vcpu {
> > struct kvm *kvm;
> > +#ifdef CONFIG_PREEMPT_NOTIFIERS
> > struct preempt_notifier preempt_notifier;
> > +#endif
> > int vcpu_id;
> > struct mutex mutex;
> > int   cpu;
> 
> Hm, this causes my build to fail on x86_64:
> 
> make -C /lib/modules/2.6.23.8-63.fc8/build M=`pwd` "$@"
> make[2]: Entering directory `/usr/src/kernels/2.6.23.8-63.fc8-x86_64'
>   LD  /tmp/kvm-userspace/kernel/built-in.o
>   CC [M]  /tmp/kvm-userspace/kernel/svm.o
>   CC [M]  /tmp/kvm-userspace/kernel/vmx.o
>   CC [M]  /tmp/kvm-userspace/kernel/vmx-debug.o
>   CC [M]  /tmp/kvm-userspace/kernel/kvm_main.o
> /tmp/kvm-userspace/kernel/kvm_main.c: In function ‘vcpu_load’:
> /tmp/kvm-userspace/kernel/kvm_main.c:82: error: ‘struct kvm_vcpu’ has no 
> member
> named ‘preempt_notifier’
> /tmp/kvm-userspace/kernel/kvm_main.c: In function ‘vcpu_put’:
> /tmp/kvm-userspace/kernel/kvm_main.c:91: error: ‘struct kvm_vcpu’ has no 
> member
> named ‘preempt_notifier’
> /tmp/kvm-userspace/kernel/kvm_main.c: In function ‘kvm_vm_ioctl_create_vcpu’:
> /tmp/kvm-userspace/kernel/kvm_main.c:749: error: ‘struct kvm_vcpu’ has no 
> member
> named ‘preempt_notifier’
> /tmp/kvm-userspace/kernel/kvm_main.c: In function ‘preempt_notifier_to_vcpu’:
> /tmp/kvm-userspace/kernel/kvm_main.c:1284: error: ‘struct kvm_vcpu’ has no
> member named ‘preempt_notifier’
> /tmp/kvm-userspace/kernel/kvm_main.c:1284: warning: type defaults to ‘int’ in
> declaration of ‘__mptr’
> /tmp/kvm-userspace/kernel/kvm_main.c:1284: warning: initialization from
> incompatible pointer type
> /tmp/kvm-userspace/kernel/kvm_main.c:1284: error: ‘struct kvm_vcpu’ has no
> member named ‘preempt_notifier’
> make[3]: *** [/tmp/kvm-userspace/kernel/kvm_main.o] Error 1
> make[2]: *** [_module_/tmp/kvm-userspace/kernel] Error 2
> make[2]: Leaving directory `/usr/src/kernels/2.6.23.8-63.fc8-x86_64'
> make[1]: *** [all] Error 2
> make[1]: Leaving directory `/tmp/kvm-userspace/kernel'
> make: *** [kernel] Error 2
> 
> Reverting this patch makes the build succeed again.
> 
> Chris Lalancette

Actually, I think this should do the trick:


Always use CONFIG_PREEMPT_NOTIFIERS in the external module hack.

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>

diff --git a/kernel/hack-module.awk b/kernel/hack-module.awk
--- a/kernel/hack-module.awk
+++ b/kernel/hack-module.awk
@@ -42,6 +42,8 @@
 
 { sub(/linux\/mm_types\.h/, "linux/mm.h") }
 
+/#ifdef CONFIG_PREEMPT_NOTIFIERS/ { $0 = "#if 1" }
+
 { print }
 
 /kvm_x86_ops->run/ {


-- 
Hollis Blanchard
IBM Linux Technology Center


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [kvm-ppc-devel] [PATCH] Clean up KVM/QEMU interaction

2008-01-29 Thread Zhang, Xiantao
Anthony Liguori wrote:
> This patch attempts to clean up the interactions between KVM and
> QEMU.  Sorry 
> for such a big patch, but I don't think there's a better way to
> approach this 
> such that it's still bisect friendly.  I think this is most of what's
> needed to 
> get basic KVM support into QEMU though.
> 
> Right now, there's a mix of #ifdef USE_KVM, if (kvm_allowed), and
> various 
> extern declarations.  It's all pretty fugly and there's a lot of
> mistakes due 
> to it.
> 
> The following patch eliminates almost all uses of #ifdef USE_KVM by
> introducing 
> a kvm_enabled() macro.  If USE_KVM is set, this macro evaluates to
> kvm_allowed. 
> If USE_KVM isn't set, the macro evaluates to 0.
> 
> Since GCC eliminates if (0) blocks, this is just as good as using
> #ifdef.  By 
> making sure that we never call into libkvm directly from QEMU, we can
> also just 
> not link qemu-kvm when USE_KVM isn't set instead of having the entire
> file 
> wrapped in a USE_KVM.
> 
> We also change the --enable-kvm configure option to --disable-kvm
> since KVM is 
> enabled by default.
> 
> I've tested this patch on x86 with 32-bit and 64-bit Linux guests and
> a 32-bit 
> Windows guest.  I've also tested with USE_KVM not set.  Jerone has
> also 
> verified that it doesn't PPC.  My apologies if it breaks ia64 but I
> have no 
> way of testing that.

Hi, Anthony 
Good patch indeed! I have checked ia64 side, and It shouldn't
break ia64. Thanks!
Xiantao

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 5/6] mmu_notifier: Callbacks for xip_filemap.c

2008-01-29 Thread Christoph Lameter
Problem for external rmaps: There is no pagelock held on the page.

Signed-off-by: Robin Holt <[EMAIL PROTECTED]>

---
 mm/filemap_xip.c |5 +
 1 file changed, 5 insertions(+)

Index: linux-2.6/mm/filemap_xip.c
===
--- linux-2.6.orig/mm/filemap_xip.c 2008-01-25 19:39:04.0 -0800
+++ linux-2.6/mm/filemap_xip.c  2008-01-25 19:39:06.0 -0800
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -183,6 +184,9 @@ __xip_unmap (struct address_space * mapp
if (!page)
return;
 
+   if (PageExternalRmap(page))
+   mmu_rmap_notifier(invalidate_page, page);
+
spin_lock(&mapping->i_mmap_lock);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
mm = vma->vm_mm;
@@ -194,6 +198,7 @@ __xip_unmap (struct address_space * mapp
/* Nuke the page table entry. */
flush_cache_page(vma, address, pte_pfn(*pte));
pteval = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
page_remove_rmap(page, vma);
dec_mm_counter(mm, file_rss);
BUG_ON(pte_dirty(pteval));

-- 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 3/6] mmu_notifier: invalidate_page callbacks for subsystems with rmap

2008-01-29 Thread Christoph Lameter
Callbacks to remove individual pages if the subsystem has an
rmap capability. The pagelock is held but no spinlocks are held.
The refcount of the page is elevated so that dropping the refcount
in the subsystem will not directly free the page.

The callbacks occur after the Linux rmaps have been walked.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/rmap.c |6 ++
 1 file changed, 6 insertions(+)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c2008-01-25 14:24:19.0 -0800
+++ linux-2.6/mm/rmap.c 2008-01-25 14:24:38.0 -0800
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -473,6 +474,8 @@ int page_mkclean(struct page *page)
struct address_space *mapping = page_mapping(page);
if (mapping) {
ret = page_mkclean_file(mapping, page);
+   if (unlikely(PageExternalRmap(page)))
+   mmu_rmap_notifier(invalidate_page, page);
if (page_test_dirty(page)) {
page_clear_dirty(page);
ret = 1;
@@ -971,6 +974,9 @@ int try_to_unmap(struct page *page, int 
else
ret = try_to_unmap_file(page, migration);
 
+   if (unlikely(PageExternalRmap(page)))
+   mmu_rmap_notifier(invalidate_page, page);
+
if (!page_mapped(page))
ret = SWAP_SUCCESS;
return ret;

-- 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 6/6] mmu_notifier: Add invalidate_all()

2008-01-29 Thread Christoph Lameter
when a task exits we can remove all external pts at once. At that point the
extern mmu may also unregister itself from the mmu notifier chain to avoid
future calls.

Note the complications because of RCU. Other processors may not see that the
notifier was unlinked until a quiescent period has passed!

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 include/linux/mmu_notifier.h |4 
 mm/mmap.c|1 +
 2 files changed, 5 insertions(+)

Index: linux-2.6/include/linux/mmu_notifier.h
===
--- linux-2.6.orig/include/linux/mmu_notifier.h 2008-01-28 14:02:18.0 
-0800
+++ linux-2.6/include/linux/mmu_notifier.h  2008-01-28 14:15:49.0 
-0800
@@ -62,6 +62,10 @@ struct mmu_notifier_ops {
struct mm_struct *mm,
unsigned long address);
 
+   /* Dummy needed because the mmu_notifier() macro requires it */
+   void (*invalidate_all)(struct mmu_notifier *mn, struct mm_struct *mm,
+   int dummy);
+
/*
 * lock indicates that the function is called under spinlock.
 */
Index: linux-2.6/mm/mmap.c
===
--- linux-2.6.orig/mm/mmap.c2008-01-28 14:15:49.0 -0800
+++ linux-2.6/mm/mmap.c 2008-01-28 14:15:49.0 -0800
@@ -2034,6 +2034,7 @@ void exit_mmap(struct mm_struct *mm)
unsigned long end;
 
/* mm's last user has gone, and its about to be pulled down */
+   mmu_notifier(invalidate_all, mm, 0);
arch_exit_mmap(mm);
 
lru_add_drain();

-- 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 4/6] MMU notifier: invalidate_page callbacks using Linux rmaps

2008-01-29 Thread Christoph Lameter
These notifiers here use the Linux rmaps to perform the callbacks.
In order to walk the rmaps locks must be held. Callbacks can therefore
only operate in an atomic context.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/rmap.c |   12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c2008-01-29 16:58:25.0 -0800
+++ linux-2.6/mm/rmap.c 2008-01-29 16:58:39.0 -0800
@@ -285,7 +285,8 @@ static int page_referenced_one(struct pa
if (!pte)
goto out;
 
-   if (ptep_clear_flush_young(vma, address, pte))
+   if (ptep_clear_flush_young(vma, address, pte) |
+   mmu_notifier_age_page(mm, address))
referenced++;
 
/* Pretend the page is referenced if the task has the
@@ -435,6 +436,7 @@ static int page_mkclean_one(struct page 
 
flush_cache_page(vma, address, pte_pfn(*pte));
entry = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
entry = pte_wrprotect(entry);
entry = pte_mkclean(entry);
set_pte_at(mm, address, pte, entry);
@@ -680,7 +682,8 @@ static int try_to_unmap_one(struct page 
 * skipped over this mm) then we should reactivate it.
 */
if (!migration && ((vma->vm_flags & VM_LOCKED) ||
-   (ptep_clear_flush_young(vma, address, pte {
+   (ptep_clear_flush_young(vma, address, pte) |
+   mmu_notifier_age_page(mm, address {
ret = SWAP_FAIL;
goto out_unmap;
}
@@ -688,6 +691,7 @@ static int try_to_unmap_one(struct page 
/* Nuke the page table entry. */
flush_cache_page(vma, address, page_to_pfn(page));
pteval = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
 
/* Move the dirty bit to the physical page now the pte is gone. */
if (pte_dirty(pteval))
@@ -812,12 +816,14 @@ static void try_to_unmap_cluster(unsigne
page = vm_normal_page(vma, address, *pte);
BUG_ON(!page || PageAnon(page));
 
-   if (ptep_clear_flush_young(vma, address, pte))
+   if (ptep_clear_flush_young(vma, address, pte) |
+   mmu_notifier_age_page(mm, address))
continue;
 
/* Nuke the page table entry. */
flush_cache_page(vma, address, pte_pfn(*pte));
pteval = ptep_clear_flush(vma, address, pte);
+   mmu_notifier(invalidate_page, mm, address);
 
/* If nonlinear, store the file page offset in the pte. */
if (page->index != linear_page_index(vma, address))

-- 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [patch 1/6] mmu_notifier: Core code

2008-01-29 Thread Christoph Lameter
Core code for mmu notifiers.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>
Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>

---
 include/linux/list.h |   14 ++
 include/linux/mm_types.h |6 +
 include/linux/mmu_notifier.h |  210 +++
 include/linux/page-flags.h   |   10 ++
 kernel/fork.c|2 
 mm/Kconfig   |4 
 mm/Makefile  |1 
 mm/mmap.c|2 
 mm/mmu_notifier.c|  101 
 9 files changed, 350 insertions(+)

Index: linux-2.6/include/linux/mm_types.h
===
--- linux-2.6.orig/include/linux/mm_types.h 2008-01-29 16:56:33.0 
-0800
+++ linux-2.6/include/linux/mm_types.h  2008-01-29 16:56:36.0 -0800
@@ -153,6 +153,10 @@ struct vm_area_struct {
 #endif
 };
 
+struct mmu_notifier_head {
+   struct hlist_head head;
+};
+
 struct mm_struct {
struct vm_area_struct * mmap;   /* list of VMAs */
struct rb_root mm_rb;
@@ -219,6 +223,8 @@ struct mm_struct {
/* aio bits */
rwlock_tioctx_list_lock;
struct kioctx   *ioctx_list;
+
+   struct mmu_notifier_head mmu_notifier; /* MMU notifier list */
 };
 
 #endif /* _LINUX_MM_TYPES_H */
Index: linux-2.6/include/linux/mmu_notifier.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6/include/linux/mmu_notifier.h  2008-01-29 16:56:36.0 
-0800
@@ -0,0 +1,210 @@
+#ifndef _LINUX_MMU_NOTIFIER_H
+#define _LINUX_MMU_NOTIFIER_H
+
+/*
+ * MMU motifier
+ *
+ * Notifier functions for hardware and software that establishes external
+ * references to pages of a Linux system. The notifier calls ensure that
+ * the external mappings are removed when the Linux VM removes memory ranges
+ * or individual pages from a process.
+ *
+ * These fall into two classes
+ *
+ * 1. mmu_notifier
+ *
+ * These are callbacks registered with an mm_struct. If mappings are
+ * removed from an address space then callbacks are performed.
+ * Spinlocks must be held in order to the walk reverse maps and the
+ * notifications are performed while the spinlock is held.
+ *
+ *
+ * 2. mmu_rmap_notifier
+ *
+ * Callbacks for subsystems that provide their own rmaps. These
+ * need to walk their own rmaps for a page. The invalidate_page
+ * callback is outside of locks so that we are not in a strictly
+ * atomic context (but we may be in a PF_MEMALLOC context if the
+ * notifier is called from reclaim code) and are able to sleep.
+ * Rmap notifiers need an extra page bit and are only available
+ * on 64 bit platforms. It is up to the subsystem to mark pags
+ * as PageExternalRmap as needed to trigger the callbacks. Pages
+ * must be marked dirty if dirty bits are set in the external
+ * pte.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+struct mmu_notifier_ops;
+
+struct mmu_notifier {
+   struct hlist_node hlist;
+   const struct mmu_notifier_ops *ops;
+};
+
+struct mmu_notifier_ops {
+   /*
+* Note: The mmu_notifier structure must be released with
+* call_rcu() since other processors are only guaranteed to
+* see the changes after a quiescent period.
+*/
+   void (*release)(struct mmu_notifier *mn,
+   struct mm_struct *mm);
+
+   int (*age_page)(struct mmu_notifier *mn,
+   struct mm_struct *mm,
+   unsigned long address);
+
+   void (*invalidate_page)(struct mmu_notifier *mn,
+   struct mm_struct *mm,
+   unsigned long address);
+
+   /*
+* lock indicates that the function is called under spinlock.
+*/
+   void (*invalidate_range)(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long start, unsigned long end,
+int lock);
+};
+
+struct mmu_rmap_notifier_ops;
+
+struct mmu_rmap_notifier {
+   struct hlist_node hlist;
+   const struct mmu_rmap_notifier_ops *ops;
+};
+
+struct mmu_rmap_notifier_ops {
+   /*
+* Called with the page lock held after ptes are modified or removed
+* so that a subsystem with its own rmap's can remove remote ptes
+* mapping a page.
+*/
+   void (*invalidate_page)(struct mmu_rmap_notifier *mrn,
+   struct page *page);
+};
+
+#ifdef CONFIG_MMU_NOTIFIER
+
+/*
+ * Must hold the mmap_sem for write.
+ *
+ * RCU is used to traverse the list. A quiescent period needs to pass
+ * before the notifier is guaranteed to be visible to all threads
+ */
+extern void __mmu_notifier_register(struct mmu_notifier *mn,
+   

[kvm-devel] [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-01-29 Thread Christoph Lameter
The invalidation of address ranges in a mm_struct needs to be
performed when pages are removed or permissions etc change.
Most of the VM address space changes can use the range invalidate
callback.

invalidate_range() is generally called with mmap_sem held but
no spinlocks are active. If invalidate_range() is called with
locks held then we pass a flag into invalidate_range()

Comments state that mmap_sem must be held for
remap_pfn_range() but various drivers do not seem to do this.

Signed-off-by: Andrea Arcangeli <[EMAIL PROTECTED]>
Signed-off-by: Robin Holt <[EMAIL PROTECTED]>
Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

---
 mm/fremap.c  |2 ++
 mm/hugetlb.c |2 ++
 mm/memory.c  |   11 +--
 mm/mmap.c|1 +
 4 files changed, 14 insertions(+), 2 deletions(-)

Index: linux-2.6/mm/fremap.c
===
--- linux-2.6.orig/mm/fremap.c  2008-01-29 16:56:33.0 -0800
+++ linux-2.6/mm/fremap.c   2008-01-29 16:59:24.0 -0800
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -212,6 +213,7 @@ asmlinkage long sys_remap_file_pages(uns
}
 
err = populate_range(mm, vma, start, size, pgoff);
+   mmu_notifier(invalidate_range, mm, start, start + size, 0);
if (!err && !(flags & MAP_NONBLOCK)) {
if (unlikely(has_write_lock)) {
downgrade_write(&mm->mmap_sem);
Index: linux-2.6/mm/memory.c
===
--- linux-2.6.orig/mm/memory.c  2008-01-29 16:56:33.0 -0800
+++ linux-2.6/mm/memory.c   2008-01-29 16:59:24.0 -0800
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -891,6 +892,8 @@ unsigned long zap_page_range(struct vm_a
end = unmap_vmas(&tlb, vma, address, end, &nr_accounted, details);
if (tlb)
tlb_finish_mmu(tlb, address, end);
+   mmu_notifier(invalidate_range, mm, address, end,
+   (details ? (details->i_mmap_lock != NULL)  : 0));
return end;
 }
 
@@ -1319,7 +1322,7 @@ int remap_pfn_range(struct vm_area_struc
 {
pgd_t *pgd;
unsigned long next;
-   unsigned long end = addr + PAGE_ALIGN(size);
+   unsigned long start = addr, end = addr + PAGE_ALIGN(size);
struct mm_struct *mm = vma->vm_mm;
int err;
 
@@ -1360,6 +1363,7 @@ int remap_pfn_range(struct vm_area_struc
if (err)
break;
} while (pgd++, addr = next, addr != end);
+   mmu_notifier(invalidate_range, mm, start, end, 0);
return err;
 }
 EXPORT_SYMBOL(remap_pfn_range);
@@ -1443,7 +1447,7 @@ int apply_to_page_range(struct mm_struct
 {
pgd_t *pgd;
unsigned long next;
-   unsigned long end = addr + size;
+   unsigned long start = addr, end = addr + size;
int err;
 
BUG_ON(addr >= end);
@@ -1454,6 +1458,7 @@ int apply_to_page_range(struct mm_struct
if (err)
break;
} while (pgd++, addr = next, addr != end);
+   mmu_notifier(invalidate_range, mm, start, end, 0);
return err;
 }
 EXPORT_SYMBOL_GPL(apply_to_page_range);
@@ -1669,6 +1674,8 @@ gotten:
page_cache_release(old_page);
 unlock:
pte_unmap_unlock(page_table, ptl);
+   mmu_notifier(invalidate_range, mm, address,
+   address + PAGE_SIZE - 1, 0);
if (dirty_page) {
if (vma->vm_file)
file_update_time(vma->vm_file);
Index: linux-2.6/mm/mmap.c
===
--- linux-2.6.orig/mm/mmap.c2008-01-29 16:56:36.0 -0800
+++ linux-2.6/mm/mmap.c 2008-01-29 16:58:15.0 -0800
@@ -1748,6 +1748,7 @@ static void unmap_region(struct mm_struc
free_pgtables(&tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
 next? next->vm_start: 0);
tlb_finish_mmu(tlb, start, end);
+   mmu_notifier(invalidate_range, mm, start, end, 0);
 }
 
 /*
Index: linux-2.6/mm/hugetlb.c
===
--- linux-2.6.orig/mm/hugetlb.c 2008-01-29 16:56:33.0 -0800
+++ linux-2.6/mm/hugetlb.c  2008-01-29 16:58:15.0 -0800
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -763,6 +764,7 @@ void __unmap_hugepage_range(struct vm_ar
}
spin_unlock(&mm->page_table_lock);
flush_tlb_range(vma, start, end);
+   mmu_notifier(invalidate_range, mm, start, end, 1);
list_for_each_entry_safe(page, tmp, &page_list, lru) {
list_del(&page->lru);
put_page(page);

-- 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2

[kvm-devel] [patch 0/6] [RFC] MMU Notifiers V3

2008-01-29 Thread Christoph Lameter
This is a patchset implementing MMU notifier callbacks based on Andrea's
earlier work. These are needed if Linux pages are referenced from something
else than tracked by the rmaps of the kernel. The known immediate users are

KVM (establishes a refcount to the page. External references called spte)

GRU (simple TLB shootdown without refcount. Has its own pagetable/tlb)

XPmem (uses its own reverse mappings and refcount. Remote ptes, Needs
to sleep when sending messages)

Issues:

- Feedback from uses of the callbacks for KVM, RDMA, XPmem and GRU
  Early tests with the GRU were successful.

- Pages may be freed before the external mapping are torn down
  through invalidate_range() if no refcount on the page is taken.
  There is the chance that page content may be visible after
  they have been reallocated (mainly an issue for the GRU that
  takes no refcount).

- invalidate_range() callbacks are sometimes called under i_mmap_lock.
  These need to be dealt with or XPmem needs to be able to work around
  these.

- filemap_xip.c does not follow conventions for Rmap callbacks.
  We could depends on XIP support not being active to avoid the issue.

Things that we leave as is:

- RCU quiescent periods are required on registering and unregistering
  notifiers to guarantee visibility to other processors.
  Currently only mmu_notifier_release() does the correct thing.
  It is up to the user to provide RCU quiescent periods for
  register/unregister functions if they are called outside of the
  ->release method.

Andrea's mmu_notifier #4 -> RFC V1

- Merge subsystem rmap based with Linux rmap based approach
- Move Linux rmap based notifiers out of macro
- Try to account for what locks are held while the notifiers are
  called.
- Develop a patch sequence that separates out the different types of
  hooks so that we can review their use.
- Avoid adding include to linux/mm_types.h
- Integrate RCU logic suggested by Peter.

V1->V2:
- Improve RCU support
- Use mmap_sem for mmu_notifier register / unregister
- Drop invalidate_page from COW, mm/fremap.c and mm/rmap.c since we
  already have invalidate_range() callbacks there.
- Clean compile for !MMU_NOTIFIER
- Isolate filemap_xip strangeness into its own diff
- Pass a the flag to invalidate_range to indicate if a spinlock
  is held.
- Add invalidate_all()

V2->V3:
- Further RCU fixes
- Fixes from Andrea to fixup aging and move invalidate_range() in do_wp_page
  and sys_remap_file_pages() after the pte clearing.

-- 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] Making SLIRP code more 64-bit clean

2008-01-29 Thread Zhang, Xiantao
Scott Pakin wrote:
> The attached patch corrects a bug in qemu/slirp/tcp_var.h that defines
> the seg_next field in struct tcpcb to be 32 bits wide regardless of
> 32/64-bitness.  seg_next is assigned a pointer value in
> qemu/slirp/tcp_subr.c, then cast back to a pointer in
> qemu/slirp/tcp_input.c and dereferenced.  That produces a SIGSEGV on
> my system. 


I still hit it on IA64 platform with your patch, once configured with
slirp.  
Thanks
Xiantao

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH] Remove unnecessary linux/kvm.h include (v2)

2008-01-29 Thread Anthony Liguori
This removes an unnecessary include of linux/kvm.h which happens to silence
a warning introduced by my previous patch :-)  We have to move the ABI check
too until we've included libkvm.h.

Signed-off-by: Anthony Liguori <[EMAIL PROTECTED]>

diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c
index d798841..28df774 100644
--- a/libkvm/libkvm.c
+++ b/libkvm/libkvm.c
@@ -18,18 +18,6 @@
 #define __user /* temporary, until installed via make headers_install */
 #endif
 
-#if defined(__i386__) || defined(__x86_64__)
-#define CONFIG_X86
-#endif
-
-#include 
-
-#define EXPECTED_KVM_API_VERSION 12
-
-#if EXPECTED_KVM_API_VERSION != KVM_API_VERSION
-#error libkvm: userspace and kernel version mismatch
-#endif
-
 #include 
 #include 
 #include 
@@ -40,6 +28,12 @@
 #include 
 #include "libkvm.h"
 
+#define EXPECTED_KVM_API_VERSION 12
+
+#if EXPECTED_KVM_API_VERSION != KVM_API_VERSION
+#error libkvm: userspace and kernel version mismatch
+#endif
+
 #if defined(__x86_64__) || defined(__i386__)
 #include "kvm-x86.h"
 #endif

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel