Re: [PATCH 2/2] kvm: change the dirty page tracking to work with dirty bity

2009-06-11 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Izik Eidus wrote:
 + if (!kvm_x86_ops-dirty_bit_support()) {
 + spin_lock(kvm-mmu_lock);
 + /*  remove_write_access() flush the tlb */
 + kvm_mmu_slot_remove_write_access(kvm, log-slot);
 + spin_unlock(kvm-mmu_lock);
 + } else {
 + kvm_flush_remote_tlbs(kvm);

It might not correspond to the common style, but I think a callback
function -dirty_bit_support is overkill.  This is a function pointer
the compiler cannot see through.  Hence it's an indirect function call.
 But the implementation is always a simple yes/no (it seems).  Indirect
calls are rather expensive (most of the time they cannot be predicted
right).

Why not instead have a read-only data constants and have an inline
function test that value?  It means no function call and only one data
access.


Also, you're inconsistent in the use of integers and true/false in the
implementations of this function.  Either use 0/1 or false/true.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkowv08ACgkQ2ijCOnn/RHR71ACdH3xr3XPnCLgsMMwdTawfehEN
vs4An2DlErhU6SeanSYVIyP3eLB4sjsz
=UZ32
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: change the dirty page tracking to work with dirty bity

2009-06-11 Thread Izik Eidus

Avi Kivity wrote:

Izik Eidus wrote:
change the dirty page tracking to work with dirty bity instead of 
page fault.
right now the dirty page tracking work with the help of page faults, 
when we
want to track a page for being dirty, we write protect it and we mark 
it dirty
when we have write page fault, this code move into looking at the 
dirty bit

of the spte.

  


I'm concerned about performance during the later stages of live 
migration.  Even if only 1000 pages are dirty, you still have to look 
at 2,000,000 or more ptes (for an 8GB guest).  That's a lot of overhead.


I think we need to use the page table hierarchy, write protect the 
upper page table so we know which page tables we need to look at.





Great idea, so i add another bitmap for the page directory?
 
+static int vmx_dirty_bit_support(void)

+{
+return false;
+}
  


It's false only when ept is enabled.



Yea, that i found out already

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: change the dirty page tracking to work with dirty bity

2009-06-11 Thread Avi Kivity

Izik Eidus wrote:

Avi Kivity wrote:

Izik Eidus wrote:
change the dirty page tracking to work with dirty bity instead of 
page fault.
right now the dirty page tracking work with the help of page faults, 
when we
want to track a page for being dirty, we write protect it and we 
mark it dirty
when we have write page fault, this code move into looking at the 
dirty bit

of the spte.

  


I'm concerned about performance during the later stages of live 
migration.  Even if only 1000 pages are dirty, you still have to look 
at 2,000,000 or more ptes (for an 8GB guest).  That's a lot of overhead.


I think we need to use the page table hierarchy, write protect the 
upper page table so we know which page tables we need to look at.





Great idea, so i add another bitmap for the page directory?


No, why?

You need to drop write access to the shadow root ptes.  When you get a 
fault, restore write access to the root ptes, but drop access from the 
L3 ptes, and so on until you reach the L1 ptes.  There you clear the 
dirty bits, and add the page to a list of pages that need to be checked 
for dirty bits.  This way you only check ptes that have a chance to be 
dirty.


I'm not sure that will be faster, but there's a good chance.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: change the dirty page tracking to work with dirty bity

2009-06-11 Thread Marcelo Tosatti
On Thu, Jun 11, 2009 at 02:27:46PM +0300, Izik Eidus wrote:
 Marcelo Tosatti wrote:


 What i'm saying is with shadow and NPT (i believe) you can mark a spte
 writable but not dirty, which gives you the ability to know whether
 certain pages have been dirtied.
   

 Isnt this what this patch is doing?

Yes, was confused for some reason i don't remember.

So making the dirty bit available to the host is a good idea, but would
have to check things like faults on out of sync pagetables (where
the guest dirty bit might be cleared in parallel, maybe its ok but
not sure), verify transfer of dirty bit when zapping is consistent
everywhere, etc.

So it would be nicer to introduce an optimization to the way dirty bit
info is acquired, then you use that to optimize kvm's dirty log ioctl.

The link with KSM was that you can consult this dirty info, which is
fast, to know if content of pages has changed. But it maybe useless,
don't know.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: change the dirty page tracking to work with dirty bity

2009-06-10 Thread Izik Eidus

Izik Eidus wrote:

+static int vmx_dirty_bit_support(void)
+{
+   return false;
+}
+
  



Again, idiotic bug: this should be:
return tdp_enable == false;


...


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: change the dirty page tracking to work with dirty bity

2009-06-10 Thread Marcelo Tosatti
On Wed, Jun 10, 2009 at 07:23:25PM +0300, Izik Eidus wrote:
 change the dirty page tracking to work with dirty bity instead of page fault.
 right now the dirty page tracking work with the help of page faults, when we
 want to track a page for being dirty, we write protect it and we mark it dirty
 when we have write page fault, this code move into looking at the dirty bit
 of the spte.
 
 Signed-off-by: Izik Eidus iei...@redhat.com
 ---
  arch/ia64/kvm/kvm-ia64.c|4 +++
  arch/powerpc/kvm/powerpc.c  |4 +++
  arch/s390/kvm/kvm-s390.c|4 +++
  arch/x86/include/asm/kvm_host.h |3 ++
  arch/x86/kvm/mmu.c  |   42 --
  arch/x86/kvm/svm.c  |7 ++
  arch/x86/kvm/vmx.c  |7 ++
  arch/x86/kvm/x86.c  |   26 ---
  include/linux/kvm_host.h|1 +
  virt/kvm/kvm_main.c |6 -
  10 files changed, 96 insertions(+), 8 deletions(-)
 
 diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
 index 3199221..5914128 100644
 --- a/arch/ia64/kvm/kvm-ia64.c
 +++ b/arch/ia64/kvm/kvm-ia64.c
 @@ -1809,6 +1809,10 @@ void kvm_arch_exit(void)
   kvm_vmm_info = NULL;
  }
  
 +void kvm_arch_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 +{
 +}
 +
  static int kvm_ia64_sync_dirty_log(struct kvm *kvm,
   struct kvm_dirty_log *log)
  {
 diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
 index 2cf915e..6beb368 100644
 --- a/arch/powerpc/kvm/powerpc.c
 +++ b/arch/powerpc/kvm/powerpc.c
 @@ -418,6 +418,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
 kvm_dirty_log *log)
   return -ENOTSUPP;
  }

  

#ifndef KVM_ARCH_HAVE_DIRTY_LOG
 +void kvm_arch_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 +{
 +}
 +
#endif

in virt/kvm/main.c


 index c7b0cc2..8a24149 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -527,6 +527,7 @@ struct kvm_x86_ops {
   int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
   int (*get_tdp_level)(void);
   u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
 + int (*dirty_bit_support)(void);
  };
  
  extern struct kvm_x86_ops *kvm_x86_ops;
 @@ -796,4 +797,6 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
  int kvm_age_hva(struct kvm *kvm, unsigned long hva);
  int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
  
 +int is_dirty_and_clean_rmapp(struct kvm *kvm, unsigned long *rmapp);
 +
  #endif /* _ASM_X86_KVM_HOST_H */
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 809cce0..500e0e2 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -140,6 +140,8 @@ module_param(oos_shadow, bool, 0644);
  #define ACC_USER_MASKPT_USER_MASK
  #define ACC_ALL  (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
  
 +#define SPTE_DONT_DIRTY (1ULL  PT_FIRST_AVAIL_BITS_SHIFT)
 +
  #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
  
  struct kvm_rmap_desc {
 @@ -629,6 +631,29 @@ static u64 *rmap_next(struct kvm *kvm, unsigned long 
 *rmapp, u64 *spte)
   return NULL;
  }
  
 +int is_dirty_and_clean_rmapp(struct kvm *kvm, unsigned long *rmapp)
 +{
 + u64 *spte;
 + int dirty = 0;
 +
 + if (!shadow_dirty_mask)
 + return 0;
 +
 + spte = rmap_next(kvm, rmapp, NULL);
 + while (spte) {
 + if (*spte  PT_DIRTY_MASK) {
 + set_shadow_pte(spte, (*spte = ~PT_DIRTY_MASK) |
 +SPTE_DONT_DIRTY);
 + dirty = 1;
 + break;
 + }
 + spte = rmap_next(kvm, rmapp, spte);
 + }
 +
 + return dirty;
 +}
 +
 +
  static int rmap_write_protect(struct kvm *kvm, u64 gfn)
  {
   unsigned long *rmapp;
 @@ -1381,11 +1406,17 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
  static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp)
  {
   int ret;
 + int i;
 +
   ++kvm-stat.mmu_shadow_zapped;
   ret = mmu_zap_unsync_children(kvm, sp);
   kvm_mmu_page_unlink_children(kvm, sp);
   kvm_mmu_unlink_parents(kvm, sp);
   kvm_flush_remote_tlbs(kvm);
 + for (i = 0; i  PT64_ENT_PER_PAGE; ++i) {
 + if (sp-spt[i]  PT_DIRTY_MASK)
 + mark_page_dirty(kvm, sp-gfns[i]);
 + }

Also need to transfer dirty bit in other places probably.

   if (!sp-role.invalid  !sp-role.direct)
   unaccount_shadowed(kvm, sp-gfn);
   if (sp-unsync)
 @@ -1676,7 +1707,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
 *shadow_pte,
* whether the guest actually used the pte (in order to detect
* demand paging).
*/
 - spte = shadow_base_present_pte | shadow_dirty_mask;
 + spte = shadow_base_present_pte;
 + if (!(spte  SPTE_DONT_DIRTY))
 + spte |= shadow_dirty_mask;
 +
   if (!speculative)
   

Re: [PATCH 2/2] kvm: change the dirty page tracking to work with dirty bity

2009-06-10 Thread Izik Eidus

Marcelo Tosatti wrote:

On Wed, Jun 10, 2009 at 07:23:25PM +0300, Izik Eidus wrote:
  

change the dirty page tracking to work with dirty bity instead of page fault.
right now the dirty page tracking work with the help of page faults, when we
want to track a page for being dirty, we write protect it and we mark it dirty
when we have write page fault, this code move into looking at the dirty bit
of the spte.

Signed-off-by: Izik Eidus iei...@redhat.com
---
 arch/ia64/kvm/kvm-ia64.c|4 +++
 arch/powerpc/kvm/powerpc.c  |4 +++
 arch/s390/kvm/kvm-s390.c|4 +++
 arch/x86/include/asm/kvm_host.h |3 ++
 arch/x86/kvm/mmu.c  |   42 --
 arch/x86/kvm/svm.c  |7 ++
 arch/x86/kvm/vmx.c  |7 ++
 arch/x86/kvm/x86.c  |   26 ---
 include/linux/kvm_host.h|1 +
 virt/kvm/kvm_main.c |6 -
 10 files changed, 96 insertions(+), 8 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 3199221..5914128 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1809,6 +1809,10 @@ void kvm_arch_exit(void)
kvm_vmm_info = NULL;
 }
 
+void kvm_arch_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)

+{
+}
+
 static int kvm_ia64_sync_dirty_log(struct kvm *kvm,
struct kvm_dirty_log *log)
 {
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2cf915e..6beb368 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -418,6 +418,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
return -ENOTSUPP;
 }



  
 



#ifndef KVM_ARCH_HAVE_DIRTY_LOG
  

+void kvm_arch_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+}
+


#endif

in virt/kvm/main.c


  

index c7b0cc2..8a24149 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -527,6 +527,7 @@ struct kvm_x86_ops {
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
+   int (*dirty_bit_support)(void);
 };
 
 extern struct kvm_x86_ops *kvm_x86_ops;

@@ -796,4 +797,6 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
 int kvm_age_hva(struct kvm *kvm, unsigned long hva);
 int cpuid_maxphyaddr(struct kvm_vcpu *vcpu);
 
+int is_dirty_and_clean_rmapp(struct kvm *kvm, unsigned long *rmapp);

+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 809cce0..500e0e2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -140,6 +140,8 @@ module_param(oos_shadow, bool, 0644);
 #define ACC_USER_MASKPT_USER_MASK
 #define ACC_ALL  (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
 
+#define SPTE_DONT_DIRTY (1ULL  PT_FIRST_AVAIL_BITS_SHIFT)

+
 #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
 
 struct kvm_rmap_desc {

@@ -629,6 +631,29 @@ static u64 *rmap_next(struct kvm *kvm, unsigned long 
*rmapp, u64 *spte)
return NULL;
 }
 
+int is_dirty_and_clean_rmapp(struct kvm *kvm, unsigned long *rmapp)

+{
+   u64 *spte;
+   int dirty = 0;
+
+   if (!shadow_dirty_mask)
+   return 0;
+
+   spte = rmap_next(kvm, rmapp, NULL);
+   while (spte) {
+   if (*spte  PT_DIRTY_MASK) {
+   set_shadow_pte(spte, (*spte = ~PT_DIRTY_MASK) |
+  SPTE_DONT_DIRTY);
+   dirty = 1;
+   break;
+   }
+   spte = rmap_next(kvm, rmapp, spte);
+   }
+
+   return dirty;
+}
+
+
 static int rmap_write_protect(struct kvm *kvm, u64 gfn)
 {
unsigned long *rmapp;
@@ -1381,11 +1406,17 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
 static int kvm_mmu_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
int ret;
+   int i;
+
++kvm-stat.mmu_shadow_zapped;
ret = mmu_zap_unsync_children(kvm, sp);
kvm_mmu_page_unlink_children(kvm, sp);
kvm_mmu_unlink_parents(kvm, sp);
kvm_flush_remote_tlbs(kvm);
+   for (i = 0; i  PT64_ENT_PER_PAGE; ++i) {
+   if (sp-spt[i]  PT_DIRTY_MASK)
+   mark_page_dirty(kvm, sp-gfns[i]);
+   }



Also need to transfer dirty bit in other places probably.
  



Yes, i can think about some other case, but maybe i can avoid it using 
some trick.



  

if (!sp-role.invalid  !sp-role.direct)
unaccount_shadowed(kvm, sp-gfn);
if (sp-unsync)
@@ -1676,7 +1707,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 
*shadow_pte,
 * whether the guest actually used the pte (in order to detect
 * demand paging).
 */
-   spte = shadow_base_present_pte | shadow_dirty_mask;
+   spte = 

Re: [PATCH 2/2] kvm: change the dirty page tracking to work with dirty bity

2009-06-10 Thread Izik Eidus

Izik Eidus wrote:

Marcelo Tosatti wrote:



 
 /* Free page dirty bitmap if unneeded */

-if (!(new.flags  KVM_MEM_LOG_DIRTY_PAGES))
+if (!(new.flags  KVM_MEM_LOG_DIRTY_PAGES)) {
 new.dirty_bitmap = NULL;
+if (old.flags  KVM_MEM_LOG_DIRTY_PAGES)
+kvm_arch_flush_shadow(kvm);
+}



Whats this for?
  


We have added all this SPTE_DONT_DIRTY..., when we stop dirty bit 
tracking, we want to continue setting the dirty bit for the spte 
inside set_spte(), so writing to the page would be faster


Another way would be doing something like kvm_arch_clean_dont_dirty(), 
might be better than flushing the whole shadow page tables.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html