[PATCH 2/2] KVM: Remove KVM_REQ_MCLOCK_INPROGRESS to save a bit in vcpu->requests

2016-01-07 Thread Takuya Yoshikawa
Now that we are running out of the bits in vcpu->requests, using one
of them just to call kvm_make_all_cpus_request() with a valid request
number should be avoided.

This patch achieves this by making kvm_make_all_cpus_request() handle
an empty request.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/x86.c   |  2 --
 include/linux/kvm_host.h | 27 +--
 virt/kvm/kvm_main.c  |  8 +---
 3 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b6102c1..88260d0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1701,8 +1701,6 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
 
/* guest entries allowed */
-   kvm_for_each_vcpu(i, vcpu, kvm)
-   clear_bit(KVM_REQ_MCLOCK_INPROGRESS, >requests);
 
spin_unlock(>pvclock_gtod_sync_lock);
 #endif
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ca9b93e..bb9ae4f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -131,19 +131,18 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_PMI   15
 #define KVM_REQ_WATCHDOG  16
 #define KVM_REQ_MASTERCLOCK_UPDATE 17
-#define KVM_REQ_MCLOCK_INPROGRESS 18
-#define KVM_REQ_EPR_EXIT  19
-#define KVM_REQ_SCAN_IOAPIC   20
-#define KVM_REQ_GLOBAL_CLOCK_UPDATE 21
-#define KVM_REQ_ENABLE_IBS22
-#define KVM_REQ_DISABLE_IBS   23
-#define KVM_REQ_APIC_PAGE_RELOAD  24
-#define KVM_REQ_SMI   25
-#define KVM_REQ_HV_CRASH  26
-#define KVM_REQ_IOAPIC_EOI_EXIT   27
-#define KVM_REQ_HV_RESET  28
-#define KVM_REQ_HV_EXIT   29
-#define KVM_REQ_HV_STIMER 30
+#define KVM_REQ_EPR_EXIT  18
+#define KVM_REQ_SCAN_IOAPIC   19
+#define KVM_REQ_GLOBAL_CLOCK_UPDATE 20
+#define KVM_REQ_ENABLE_IBS21
+#define KVM_REQ_DISABLE_IBS   22
+#define KVM_REQ_APIC_PAGE_RELOAD  23
+#define KVM_REQ_SMI   24
+#define KVM_REQ_HV_CRASH  25
+#define KVM_REQ_IOAPIC_EOI_EXIT   26
+#define KVM_REQ_HV_RESET  27
+#define KVM_REQ_HV_EXIT   28
+#define KVM_REQ_HV_STIMER 29
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID0
 #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID   1
@@ -685,7 +684,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_reload_remote_mmus(struct kvm *kvm);
 void kvm_make_mclock_inprogress_request(struct kvm *kvm);
 void kvm_make_scan_ioapic_request(struct kvm *kvm);
-bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
+bool kvm_make_all_cpus_request(struct kvm *kvm, int req);
 
 long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index be3cef1..0100a19 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -156,7 +156,7 @@ static void ack_flush(void *_completed)
 {
 }
 
-bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
+bool kvm_make_all_cpus_request(struct kvm *kvm, int req)
 {
int i, cpu, me;
cpumask_var_t cpus;
@@ -167,7 +167,9 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned 
int req)
 
me = get_cpu();
kvm_for_each_vcpu(i, vcpu, kvm) {
-   kvm_make_request(req, vcpu);
+   if (req >= 0)
+   kvm_make_request(req, vcpu);
+
cpu = vcpu->cpu;
 
/* Set ->requests bit before we read ->mode */
@@ -208,7 +210,7 @@ void kvm_reload_remote_mmus(struct kvm *kvm)
 
 void kvm_make_mclock_inprogress_request(struct kvm *kvm)
 {
-   kvm_make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
+   kvm_make_all_cpus_request(kvm, -1);
 }
 
 void kvm_make_scan_ioapic_request(struct kvm *kvm)
-- 
2.1.0



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: MMU: Use clear_page() instead of init_shadow_page_table()

2015-12-18 Thread Takuya Yoshikawa
Not just in order to clean up the code, but to make it faster by using
enhanced instructions: the initialization became 20-30% faster on our
testing machine.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a1a3d19..7f5a82b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2041,14 +2041,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
}
 }
 
-static void init_shadow_page_table(struct kvm_mmu_page *sp)
-{
-   int i;
-
-   for (i = 0; i < PT64_ENT_PER_PAGE; ++i)
-   sp->spt[i] = 0ull;
-}
-
 static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
 {
sp->write_flooding_count = 0;
@@ -2128,7 +2120,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
account_shadowed(vcpu->kvm, sp);
}
sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
-   init_shadow_page_table(sp);
+   clear_page(sp->spt);
trace_kvm_mmu_get_page(sp, true);
return sp;
 }
-- 
2.1.0



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4 0/3] KVM: x86: MMU: Clean up x86's mmu code for future work - part2

2015-11-26 Thread Takuya Yoshikawa
Guests worked normally in shadow paging mode (ept=0) on my test machine.

Please check if the first two patches reflect what you meant correctly.

Takuya Yoshikawa (3):
  [1] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to 
link_shadow_page()
  [2] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  [3] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()

 arch/x86/kvm/mmu.c | 70 +++---
 arch/x86/kvm/paging_tmpl.h | 10 +++
 2 files changed, 26 insertions(+), 54 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()

2015-11-26 Thread Takuya Yoshikawa
Every time kvm_mmu_get_page() is called with a non-NULL parent_pte
argument, link_shadow_page() follows that to set the parent entry so
that the new mapping will point to the returned page table.

Moving parent_pte handling there allows to clean up the code because
parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and
mmu_page_add_parent_pte().

In addition, the patch avoids calling mark_unsync() for other parents in
the sp->parent_ptes chain than the newly added parent_pte, because they
have been there since before the current page fault handling started.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
Cc: Xiao Guangrong <guangrong.x...@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 23 +--
 arch/x86/kvm/paging_tmpl.h |  6 ++
 2 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7f46e3e..ec61b22 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2119,12 +2119,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
break;
 
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
-   if (sp->unsync_children) {
+   if (sp->unsync_children)
kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
-   kvm_mmu_mark_parents_unsync(sp);
-   } else if (sp->unsync)
-   kvm_mmu_mark_parents_unsync(sp);
 
__clear_sp_write_flooding_count(sp);
trace_kvm_mmu_get_page(sp, false);
@@ -2135,8 +2131,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
 
sp = kvm_mmu_alloc_page(vcpu, direct);
 
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
-
sp->gfn = gfn;
sp->role = role;
hlist_add_head(>hash_link,
@@ -2204,7 +2198,8 @@ static void shadow_walk_next(struct 
kvm_shadow_walk_iterator *iterator)
return __shadow_walk_next(iterator, *iterator->sptep);
 }
 
-static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp)
+static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
+struct kvm_mmu_page *sp)
 {
u64 spte;
 
@@ -2215,6 +2210,11 @@ static void link_shadow_page(u64 *sptep, struct 
kvm_mmu_page *sp)
   shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
 
mmu_spte_set(sptep, spte);
+
+   mmu_page_add_parent_pte(vcpu, sp, sptep);
+
+   if (sp->unsync_children || sp->unsync)
+   mark_unsync(sptep);
 }
 
 static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
@@ -2273,11 +2273,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
mmu_page_zap_pte(kvm, sp, sp->spt + i);
 }
 
-static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
-{
-   mmu_page_remove_parent_pte(sp, parent_pte);
-}
-
 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
u64 *sptep;
@@ -2743,7 +2738,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, 
int map_writable,
  iterator.level - 1,
  1, ACC_ALL, iterator.sptep);
 
-   link_shadow_page(iterator.sptep, sp);
+   link_shadow_page(vcpu, iterator.sptep, sp);
}
}
return emulate;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 11650ea..0dcf9c8 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
goto out_gpte_changed;
 
if (sp)
-   link_shadow_page(it.sptep, sp);
+   link_shadow_page(vcpu, it.sptep, sp);
}
 
for (;
@@ -618,7 +618,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 
sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
  true, direct_access, it.sptep);
-   link_shadow_page(it.sptep, sp);
+   link_shadow_page(vcpu, it.sptep, sp);
}
 
clear_sp_write_flooding_count(it.sptep);
@@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
return emulate;
 
 out_gpte_changed:
-   if (sp)
-   kvm_mmu_put_page(sp, it.sptep);
kvm_release_pfn_clean(pfn);
return 0;
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()

2015-11-26 Thread Takuya Yoshikawa
As kvm_mmu_get_page() was changed so that every parent pointer would not
get into the sp->parent_ptes chain before the entry pointed to by it was
set properly, we can use the for_each_rmap_spte macro instead of
pte_list_walk().

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
Cc: Xiao Guangrong <guangrong.x...@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 27 ++-
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ec61b22..204c7d4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, struct 
kvm_rmap_head *rmap_head)
}
 }
 
-typedef void (*pte_list_walk_fn) (u64 *spte);
-static void pte_list_walk(struct kvm_rmap_head *rmap_head, pte_list_walk_fn fn)
-{
-   struct pte_list_desc *desc;
-   int i;
-
-   if (!rmap_head->val)
-   return;
-
-   if (!(rmap_head->val & 1))
-   return fn((u64 *)rmap_head->val);
-
-   desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
-   while (desc) {
-   for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
-   fn(desc->sptes[i]);
-   desc = desc->more;
-   }
-}
-
 static struct kvm_rmap_head *__gfn_to_rmap(gfn_t gfn, int level,
   struct kvm_memory_slot *slot)
 {
@@ -1749,7 +1729,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu, int direct
 static void mark_unsync(u64 *spte);
 static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
 {
-   pte_list_walk(>parent_ptes, mark_unsync);
+   u64 *sptep;
+   struct rmap_iterator iter;
+
+   for_each_rmap_spte(>parent_ptes, , sptep) {
+   mark_unsync(sptep);
+   }
 }
 
 static void mark_unsync(u64 *spte)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()

2015-11-26 Thread Takuya Yoshikawa
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 20 +++-
 arch/x86/kvm/paging_tmpl.h |  4 ++--
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 204c7d4..a1a3d19 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2071,8 +2071,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
 gva_t gaddr,
 unsigned level,
 int direct,
-unsigned access,
-u64 *parent_pte)
+unsigned access)
 {
union kvm_mmu_page_role role;
unsigned quadrant;
@@ -2720,8 +2719,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, 
int map_writable,
base_addr &= PT64_LVL_ADDR_MASK(iterator.level);
pseudo_gfn = base_addr >> PAGE_SHIFT;
sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
- iterator.level - 1,
- 1, ACC_ALL, iterator.sptep);
+ iterator.level - 1, 1, ACC_ALL);
 
link_shadow_page(vcpu, iterator.sptep, sp);
}
@@ -3078,8 +3076,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
spin_lock(>kvm->mmu_lock);
make_mmu_pages_available(vcpu);
-   sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL,
- 1, ACC_ALL, NULL);
+   sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, 1, ACC_ALL);
++sp->root_count;
spin_unlock(>kvm->mmu_lock);
vcpu->arch.mmu.root_hpa = __pa(sp->spt);
@@ -3091,9 +3088,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
spin_lock(>kvm->mmu_lock);
make_mmu_pages_available(vcpu);
sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT),
- i << 30,
- PT32_ROOT_LEVEL, 1, ACC_ALL,
- NULL);
+   i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
spin_unlock(>kvm->mmu_lock);
@@ -3130,7 +3125,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
spin_lock(>kvm->mmu_lock);
make_mmu_pages_available(vcpu);
sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL,
- 0, ACC_ALL, NULL);
+ 0, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
spin_unlock(>kvm->mmu_lock);
@@ -3163,9 +3158,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
}
spin_lock(>kvm->mmu_lock);
make_mmu_pages_available(vcpu);
-   sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
- PT32_ROOT_LEVEL, 0,
- ACC_ALL, NULL);
+   sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL,
+ 0, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
spin_unlock(>kvm->mmu_lock);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 0dcf9c8..91e939b 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -587,7 +587,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
if (!is_shadow_present_pte(*it.sptep)) {
table_gfn = gw->table_gfn[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
- false, access, it.sptep);
+ false, access);
}
 
/*
@@ -617,7 +617,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
direct_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
 
sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
- true, direct_access, it.sptep);
+ true, direct_access);
link_shadow_page(vcpu, it.sptep, sp);
}
 
-

Re: [PATCH 09/10] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()

2015-11-25 Thread Takuya Yoshikawa

On 2015/11/26 1:32, Paolo Bonzini wrote:

On 20/11/2015 09:57, Xiao Guangrong wrote:



You can move this patch to the front of
[PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of
pte_list_walk()

By moving kvm_mmu_mark_parents_unsync() to the behind of mmu_spte_set()
(then the parent
spte is present now), you can directly clean up for_each_rmap_spte().


So basically squash together the two patches (8/10 and 9/10) except the
change to kvm_mmu_mark_parents_unsync; then in the second patch switch
from pte_list_walk to for_each_rmap_spte.

That makes sense indeed.


Sorry for my being late to respond to Xiao's suggestions.  I could not
use my development machine for a while this week.

In short, this kvm_mmu_mark_parents_unsync() call in kvm_mmu_get_page()
should have been mark_unsync() for the new parent_pte only, because we
are constructing the mappings from/to it and other parents in the
sp->parent_ptes are not related to this fault?

As the code has been this way for some time, a bit scary to change it,
but I'll do some tests without that extra kvm_mmu_mark_parents_unsync()
with a guest (with ept=0) this afternoon.

  Takuya


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()

2015-11-20 Thread Takuya Yoshikawa

On 2015/11/20 17:46, Xiao Guangrong wrote:


You just ignored my comment on the previous version...


I'm sorry but please read the explanation in patch 00.
I've read your comments and I'm not ignoring you.

Since this patch set has become huge than expected, I'm sending
this version so that patch 01-07 can be applied first.

For patch 08-10, I think we need to check more because there seems
to be some confusion between us.  You can also read other discussions
between Marcelo, Paolo and me.

Anyway, since these three patches has been placed at the end of the
series now, I hope we can concentrate on them easier than before.

Thanks,
  Takuya

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 00/10] KVM: x86: MMU: Clean up x86's mmu code for future work

2015-11-20 Thread Takuya Yoshikawa
It seems like you all are busy now, so I've made this patch set so that
mechanical and trivial changes come before.

V2->V3:
Patch 01: Rebased and moved here. Updated stale comments.
  We may also want to use a union, inside the struct, to eliminate casting to
  (u64 *) type when spte is in the head in the future.
Patch 02-05: No change.
  About patch 03: There was a comment on the usage of braces for a single line
  else-if statement from Xiao. As I answered, checkpatch did not complain about
  this, and when the corresponding if block has multiple lines, some developers
  prefer/recommend this style. Feel free to modify it if you don't like it.
Patch 06: Changed WARN_ON to BUG_ON as Marcelo suggested.
Patch 07: Removed unnecessary zero-initialization of sp->parent_ptes as Xiao
  suggested.

I think these seven patches are ready for inclusion.

Patch 08-10: No change now, though there were a few comments.
  This patch set is not intended to optimize anything, so these patches try to
  keep the way mark_unsync() gets called as much as possible: the only changes
  are when this gets called for the new parent_pte and when
  mmu_page_add_parent_pte() gets called.

For these three, I'm not sure what we should do now, still RFC?
We can also consider other approaches, e.g. moving link_shadow_page() in the
kvm_get_mmu_page() as Paolo suggested before.

  Takuya

Takuya Yoshikawa (10):
  [01] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
  [02] KVM: x86: MMU: Remove unused parameter of __direct_map()
  [03] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  [04] KVM: x86: MMU: Make mmu_set_spte() return emulate value
  [05] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
  [06] KVM: x86: MMU: Consolidate BUG_ON checks for reverse-mapped sptes
  [07] KVM: x86: MMU: Move initialization of parent_ptes out from 
kvm_mmu_alloc_page()
  [08] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  [09] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to 
link_shadow_page()
  [10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()

 Documentation/virtual/kvm/mmu.txt |   4 +-
 arch/x86/include/asm/kvm_host.h   |   8 +-
 arch/x86/kvm/mmu.c| 370 ++
 arch/x86/kvm/mmu_audit.c  |  15 +-
 arch/x86/kvm/paging_tmpl.h|  20 +--
 5 files changed, 201 insertions(+), 216 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct

2015-11-20 Thread Takuya Yoshikawa
New struct kvm_rmap_head makes the code type-safe to some extent.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/include/asm/kvm_host.h |   8 +-
 arch/x86/kvm/mmu.c  | 196 
 arch/x86/kvm/mmu_audit.c|  13 +--
 3 files changed, 113 insertions(+), 104 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f608e17..8140077 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -214,6 +214,10 @@ union kvm_mmu_page_role {
};
 };
 
+struct kvm_rmap_head {
+   unsigned long val;
+};
+
 struct kvm_mmu_page {
struct list_head link;
struct hlist_node hash_link;
@@ -231,7 +235,7 @@ struct kvm_mmu_page {
bool unsync;
int root_count;  /* Currently serving as active root */
unsigned int unsync_children;
-   unsigned long parent_ptes;  /* Reverse mapping for parent_pte */
+   struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
 
/* The page is obsolete if mmu_valid_gen != kvm->arch.mmu_valid_gen.  */
unsigned long mmu_valid_gen;
@@ -606,7 +610,7 @@ struct kvm_lpage_info {
 };
 
 struct kvm_arch_memory_slot {
-   unsigned long *rmap[KVM_NR_PAGE_SIZES];
+   struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
 };
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 276d2f2..d9a6801 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -909,36 +909,35 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t 
large_gfn,
 }
 
 /*
- * Pte mapping structures:
+ * About rmap_head encoding:
  *
- * If pte_list bit zero is zero, then pte_list point to the spte.
- *
- * If pte_list bit zero is one, (then pte_list & ~1) points to a struct
+ * If the bit zero of rmap_head->val is clear, then it points to the only spte
+ * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct
  * pte_list_desc containing more mappings.
- *
- * Returns the number of pte entries before the spte was added or zero if
- * the spte was not added.
- *
+ */
+
+/*
+ * Returns the number of pointers in the rmap chain, not counting the new one.
  */
 static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
-   unsigned long *pte_list)
+   struct kvm_rmap_head *rmap_head)
 {
struct pte_list_desc *desc;
int i, count = 0;
 
-   if (!*pte_list) {
+   if (!rmap_head->val) {
rmap_printk("pte_list_add: %p %llx 0->1\n", spte, *spte);
-   *pte_list = (unsigned long)spte;
-   } else if (!(*pte_list & 1)) {
+   rmap_head->val = (unsigned long)spte;
+   } else if (!(rmap_head->val & 1)) {
rmap_printk("pte_list_add: %p %llx 1->many\n", spte, *spte);
desc = mmu_alloc_pte_list_desc(vcpu);
-   desc->sptes[0] = (u64 *)*pte_list;
+   desc->sptes[0] = (u64 *)rmap_head->val;
desc->sptes[1] = spte;
-   *pte_list = (unsigned long)desc | 1;
+   rmap_head->val = (unsigned long)desc | 1;
++count;
} else {
rmap_printk("pte_list_add: %p %llx many->many\n", spte, *spte);
-   desc = (struct pte_list_desc *)(*pte_list & ~1ul);
+   desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
while (desc->sptes[PTE_LIST_EXT-1] && desc->more) {
desc = desc->more;
count += PTE_LIST_EXT;
@@ -955,8 +954,9 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
 }
 
 static void
-pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc,
-  int i, struct pte_list_desc *prev_desc)
+pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
+  struct pte_list_desc *desc, int i,
+  struct pte_list_desc *prev_desc)
 {
int j;
 
@@ -967,43 +967,43 @@ pte_list_desc_remove_entry(unsigned long *pte_list, 
struct pte_list_desc *desc,
if (j != 0)
return;
if (!prev_desc && !desc->more)
-   *pte_list = (unsigned long)desc->sptes[0];
+   rmap_head->val = (unsigned long)desc->sptes[0];
else
if (prev_desc)
prev_desc->more = desc->more;
else
-   *pte_list = (unsigned long)desc->more | 1;
+   rmap_head->val = (unsigned long)desc->more | 1;
mmu_free_pte_list_desc(desc);
 }
 
-static void pte_list_remove(u64 *spte, unsigned long *pte_list)
+static void pte_list_remove(u64 *s

[PATCH 02/10] KVM: x86: MMU: Remove unused parameter of __direct_map()

2015-11-20 Thread Takuya Yoshikawa
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d9a6801..8a1593f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2708,9 +2708,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, 
u64 *sptep)
__direct_pte_prefetch(vcpu, sp, sptep);
 }
 
-static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
-   int map_writable, int level, gfn_t gfn, pfn_t pfn,
-   bool prefault)
+static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
+   int level, gfn_t gfn, pfn_t pfn, bool prefault)
 {
struct kvm_shadow_walk_iterator iterator;
struct kvm_mmu_page *sp;
@@ -3018,11 +3017,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
u32 error_code,
make_mmu_pages_available(vcpu);
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, , , );
-   r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn,
-prefault);
+   r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
spin_unlock(>kvm->mmu_lock);
 
-
return r;
 
 out_unlock:
@@ -3531,8 +3528,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
make_mmu_pages_available(vcpu);
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, , , );
-   r = __direct_map(vcpu, gpa, write, map_writable,
-level, gfn, pfn, prefault);
+   r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
spin_unlock(>kvm->mmu_lock);
 
return r;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/10] KVM: x86: MMU: Make mmu_set_spte() return emulate value

2015-11-20 Thread Takuya Yoshikawa
mmu_set_spte()'s code is based on the assumption that the emulate
parameter has a valid pointer value if set_spte() returns true and
write_fault is not zero.  In other cases, emulate may be NULL, so a
NULL-check is needed.

Stop passing emulate pointer and make mmu_set_spte() return the emulate
value instead to clean up this complex interface.  Prefetch functions
can just throw away the return value.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 27 ++-
 arch/x86/kvm/paging_tmpl.h | 10 +-
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9832bc9..74c120c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2564,13 +2564,13 @@ done:
return ret;
 }
 
-static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-unsigned pte_access, int write_fault, int *emulate,
-int level, gfn_t gfn, pfn_t pfn, bool speculative,
-bool host_writable)
+static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned 
pte_access,
+int write_fault, int level, gfn_t gfn, pfn_t pfn,
+bool speculative, bool host_writable)
 {
int was_rmapped = 0;
int rmap_count;
+   bool emulate = false;
 
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 *sptep, write_fault, gfn);
@@ -2600,12 +2600,12 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative,
  true, host_writable)) {
if (write_fault)
-   *emulate = 1;
+   emulate = true;
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
}
 
-   if (unlikely(is_mmio_spte(*sptep) && emulate))
-   *emulate = 1;
+   if (unlikely(is_mmio_spte(*sptep)))
+   emulate = true;
 
pgprintk("%s: setting spte %llx\n", __func__, *sptep);
pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n",
@@ -2624,6 +2624,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
}
 
kvm_release_pfn_clean(pfn);
+
+   return emulate;
 }
 
 static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
@@ -2658,9 +2660,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
return -1;
 
for (i = 0; i < ret; i++, gfn++, start++)
-   mmu_set_spte(vcpu, start, access, 0, NULL,
-sp->role.level, gfn, page_to_pfn(pages[i]),
-true, true);
+   mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn,
+page_to_pfn(pages[i]), true, true);
 
return 0;
 }
@@ -2721,9 +2722,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, 
int map_writable,
 
for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
if (iterator.level == level) {
-   mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
-write, , level, gfn, pfn,
-prefault, map_writable);
+   emulate = mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
+  write, level, gfn, pfn, prefault,
+  map_writable);
direct_pte_prefetch(vcpu, iterator.sptep);
++vcpu->stat.pf_fixed;
break;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index d8fdc5c..11650ea 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -475,8 +475,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
 * we call mmu_set_spte() with host_writable = true because
 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 */
-   mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL,
-gfn, pfn, true, true);
+   mmu_set_spte(vcpu, spte, pte_access, 0, PT_PAGE_TABLE_LEVEL, gfn, pfn,
+true, true);
 
return true;
 }
@@ -556,7 +556,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
unsigned direct_access, access = gw->pt_access;
-   int top_level, emulate = 0;
+   int top_level, emulate;
 
direct_access = gw->pte_access;
 
@@ -622,8 +622,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
}
 
clear_sp_write_flooding_count(it.sptep);
-   mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, ,
-it.level, gw-

[PATCH 03/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap

2015-11-20 Thread Takuya Yoshikawa
Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line
code which clears a bit in the unsync child bitmap; the former places it
inside a loop block and uses a few goto statements to jump to it.

A new helper function, clear_unsync_child_bit(), makes the code cleaner.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8a1593f..9832bc9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1809,6 +1809,13 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, 
struct kvm_mmu_page *sp,
return (pvec->nr == KVM_PAGE_ARRAY_NR);
 }
 
+static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
+{
+   --sp->unsync_children;
+   WARN_ON((int)sp->unsync_children < 0);
+   __clear_bit(idx, sp->unsync_child_bitmap);
+}
+
 static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
   struct kvm_mmu_pages *pvec)
 {
@@ -1818,8 +1825,10 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
struct kvm_mmu_page *child;
u64 ent = sp->spt[i];
 
-   if (!is_shadow_present_pte(ent) || is_large_pte(ent))
-   goto clear_child_bitmap;
+   if (!is_shadow_present_pte(ent) || is_large_pte(ent)) {
+   clear_unsync_child_bit(sp, i);
+   continue;
+   }
 
child = page_header(ent & PT64_BASE_ADDR_MASK);
 
@@ -1828,28 +1837,21 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
return -ENOSPC;
 
ret = __mmu_unsync_walk(child, pvec);
-   if (!ret)
-   goto clear_child_bitmap;
-   else if (ret > 0)
+   if (!ret) {
+   clear_unsync_child_bit(sp, i);
+   continue;
+   } else if (ret > 0) {
nr_unsync_leaf += ret;
-   else
+   } else
return ret;
} else if (child->unsync) {
nr_unsync_leaf++;
if (mmu_pages_add(pvec, child, i))
return -ENOSPC;
} else
-goto clear_child_bitmap;
-
-   continue;
-
-clear_child_bitmap:
-   __clear_bit(i, sp->unsync_child_bitmap);
-   sp->unsync_children--;
-   WARN_ON((int)sp->unsync_children < 0);
+   clear_unsync_child_bit(sp, i);
}
 
-
return nr_unsync_leaf;
 }
 
@@ -2012,9 +2014,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path 
*parents)
if (!sp)
return;
 
-   --sp->unsync_children;
-   WARN_ON((int)sp->unsync_children < 0);
-   __clear_bit(idx, sp->unsync_child_bitmap);
+   clear_unsync_child_bit(sp, idx);
level++;
} while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/10] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()

2015-11-20 Thread Takuya Yoshikawa
is_rmap_spte(), originally named is_rmap_pte(), was introduced when the
simple reverse mapping was implemented by commit cd4a4e5374110444
("[PATCH] KVM: MMU: Implement simple reverse mapping").  At that point,
its role was clear and only rmap_add() and rmap_remove() were using it
to select sptes that need to be reverse-mapped.

Independently of that, is_shadow_present_pte() was first introduced by
commit c7addb902054195b ("KVM: Allow not-present guest page faults to
bypass kvm") to do bypass_guest_pf optimization, which does not exist
any more.

These two seem to have changed their roles somewhat, and is_rmap_spte()
just calls is_shadow_present_pte() now.

Since using both of them without clear distinction just makes the code
confusing, remove is_rmap_spte().

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c   | 13 -
 arch/x86/kvm/mmu_audit.c |  2 +-
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 74c120c..3104748 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -311,11 +311,6 @@ static int is_large_pte(u64 pte)
return pte & PT_PAGE_SIZE_MASK;
 }
 
-static int is_rmap_spte(u64 pte)
-{
-   return is_shadow_present_pte(pte);
-}
-
 static int is_last_spte(u64 pte, int level)
 {
if (level == PT_PAGE_TABLE_LEVEL)
@@ -540,7 +535,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
u64 old_spte = *sptep;
bool ret = false;
 
-   WARN_ON(!is_rmap_spte(new_spte));
+   WARN_ON(!is_shadow_present_pte(new_spte));
 
if (!is_shadow_present_pte(old_spte)) {
mmu_spte_set(sptep, new_spte);
@@ -595,7 +590,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
else
old_spte = __update_clear_spte_slow(sptep, 0ull);
 
-   if (!is_rmap_spte(old_spte))
+   if (!is_shadow_present_pte(old_spte))
return 0;
 
pfn = spte_to_pfn(old_spte);
@@ -2575,7 +2570,7 @@ static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep, unsigned pte_access,
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 *sptep, write_fault, gfn);
 
-   if (is_rmap_spte(*sptep)) {
+   if (is_shadow_present_pte(*sptep)) {
/*
 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
 * the parent of the now unreachable PTE.
@@ -2919,7 +2914,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t 
gva, int level,
 * If the mapping has been changed, let the vcpu fault on the
 * same address again.
 */
-   if (!is_rmap_spte(spte)) {
+   if (!is_shadow_present_pte(spte)) {
ret = true;
goto exit;
}
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index f7b0488..1cee3ec 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -183,7 +183,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct 
kvm_mmu_page *sp)
return;
 
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
-   if (!is_rmap_spte(sp->spt[i]))
+   if (!is_shadow_present_pte(sp->spt[i]))
continue;
 
inspect_spte_has_rmap(kvm, sp->spt + i);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/10] KVM: x86: MMU: Consolidate BUG_ON checks for reverse-mapped sptes

2015-11-20 Thread Takuya Yoshikawa
At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is
placed right after the call to detect unrelated sptes which must not be
found in the reverse-mapping list.

Move this check in rmap_get_first/next() so that all call sites, not
just the users of the for_each_rmap_spte() macro, will be checked the
same way.

One thing to keep in mind is that kvm_mmu_unlink_parents() also uses
rmap_get_first() to handle parent sptes.  The change will not break it
because parent sptes are present, at least until drop_parent_pte()
actually unlinks them, and not mmio-sptes.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 Documentation/virtual/kvm/mmu.txt |  4 ++--
 arch/x86/kvm/mmu.c| 26 +-
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt 
b/Documentation/virtual/kvm/mmu.txt
index 3a4d681..daf9c0f 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -203,10 +203,10 @@ Shadow pages contain the following information:
 page cannot be destroyed.  See role.invalid.
   parent_ptes:
 The reverse mapping for the pte/ptes pointing at this page's spt. If
-parent_ptes bit 0 is zero, only one spte points at this pages and
+parent_ptes bit 0 is zero, only one spte points at this page and
 parent_ptes points at this single spte, otherwise, there exists multiple
 sptes pointing at this page and (parent_ptes & ~0x1) points at a data
-structure with a list of parent_ptes.
+structure with a list of parent sptes.
   unsync:
 If true, then the translations in this page may not match the guest's
 translation.  This is equivalent to the state of the tlb when a pte is
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3104748..5b249d4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1098,17 +1098,23 @@ struct rmap_iterator {
 static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head,
   struct rmap_iterator *iter)
 {
+   u64 *sptep;
+
if (!rmap_head->val)
return NULL;
 
if (!(rmap_head->val & 1)) {
iter->desc = NULL;
-   return (u64 *)rmap_head->val;
+   sptep = (u64 *)rmap_head->val;
+   goto out;
}
 
iter->desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
iter->pos = 0;
-   return iter->desc->sptes[iter->pos];
+   sptep = iter->desc->sptes[iter->pos];
+out:
+   BUG_ON(!is_shadow_present_pte(*sptep));
+   return sptep;
 }
 
 /*
@@ -1118,14 +1124,14 @@ static u64 *rmap_get_first(struct kvm_rmap_head 
*rmap_head,
  */
 static u64 *rmap_get_next(struct rmap_iterator *iter)
 {
+   u64 *sptep;
+
if (iter->desc) {
if (iter->pos < PTE_LIST_EXT - 1) {
-   u64 *sptep;
-
++iter->pos;
sptep = iter->desc->sptes[iter->pos];
if (sptep)
-   return sptep;
+   goto out;
}
 
iter->desc = iter->desc->more;
@@ -1133,17 +1139,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
if (iter->desc) {
iter->pos = 0;
/* desc->sptes[0] cannot be NULL */
-   return iter->desc->sptes[iter->pos];
+   sptep = iter->desc->sptes[iter->pos];
+   goto out;
}
}
 
return NULL;
+out:
+   BUG_ON(!is_shadow_present_pte(*sptep));
+   return sptep;
 }
 
 #define for_each_rmap_spte(_rmap_head_, _iter_, _spte_)
\
for (_spte_ = rmap_get_first(_rmap_head_, _iter_);  \
-_spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;}); \
-_spte_ = rmap_get_next(_iter_))
+_spte_; _spte_ = rmap_get_next(_iter_))
 
 static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
@@ -1358,7 +1367,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, struct 
kvm_rmap_head *rmap_head)
bool flush = false;
 
while ((sptep = rmap_get_first(rmap_head, ))) {
-   BUG_ON(!(*sptep & PT_PRESENT_MASK));
rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);
 
drop_spte(kvm, sptep);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/10] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page()

2015-11-20 Thread Takuya Yoshikawa
Make kvm_mmu_alloc_page() do just what its name tells to do, and remove
the extra allocation error check and zero-initialization of parent_ptes:
shadow page headers allocated by kmem_cache_zalloc() are always in the
per-VCPU pools.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5b249d4..7f46e3e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1726,8 +1726,7 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
mmu_spte_clear_no_track(parent_pte);
 }
 
-static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
-  u64 *parent_pte, int direct)
+static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int 
direct)
 {
struct kvm_mmu_page *sp;
 
@@ -1743,8 +1742,6 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu,
 * this feature. See the comments in kvm_zap_obsolete_pages().
 */
list_add(>link, >kvm->arch.active_mmu_pages);
-   sp->parent_ptes.val = 0;
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
kvm_mod_used_mmu_pages(vcpu->kvm, +1);
return sp;
 }
@@ -2133,10 +2130,13 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
trace_kvm_mmu_get_page(sp, false);
return sp;
}
+
++vcpu->kvm->stat.mmu_cache_miss;
-   sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct);
-   if (!sp)
-   return sp;
+
+   sp = kvm_mmu_alloc_page(vcpu, direct);
+
+   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
+
sp->gfn = gfn;
sp->role = role;
hlist_add_head(>hash_link,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()

2015-11-20 Thread Takuya Yoshikawa
kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does
nearly the same as the for_each_rmap_spte macro.  The only difference
is that is_shadow_present_pte() checks cannot be placed there because
kvm_mmu_mark_parents_unsync() can be called with a new parent pointer
whose entry is not set yet.

By calling mark_unsync() separately for the parent and adding the parent
pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro
works with no problem.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 36 +---
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7f46e3e..4e29d9a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, struct 
kvm_rmap_head *rmap_head)
}
 }
 
-typedef void (*pte_list_walk_fn) (u64 *spte);
-static void pte_list_walk(struct kvm_rmap_head *rmap_head, pte_list_walk_fn fn)
-{
-   struct pte_list_desc *desc;
-   int i;
-
-   if (!rmap_head->val)
-   return;
-
-   if (!(rmap_head->val & 1))
-   return fn((u64 *)rmap_head->val);
-
-   desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
-   while (desc) {
-   for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
-   fn(desc->sptes[i]);
-   desc = desc->more;
-   }
-}
-
 static struct kvm_rmap_head *__gfn_to_rmap(gfn_t gfn, int level,
   struct kvm_memory_slot *slot)
 {
@@ -1749,7 +1729,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu, int direct
 static void mark_unsync(u64 *spte);
 static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
 {
-   pte_list_walk(>parent_ptes, mark_unsync);
+   u64 *sptep;
+   struct rmap_iterator iter;
+
+   for_each_rmap_spte(>parent_ptes, , sptep) {
+   mark_unsync(sptep);
+   }
 }
 
 static void mark_unsync(u64 *spte)
@@ -2119,12 +2104,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
break;
 
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
if (sp->unsync_children) {
kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
kvm_mmu_mark_parents_unsync(sp);
-   } else if (sp->unsync)
+   if (parent_pte)
+   mark_unsync(parent_pte);
+   } else if (sp->unsync) {
kvm_mmu_mark_parents_unsync(sp);
+   if (parent_pte)
+   mark_unsync(parent_pte);
+   }
+   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 
__clear_sp_write_flooding_count(sp);
trace_kvm_mmu_get_page(sp, false);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/10] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()

2015-11-20 Thread Takuya Yoshikawa
Every time kvm_mmu_get_page() is called with a non-NULL parent_pte
argument, link_shadow_page() follows that to set the parent entry so
that the new mapping will point to the returned page table.

Moving parent_pte handling there allows to clean up the code because
parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and
mmu_page_add_parent_pte().

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 22 --
 arch/x86/kvm/paging_tmpl.h |  6 ++
 2 files changed, 10 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4e29d9a..b020323 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2107,14 +2107,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (sp->unsync_children) {
kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
kvm_mmu_mark_parents_unsync(sp);
-   if (parent_pte)
-   mark_unsync(parent_pte);
} else if (sp->unsync) {
kvm_mmu_mark_parents_unsync(sp);
-   if (parent_pte)
-   mark_unsync(parent_pte);
}
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 
__clear_sp_write_flooding_count(sp);
trace_kvm_mmu_get_page(sp, false);
@@ -2125,8 +2120,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
 
sp = kvm_mmu_alloc_page(vcpu, direct);
 
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
-
sp->gfn = gfn;
sp->role = role;
hlist_add_head(>hash_link,
@@ -2194,7 +2187,8 @@ static void shadow_walk_next(struct 
kvm_shadow_walk_iterator *iterator)
return __shadow_walk_next(iterator, *iterator->sptep);
 }
 
-static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp)
+static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
+struct kvm_mmu_page *sp)
 {
u64 spte;
 
@@ -2205,6 +2199,11 @@ static void link_shadow_page(u64 *sptep, struct 
kvm_mmu_page *sp)
   shadow_user_mask | shadow_x_mask | shadow_accessed_mask;
 
mmu_spte_set(sptep, spte);
+
+   if (sp->unsync_children || sp->unsync)
+   mark_unsync(sptep);
+
+   mmu_page_add_parent_pte(vcpu, sp, sptep);
 }
 
 static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
@@ -2263,11 +2262,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
mmu_page_zap_pte(kvm, sp, sp->spt + i);
 }
 
-static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
-{
-   mmu_page_remove_parent_pte(sp, parent_pte);
-}
-
 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
u64 *sptep;
@@ -2733,7 +2727,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, 
int map_writable,
  iterator.level - 1,
  1, ACC_ALL, iterator.sptep);
 
-   link_shadow_page(iterator.sptep, sp);
+   link_shadow_page(vcpu, iterator.sptep, sp);
}
}
return emulate;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 11650ea..0dcf9c8 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
goto out_gpte_changed;
 
if (sp)
-   link_shadow_page(it.sptep, sp);
+   link_shadow_page(vcpu, it.sptep, sp);
}
 
for (;
@@ -618,7 +618,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 
sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
  true, direct_access, it.sptep);
-   link_shadow_page(it.sptep, sp);
+   link_shadow_page(vcpu, it.sptep, sp);
}
 
clear_sp_write_flooding_count(it.sptep);
@@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
return emulate;
 
 out_gpte_changed:
-   if (sp)
-   kvm_mmu_put_page(sp, it.sptep);
kvm_release_pfn_clean(pfn);
return 0;
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()

2015-11-20 Thread Takuya Yoshikawa
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 20 +++-
 arch/x86/kvm/paging_tmpl.h |  4 ++--
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b020323..9baf884 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2071,8 +2071,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
 gva_t gaddr,
 unsigned level,
 int direct,
-unsigned access,
-u64 *parent_pte)
+unsigned access)
 {
union kvm_mmu_page_role role;
unsigned quadrant;
@@ -2724,8 +2723,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, 
int map_writable,
base_addr &= PT64_LVL_ADDR_MASK(iterator.level);
pseudo_gfn = base_addr >> PAGE_SHIFT;
sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
- iterator.level - 1,
- 1, ACC_ALL, iterator.sptep);
+ iterator.level - 1, 1, ACC_ALL);
 
link_shadow_page(vcpu, iterator.sptep, sp);
}
@@ -3082,8 +3080,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
spin_lock(>kvm->mmu_lock);
make_mmu_pages_available(vcpu);
-   sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL,
- 1, ACC_ALL, NULL);
+   sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, 1, ACC_ALL);
++sp->root_count;
spin_unlock(>kvm->mmu_lock);
vcpu->arch.mmu.root_hpa = __pa(sp->spt);
@@ -3095,9 +3092,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
spin_lock(>kvm->mmu_lock);
make_mmu_pages_available(vcpu);
sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT),
- i << 30,
- PT32_ROOT_LEVEL, 1, ACC_ALL,
- NULL);
+   i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
spin_unlock(>kvm->mmu_lock);
@@ -3134,7 +3129,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
spin_lock(>kvm->mmu_lock);
make_mmu_pages_available(vcpu);
sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL,
- 0, ACC_ALL, NULL);
+ 0, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
spin_unlock(>kvm->mmu_lock);
@@ -3167,9 +3162,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
}
spin_lock(>kvm->mmu_lock);
make_mmu_pages_available(vcpu);
-   sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
- PT32_ROOT_LEVEL, 0,
- ACC_ALL, NULL);
+   sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL,
+ 0, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
spin_unlock(>kvm->mmu_lock);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 0dcf9c8..91e939b 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -587,7 +587,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
if (!is_shadow_present_pte(*it.sptep)) {
table_gfn = gw->table_gfn[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
- false, access, it.sptep);
+ false, access);
}
 
/*
@@ -617,7 +617,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
direct_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
 
sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
- true, direct_access, it.sptep);
+ true, direct_access);
link_shadow_page(vcpu, it.sptep, sp);
}
 
-

Re: [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap

2015-11-18 Thread Takuya Yoshikawa

On 2015/11/18 11:44, Xiao Guangrong wrote:


On 11/12/2015 07:50 PM, Takuya Yoshikawa wrote:

+if (!ret) {
+clear_unsync_child_bit(sp, i);
+continue;
+} else if (ret > 0) {
  nr_unsync_leaf += ret;


Just a single line here, braces are unnecessary.


-else
+} else
  return ret;


I know we can eliminate the braces, but that does not mean
we should do so: there seems to be no consensus about this
style issue and checkpatch accepts both ways.

Actually, some people prefer to put braces when one of the
if/else-if/else cases has multiple lines.  You can see
some examples in kernel/sched/core.c: see hrtick_start(),
sched_fork(), free_sched_domain().

In our case, I thought putting braces would align the else-if
and else and make the code look a bit nicer, but I know this
may be just a matter of personal feeling.

In short, unless the maintainer, Paolo for this file, has any
preference, both ways will be accepted.

  Takuya

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct

2015-11-18 Thread Takuya Yoshikawa

On 2015/11/18 18:09, Paolo Bonzini wrote:


On 18/11/2015 04:21, Xiao Guangrong wrote:



On 11/12/2015 07:55 PM, Takuya Yoshikawa wrote:

@@ -1720,7 +1724,7 @@ static struct kvm_mmu_page
*kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
* this feature. See the comments in kvm_zap_obsolete_pages().
*/
   list_add(>link, >kvm->arch.active_mmu_pages);
-sp->parent_ptes = 0;
+sp->parent_ptes.val = 0;


The sp is allocated from kmem_cache_zalloc() so explicitly initialize it
to zero is not needed.


Right, but it should be a separate patch.

Takuya, since you are going to send another version of this series, can
you also:


Yes, I'm preparing to do so.


1) move this patch either to the beginning or to the end

2) include "KVM: x86: always set accessed bit in shadow PTEs", also near
the beginning of the series?


Commit 1c9a5e19b1af8a2c ("KVM: x86: MMU: always set accessed bit
in shadow PTEs") will be the first.

Then, the ordering will become something like this:

02: Encapsulate the type of rmap-chain head in a new struct
03: Remove unused parameter of __direct_map()
04: Add helper function to clear a bit in unsync child bitmap
05: Make mmu_set_spte() return emulate value
06: Remove is_rmap_spte() and use is_shadow_present_pte()

These five seem to be easy ones for you to apply: since patch 02
touches many places, it should go first to become the base of the
following work.

07: Consolidate BUG_ON checks for reverse-mapped sptes

I will change the WARN_ON to BUG_ON.  // Marcelo's comment

08: Move initialization of parent_ptes out from kvm_mmu_alloc_page()

In this patch, I will delete "sp->parent_ptes.val = 0;" line since
this is the problem of kvm_mmu_alloc_page(), though not a new one.
  // Xiao's comment

09: Use for_each_rmap_spte macro instead of pte_list_walk()

There is some confusion between us: Paolo and I agreed that the
patch keeps the original way and calls mark_unsync() the same way
as before, but there are still comments from Marcelo and Xiao and
those comments seem to explain the code differently.

I will check again, but I may not change this one and the following
two patches in the next version.  If we can eliminate some of the
mark_unsync() calls, that will be kind of an optimization which this
series does not intend to achieve.

Anyway, by moving the non-trivial two patches (09 and 10) here,
reviewing will become easier and you can apply the other patches
separately.

10: Move parent_pte handling from kvm_mmu_get_page()
to link_shadow_page()
11: Remove unused parameter parent_pte from kvm_mmu_get_page()

Thanks,
  Takuya

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap

2015-11-18 Thread Takuya Yoshikawa

On 2015/11/19 11:46, Xiao Guangrong wrote:


Actually, some people prefer to put braces when one of the
if/else-if/else cases has multiple lines.  You can see
some examples in kernel/sched/core.c: see hrtick_start(),
sched_fork(), free_sched_domain().

In our case, I thought putting braces would align the else-if
and else and make the code look a bit nicer, but I know this
may be just a matter of personal feeling.

In short, unless the maintainer, Paolo for this file, has any
preference, both ways will be accepted.


The reason why i pointed this out is that it is the style documented
in Documentation/CodingStyle:
| Do not unnecessarily use braces where a single statement will do.
|
|if (condition)
|action();
|


Ah, this is a different thing.  For this case, there is a consensus
and checkpatch will complain if we don't obey the rule.

What I explained was:

  if (condition) {
 line1;
 line2;  // multiple lines
  } else if {
 single-line-statement;  -- (*1)
  } else
 single-line-statement;  -- (*2)

For (*1) and (*2), especially for (*1), some people put braces.


Actually, Ingo Molnar hated this braces-style too much and blamed
many developers who used this style (include me, that why i was
nervous to see this style :( ).


I think he likes the coding style of kernel/sched/core.c very much,
as you know.  Actually that is one reason why I took it as an example.

Let's just choose the way which Paolo prefers for this time, I don't
know which is better.

Thank you,
  Takuya


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()

2015-11-15 Thread Takuya Yoshikawa

On 2015/11/14 18:20, Marcelo Tosatti wrote:


The actual issue is this: a higher level page that had, under its children,
no out of sync pages, now, due to your addition, a child that is unsync:

initial state:
level1

final state:

level1 -x-> level2 -x-> level3

Where -x-> are the links created by this pagefault fixing round.

If _any_ page under you is unsync (not necessarily the ones this
pagefault is accessing), you have to mark parents unsync.


I understand this, but I don't think my patch will break this.

What kvm_mmu_mark_parents_unsync() does is:

  for each p_i in sp->parent_ptes rmap chain
mark_unsync(p_i);

Then, mark_unsync() finds the parent sp including that p_i to
set ->unsync_child_bitmap and increment ->unsync_children if
necessary.  It may also call kvm_mmu_mark_parents_unsync()
recursively.

I understand we need to tell the parents "you have an unsync
child/descendant" until this information reaches the top level
by that recursive calls.

But since these recursive calls cannot come back to the starting sp,
the child->parent graph has no loop, each mark_unsync(p_i) will not
be affected by other parents in that sp->parent_ptes rmap chain,
from which we started the recursive calls.


As the following code shows, my patch does mark_unsync(parent_pte)
separately, and then mmu_page_add_parent_pte(vcpu, sp, parent_pte):


-   } else if (sp->unsync)
+   if (parent_pte)
+   mark_unsync(parent_pte);
+   } else if (sp->unsync) {
kvm_mmu_mark_parents_unsync(sp);
+   if (parent_pte)
+   mark_unsync(parent_pte);
+   }
+   mmu_page_add_parent_pte(vcpu, sp, parent_pte);


So, as you worried, during each mark_unsync(p_i) is processed,
this parent_pte does not exist in that sp->parent_ptes rmap chain.

But as I explained above, this does not change anything about what
each mark_unsync(p_i) call does, so keeps the original behaviour.


By the way, I think "kvm_mmu_mark_parents_unsync" and "mark_unsync"
do not tell what they actually do well. When I first saw the names,
I thought they would just set the parents' sp->unsync.

To reflect the following meaning better, it should be
propagate_unsync(_to_parents) or something:

  Tell the parents "you have an unsync child/descendant"
  until this unsync information reaches the top level


Thanks,
  Takuya


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes

2015-11-15 Thread Takuya Yoshikawa

On 2015/11/14 7:08, Marcelo Tosatti wrote:

On Thu, Nov 12, 2015 at 08:53:43PM +0900, Takuya Yoshikawa wrote:

At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is
placed right after the call to detect unrelated sptes which must not be
found in the reverse-mapping list.

Move this check in rmap_get_first/next() so that all call sites, not
just the users of the for_each_rmap_spte() macro, will be checked the
same way.  In addition, change the BUG_ON to WARN_ON since killing the
whole host is the last thing that KVM should try.


It should be a BUG_ON, if KVM continues it will corrupt (more) memory.


In the sense that we cannot predict what kind of corruption it will
cause, I agree with you.

But if it can only corrupt that guest's memory, it is a bit sad to
kill unrelated guests, and host, too.  Anyway, since we cannot say
for sure what a possible bug can cause, I agree with you now.

Thanks,
  Takuya

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/10] KVM: x86: MMU: Make mmu_set_spte() return emulate value

2015-11-12 Thread Takuya Yoshikawa
mmu_set_spte()'s code is based on the assumption that the emulate
parameter has a valid pointer value if set_spte() returns true and
write_fault is not zero.  In other cases, emulate may be NULL, so a
NULL-check is needed.

Stop passing emulate pointer and make mmu_set_spte() return the emulate
value instead to clean up this complex interface.  Prefetch functions
can just throw away the return value.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 27 ++-
 arch/x86/kvm/paging_tmpl.h | 10 +-
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f3120aa..c229356 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2564,13 +2564,13 @@ done:
return ret;
 }
 
-static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-unsigned pte_access, int write_fault, int *emulate,
-int level, gfn_t gfn, pfn_t pfn, bool speculative,
-bool host_writable)
+static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned 
pte_access,
+int write_fault, int level, gfn_t gfn, pfn_t pfn,
+bool speculative, bool host_writable)
 {
int was_rmapped = 0;
int rmap_count;
+   bool emulate = false;
 
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 *sptep, write_fault, gfn);
@@ -2600,12 +2600,12 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative,
  true, host_writable)) {
if (write_fault)
-   *emulate = 1;
+   emulate = true;
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
}
 
-   if (unlikely(is_mmio_spte(*sptep) && emulate))
-   *emulate = 1;
+   if (unlikely(is_mmio_spte(*sptep)))
+   emulate = true;
 
pgprintk("%s: setting spte %llx\n", __func__, *sptep);
pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n",
@@ -2624,6 +2624,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
}
 
kvm_release_pfn_clean(pfn);
+
+   return emulate;
 }
 
 static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
@@ -2658,9 +2660,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
return -1;
 
for (i = 0; i < ret; i++, gfn++, start++)
-   mmu_set_spte(vcpu, start, access, 0, NULL,
-sp->role.level, gfn, page_to_pfn(pages[i]),
-true, true);
+   mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn,
+page_to_pfn(pages[i]), true, true);
 
return 0;
 }
@@ -2721,9 +2722,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, 
int map_writable,
 
for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
if (iterator.level == level) {
-   mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
-write, , level, gfn, pfn,
-prefault, map_writable);
+   emulate = mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
+  write, level, gfn, pfn, prefault,
+  map_writable);
direct_pte_prefetch(vcpu, iterator.sptep);
++vcpu->stat.pf_fixed;
break;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 3058a22..9d21b44 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -475,8 +475,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
 * we call mmu_set_spte() with host_writable = true because
 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 */
-   mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL,
-gfn, pfn, true, true);
+   mmu_set_spte(vcpu, spte, pte_access, 0, PT_PAGE_TABLE_LEVEL, gfn, pfn,
+true, true);
 
return true;
 }
@@ -556,7 +556,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
unsigned direct_access, access = gw->pt_access;
-   int top_level, emulate = 0;
+   int top_level, emulate;
 
direct_access = gw->pte_access;
 
@@ -622,8 +622,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
}
 
clear_sp_write_flooding_count(it.sptep);
-   mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, ,
-it.level, gw-

[PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap

2015-11-12 Thread Takuya Yoshikawa
Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line
code which clears a bit in the unsync child bitmap; the former places it
inside a loop block and uses a few goto statements to jump to it.

A new helper function, clear_unsync_child_bit(), makes the code cleaner.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c3bbc82..f3120aa 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1806,6 +1806,13 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, 
struct kvm_mmu_page *sp,
return (pvec->nr == KVM_PAGE_ARRAY_NR);
 }
 
+static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
+{
+   --sp->unsync_children;
+   WARN_ON((int)sp->unsync_children < 0);
+   __clear_bit(idx, sp->unsync_child_bitmap);
+}
+
 static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
   struct kvm_mmu_pages *pvec)
 {
@@ -1815,8 +1822,10 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
struct kvm_mmu_page *child;
u64 ent = sp->spt[i];
 
-   if (!is_shadow_present_pte(ent) || is_large_pte(ent))
-   goto clear_child_bitmap;
+   if (!is_shadow_present_pte(ent) || is_large_pte(ent)) {
+   clear_unsync_child_bit(sp, i);
+   continue;
+   }
 
child = page_header(ent & PT64_BASE_ADDR_MASK);
 
@@ -1825,28 +1834,21 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
return -ENOSPC;
 
ret = __mmu_unsync_walk(child, pvec);
-   if (!ret)
-   goto clear_child_bitmap;
-   else if (ret > 0)
+   if (!ret) {
+   clear_unsync_child_bit(sp, i);
+   continue;
+   } else if (ret > 0) {
nr_unsync_leaf += ret;
-   else
+   } else
return ret;
} else if (child->unsync) {
nr_unsync_leaf++;
if (mmu_pages_add(pvec, child, i))
return -ENOSPC;
} else
-goto clear_child_bitmap;
-
-   continue;
-
-clear_child_bitmap:
-   __clear_bit(i, sp->unsync_child_bitmap);
-   sp->unsync_children--;
-   WARN_ON((int)sp->unsync_children < 0);
+   clear_unsync_child_bit(sp, i);
}
 
-
return nr_unsync_leaf;
 }
 
@@ -2009,9 +2011,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path 
*parents)
if (!sp)
return;
 
-   --sp->unsync_children;
-   WARN_ON((int)sp->unsync_children < 0);
-   __clear_bit(idx, sp->unsync_child_bitmap);
+   clear_unsync_child_bit(sp, idx);
level++;
} while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()

2015-11-12 Thread Takuya Yoshikawa
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 20 +++-
 arch/x86/kvm/paging_tmpl.h |  4 ++--
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 33fe720..101e77d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2072,8 +2072,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
 gva_t gaddr,
 unsigned level,
 int direct,
-unsigned access,
-u64 *parent_pte)
+unsigned access)
 {
union kvm_mmu_page_role role;
unsigned quadrant;
@@ -2730,8 +2729,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, 
int map_writable,
base_addr &= PT64_LVL_ADDR_MASK(iterator.level);
pseudo_gfn = base_addr >> PAGE_SHIFT;
sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
- iterator.level - 1,
- 1, ACC_ALL, iterator.sptep);
+ iterator.level - 1, 1, ACC_ALL);
 
link_shadow_page(vcpu, iterator.sptep, sp, true);
}
@@ -3088,8 +3086,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) {
spin_lock(>kvm->mmu_lock);
make_mmu_pages_available(vcpu);
-   sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL,
- 1, ACC_ALL, NULL);
+   sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, 1, ACC_ALL);
++sp->root_count;
spin_unlock(>kvm->mmu_lock);
vcpu->arch.mmu.root_hpa = __pa(sp->spt);
@@ -3101,9 +3098,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
spin_lock(>kvm->mmu_lock);
make_mmu_pages_available(vcpu);
sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT),
- i << 30,
- PT32_ROOT_LEVEL, 1, ACC_ALL,
- NULL);
+   i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
spin_unlock(>kvm->mmu_lock);
@@ -3140,7 +3135,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
spin_lock(>kvm->mmu_lock);
make_mmu_pages_available(vcpu);
sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL,
- 0, ACC_ALL, NULL);
+ 0, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
spin_unlock(>kvm->mmu_lock);
@@ -3173,9 +3168,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
}
spin_lock(>kvm->mmu_lock);
make_mmu_pages_available(vcpu);
-   sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30,
- PT32_ROOT_LEVEL, 0,
- ACC_ALL, NULL);
+   sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL,
+ 0, ACC_ALL);
root = __pa(sp->spt);
++sp->root_count;
spin_unlock(>kvm->mmu_lock);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index f414ca6..ee9d211 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -587,7 +587,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
if (!is_shadow_present_pte(*it.sptep)) {
table_gfn = gw->table_gfn[it.level - 2];
sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1,
- false, access, it.sptep);
+ false, access);
}
 
/*
@@ -617,7 +617,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
direct_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
 
sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
- true, direct_access, it.sptep);
+ true, direct_access);
link_shadow_page(vcpu, it.sptep, sp, PT_G

[PATCH 04/10] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()

2015-11-12 Thread Takuya Yoshikawa
is_rmap_spte(), originally named is_rmap_pte(), was introduced when the
simple reverse mapping was implemented by commit cd4a4e5374110444
("[PATCH] KVM: MMU: Implement simple reverse mapping").  At that point,
its role was clear and only rmap_add() and rmap_remove() were using it
to select sptes that need to be reverse-mapped.

Independently of that, is_shadow_present_pte() was first introduced by
commit c7addb902054195b ("KVM: Allow not-present guest page faults to
bypass kvm") to do bypass_guest_pf optimization, which does not exist
any more.

These two seem to have changed their roles somewhat, and is_rmap_spte()
just calls is_shadow_present_pte() now.

Since using both of them with no clear distinction just makes the code
confusing, remove is_rmap_spte().

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c   | 13 -
 arch/x86/kvm/mmu_audit.c |  2 +-
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c229356..e8cfdc4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -311,11 +311,6 @@ static int is_large_pte(u64 pte)
return pte & PT_PAGE_SIZE_MASK;
 }
 
-static int is_rmap_spte(u64 pte)
-{
-   return is_shadow_present_pte(pte);
-}
-
 static int is_last_spte(u64 pte, int level)
 {
if (level == PT_PAGE_TABLE_LEVEL)
@@ -540,7 +535,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
u64 old_spte = *sptep;
bool ret = false;
 
-   WARN_ON(!is_rmap_spte(new_spte));
+   WARN_ON(!is_shadow_present_pte(new_spte));
 
if (!is_shadow_present_pte(old_spte)) {
mmu_spte_set(sptep, new_spte);
@@ -595,7 +590,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
else
old_spte = __update_clear_spte_slow(sptep, 0ull);
 
-   if (!is_rmap_spte(old_spte))
+   if (!is_shadow_present_pte(old_spte))
return 0;
 
pfn = spte_to_pfn(old_spte);
@@ -2575,7 +2570,7 @@ static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep, unsigned pte_access,
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 *sptep, write_fault, gfn);
 
-   if (is_rmap_spte(*sptep)) {
+   if (is_shadow_present_pte(*sptep)) {
/*
 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
 * the parent of the now unreachable PTE.
@@ -2919,7 +2914,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t 
gva, int level,
 * If the mapping has been changed, let the vcpu fault on the
 * same address again.
 */
-   if (!is_rmap_spte(spte)) {
+   if (!is_shadow_present_pte(spte)) {
ret = true;
goto exit;
}
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index 03d518e..90ee420 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -183,7 +183,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct 
kvm_mmu_page *sp)
return;
 
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
-   if (!is_rmap_spte(sp->spt[i]))
+   if (!is_shadow_present_pte(sp->spt[i]))
continue;
 
inspect_spte_has_rmap(kvm, sp->spt + i);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/10] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page()

2015-11-12 Thread Takuya Yoshikawa
Make kvm_mmu_alloc_page() do just what its name tells to do, and remove
the extra error check at its call site since the allocation cannot fail.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 85f4bbd..9273cd4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1707,8 +1707,7 @@ static void drop_parent_pte(struct kvm_mmu_page *sp,
mmu_spte_clear_no_track(parent_pte);
 }
 
-static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
-  u64 *parent_pte, int direct)
+static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int 
direct)
 {
struct kvm_mmu_page *sp;
 
@@ -1724,8 +1723,6 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu,
 * this feature. See the comments in kvm_zap_obsolete_pages().
 */
list_add(>link, >kvm->arch.active_mmu_pages);
-   sp->parent_ptes.val = 0;
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
kvm_mod_used_mmu_pages(vcpu->kvm, +1);
return sp;
 }
@@ -2124,10 +2121,14 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
trace_kvm_mmu_get_page(sp, false);
return sp;
}
+
++vcpu->kvm->stat.mmu_cache_miss;
-   sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct);
-   if (!sp)
-   return sp;
+
+   sp = kvm_mmu_alloc_page(vcpu, direct);
+
+   sp->parent_ptes.val = 0;
+   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
+
sp->gfn = gfn;
sp->role = role;
hlist_add_head(>hash_link,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct

2015-11-12 Thread Takuya Yoshikawa
New struct kvm_rmap_head makes the code type-safe to some extent.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/include/asm/kvm_host.h |   8 +-
 arch/x86/kvm/mmu.c  | 169 +---
 arch/x86/kvm/mmu_audit.c|  13 ++--
 3 files changed, 100 insertions(+), 90 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0535359..c5a0c4a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -214,6 +214,10 @@ union kvm_mmu_page_role {
};
 };
 
+struct kvm_rmap_head {
+   unsigned long val;
+};
+
 struct kvm_mmu_page {
struct list_head link;
struct hlist_node hash_link;
@@ -231,7 +235,7 @@ struct kvm_mmu_page {
bool unsync;
int root_count;  /* Currently serving as active root */
unsigned int unsync_children;
-   unsigned long parent_ptes;  /* Reverse mapping for parent_pte */
+   struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */
 
/* The page is obsolete if mmu_valid_gen != kvm->arch.mmu_valid_gen.  */
unsigned long mmu_valid_gen;
@@ -604,7 +608,7 @@ struct kvm_lpage_info {
 };
 
 struct kvm_arch_memory_slot {
-   unsigned long *rmap[KVM_NR_PAGE_SIZES];
+   struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
 };
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ee7b101..85f4bbd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -916,24 +916,24 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t 
large_gfn,
  *
  */
 static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
-   unsigned long *pte_list)
+   struct kvm_rmap_head *rmap_head)
 {
struct pte_list_desc *desc;
int i, count = 0;
 
-   if (!*pte_list) {
+   if (!rmap_head->val) {
rmap_printk("pte_list_add: %p %llx 0->1\n", spte, *spte);
-   *pte_list = (unsigned long)spte;
-   } else if (!(*pte_list & 1)) {
+   rmap_head->val = (unsigned long)spte;
+   } else if (!(rmap_head->val & 1)) {
rmap_printk("pte_list_add: %p %llx 1->many\n", spte, *spte);
desc = mmu_alloc_pte_list_desc(vcpu);
-   desc->sptes[0] = (u64 *)*pte_list;
+   desc->sptes[0] = (u64 *)rmap_head->val;
desc->sptes[1] = spte;
-   *pte_list = (unsigned long)desc | 1;
+   rmap_head->val = (unsigned long)desc | 1;
++count;
} else {
rmap_printk("pte_list_add: %p %llx many->many\n", spte, *spte);
-   desc = (struct pte_list_desc *)(*pte_list & ~1ul);
+   desc = (struct pte_list_desc *)(rmap_head->val & ~1ul);
while (desc->sptes[PTE_LIST_EXT-1] && desc->more) {
desc = desc->more;
count += PTE_LIST_EXT;
@@ -950,8 +950,9 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
 }
 
 static void
-pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc,
-  int i, struct pte_list_desc *prev_desc)
+pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head,
+  struct pte_list_desc *desc, int i,
+  struct pte_list_desc *prev_desc)
 {
int j;
 
@@ -962,43 +963,43 @@ pte_list_desc_remove_entry(unsigned long *pte_list, 
struct pte_list_desc *desc,
if (j != 0)
return;
if (!prev_desc && !desc->more)
-   *pte_list = (unsigned long)desc->sptes[0];
+   rmap_head->val = (unsigned long)desc->sptes[0];
else
if (prev_desc)
prev_desc->more = desc->more;
else
-   *pte_list = (unsigned long)desc->more | 1;
+   rmap_head->val = (unsigned long)desc->more | 1;
mmu_free_pte_list_desc(desc);
 }
 
-static void pte_list_remove(u64 *spte, unsigned long *pte_list)
+static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head)
 {
struct pte_list_desc *desc;
struct pte_list_desc *prev_desc;
int i;
 
-   if (!*pte_list) {
+   if (!rmap_head->val) {
printk(KERN_ERR "pte_list_remove: %p 0->BUG\n", spte);
BUG();
-   } else if (!(*pte_list & 1)) {
+   } else if (!(rmap_head->val & 1)) {
rmap_printk("pte_list_remove:  %p 1->0\n", spte);
-   if ((u64 *)*pte_list != spte) {
+   if ((u64 *)rmap_head->val != spte) {
printk(KERN_ERR "pte_list_remove:  %p 1-&g

[PATCH 01/10] KVM: x86: MMU: Remove unused parameter of __direct_map()

2015-11-12 Thread Takuya Yoshikawa
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e7c2c14..c3bbc82 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2708,9 +2708,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, 
u64 *sptep)
__direct_pte_prefetch(vcpu, sp, sptep);
 }
 
-static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
-   int map_writable, int level, gfn_t gfn, pfn_t pfn,
-   bool prefault)
+static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
+   int level, gfn_t gfn, pfn_t pfn, bool prefault)
 {
struct kvm_shadow_walk_iterator iterator;
struct kvm_mmu_page *sp;
@@ -3018,11 +3017,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
u32 error_code,
make_mmu_pages_available(vcpu);
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, , , );
-   r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn,
-prefault);
+   r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
spin_unlock(>kvm->mmu_lock);
 
-
return r;
 
 out_unlock:
@@ -3531,8 +3528,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
make_mmu_pages_available(vcpu);
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, , , );
-   r = __direct_map(vcpu, gpa, write, map_writable,
-level, gfn, pfn, prefault);
+   r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
spin_unlock(>kvm->mmu_lock);
 
return r;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work

2015-11-12 Thread Takuya Yoshikawa
v1->v2:
  Patch 5 and 7 are added based on Paolo's suggestions.
  Patch 8-10 are new.

Patch 1/2/3/4: no change.
Patch 5: Needed a bit more work than I had expected.
Patch 6: Removed extra comment of v1 (patch 5 made it inappropriate).
Patch 7: As expected, many places needed to be converted.
Patch 8: This is new, but only a small change.

Patch 9: Kind of an RFC (though I have checked it to some extent).
  Following two places need to be carefully checked:
  - in kvm_mmu_get_page: "if (!direct)" block after kvm_mmu_alloc_page()
  - in FNAME(fetch): "if (FNAME(gpte_changed)(vcpu, gw, it.level - 1))" case
Patch 10: Trivial cleanup, assuming that patch 9 is correct.


In summary: patch 1-7 is the result of updating v1 based on the suggestions.
  Although patch 5 does not look so nice than expected, this is the most
  conservative approach, and patch 8-10 try to alleviate the sadness.

  Takuya


Takuya Yoshikawa (10):
  01:  KVM: x86: MMU: Remove unused parameter of __direct_map()
  02:  KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  03:  KVM: x86: MMU: Make mmu_set_spte() return emulate value
  04:  KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
  05:  KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
  06:  KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
  07:  KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
  08:  KVM: x86: MMU: Move initialization of parent_ptes out from 
kvm_mmu_alloc_page()
  09:  KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to 
link_shadow_page()
  10:  KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()

 Documentation/virtual/kvm/mmu.txt |   4 +-
 arch/x86/include/asm/kvm_host.h   |   8 +-
 arch/x86/kvm/mmu.c| 357 ++
 arch/x86/kvm/mmu_audit.c  |  15 +-
 arch/x86/kvm/paging_tmpl.h|  20 +--
 5 files changed, 196 insertions(+), 208 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()

2015-11-12 Thread Takuya Yoshikawa
Every time kvm_mmu_get_page() is called with a non-NULL parent_pte
argument, link_shadow_page() follows that to set the parent entry so
that the new mapping will point to the returned page table.

Moving parent_pte handling there allows to clean up the code because
parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and
mmu_page_add_parent_pte().

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 21 -
 arch/x86/kvm/paging_tmpl.h |  6 ++
 2 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9273cd4..33fe720 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2108,14 +2108,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (sp->unsync_children) {
kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
kvm_mmu_mark_parents_unsync(sp);
-   if (parent_pte)
-   mark_unsync(parent_pte);
} else if (sp->unsync) {
kvm_mmu_mark_parents_unsync(sp);
-   if (parent_pte)
-   mark_unsync(parent_pte);
}
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 
__clear_sp_write_flooding_count(sp);
trace_kvm_mmu_get_page(sp, false);
@@ -2127,7 +2122,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
sp = kvm_mmu_alloc_page(vcpu, direct);
 
sp->parent_ptes.val = 0;
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 
sp->gfn = gfn;
sp->role = role;
@@ -2196,7 +2190,8 @@ static void shadow_walk_next(struct 
kvm_shadow_walk_iterator *iterator)
return __shadow_walk_next(iterator, *iterator->sptep);
 }
 
-static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp, bool 
accessed)
+static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
+struct kvm_mmu_page *sp, bool accessed)
 {
u64 spte;
 
@@ -2210,6 +2205,11 @@ static void link_shadow_page(u64 *sptep, struct 
kvm_mmu_page *sp, bool accessed)
spte |= shadow_accessed_mask;
 
mmu_spte_set(sptep, spte);
+
+   if (sp->unsync_children || sp->unsync)
+   mark_unsync(sptep);
+
+   mmu_page_add_parent_pte(vcpu, sp, sptep);
 }
 
 static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
@@ -2268,11 +2268,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
mmu_page_zap_pte(kvm, sp, sp->spt + i);
 }
 
-static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
-{
-   mmu_page_remove_parent_pte(sp, parent_pte);
-}
-
 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
u64 *sptep;
@@ -2738,7 +2733,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, 
int map_writable,
  iterator.level - 1,
  1, ACC_ALL, iterator.sptep);
 
-   link_shadow_page(iterator.sptep, sp, true);
+   link_shadow_page(vcpu, iterator.sptep, sp, true);
}
}
return emulate;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 9d21b44..f414ca6 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
goto out_gpte_changed;
 
if (sp)
-   link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK);
+   link_shadow_page(vcpu, it.sptep, sp, 
PT_GUEST_ACCESSED_MASK);
}
 
for (;
@@ -618,7 +618,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 
sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1,
  true, direct_access, it.sptep);
-   link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK);
+   link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK);
}
 
clear_sp_write_flooding_count(it.sptep);
@@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
return emulate;
 
 out_gpte_changed:
-   if (sp)
-   kvm_mmu_put_page(sp, it.sptep);
kvm_release_pfn_clean(pfn);
return 0;
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()

2015-11-12 Thread Takuya Yoshikawa
kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does
nearly the same as the for_each_rmap_spte macro.  The only difference
is that is_shadow_present_pte() checks cannot be placed there because
kvm_mmu_mark_parents_unsync() can be called with a new parent pointer
whose entry is not set yet.

By calling mark_unsync() separately for the parent and adding the parent
pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro
works with no problem.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 36 +---
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e8cfdc4..1691171 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, unsigned long 
*pte_list)
}
 }
 
-typedef void (*pte_list_walk_fn) (u64 *spte);
-static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
-{
-   struct pte_list_desc *desc;
-   int i;
-
-   if (!*pte_list)
-   return;
-
-   if (!(*pte_list & 1))
-   return fn((u64 *)*pte_list);
-
-   desc = (struct pte_list_desc *)(*pte_list & ~1ul);
-   while (desc) {
-   for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
-   fn(desc->sptes[i]);
-   desc = desc->more;
-   }
-}
-
 static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
struct kvm_memory_slot *slot)
 {
@@ -1741,7 +1721,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct 
kvm_vcpu *vcpu,
 static void mark_unsync(u64 *spte);
 static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
 {
-   pte_list_walk(>parent_ptes, mark_unsync);
+   u64 *sptep;
+   struct rmap_iterator iter;
+
+   for_each_rmap_spte(>parent_ptes, , sptep) {
+   mark_unsync(sptep);
+   }
 }
 
 static void mark_unsync(u64 *spte)
@@ -2111,12 +2096,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
if (sp->unsync && kvm_sync_page_transient(vcpu, sp))
break;
 
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
if (sp->unsync_children) {
kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
kvm_mmu_mark_parents_unsync(sp);
-   } else if (sp->unsync)
+   if (parent_pte)
+   mark_unsync(parent_pte);
+   } else if (sp->unsync) {
kvm_mmu_mark_parents_unsync(sp);
+   if (parent_pte)
+   mark_unsync(parent_pte);
+   }
+   mmu_page_add_parent_pte(vcpu, sp, parent_pte);
 
__clear_sp_write_flooding_count(sp);
trace_kvm_mmu_get_page(sp, false);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes

2015-11-12 Thread Takuya Yoshikawa
At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is
placed right after the call to detect unrelated sptes which must not be
found in the reverse-mapping list.

Move this check in rmap_get_first/next() so that all call sites, not
just the users of the for_each_rmap_spte() macro, will be checked the
same way.  In addition, change the BUG_ON to WARN_ON since killing the
whole host is the last thing that KVM should try.

One thing to keep in mind is that kvm_mmu_unlink_parents() also uses
rmap_get_first() to handle parent sptes.  The change will not break it
because parent sptes are present, at least until drop_parent_pte()
actually unlinks them, and not mmio-sptes.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 Documentation/virtual/kvm/mmu.txt |  4 ++--
 arch/x86/kvm/mmu.c| 26 +-
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt 
b/Documentation/virtual/kvm/mmu.txt
index 3a4d681..daf9c0f 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -203,10 +203,10 @@ Shadow pages contain the following information:
 page cannot be destroyed.  See role.invalid.
   parent_ptes:
 The reverse mapping for the pte/ptes pointing at this page's spt. If
-parent_ptes bit 0 is zero, only one spte points at this pages and
+parent_ptes bit 0 is zero, only one spte points at this page and
 parent_ptes points at this single spte, otherwise, there exists multiple
 sptes pointing at this page and (parent_ptes & ~0x1) points at a data
-structure with a list of parent_ptes.
+structure with a list of parent sptes.
   unsync:
 If true, then the translations in this page may not match the guest's
 translation.  This is equivalent to the state of the tlb when a pte is
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1691171..ee7b101 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1079,17 +1079,23 @@ struct rmap_iterator {
  */
 static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
 {
+   u64 *sptep;
+
if (!rmap)
return NULL;
 
if (!(rmap & 1)) {
iter->desc = NULL;
-   return (u64 *)rmap;
+   sptep = (u64 *)rmap;
+   goto out;
}
 
iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
iter->pos = 0;
-   return iter->desc->sptes[iter->pos];
+   sptep = iter->desc->sptes[iter->pos];
+out:
+   WARN_ON(!is_shadow_present_pte(*sptep));
+   return sptep;
 }
 
 /*
@@ -1099,14 +1105,14 @@ static u64 *rmap_get_first(unsigned long rmap, struct 
rmap_iterator *iter)
  */
 static u64 *rmap_get_next(struct rmap_iterator *iter)
 {
+   u64 *sptep;
+
if (iter->desc) {
if (iter->pos < PTE_LIST_EXT - 1) {
-   u64 *sptep;
-
++iter->pos;
sptep = iter->desc->sptes[iter->pos];
if (sptep)
-   return sptep;
+   goto out;
}
 
iter->desc = iter->desc->more;
@@ -1114,17 +1120,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
if (iter->desc) {
iter->pos = 0;
/* desc->sptes[0] cannot be NULL */
-   return iter->desc->sptes[iter->pos];
+   sptep = iter->desc->sptes[iter->pos];
+   goto out;
}
}
 
return NULL;
+out:
+   WARN_ON(!is_shadow_present_pte(*sptep));
+   return sptep;
 }
 
 #define for_each_rmap_spte(_rmap_, _iter_, _spte_) \
   for (_spte_ = rmap_get_first(*_rmap_, _iter_);   \
-   _spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;});  \
-   _spte_ = rmap_get_next(_iter_))
+   _spte_; _spte_ = rmap_get_next(_iter_))
 
 static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
@@ -1338,7 +1347,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long 
*rmapp)
bool flush = false;
 
while ((sptep = rmap_get_first(*rmapp, ))) {
-   BUG_ON(!(*sptep & PT_PRESENT_MASK));
rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);
 
drop_spte(kvm, sptep);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()

2015-11-12 Thread Takuya Yoshikawa

On 2015/11/12 23:27, Paolo Bonzini wrote:


On 12/11/2015 12:56, Takuya Yoshikawa wrote:

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 9d21b44..f414ca6 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
goto out_gpte_changed;

if (sp)
-   link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK);
+   link_shadow_page(vcpu, it.sptep, sp, 
PT_GUEST_ACCESSED_MASK);
}



Here I think you can remove completely the

if (sp)
kvm_mmu_put_page(sp, it.sptep);

later in FNAME(fetch).  Apart from this nit, it's okay.


Yes, that's what this patch does below:


@@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
return emulate;

 out_gpte_changed:
-   if (sp)
-   kvm_mmu_put_page(sp, it.sptep);
kvm_release_pfn_clean(pfn);
return 0;
 }


Since this is the only user of kvm_mmu_put_page(), it also removes
the definition:


@@ -2268,11 +2268,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
mmu_page_zap_pte(kvm, sp, sp->spt + i);
 }

-static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
-{
-   mmu_page_remove_parent_pte(sp, parent_pte);
-}
-
 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
u64 *sptep;


Actually, I don't understand why this is named kvm_mmu_put_page() for
just removing parent_pte pointer from the sp->parent_ptes pointer chain.



On to kvm_mmu_get_page...

 if (!direct) {
 if (rmap_write_protect(vcpu, gfn))
 kvm_flush_remote_tlbs(vcpu->kvm);
 if (level > PT_PAGE_TABLE_LEVEL && need_sync)
 kvm_sync_pages(vcpu, gfn);

This seems fishy.

need_sync is set if sp->unsync, but then the parents have not been
unsynced yet.


Reaching here means that kvm_mmu_get_page() could not return sp
from inside the for_each_gfn_sp() loop above, so even without
this patch, mark_unsync() has not been called.

Here, sp holds the new page allocated by kvm_mmu_alloc_page().
One confusing thing is that hlist_add_head() right before this
"if (!direct)" line has already added the new sp to the hash
list, so it will be found by for_each_gfn_indirect_valid_sp()
in kvm_sync_pages().

Because this sp is new and sp->unsync is not set,  kvm_sync_pages()
will just skip it and look for other sp's whose ->unsync were found
to be set in the for_each_gfn_sp() loop.

I'm not 100% sure if the existence of the parent_pte pointer in the
newly created sp->parent_ptes chain alone makes any difference:

@@ -2127,7 +2122,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
sp = kvm_mmu_alloc_page(vcpu, direct);

sp->parent_ptes.val = 0;
-   mmu_page_add_parent_pte(vcpu, sp, parent_pte);

sp->gfn = gfn;
sp->role = role;




On the other hand, all calls to kvm_mmu_get_page except for the
roots are followed by link_shadow_page...  Perhaps if parent_pte != NULL
you can call link_shadow_page directly from kvm_mmu_get_page.  The call
would go before the "if (!direct)" and it would subsume all the existing
calls.

We could probably also warn if

(parent_pte == NULL)
!= (level == vcpu->arch.mmu.root_level)

in kvm_mmu_get_page.


I think we should set the spte after init_shadow_page_table(), and
to make this subsume all the existing calls, we need to change the
"return sp;" in the for_each_gfn_sp() loop to a goto statement so
that the end of this function will become something like this:

init_shadow_page(sp);
out:
if (parent_pte) {
mmu_page_add_parent_pte(vcpu, sp, parent_pte);
link_shadow_page(parent_pte, sp, accessed);
}
trace_kvm_mmu_get_page(sp, created);
return sp;

So, "bool accessed" needs to be passed to kvm_mmu_get_page().
But any way, we need to understand if mmu_page_add_parent_pte()
really needs to be placed before the "if (!direct)" block.

  Takuya


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes

2015-11-10 Thread Takuya Yoshikawa

On 2015/11/09 19:14, Paolo Bonzini wrote:

Can you also change kvm_mmu_mark_parents_unsync to use
for_each_rmap_spte instead of pte_list_walk?  It is the last use of
pte_list_walk, and it's nice if we have two uses of for_each_rmap_spte
with parent_ptes as the argument.


No problem, I will do.

Since parent_ptes is also explained as the "reverse mapping" list of
parent sptes (in mmu.txt and kvm_host.h), using rmap helpers will not
look so strange.


BTW, on my todo list is to change the rmap items to a struct (with a
single u64 inside) for type safety.  Since you are touching this code,
perhaps you can give it a shot?


Yes, almost done here (assuming that you mean 'unsigned long').
But I have some candidates for its name in mind:

1. struct kvm_rmap { unsigned long val; };
2. struct kvm_rmap_head { unsigned long val; };
3. struct kvm_rmap_list_head { unsigned long val; };
4. struct kvm_spte_list_head { unsigned long val; };

Since this is the head of the reverse mapping list of sptes, I thought
name 3 might be the best and first made a patch with it, but it was
a bit longer than I had hoped it to be.

I have changed it to name 2, and it looks a bit nicer now, but even
shorter, e.g. name 1, may be good as well.

Do you have any preference?

  Takuya

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] KVM: x86: MMU: Clean up x86's mmu code for future work

2015-11-05 Thread Takuya Yoshikawa
Patch 1/2/3 are easy ones.

Following two, patch 4/5, may not be ideal solutions, but at least
explain, or try to explain, the problems.

Takuya Yoshikawa (5):
  KVM: x86: MMU: Remove unused parameter of __direct_map()
  KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  KVM: x86: MMU: Make mmu_set_spte() return emulate value
  KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
  KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes

 Documentation/virtual/kvm/mmu.txt |   4 +-
 arch/x86/kvm/mmu.c| 118 --
 arch/x86/kvm/mmu_audit.c  |   2 +-
 arch/x86/kvm/paging_tmpl.h|  10 ++--
 4 files changed, 70 insertions(+), 64 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap

2015-11-05 Thread Takuya Yoshikawa
Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line
code which clears a bit in the unsync child bitmap; the former places it
inside a loop block and uses a few goto statements to jump to it.

A new helper function, clear_unsync_child_bit(), makes the code cleaner.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a76bc04..a9622a2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1806,6 +1806,13 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, 
struct kvm_mmu_page *sp,
return (pvec->nr == KVM_PAGE_ARRAY_NR);
 }
 
+static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
+{
+   --sp->unsync_children;
+   WARN_ON((int)sp->unsync_children < 0);
+   __clear_bit(idx, sp->unsync_child_bitmap);
+}
+
 static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
   struct kvm_mmu_pages *pvec)
 {
@@ -1815,8 +1822,10 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
struct kvm_mmu_page *child;
u64 ent = sp->spt[i];
 
-   if (!is_shadow_present_pte(ent) || is_large_pte(ent))
-   goto clear_child_bitmap;
+   if (!is_shadow_present_pte(ent) || is_large_pte(ent)) {
+   clear_unsync_child_bit(sp, i);
+   continue;
+   }
 
child = page_header(ent & PT64_BASE_ADDR_MASK);
 
@@ -1825,28 +1834,21 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
return -ENOSPC;
 
ret = __mmu_unsync_walk(child, pvec);
-   if (!ret)
-   goto clear_child_bitmap;
-   else if (ret > 0)
+   if (!ret) {
+   clear_unsync_child_bit(sp, i);
+   continue;
+   } else if (ret > 0) {
nr_unsync_leaf += ret;
-   else
+   } else
return ret;
} else if (child->unsync) {
nr_unsync_leaf++;
if (mmu_pages_add(pvec, child, i))
return -ENOSPC;
} else
-goto clear_child_bitmap;
-
-   continue;
-
-clear_child_bitmap:
-   __clear_bit(i, sp->unsync_child_bitmap);
-   sp->unsync_children--;
-   WARN_ON((int)sp->unsync_children < 0);
+   clear_unsync_child_bit(sp, i);
}
 
-
return nr_unsync_leaf;
 }
 
@@ -2009,9 +2011,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path 
*parents)
if (!sp)
return;
 
-   --sp->unsync_children;
-   WARN_ON((int)sp->unsync_children < 0);
-   __clear_bit(idx, sp->unsync_child_bitmap);
+   clear_unsync_child_bit(sp, idx);
level++;
} while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM: x86: MMU: Remove unused parameter of __direct_map()

2015-11-05 Thread Takuya Yoshikawa
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7d85bca..a76bc04 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2708,9 +2708,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, 
u64 *sptep)
__direct_pte_prefetch(vcpu, sp, sptep);
 }
 
-static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
-   int map_writable, int level, gfn_t gfn, pfn_t pfn,
-   bool prefault)
+static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
+   int level, gfn_t gfn, pfn_t pfn, bool prefault)
 {
struct kvm_shadow_walk_iterator iterator;
struct kvm_mmu_page *sp;
@@ -3018,8 +3017,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
u32 error_code,
make_mmu_pages_available(vcpu);
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, , , );
-   r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn,
-prefault);
+   r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
spin_unlock(>kvm->mmu_lock);
 
 
@@ -3541,8 +3539,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
make_mmu_pages_available(vcpu);
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, , , );
-   r = __direct_map(vcpu, gpa, write, map_writable,
-level, gfn, pfn, prefault);
+   r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
spin_unlock(>kvm->mmu_lock);
 
return r;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] KVM: x86: MMU: Make mmu_set_spte() return emulate value

2015-11-05 Thread Takuya Yoshikawa
mmu_set_spte()'s code is based on the assumption that the emulate
parameter has a valid pointer value if set_spte() returns true and
write_fault is not zero.  In other cases, emulate may be NULL, so a
NULL-check is needed.

Stop passing emulate pointer and make mmu_set_spte() return the emulate
value instead to clean up this complex interface.  Prefetch functions
can just throw away the return value.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 27 ++-
 arch/x86/kvm/paging_tmpl.h | 10 +-
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a9622a2..69e7d20 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2564,13 +2564,13 @@ done:
return ret;
 }
 
-static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-unsigned pte_access, int write_fault, int *emulate,
-int level, gfn_t gfn, pfn_t pfn, bool speculative,
-bool host_writable)
+static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned 
pte_access,
+int write_fault, int level, gfn_t gfn, pfn_t pfn,
+bool speculative, bool host_writable)
 {
int was_rmapped = 0;
int rmap_count;
+   bool emulate = false;
 
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 *sptep, write_fault, gfn);
@@ -2600,12 +2600,12 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative,
  true, host_writable)) {
if (write_fault)
-   *emulate = 1;
+   emulate = true;
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
}
 
-   if (unlikely(is_mmio_spte(*sptep) && emulate))
-   *emulate = 1;
+   if (unlikely(is_mmio_spte(*sptep)))
+   emulate = true;
 
pgprintk("%s: setting spte %llx\n", __func__, *sptep);
pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n",
@@ -2624,6 +2624,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
}
 
kvm_release_pfn_clean(pfn);
+
+   return emulate;
 }
 
 static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
@@ -2658,9 +2660,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
return -1;
 
for (i = 0; i < ret; i++, gfn++, start++)
-   mmu_set_spte(vcpu, start, access, 0, NULL,
-sp->role.level, gfn, page_to_pfn(pages[i]),
-true, true);
+   mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn,
+page_to_pfn(pages[i]), true, true);
 
return 0;
 }
@@ -2721,9 +2722,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, 
int map_writable,
 
for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
if (iterator.level == level) {
-   mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
-write, , level, gfn, pfn,
-prefault, map_writable);
+   emulate = mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
+  write, level, gfn, pfn, prefault,
+  map_writable);
direct_pte_prefetch(vcpu, iterator.sptep);
++vcpu->stat.pf_fixed;
break;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index b41faa9..de24499 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -475,8 +475,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
 * we call mmu_set_spte() with host_writable = true because
 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 */
-   mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL,
-gfn, pfn, true, true);
+   mmu_set_spte(vcpu, spte, pte_access, 0, PT_PAGE_TABLE_LEVEL, gfn, pfn,
+true, true);
 
return true;
 }
@@ -556,7 +556,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
unsigned direct_access, access = gw->pt_access;
-   int top_level, emulate = 0;
+   int top_level, emulate;
 
direct_access = gw->pte_access;
 
@@ -622,8 +622,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
}
 
clear_sp_write_flooding_count(it.sptep);
-   mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, ,
-it.level, gw-

[PATCH 5/5] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes

2015-11-05 Thread Takuya Yoshikawa
At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is
placed right after the call to detect unrelated sptes which should not
be found in the reverse-mapping list.

Move this check in rmap_get_first/next() so that all call sites, not
just the users of the for_each_rmap_spte() macro, will be checked the
same way.  In addition, change the BUG_ON to WARN_ON since killing the
whole host is the last thing that KVM should try.

One thing to keep in mind is that kvm_mmu_unlink_parents() also uses
rmap_get_first() to handle parent sptes.  The change will not break it
because parent sptes are present, at least until drop_parent_pte()
actually unlinks them, and not mmio-sptes.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 Documentation/virtual/kvm/mmu.txt |  4 ++--
 arch/x86/kvm/mmu.c| 31 ++-
 2 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt 
b/Documentation/virtual/kvm/mmu.txt
index 3a4d681..daf9c0f 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -203,10 +203,10 @@ Shadow pages contain the following information:
 page cannot be destroyed.  See role.invalid.
   parent_ptes:
 The reverse mapping for the pte/ptes pointing at this page's spt. If
-parent_ptes bit 0 is zero, only one spte points at this pages and
+parent_ptes bit 0 is zero, only one spte points at this page and
 parent_ptes points at this single spte, otherwise, there exists multiple
 sptes pointing at this page and (parent_ptes & ~0x1) points at a data
-structure with a list of parent_ptes.
+structure with a list of parent sptes.
   unsync:
 If true, then the translations in this page may not match the guest's
 translation.  This is equivalent to the state of the tlb when a pte is
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c5e2363..353d752 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1099,17 +1099,28 @@ struct rmap_iterator {
  */
 static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
 {
+   u64 *sptep;
+
if (!rmap)
return NULL;
 
if (!(rmap & 1)) {
iter->desc = NULL;
-   return (u64 *)rmap;
+   sptep = (u64 *)rmap;
+   goto out;
}
 
iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
iter->pos = 0;
-   return iter->desc->sptes[iter->pos];
+   sptep = iter->desc->sptes[iter->pos];
+out:
+   /*
+* Parent sptes found in sp->parent_ptes lists are also checked here
+* since kvm_mmu_unlink_parents() uses this function.  If the condition
+* needs to be changed for them, make another wrapper function.
+*/
+   WARN_ON(!is_shadow_present_pte(*sptep));
+   return sptep;
 }
 
 /*
@@ -1119,14 +1130,14 @@ static u64 *rmap_get_first(unsigned long rmap, struct 
rmap_iterator *iter)
  */
 static u64 *rmap_get_next(struct rmap_iterator *iter)
 {
+   u64 *sptep;
+
if (iter->desc) {
if (iter->pos < PTE_LIST_EXT - 1) {
-   u64 *sptep;
-
++iter->pos;
sptep = iter->desc->sptes[iter->pos];
if (sptep)
-   return sptep;
+   goto out;
}
 
iter->desc = iter->desc->more;
@@ -1134,17 +1145,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
if (iter->desc) {
iter->pos = 0;
/* desc->sptes[0] cannot be NULL */
-   return iter->desc->sptes[iter->pos];
+   sptep = iter->desc->sptes[iter->pos];
+   goto out;
}
}
 
return NULL;
+out:
+   WARN_ON(!is_shadow_present_pte(*sptep));
+   return sptep;
 }
 
 #define for_each_rmap_spte(_rmap_, _iter_, _spte_) \
   for (_spte_ = rmap_get_first(*_rmap_, _iter_);   \
-   _spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;});  \
-   _spte_ = rmap_get_next(_iter_))
+   _spte_; _spte_ = rmap_get_next(_iter_))
 
 static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
@@ -1358,7 +1372,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long 
*rmapp)
bool flush = false;
 
while ((sptep = rmap_get_first(*rmapp, ))) {
-   BUG_ON(!(*sptep & PT_PRESENT_MASK));
rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);
 
drop_spte(kvm, sptep);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()

2015-11-05 Thread Takuya Yoshikawa
is_rmap_spte(), originally named is_rmap_pte(), was introduced when the
simple reverse mapping was implemented by commit cd4a4e5374110444
("[PATCH] KVM: MMU: Implement simple reverse mapping").  At that point,
its role was clear and only rmap_add() and rmap_remove() were using it
to select sptes that need to be reverse-mapped.

Independently of that, is_shadow_present_pte() was first introduced by
commit c7addb902054195b ("KVM: Allow not-present guest page faults to
bypass kvm") to do bypass_guest_pf optimization, which does not exist
any more.

These two seem to have changed their roles somewhat, and is_rmap_spte()
just calls is_shadow_present_pte() now.

Since using both of them without no clear distinction just makes the
code confusing, remove is_rmap_spte().

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c   | 13 -
 arch/x86/kvm/mmu_audit.c |  2 +-
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 69e7d20..c5e2363 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -311,11 +311,6 @@ static int is_large_pte(u64 pte)
return pte & PT_PAGE_SIZE_MASK;
 }
 
-static int is_rmap_spte(u64 pte)
-{
-   return is_shadow_present_pte(pte);
-}
-
 static int is_last_spte(u64 pte, int level)
 {
if (level == PT_PAGE_TABLE_LEVEL)
@@ -540,7 +535,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
u64 old_spte = *sptep;
bool ret = false;
 
-   WARN_ON(!is_rmap_spte(new_spte));
+   WARN_ON(!is_shadow_present_pte(new_spte));
 
if (!is_shadow_present_pte(old_spte)) {
mmu_spte_set(sptep, new_spte);
@@ -595,7 +590,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
else
old_spte = __update_clear_spte_slow(sptep, 0ull);
 
-   if (!is_rmap_spte(old_spte))
+   if (!is_shadow_present_pte(old_spte))
return 0;
 
pfn = spte_to_pfn(old_spte);
@@ -2575,7 +2570,7 @@ static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep, unsigned pte_access,
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 *sptep, write_fault, gfn);
 
-   if (is_rmap_spte(*sptep)) {
+   if (is_shadow_present_pte(*sptep)) {
/*
 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
 * the parent of the now unreachable PTE.
@@ -2919,7 +2914,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t 
gva, int level,
 * If the mapping has been changed, let the vcpu fault on the
 * same address again.
 */
-   if (!is_rmap_spte(spte)) {
+   if (!is_shadow_present_pte(spte)) {
ret = true;
goto exit;
}
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index 03d518e..90ee420 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -183,7 +183,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct 
kvm_mmu_page *sp)
return;
 
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
-   if (!is_rmap_spte(sp->spt[i]))
+   if (!is_shadow_present_pte(sp->spt[i]))
continue;
 
inspect_spte_has_rmap(kvm, sp->spt + i);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: MMU: Initialize force_pt_level before calling mapping_level()

2015-10-19 Thread Takuya Yoshikawa
Commit fd1369021878 ("KVM: x86: MMU: Move mapping_level_dirty_bitmap()
call in mapping_level()") forgot to initialize force_pt_level to false
in FNAME(page_fault)() before calling mapping_level() like
nonpaging_map() does.  This can sometimes result in forcing page table
level mapping unnecessarily.

Fix this and move the first *force_pt_level check in mapping_level()
before kvm_vcpu_gfn_to_memslot() call to make it a bit clearer that
the variable must be initialized before mapping_level() gets called.

This change can also avoid calling kvm_vcpu_gfn_to_memslot() when
!check_hugepage_cache_consistency() check in tdp_page_fault() forces
page table level mapping.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 7 ---
 arch/x86/kvm/paging_tmpl.h | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index dd2a7c6..7d85bca 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -886,10 +886,11 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t 
large_gfn,
int host_level, level, max_level;
struct kvm_memory_slot *slot;
 
-   slot = kvm_vcpu_gfn_to_memslot(vcpu, large_gfn);
+   if (unlikely(*force_pt_level))
+   return PT_PAGE_TABLE_LEVEL;
 
-   if (likely(!*force_pt_level))
-   *force_pt_level = !memslot_valid_for_gpte(slot, true);
+   slot = kvm_vcpu_gfn_to_memslot(vcpu, large_gfn);
+   *force_pt_level = !memslot_valid_for_gpte(slot, true);
if (unlikely(*force_pt_level))
return PT_PAGE_TABLE_LEVEL;
 
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index bf39d0f..b41faa9 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -698,7 +698,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
int r;
pfn_t pfn;
int level = PT_PAGE_TABLE_LEVEL;
-   bool force_pt_level;
+   bool force_pt_level = false;
unsigned long mmu_seq;
bool map_writable, is_self_change_mapping;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5 v2] KVM: x86: MMU: Eliminate extra memory slot searches in page fault handlers

2015-10-16 Thread Takuya Yoshikawa
v2: Based on Paolo's suggestion, kept the common code as much as possible by
introducing memslot_valid_for_gpte().

Note: instead of joining all checks by boolean operators, splitted the
no_dirty_log case off to be a separate if-check because it is checking
clearly different thing than the rest.  See patch 4 for details.

In page fault handlers, both mapping_level_dirty_bitmap() and mapping_level()
do a memory slot search, binary search, through kvm_vcpu_gfn_to_memslot(), which
may not be negligible especially for virtual machines with many memory slots.

With a bit of cleanup effort, the patch set reduces this overhead.

Takuya

Takuya Yoshikawa (5):
  KVM: x86: MMU: Make force_pt_level bool
  KVM: x86: MMU: Simplify force_pt_level calculation code in FNAME(page_fault)()
  KVM: x86: MMU: Merge mapping_level_dirty_bitmap() into mapping_level()
  KVM: x86: MMU: Remove mapping_level_dirty_bitmap()
  KVM: x86: MMU: Eliminate an extra memory slot search in mapping_level()

 arch/x86/kvm/mmu.c | 70 +++---
 arch/x86/kvm/paging_tmpl.h | 19 ++---
 2 files changed, 50 insertions(+), 39 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] KVM: x86: MMU: Eliminate an extra memory slot search in mapping_level()

2015-10-16 Thread Takuya Yoshikawa
Calling kvm_vcpu_gfn_to_memslot() twice in mapping_level() should be
avoided since getting a slot by binary search may not be negligible,
especially for virtual machines with many memory slots.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 09833b0..dd2a7c6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -818,14 +818,11 @@ static void unaccount_shadowed(struct kvm *kvm, struct 
kvm_mmu_page *sp)
kvm->arch.indirect_shadow_pages--;
 }
 
-static int has_wrprotected_page(struct kvm_vcpu *vcpu,
-   gfn_t gfn,
-   int level)
+static int __has_wrprotected_page(gfn_t gfn, int level,
+ struct kvm_memory_slot *slot)
 {
-   struct kvm_memory_slot *slot;
struct kvm_lpage_info *linfo;
 
-   slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
if (slot) {
linfo = lpage_info_slot(gfn, slot, level);
return linfo->write_count;
@@ -834,6 +831,14 @@ static int has_wrprotected_page(struct kvm_vcpu *vcpu,
return 1;
 }
 
+static int has_wrprotected_page(struct kvm_vcpu *vcpu, gfn_t gfn, int level)
+{
+   struct kvm_memory_slot *slot;
+
+   slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+   return __has_wrprotected_page(gfn, level, slot);
+}
+
 static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
 {
unsigned long page_size;
@@ -896,7 +901,7 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t 
large_gfn,
max_level = min(kvm_x86_ops->get_lpage_level(), host_level);
 
for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level)
-   if (has_wrprotected_page(vcpu, large_gfn, level))
+   if (__has_wrprotected_page(large_gfn, level, slot))
break;
 
return level - 1;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] KVM: x86: MMU: Remove mapping_level_dirty_bitmap()

2015-10-16 Thread Takuya Yoshikawa
Now that it has only one caller, and its name is not so helpful for
readers, remove it.  Instead, the new memslot_valid_for_gpte() function
makes it possible to share the common code.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 890cd69..09833b0 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -851,6 +851,17 @@ static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
return ret;
 }
 
+static inline bool memslot_valid_for_gpte(struct kvm_memory_slot *slot,
+ bool no_dirty_log)
+{
+   if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
+   return false;
+   if (no_dirty_log && slot->dirty_bitmap)
+   return false;
+
+   return true;
+}
+
 static struct kvm_memory_slot *
 gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
bool no_dirty_log)
@@ -858,25 +869,22 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t 
gfn,
struct kvm_memory_slot *slot;
 
slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
-   if (!slot || slot->flags & KVM_MEMSLOT_INVALID ||
- (no_dirty_log && slot->dirty_bitmap))
+   if (!memslot_valid_for_gpte(slot, no_dirty_log))
slot = NULL;
 
return slot;
 }
 
-static bool mapping_level_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t large_gfn)
-{
-   return !gfn_to_memslot_dirty_bitmap(vcpu, large_gfn, true);
-}
-
 static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn,
 bool *force_pt_level)
 {
int host_level, level, max_level;
+   struct kvm_memory_slot *slot;
+
+   slot = kvm_vcpu_gfn_to_memslot(vcpu, large_gfn);
 
if (likely(!*force_pt_level))
-   *force_pt_level = mapping_level_dirty_bitmap(vcpu, large_gfn);
+   *force_pt_level = !memslot_valid_for_gpte(slot, true);
if (unlikely(*force_pt_level))
return PT_PAGE_TABLE_LEVEL;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] KVM: x86: MMU: Simplify force_pt_level calculation code in FNAME(page_fault)()

2015-10-16 Thread Takuya Yoshikawa
As a bonus, an extra memory slot search can be eliminated when
is_self_change_mapping is true.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/paging_tmpl.h | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 07f1a4e..8ebc3a5 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -743,15 +743,14 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
  , user_fault, >arch.write_fault_to_shadow_pgtable);
 
-   if (walker.level >= PT_DIRECTORY_LEVEL)
-   force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn)
-  || is_self_change_mapping;
-   else
+   if (walker.level >= PT_DIRECTORY_LEVEL && !is_self_change_mapping) {
+   force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn);
+   if (!force_pt_level) {
+   level = min(walker.level, mapping_level(vcpu, 
walker.gfn));
+   walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) 
- 1);
+   }
+   } else
force_pt_level = true;
-   if (!force_pt_level) {
-   level = min(walker.level, mapping_level(vcpu, walker.gfn));
-   walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1);
-   }
 
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] KVM: x86: MMU: Merge mapping_level_dirty_bitmap() into mapping_level()

2015-10-16 Thread Takuya Yoshikawa
This is necessary to eliminate an extra memory slot search later.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 29 ++---
 arch/x86/kvm/paging_tmpl.h |  6 +++---
 2 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2262728..890cd69 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -870,10 +870,16 @@ static bool mapping_level_dirty_bitmap(struct kvm_vcpu 
*vcpu, gfn_t large_gfn)
return !gfn_to_memslot_dirty_bitmap(vcpu, large_gfn, true);
 }
 
-static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
+static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn,
+bool *force_pt_level)
 {
int host_level, level, max_level;
 
+   if (likely(!*force_pt_level))
+   *force_pt_level = mapping_level_dirty_bitmap(vcpu, large_gfn);
+   if (unlikely(*force_pt_level))
+   return PT_PAGE_TABLE_LEVEL;
+
host_level = host_mapping_level(vcpu->kvm, large_gfn);
 
if (host_level == PT_PAGE_TABLE_LEVEL)
@@ -2962,14 +2968,13 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t 
v, u32 error_code,
 {
int r;
int level;
-   bool force_pt_level;
+   bool force_pt_level = false;
pfn_t pfn;
unsigned long mmu_seq;
bool map_writable, write = error_code & PFERR_WRITE_MASK;
 
-   force_pt_level = mapping_level_dirty_bitmap(vcpu, gfn);
+   level = mapping_level(vcpu, gfn, _pt_level);
if (likely(!force_pt_level)) {
-   level = mapping_level(vcpu, gfn);
/*
 * This path builds a PAE pagetable - so we can map
 * 2mb pages at maximum. Therefore check if the level
@@ -2979,8 +2984,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
u32 error_code,
level = PT_DIRECTORY_LEVEL;
 
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
-   } else
-   level = PT_PAGE_TABLE_LEVEL;
+   }
 
if (fast_page_fault(vcpu, v, level, error_code))
return 0;
@@ -3495,20 +3499,15 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
if (r)
return r;
 
-   if (mapping_level_dirty_bitmap(vcpu, gfn) ||
-   !check_hugepage_cache_consistency(vcpu, gfn, PT_DIRECTORY_LEVEL))
-   force_pt_level = true;
-   else
-   force_pt_level = false;
-
+   force_pt_level = !check_hugepage_cache_consistency(vcpu, gfn,
+  PT_DIRECTORY_LEVEL);
+   level = mapping_level(vcpu, gfn, _pt_level);
if (likely(!force_pt_level)) {
-   level = mapping_level(vcpu, gfn);
if (level > PT_DIRECTORY_LEVEL &&
!check_hugepage_cache_consistency(vcpu, gfn, level))
level = PT_DIRECTORY_LEVEL;
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
-   } else
-   level = PT_PAGE_TABLE_LEVEL;
+   }
 
if (fast_page_fault(vcpu, gpa, level, error_code))
return 0;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 8ebc3a5..bf39d0f 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -744,9 +744,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
  , user_fault, >arch.write_fault_to_shadow_pgtable);
 
if (walker.level >= PT_DIRECTORY_LEVEL && !is_self_change_mapping) {
-   force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn);
-   if (!force_pt_level) {
-   level = min(walker.level, mapping_level(vcpu, 
walker.gfn));
+   level = mapping_level(vcpu, walker.gfn, _pt_level);
+   if (likely(!force_pt_level)) {
+   level = min(walker.level, level);
walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) 
- 1);
}
} else
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM: x86: MMU: Make force_pt_level bool

2015-10-16 Thread Takuya Yoshikawa
This will be passed to a function later.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c | 8 
 arch/x86/kvm/paging_tmpl.h | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b8482c0..2262728 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2962,7 +2962,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
u32 error_code,
 {
int r;
int level;
-   int force_pt_level;
+   bool force_pt_level;
pfn_t pfn;
unsigned long mmu_seq;
bool map_writable, write = error_code & PFERR_WRITE_MASK;
@@ -3476,7 +3476,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
pfn_t pfn;
int r;
int level;
-   int force_pt_level;
+   bool force_pt_level;
gfn_t gfn = gpa >> PAGE_SHIFT;
unsigned long mmu_seq;
int write = error_code & PFERR_WRITE_MASK;
@@ -3497,9 +3497,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
 
if (mapping_level_dirty_bitmap(vcpu, gfn) ||
!check_hugepage_cache_consistency(vcpu, gfn, PT_DIRECTORY_LEVEL))
-   force_pt_level = 1;
+   force_pt_level = true;
else
-   force_pt_level = 0;
+   force_pt_level = false;
 
if (likely(!force_pt_level)) {
level = mapping_level(vcpu, gfn);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 736e6ab..07f1a4e 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -698,7 +698,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
int r;
pfn_t pfn;
int level = PT_PAGE_TABLE_LEVEL;
-   int force_pt_level;
+   bool force_pt_level;
unsigned long mmu_seq;
bool map_writable, is_self_change_mapping;
 
@@ -747,7 +747,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn)
   || is_self_change_mapping;
else
-   force_pt_level = 1;
+   force_pt_level = true;
if (!force_pt_level) {
level = min(walker.level, mapping_level(vcpu, walker.gfn));
walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM: x86: MMU: Make force_pt_level bool

2015-10-15 Thread Takuya Yoshikawa
This will be passed to a function later.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |8 
 arch/x86/kvm/paging_tmpl.h |4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b8482c0..2262728 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2962,7 +2962,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
u32 error_code,
 {
int r;
int level;
-   int force_pt_level;
+   bool force_pt_level;
pfn_t pfn;
unsigned long mmu_seq;
bool map_writable, write = error_code & PFERR_WRITE_MASK;
@@ -3476,7 +3476,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
pfn_t pfn;
int r;
int level;
-   int force_pt_level;
+   bool force_pt_level;
gfn_t gfn = gpa >> PAGE_SHIFT;
unsigned long mmu_seq;
int write = error_code & PFERR_WRITE_MASK;
@@ -3497,9 +3497,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
 
if (mapping_level_dirty_bitmap(vcpu, gfn) ||
!check_hugepage_cache_consistency(vcpu, gfn, PT_DIRECTORY_LEVEL))
-   force_pt_level = 1;
+   force_pt_level = true;
else
-   force_pt_level = 0;
+   force_pt_level = false;
 
if (likely(!force_pt_level)) {
level = mapping_level(vcpu, gfn);
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 736e6ab..07f1a4e 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -698,7 +698,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
int r;
pfn_t pfn;
int level = PT_PAGE_TABLE_LEVEL;
-   int force_pt_level;
+   bool force_pt_level;
unsigned long mmu_seq;
bool map_writable, is_self_change_mapping;
 
@@ -747,7 +747,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn)
   || is_self_change_mapping;
else
-   force_pt_level = 1;
+   force_pt_level = true;
if (!force_pt_level) {
level = min(walker.level, mapping_level(vcpu, walker.gfn));
walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] KVM: x86: MMU: Eliminate extra memory slot searches in page fault handlers

2015-10-15 Thread Takuya Yoshikawa
In page fault handlers, both mapping_level_dirty_bitmap() and mapping_level()
do a memory slot search, binary search, through kvm_vcpu_gfn_to_memslot(), which
may not be negligible especially for virtual machines with many memory slots.

With a bit of cleanup effort, the patch set reduces this overhead.

  [PATCH 1/5] KVM: x86: MMU: Make force_pt_level bool
  [PATCH 2/5] KVM: x86: MMU: Simplify force_pt_level calculation code in 
FNAME(page_fault)()
  [PATCH 3/5] KVM: x86: MMU: Merge mapping_level_dirty_bitmap() into 
mapping_level()
  [PATCH 4/5] KVM: x86: MMU: Remove mapping_level_dirty_bitmap()
  [PATCH 5/5] KVM: x86: MMU: Eliminate an extra memory slot search in 
mapping_level()

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] KVM: x86: MMU: Simplify force_pt_level calculation code in FNAME(page_fault)()

2015-10-15 Thread Takuya Yoshikawa
As a bonus, an extra memory slot search can be eliminated when
is_self_change_mapping is true.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/paging_tmpl.h |   15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 07f1a4e..8ebc3a5 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -743,15 +743,14 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
  , user_fault, >arch.write_fault_to_shadow_pgtable);
 
-   if (walker.level >= PT_DIRECTORY_LEVEL)
-   force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn)
-  || is_self_change_mapping;
-   else
+   if (walker.level >= PT_DIRECTORY_LEVEL && !is_self_change_mapping) {
+   force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn);
+   if (!force_pt_level) {
+   level = min(walker.level, mapping_level(vcpu, 
walker.gfn));
+   walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) 
- 1);
+   }
+   } else
force_pt_level = true;
-   if (!force_pt_level) {
-   level = min(walker.level, mapping_level(vcpu, walker.gfn));
-   walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) - 1);
-   }
 
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] KVM: x86: MMU: Merge mapping_level_dirty_bitmap() into mapping_level()

2015-10-15 Thread Takuya Yoshikawa
This is necessary to eliminate an extra memory slot search later.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |   29 ++---
 arch/x86/kvm/paging_tmpl.h |6 +++---
 2 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2262728..890cd69 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -870,10 +870,16 @@ static bool mapping_level_dirty_bitmap(struct kvm_vcpu 
*vcpu, gfn_t large_gfn)
return !gfn_to_memslot_dirty_bitmap(vcpu, large_gfn, true);
 }
 
-static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
+static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn,
+bool *force_pt_level)
 {
int host_level, level, max_level;
 
+   if (likely(!*force_pt_level))
+   *force_pt_level = mapping_level_dirty_bitmap(vcpu, large_gfn);
+   if (unlikely(*force_pt_level))
+   return PT_PAGE_TABLE_LEVEL;
+
host_level = host_mapping_level(vcpu->kvm, large_gfn);
 
if (host_level == PT_PAGE_TABLE_LEVEL)
@@ -2962,14 +2968,13 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t 
v, u32 error_code,
 {
int r;
int level;
-   bool force_pt_level;
+   bool force_pt_level = false;
pfn_t pfn;
unsigned long mmu_seq;
bool map_writable, write = error_code & PFERR_WRITE_MASK;
 
-   force_pt_level = mapping_level_dirty_bitmap(vcpu, gfn);
+   level = mapping_level(vcpu, gfn, _pt_level);
if (likely(!force_pt_level)) {
-   level = mapping_level(vcpu, gfn);
/*
 * This path builds a PAE pagetable - so we can map
 * 2mb pages at maximum. Therefore check if the level
@@ -2979,8 +2984,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
u32 error_code,
level = PT_DIRECTORY_LEVEL;
 
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
-   } else
-   level = PT_PAGE_TABLE_LEVEL;
+   }
 
if (fast_page_fault(vcpu, v, level, error_code))
return 0;
@@ -3495,20 +3499,15 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
if (r)
return r;
 
-   if (mapping_level_dirty_bitmap(vcpu, gfn) ||
-   !check_hugepage_cache_consistency(vcpu, gfn, PT_DIRECTORY_LEVEL))
-   force_pt_level = true;
-   else
-   force_pt_level = false;
-
+   force_pt_level = !check_hugepage_cache_consistency(vcpu, gfn,
+  PT_DIRECTORY_LEVEL);
+   level = mapping_level(vcpu, gfn, _pt_level);
if (likely(!force_pt_level)) {
-   level = mapping_level(vcpu, gfn);
if (level > PT_DIRECTORY_LEVEL &&
!check_hugepage_cache_consistency(vcpu, gfn, level))
level = PT_DIRECTORY_LEVEL;
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
-   } else
-   level = PT_PAGE_TABLE_LEVEL;
+   }
 
if (fast_page_fault(vcpu, gpa, level, error_code))
return 0;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 8ebc3a5..bf39d0f 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -744,9 +744,9 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr, u32 error_code,
  , user_fault, >arch.write_fault_to_shadow_pgtable);
 
if (walker.level >= PT_DIRECTORY_LEVEL && !is_self_change_mapping) {
-   force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn);
-   if (!force_pt_level) {
-   level = min(walker.level, mapping_level(vcpu, 
walker.gfn));
+   level = mapping_level(vcpu, walker.gfn, _pt_level);
+   if (likely(!force_pt_level)) {
+   level = min(walker.level, level);
walker.gfn = walker.gfn & ~(KVM_PAGES_PER_HPAGE(level) 
- 1);
}
} else
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] KVM: x86: MMU: Remove mapping_level_dirty_bitmap()

2015-10-15 Thread Takuya Yoshikawa
Now that it has only one caller, and its name is not so helpful for
readers, just remove it.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |   21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 890cd69..78a3d08 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -851,6 +851,14 @@ static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
return ret;
 }
 
+static inline bool memslot_invalid(struct kvm_memory_slot *slot)
+{
+   if (!slot || slot->flags & KVM_MEMSLOT_INVALID)
+   return true;
+
+   return false;
+}
+
 static struct kvm_memory_slot *
 gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
bool no_dirty_log)
@@ -858,25 +866,22 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t 
gfn,
struct kvm_memory_slot *slot;
 
slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
-   if (!slot || slot->flags & KVM_MEMSLOT_INVALID ||
- (no_dirty_log && slot->dirty_bitmap))
+   if (memslot_invalid(slot) || (no_dirty_log && slot->dirty_bitmap))
slot = NULL;
 
return slot;
 }
 
-static bool mapping_level_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t large_gfn)
-{
-   return !gfn_to_memslot_dirty_bitmap(vcpu, large_gfn, true);
-}
-
 static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn,
 bool *force_pt_level)
 {
int host_level, level, max_level;
+   struct kvm_memory_slot *slot;
+
+   slot = kvm_vcpu_gfn_to_memslot(vcpu, large_gfn);
 
if (likely(!*force_pt_level))
-   *force_pt_level = mapping_level_dirty_bitmap(vcpu, large_gfn);
+   *force_pt_level = memslot_invalid(slot) || slot->dirty_bitmap;
if (unlikely(*force_pt_level))
return PT_PAGE_TABLE_LEVEL;
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] KVM: x86: MMU: Eliminate an extra memory slot search in mapping_level()

2015-10-15 Thread Takuya Yoshikawa
Calling kvm_vcpu_gfn_to_memslot() twice in mapping_level() should be
avoided since getting a slot by binary search may not be negligible,
especially for virtual machines with many memory slots.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |   17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 78a3d08..8d285dc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -818,14 +818,11 @@ static void unaccount_shadowed(struct kvm *kvm, struct 
kvm_mmu_page *sp)
kvm->arch.indirect_shadow_pages--;
 }
 
-static int has_wrprotected_page(struct kvm_vcpu *vcpu,
-   gfn_t gfn,
-   int level)
+static int __has_wrprotected_page(gfn_t gfn, int level,
+ struct kvm_memory_slot *slot)
 {
-   struct kvm_memory_slot *slot;
struct kvm_lpage_info *linfo;
 
-   slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
if (slot) {
linfo = lpage_info_slot(gfn, slot, level);
return linfo->write_count;
@@ -834,6 +831,14 @@ static int has_wrprotected_page(struct kvm_vcpu *vcpu,
return 1;
 }
 
+static int has_wrprotected_page(struct kvm_vcpu *vcpu, gfn_t gfn, int level)
+{
+   struct kvm_memory_slot *slot;
+
+   slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+   return __has_wrprotected_page(gfn, level, slot);
+}
+
 static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
 {
unsigned long page_size;
@@ -893,7 +898,7 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t 
large_gfn,
max_level = min(kvm_x86_ops->get_lpage_level(), host_level);
 
for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level)
-   if (has_wrprotected_page(vcpu, large_gfn, level))
+   if (__has_wrprotected_page(large_gfn, level, slot))
break;
 
return level - 1;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] KVM: const-ify uses of struct kvm_userspace_memory_region

2015-05-20 Thread Takuya Yoshikawa
On 2015/05/20 2:25, Paolo Bonzini wrote:
 Architecture-specific helpers are not supposed to muck with
 struct kvm_userspace_memory_region contents.  Add const to
 enforce this.

Nice, this is what I wanted to do, but could not do, when I
cleaned up these functions a few years ago.

 
 In order to eliminate the only write in __kvm_set_memory_region,
 the cleaning of deleted slots is pulled up from update_memslots
 to __kvm_set_memory_region.

-   if (!npages)
-   mem-flags = ~KVM_MEM_LOG_DIRTY_PAGES;

Especially, removing this hard-to-understand lines is really
nice.

 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Reviewed-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp


Takuya


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()

2015-03-17 Thread Takuya Yoshikawa
When all bits in mask are not set,
kvm_arch_mmu_enable_log_dirty_pt_masked() has nothing to do.  But since
it needs to be called from the generic code, it cannot be inlined, and
a few function calls, two when PML is enabled, are wasted.

Since it is common to see many pages remain clean, e.g. framebuffers can
stay calm for a long time, it is worth eliminating this overhead.

Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
---
 virt/kvm/kvm_main.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a109370..420d8cf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1061,9 +1061,11 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
mask = xchg(dirty_bitmap[i], 0);
dirty_bitmap_buffer[i] = mask;
 
-   offset = i * BITS_PER_LONG;
-   kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot, offset,
-   mask);
+   if (mask) {
+   offset = i * BITS_PER_LONG;
+   kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot,
+   offset, mask);
+   }
}
 
spin_unlock(kvm-mmu_lock);
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM: simplification to the memslots code

2014-11-17 Thread Takuya Yoshikawa
On 2014/11/17 18:23, Paolo Bonzini wrote:
 
 
 On 17/11/2014 02:56, Takuya Yoshikawa wrote:
 here are a few small patches that simplify __kvm_set_memory_region
 and associated code.  Can you please review them?
 Ah, already queued.  Sorry for being late to respond.
 
 While they are not in kvm/next, there's time to add Reviewed-by's and
 all that.  kvm/queue basically means I want Fengguang to compile-test
 them, some testing done on x86_64.
 
 Paolo
 

OK.

I reviewed patch 2/3 and 3/3, and saw no problem, some
improvements, there.

Takuya


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm: commonize allocation of the new memory slots

2014-11-16 Thread Takuya Yoshikawa
On 2014/11/14 20:12, Paolo Bonzini wrote:
 The two kmemdup invocations can be unified.  I find that the new
 placement of the comment makes it easier to see what happens.

A lot easier to follow the logic.

Reviewed-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp

 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
   virt/kvm/kvm_main.c | 28 +++-
   1 file changed, 11 insertions(+), 17 deletions(-)
 
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index c8ff99cc0ccb..7bfc842b96d7 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -865,11 +865,12 @@ int __kvm_set_memory_region(struct kvm *kvm,
   goto out_free;
   }
   
 + slots = kmemdup(kvm-memslots, sizeof(struct kvm_memslots),
 + GFP_KERNEL);
 + if (!slots)
 + goto out_free;
 +
   if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
 - slots = kmemdup(kvm-memslots, sizeof(struct kvm_memslots),
 - GFP_KERNEL);
 - if (!slots)
 - goto out_free;
   slot = id_to_memslot(slots, mem-slot);
   slot-flags |= KVM_MEMSLOT_INVALID;
   
 @@ -885,6 +886,12 @@ int __kvm_set_memory_region(struct kvm *kvm,
*  - kvm_is_visible_gfn (mmu_check_roots)
*/
   kvm_arch_flush_shadow_memslot(kvm, slot);
 +
 + /*
 +  * We can re-use the old_memslots from above, the only 
 difference
 +  * from the currently installed memslots is the invalid flag.  
 This
 +  * will get overwritten by update_memslots anyway.
 +  */
   slots = old_memslots;
   }
   
 @@ -892,19 +899,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
   if (r)
   goto out_slots;
   
 - r = -ENOMEM;
 - /*
 -  * We can re-use the old_memslots from above, the only difference
 -  * from the currently installed memslots is the invalid flag.  This
 -  * will get overwritten by update_memslots anyway.
 -  */
 - if (!slots) {
 - slots = kmemdup(kvm-memslots, sizeof(struct kvm_memslots),
 - GFP_KERNEL);
 - if (!slots)
 - goto out_free;
 - }
 -
   /* actual memory is freed via old in kvm_free_physmem_slot below */
   if (change == KVM_MR_DELETE) {
   new.dirty_bitmap = NULL;
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM: simplification to the memslots code

2014-11-16 Thread Takuya Yoshikawa
On 2014/11/14 20:11, Paolo Bonzini wrote:
 Hi Igor and Takuya,
 
 here are a few small patches that simplify __kvm_set_memory_region
 and associated code.  Can you please review them?

Ah, already queued.  Sorry for being late to respond.

Takuya

 
 Thanks,
 
 Paolo
 
 Paolo Bonzini (3):
kvm: memslots: track id_to_index changes during the insertion sort
kvm: commonize allocation of the new memory slots
kvm: simplify update_memslots invocation
 
   virt/kvm/kvm_main.c | 87 
 ++---
   1 file changed, 36 insertions(+), 51 deletions(-)
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v12 2/6] KVM: Add generic support for dirty page logging

2014-11-02 Thread Takuya Yoshikawa
On Thu, 30 Oct 2014 12:19:00 -0700
Mario Smarduch m.smard...@samsung.com wrote:

 On 10/30/2014 05:14 AM, Cornelia Huck wrote:
  On Wed, 22 Oct 2014 15:34:07 -0700
  Mario Smarduch m.smard...@samsung.com wrote:
  
  This patch defines KVM_GENERIC_DIRTYLOG, and moves dirty log read function
  to it's own file virt/kvm/dirtylog.c. x86 is updated to use the generic
  dirty log interface, selecting KVM_GENERIC_DIRTYLOG in its Kconfig and 
  makefile. No other architectures are affected, each uses it's own version.
  This changed from previous patch revision where non-generic architectures 
  were modified.
 
  In subsequent patch armv7 does samething. All other architectures continue
  use architecture defined version.
 
  
  Hm.
  
  The x86 specific version of dirty page logging is generic enough to be
  used by other architectures, noteably ARMv7. So let's move the x86 code
  under virt/kvm/ and make it depend on KVM_GENERIC_DIRTYLOG. Other
  architectures continue to use their own implementations.
  
  ?
 
 I'll update descriptions for both patches, with the more concise
 descriptions.

I don't think it's so generic.

Especially, the relationship between mmu_lock and TLB flushing has
been changed a few times for optimizing x86 mmu code, and it may change
in the future again.

Since how mmu_lock is used is depends on each arch, it's not so
simple to make the function generic IMO.

Thanks,
Takuya

 
 Thanks.
 
  
 
  Signed-off-by: Mario Smarduch m.smard...@samsung.com
  ---
   arch/x86/include/asm/kvm_host.h |3 --
   arch/x86/kvm/Kconfig|1 +
   arch/x86/kvm/Makefile   |1 +
   arch/x86/kvm/x86.c  |   86 --
   include/linux/kvm_host.h|4 ++
   virt/kvm/Kconfig|3 ++
   virt/kvm/dirtylog.c |  112 
  +++
   7 files changed, 121 insertions(+), 89 deletions(-)
   create mode 100644 virt/kvm/dirtylog.c
 
  
  diff --git a/virt/kvm/dirtylog.c b/virt/kvm/dirtylog.c
  new file mode 100644
  index 000..67a
  --- /dev/null
  +++ b/virt/kvm/dirtylog.c
  @@ -0,0 +1,112 @@
  +/*
  + * kvm generic dirty logging support, used by architectures that share
  + * comman dirty page logging implementation.
  
  s/comman/common/
  
  The approach looks sane to me, especially as it does not change other
  architectures needlessly.
  
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Takuya Yoshikawa takuya.yoshik...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v12 2/6] KVM: Add generic support for dirty page logging

2014-11-02 Thread Takuya Yoshikawa
On Thu, 30 Oct 2014 12:19:00 -0700
Mario Smarduch m.smard...@samsung.com wrote:

 On 10/30/2014 05:14 AM, Cornelia Huck wrote:
  On Wed, 22 Oct 2014 15:34:07 -0700
  Mario Smarduch m.smard...@samsung.com wrote:
  
  This patch defines KVM_GENERIC_DIRTYLOG, and moves dirty log read function
  to it's own file virt/kvm/dirtylog.c. x86 is updated to use the generic
  dirty log interface, selecting KVM_GENERIC_DIRTYLOG in its Kconfig and 
  makefile. No other architectures are affected, each uses it's own version.
  This changed from previous patch revision where non-generic architectures 
  were modified.
 
  In subsequent patch armv7 does samething. All other architectures continue
  use architecture defined version.
 
  
  Hm.
  
  The x86 specific version of dirty page logging is generic enough to be
  used by other architectures, noteably ARMv7. So let's move the x86 code
  under virt/kvm/ and make it depend on KVM_GENERIC_DIRTYLOG. Other
  architectures continue to use their own implementations.
  
  ?
 
 I'll update descriptions for both patches, with the more concise
 descriptions.

I don't think it's so generic.

Especially, the relationship between mmu_lock and TLB flushing has
been changed a few times for optimizing x86 mmu code, and it may change
in the future again.

Since how mmu_lock is used is depends on each arch, it's not so
simple to make the function generic IMO.

Thanks,
Takuya

 
 Thanks.
 
  
 
  Signed-off-by: Mario Smarduch m.smard...@samsung.com
  ---
   arch/x86/include/asm/kvm_host.h |3 --
   arch/x86/kvm/Kconfig|1 +
   arch/x86/kvm/Makefile   |1 +
   arch/x86/kvm/x86.c  |   86 --
   include/linux/kvm_host.h|4 ++
   virt/kvm/Kconfig|3 ++
   virt/kvm/dirtylog.c |  112 
  +++
   7 files changed, 121 insertions(+), 89 deletions(-)
   create mode 100644 virt/kvm/dirtylog.c
 
  
  diff --git a/virt/kvm/dirtylog.c b/virt/kvm/dirtylog.c
  new file mode 100644
  index 000..67a
  --- /dev/null
  +++ b/virt/kvm/dirtylog.c
  @@ -0,0 +1,112 @@
  +/*
  + * kvm generic dirty logging support, used by architectures that share
  + * comman dirty page logging implementation.
  
  s/comman/common/
  
  The approach looks sane to me, especially as it does not change other
  architectures needlessly.
  
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Takuya Yoshikawa takuya.yoshik...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: Break kvm_for_each_vcpu loop after finding the VP_INDEX

2014-02-26 Thread Takuya Yoshikawa
No need to scan the entire VCPU array.

Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
---
 BTW, this looks like hyperv support forces us to stick to the current
 implementation which stores VCPUs in an array, or at least something
 we can index them; not a good thing.

 arch/x86/kvm/x86.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4cca458..773eba7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2328,9 +2328,12 @@ static int get_msr_hyperv(struct kvm_vcpu *vcpu, u32 
msr, u64 *pdata)
case HV_X64_MSR_VP_INDEX: {
int r;
struct kvm_vcpu *v;
-   kvm_for_each_vcpu(r, v, vcpu-kvm)
-   if (v == vcpu)
+   kvm_for_each_vcpu(r, v, vcpu-kvm) {
+   if (v == vcpu) {
data = r;
+   break;
+   }
+   }
break;
}
case HV_X64_MSR_EOI:
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH A or B] KVM: kvm-tlbs_dirty handling

2014-02-18 Thread Takuya Yoshikawa
Please take patch A or B.

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH A] KVM: Simplify kvm-tlbs_dirty handling

2014-02-18 Thread Takuya Yoshikawa
When this was introduced, kvm_flush_remote_tlbs() could be called
without holding mmu_lock.  It is now acknowledged that the function
must be called before releasing mmu_lock, and all callers have already
been changed to do so.

There is no need to use smp_mb() and cmpxchg() any more.

Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
---
 arch/x86/kvm/paging_tmpl.h |7 ---
 include/linux/kvm_host.h   |2 +-
 virt/kvm/kvm_main.c|   11 +++
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index cba218a..b1e6c1b 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -913,7 +913,8 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu 
*vcpu, gva_t vaddr,
  *   and kvm_mmu_notifier_invalidate_range_start detect the mapping page isn't
  *   used by guest then tlbs are not flushed, so guest is allowed to access the
  *   freed pages.
- *   And we increase kvm-tlbs_dirty to delay tlbs flush in this case.
+ *   We set tlbs_dirty to let the notifier know this change and delay the flush
+ *   until such a case actually happens.
  */
 static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
@@ -942,7 +943,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp)
return -EINVAL;
 
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, sp-spt[i], gpte)) {
-   vcpu-kvm-tlbs_dirty++;
+   vcpu-kvm-tlbs_dirty = true;
continue;
}
 
@@ -957,7 +958,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp)
 
if (gfn != sp-gfns[i]) {
drop_spte(vcpu-kvm, sp-spt[i]);
-   vcpu-kvm-tlbs_dirty++;
+   vcpu-kvm-tlbs_dirty = true;
continue;
}
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f5937b8..ed1cc89 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -401,7 +401,7 @@ struct kvm {
unsigned long mmu_notifier_seq;
long mmu_notifier_count;
 #endif
-   long tlbs_dirty;
+   bool tlbs_dirty;
struct list_head devices;
 };
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a9e999a..51744da 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -186,12 +186,15 @@ static bool make_all_cpus_request(struct kvm *kvm, 
unsigned int req)
 
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
-   long dirty_count = kvm-tlbs_dirty;
-
-   smp_mb();
if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
++kvm-stat.remote_tlb_flush;
-   cmpxchg(kvm-tlbs_dirty, dirty_count, 0);
+   /*
+* tlbs_dirty is used only for optimizing x86's shadow paging code with
+* mmu notifiers in mind, see the note on sync_page().  Since it is
+* always protected with mmu_lock there, should kvm_flush_remote_tlbs()
+* be called before releasing mmu_lock, this is safe.
+*/
+   kvm-tlbs_dirty = false;
 }
 EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH B] KVM: Explain tlbs_dirty trick in kvm_flush_remote_tlbs()

2014-02-18 Thread Takuya Yoshikawa
When this was introduced, kvm_flush_remote_tlbs() could be called
without holding mmu_lock.  It is now acknowledged that the function
must be called before releasing mmu_lock, and all callers have already
been changed to do so.

This patch adds a comment explaining this.

Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
---
 virt/kvm/kvm_main.c |9 +
 1 file changed, 9 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a9e999a..53521ea 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -184,6 +184,15 @@ static bool make_all_cpus_request(struct kvm *kvm, 
unsigned int req)
return called;
 }
 
+/*
+ * tlbs_dirty is used only for optimizing x86's shadow paging code with mmu
+ * notifiers in mind, see the note on sync_page().  Since it is always 
protected
+ * with mmu_lock there, should kvm_flush_remote_tlbs() be called before
+ * releasing mmu_lock, the trick using smp_mb() and cmpxchg() is not necessary.
+ *
+ * Currently, the assumption about kvm_flush_remote_tlbs() callers is true, but
+ * the code is kept as is for someone who may change the rule in the future.
+ */
 void kvm_flush_remote_tlbs(struct kvm *kvm)
 {
long dirty_count = kvm-tlbs_dirty;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH A] KVM: Simplify kvm-tlbs_dirty handling

2014-02-18 Thread Takuya Yoshikawa

(2014/02/18 18:43), Xiao Guangrong wrote:

On 02/18/2014 04:22 PM, Takuya Yoshikawa wrote:

When this was introduced, kvm_flush_remote_tlbs() could be called
without holding mmu_lock.  It is now acknowledged that the function
must be called before releasing mmu_lock, and all callers have already
been changed to do so.



I have already posted the patch to moving flush tlb out of mmu-lock
when do dirty tracking:
KVM: MMU: flush tlb out of mmu lock when write-protect the sptes


Yes, I had your patch in mind, so made patch B.



Actually some patches (patch 1 - patch 4) in that patchset can be
pickup-ed separately and i should apologize for the patchset was
not updated for a long time since i was busy on other development.


Looking forward to seeing that work again!



And moving tlb flush out of mmu-lock should be going on in the
future, so i think the B patch is more acceptable.



Maybe.

Some time ago, someone working on non-x86 stuff asked us why
smp_mb() and cmpxchg() were used there.  So a brief comment
which points them to the note on sync_page() will be helpful.

Takuya

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH A] KVM: Simplify kvm-tlbs_dirty handling

2014-02-18 Thread Takuya Yoshikawa

(2014/02/18 18:07), Paolo Bonzini wrote:

Il 18/02/2014 09:22, Takuya Yoshikawa ha scritto:

When this was introduced, kvm_flush_remote_tlbs() could be called
without holding mmu_lock.  It is now acknowledged that the function
must be called before releasing mmu_lock, and all callers have already
been changed to do so.

There is no need to use smp_mb() and cmpxchg() any more.

Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp


I prefer this patch, and in fact we can make it even simpler:

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ed1cc89..9816b68 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -401,7 +401,9 @@ struct kvm {
unsigned long mmu_notifier_seq;
long mmu_notifier_count;
  #endif
+   /* Protected by mmu_lock */
bool tlbs_dirty;
+
struct list_head devices;
  };

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 51744da..f5668a4 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -188,12 +188,6 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
  {
if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
++kvm-stat.remote_tlb_flush;
-   /*
-* tlbs_dirty is used only for optimizing x86's shadow paging code with
-* mmu notifiers in mind, see the note on sync_page().  Since it is
-* always protected with mmu_lock there, should kvm_flush_remote_tlbs()
-* be called before releasing mmu_lock, this is safe.
-*/
kvm-tlbs_dirty = false;
  }
  EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);


What do you think?


I agree.

But please check Xiao's comment and my reply to it.

Takuya



Paolo

---
  arch/x86/kvm/paging_tmpl.h |7 ---
  include/linux/kvm_host.h   |2 +-
  virt/kvm/kvm_main.c|   11 +++
  3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index cba218a..b1e6c1b 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -913,7 +913,8 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu 
*vcpu, gva_t vaddr,
   *   and kvm_mmu_notifier_invalidate_range_start detect the mapping page isn't
   *   used by guest then tlbs are not flushed, so guest is allowed to access 
the
   *   freed pages.
- *   And we increase kvm-tlbs_dirty to delay tlbs flush in this case.
+ *   We set tlbs_dirty to let the notifier know this change and delay the flush
+ *   until such a case actually happens.
   */
  static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
  {
@@ -942,7 +943,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp)
return -EINVAL;

if (FNAME(prefetch_invalid_gpte)(vcpu, sp, sp-spt[i], gpte)) {
-   vcpu-kvm-tlbs_dirty++;
+   vcpu-kvm-tlbs_dirty = true;
continue;
}

@@ -957,7 +958,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp)

if (gfn != sp-gfns[i]) {
drop_spte(vcpu-kvm, sp-spt[i]);
-   vcpu-kvm-tlbs_dirty++;
+   vcpu-kvm-tlbs_dirty = true;
continue;
}

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f5937b8..ed1cc89 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -401,7 +401,7 @@ struct kvm {
unsigned long mmu_notifier_seq;
long mmu_notifier_count;
  #endif
-   long tlbs_dirty;
+   bool tlbs_dirty;
struct list_head devices;
  };

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a9e999a..51744da 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -186,12 +186,15 @@ static bool make_all_cpus_request(struct kvm *kvm, 
unsigned int req)

  void kvm_flush_remote_tlbs(struct kvm *kvm)
  {
-   long dirty_count = kvm-tlbs_dirty;
-
-   smp_mb();
if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
++kvm-stat.remote_tlb_flush;
-   cmpxchg(kvm-tlbs_dirty, dirty_count, 0);
+   /*
+* tlbs_dirty is used only for optimizing x86's shadow paging code with
+* mmu notifiers in mind, see the note on sync_page().  Since it is
+* always protected with mmu_lock there, should kvm_flush_remote_tlbs()
+* be called before releasing mmu_lock, this is safe.
+*/
+   kvm-tlbs_dirty = false;
  }
  EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: Use cond_resched() directly and remove useless kvm_resched()

2013-12-12 Thread Takuya Yoshikawa
Since the commit 15ad7146 (KVM: Use the scheduler preemption notifiers
to make kvm preemptible), the remaining stuff in this function is a
simple cond_resched() call with an extra need_resched() check which was
there to avoid dropping VCPUs unnecessarily.  Now it is meaningless.

Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
---
 arch/ia64/kvm/kvm-ia64.c |2 +-
 arch/powerpc/kvm/book3s_hv.c |2 +-
 arch/x86/kvm/x86.c   |2 +-
 include/linux/kvm_host.h |1 -
 virt/kvm/kvm_main.c  |8 
 5 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 985bf80..53f44be 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -702,7 +702,7 @@ again:
 out:
srcu_read_unlock(vcpu-kvm-srcu, idx);
if (r  0) {
-   kvm_resched(vcpu);
+   cond_resched();
idx = srcu_read_lock(vcpu-kvm-srcu);
goto again;
}
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 072287f..3fa99b2 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1348,7 +1348,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
kvm_guest_exit();
 
preempt_enable();
-   kvm_resched(vcpu);
+   cond_resched();
 
spin_lock(vc-lock);
now = get_tb();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5d004da..21748bb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6089,7 +6089,7 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
}
if (need_resched()) {
srcu_read_unlock(kvm-srcu, vcpu-srcu_idx);
-   kvm_resched(vcpu);
+   cond_resched();
vcpu-srcu_idx = srcu_read_lock(kvm-srcu);
}
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9523d2a..4ecf107 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -583,7 +583,6 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 bool kvm_vcpu_yield_to(struct kvm_vcpu *target);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
-void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4f588bc..3a33c98 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1710,14 +1710,6 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
 #endif /* !CONFIG_S390 */
 
-void kvm_resched(struct kvm_vcpu *vcpu)
-{
-   if (!need_resched())
-   return;
-   cond_resched();
-}
-EXPORT_SYMBOL_GPL(kvm_resched);
-
 bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
 {
struct pid *pid;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: x86: Add comment on vcpu_enter_guest()'s return value

2013-12-12 Thread Takuya Yoshikawa
Giving proper names to the 0 and 1 was once suggested.  But since 0 is
returned to the userspace, giving it another name can introduce extra
confusion.  This patch just explains the meanings instead.

Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
---
 arch/x86/kvm/x86.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21748bb..65499fc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5834,6 +5834,11 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
 }
 
+/*
+ * Returns 1 to let __vcpu_run() continue the guest execution loop without
+ * exiting to the userspace.  Otherwise, the value will be returned to the
+ * userspace.
+ */
 static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 {
int r;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: Use cond_resched() directly and remove useless kvm_resched()

2013-12-12 Thread Takuya Yoshikawa
Since the commit 15ad7146 (KVM: Use the scheduler preemption notifiers
to make kvm preemptible), the remaining stuff in this function is a
simple cond_resched() call with an extra need_resched() check which was
there to avoid dropping VCPUs unnecessarily.  Now it is meaningless.

Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
---
 arch/ia64/kvm/kvm-ia64.c |2 +-
 arch/powerpc/kvm/book3s_hv.c |2 +-
 arch/x86/kvm/x86.c   |2 +-
 include/linux/kvm_host.h |1 -
 virt/kvm/kvm_main.c  |8 
 5 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 985bf80..53f44be 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -702,7 +702,7 @@ again:
 out:
srcu_read_unlock(vcpu-kvm-srcu, idx);
if (r  0) {
-   kvm_resched(vcpu);
+   cond_resched();
idx = srcu_read_lock(vcpu-kvm-srcu);
goto again;
}
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 072287f..3fa99b2 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1348,7 +1348,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
kvm_guest_exit();
 
preempt_enable();
-   kvm_resched(vcpu);
+   cond_resched();
 
spin_lock(vc-lock);
now = get_tb();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5d004da..21748bb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6089,7 +6089,7 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
}
if (need_resched()) {
srcu_read_unlock(kvm-srcu, vcpu-srcu_idx);
-   kvm_resched(vcpu);
+   cond_resched();
vcpu-srcu_idx = srcu_read_lock(kvm-srcu);
}
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9523d2a..4ecf107 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -583,7 +583,6 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 bool kvm_vcpu_yield_to(struct kvm_vcpu *target);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
-void kvm_resched(struct kvm_vcpu *vcpu);
 void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
 void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4f588bc..3a33c98 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1710,14 +1710,6 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu)
 EXPORT_SYMBOL_GPL(kvm_vcpu_kick);
 #endif /* !CONFIG_S390 */
 
-void kvm_resched(struct kvm_vcpu *vcpu)
-{
-   if (!need_resched())
-   return;
-   cond_resched();
-}
-EXPORT_SYMBOL_GPL(kvm_resched);
-
 bool kvm_vcpu_yield_to(struct kvm_vcpu *target)
 {
struct pid *pid;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] RFC: KVM: Simple optimization based on Xiao's patch

2013-08-29 Thread Takuya Yoshikawa
I think this patch set answers Gleb's comment.

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: Stop using extra buffer for copying dirty_bitmap to user-space

2013-08-29 Thread Takuya Yoshikawa
Now that mmu_lock is held only inside kvm_mmu_write_protect_pt_masked(),
we can do __put_user() for copying each 64/32 dirty bits to user-space.

This eliminates the need to copy the whole bitmap to an extra buffer and
the resulting code is much more cache friendly than before.

Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
Cc: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 arch/x86/kvm/x86.c  |   18 --
 virt/kvm/kvm_main.c |6 +-
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1d1f6df..79e8ad0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3522,7 +3522,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
struct kvm_memory_slot *memslot;
unsigned long n, i;
unsigned long *dirty_bitmap;
-   unsigned long *dirty_bitmap_buffer;
+   unsigned long __user *p_user;
bool is_dirty = false;
 
mutex_lock(kvm-slots_lock);
@@ -3539,11 +3539,12 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
goto out;
 
n = kvm_dirty_bitmap_bytes(memslot);
+   r = -EFAULT;
+   if (clear_user(log-dirty_bitmap, n))
+   goto out;
 
-   dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
-   memset(dirty_bitmap_buffer, 0, n);
-
-   for (i = 0; i  n / sizeof(long); i++) {
+   p_user = (unsigned long __user *)log-dirty_bitmap;
+   for (i = 0; i  n / sizeof(long); i++, p_user++) {
unsigned long mask;
gfn_t offset;
 
@@ -3553,7 +3554,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
is_dirty = true;
 
mask = xchg(dirty_bitmap[i], 0);
-   dirty_bitmap_buffer[i] = mask;
+   if (__put_user(mask, p_user))
+   goto out;
 
offset = i * BITS_PER_LONG;
kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
@@ -3561,10 +3563,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
if (is_dirty)
kvm_flush_remote_tlbs(kvm);
 
-   r = -EFAULT;
-   if (copy_to_user(log-dirty_bitmap, dirty_bitmap_buffer, n))
-   goto out;
-
r = 0;
 out:
mutex_unlock(kvm-slots_lock);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bf040c4..c919f58 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -626,14 +626,10 @@ static int kvm_vm_release(struct inode *inode, struct 
file *filp)
return 0;
 }
 
-/*
- * Allocation size is twice as large as the actual dirty bitmap size.
- * See x86's kvm_vm_ioctl_get_dirty_log() why this is needed.
- */
 static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
 {
 #ifndef CONFIG_S390
-   unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
+   unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(memslot);
 
memslot-dirty_bitmap = kvm_kvzalloc(dirty_bytes);
if (!memslot-dirty_bitmap)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: Take mmu_lock only while write-protecting pages in get_dirty_log

2013-08-29 Thread Takuya Yoshikawa
Xiao's KVM: MMU: flush tlb if the spte can be locklessly modified
allows us to release mmu_lock before flushing TLBs.

Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
Cc: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
 Xiao can change the remaining mmu_lock to RCU's read-side lock:
 The grace period will be reasonably limited.

 arch/x86/kvm/mmu.c |4 
 arch/x86/kvm/x86.c |4 
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5d9efb1..c6da9ba 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1249,6 +1249,8 @@ void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 {
unsigned long *rmapp;
 
+   spin_lock(kvm-mmu_lock);
+
while (mask) {
rmapp = __gfn_to_rmap(slot-base_gfn + gfn_offset + __ffs(mask),
  PT_PAGE_TABLE_LEVEL, slot);
@@ -1257,6 +1259,8 @@ void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
/* clear the first set bit */
mask = mask - 1;
}
+
+   spin_unlock(kvm-mmu_lock);
 }
 
 static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e5ca72a..1d1f6df 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3543,8 +3543,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
memset(dirty_bitmap_buffer, 0, n);
 
-   spin_lock(kvm-mmu_lock);
-
for (i = 0; i  n / sizeof(long); i++) {
unsigned long mask;
gfn_t offset;
@@ -3563,8 +3561,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
if (is_dirty)
kvm_flush_remote_tlbs(kvm);
 
-   spin_unlock(kvm-mmu_lock);
-
r = -EFAULT;
if (copy_to_user(log-dirty_bitmap, dirty_bitmap_buffer, n))
goto out;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/12] KVM: MMU: allow locklessly access shadow page table out of vcpu thread

2013-08-07 Thread Takuya Yoshikawa
On Tue, 30 Jul 2013 21:02:08 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:

 @@ -2342,6 +2358,13 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
*/
   kvm_flush_remote_tlbs(kvm);
  
 + if (kvm-arch.rcu_free_shadow_page) {
 + sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
 + list_del_init(invalid_list);
 + call_rcu(sp-rcu, free_pages_rcu);
 + return;
 + }
 +
   list_for_each_entry_safe(sp, nsp, invalid_list, link) {
   WARN_ON(!sp-role.invalid || sp-root_count);
   kvm_mmu_free_page(sp);

Shouldn't we avoid calling call_rcu() when we are holding mmu_lock?

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect

2013-08-02 Thread Takuya Yoshikawa
 up spte_write_protect
 
  arch/x86/include/asm/kvm_host.h |  10 +-
  arch/x86/kvm/mmu.c  | 442 
 
  arch/x86/kvm/mmu.h  |  28 +++
  arch/x86/kvm/x86.c  |  19 +-
  4 files changed, 356 insertions(+), 143 deletions(-)
 
 -- 
 1.8.1.4
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Takuya Yoshikawa takuya.yoshik...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: reset arch memslot info on memslot creation

2013-07-11 Thread Takuya Yoshikawa
On Thu, 11 Jul 2013 10:41:53 +0300
Gleb Natapov g...@redhat.com wrote:

 On Wed, Jul 10, 2013 at 10:49:56PM +0900, Takuya Yoshikawa wrote:
  On Wed, 10 Jul 2013 11:24:39 +0300
  Michael S. Tsirkin m...@redhat.com wrote:
  
   On x86, kvm_arch_create_memslot assumes that rmap/lpage_info for the
   slot are zeroed out: if they weren't, error handling code after out_free
   label will free memory which wasn't allocated here.
   This always happens to be the case because on KVM_MR_DELETE we clear the
   whole arch structure.  So there's no bug, but it's cleaner not to rely
   on this here.
  
  Yes, the assumption is that the function is called only with zero-sized 
  slots.
  Since changing the size is not allowed, DELETE-CREATE is the only case we
  care about.
  
  But isn't it possible to make it explicit that zero-sized slots have always
  zero-cleared contents instead?  Otherwise, there would be many troubles.
  
 Do you have something in mind?
 

I remember that I once wrote code that assumed flags field was cleared before
creating a new slot and was pointed out that such assumptions might be 
dangerous:
actually, it's cleared separately but not so easy to notice.

So, I want to make it clear what differentiate DELETE'ed slots from others.

If we only assume (npages == 0), CREATE should properly set everything,
having out_free troubles in mind like this patch.  If we also assume the other
fields are zero, then DELETE is responsible for that assumption, some comment
in code may be helpful.

Actually, (rmap==NULL) was once used to check if we needed to allocate memory
for a new slot, meaning that we assumed the latter.  I felt uneasy and changed
that to (npages == 0).

Let's make it clear the underlying assumptions now.

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: reset arch memslot info on memslot creation

2013-07-10 Thread Takuya Yoshikawa
On Wed, 10 Jul 2013 11:24:39 +0300
Michael S. Tsirkin m...@redhat.com wrote:

 On x86, kvm_arch_create_memslot assumes that rmap/lpage_info for the
 slot are zeroed out: if they weren't, error handling code after out_free
 label will free memory which wasn't allocated here.
 This always happens to be the case because on KVM_MR_DELETE we clear the
 whole arch structure.  So there's no bug, but it's cleaner not to rely
 on this here.

Yes, the assumption is that the function is called only with zero-sized slots.
Since changing the size is not allowed, DELETE-CREATE is the only case we
care about.

But isn't it possible to make it explicit that zero-sized slots have always
zero-cleared contents instead?  Otherwise, there would be many troubles.

Takuya

 
 Make the code more robust by clearing the rmap/lpage_info explicitly.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  arch/x86/kvm/x86.c | 4 
  1 file changed, 4 insertions(+)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index e8ba99c..96e6eb4 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -6922,6 +6922,10 @@ int kvm_arch_create_memslot(struct kvm_memory_slot 
 *slot, unsigned long npages)
  {
   int i;
  
 + /* Reset in case slot had some rmap/lpage_info. */
 + memset(slot-arch.rmap, 0, sizeof slot-arch.rmap);
 + memset(slot-arch.lpage_info, 0, sizeof slot-arch.lpage_info);
 +
   for (i = 0; i  KVM_NR_PAGE_SIZES; ++i) {
   unsigned long ugfn;
   int lpages;
 -- 
 MST
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Takuya Yoshikawa takuya.yoshik...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: Avoid zapping mmio sptes twice for generation wraparound

2013-07-03 Thread Takuya Yoshikawa
Since kvm_arch_prepare_memory_region() is called right after installing
the slot marked invalid, wraparound checking should be there to avoid
zapping mmio sptes when mmio generation is still MMIO_MAX_GEN - 1.

Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
---
 This seems to be the simplest solution for fixing the off-by-one issue
 we discussed before.

 arch/x86/kvm/mmu.c |5 +
 arch/x86/kvm/x86.c |7 +++
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0d094da..bf7af1e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4383,11 +4383,8 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm)
/*
 * The very rare case: if the generation-number is round,
 * zap all shadow pages.
-*
-* The max value is MMIO_MAX_GEN - 1 since it is not called
-* when mark memslot invalid.
 */
-   if (unlikely(kvm_current_mmio_generation(kvm) = (MMIO_MAX_GEN - 1))) {
+   if (unlikely(kvm_current_mmio_generation(kvm) = MMIO_MAX_GEN)) {
printk_ratelimited(KERN_INFO kvm: zapping shadow pages for 
mmio generation wraparound\n);
kvm_mmu_invalidate_zap_all_pages(kvm);
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7d71c0f..9ddd4ff 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7046,6 +7046,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
memslot-userspace_addr = userspace_addr;
}
 
+   /*
+* In these cases, slots-generation has been increased for marking the
+* slot invalid, so we need wraparound checking here.
+*/
+   if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE))
+   kvm_mmu_invalidate_mmio_sptes(kvm);
+
return 0;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Avoid zapping mmio sptes twice for generation wraparound

2013-07-03 Thread Takuya Yoshikawa
On Wed, 03 Jul 2013 16:39:25 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:

 Please wait a while. I can not understand it very clearly.
 
 This conditional check will cause caching a overflow value into mmio spte.
 The simple case is that kvm adds new slots for many times, the mmio-gen is 
 easily
 more than MMIO_MAX_GEN.

Unconditional checking in commit_memory_region() is still there
to treat such cases.

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Avoid zapping mmio sptes twice for generation wraparound

2013-07-03 Thread Takuya Yoshikawa
On Wed, 03 Jul 2013 10:53:51 +0200
Paolo Bonzini pbonz...@redhat.com wrote:

 Il 03/07/2013 10:50, Xiao Guangrong ha scritto:
   Please wait a while. I can not understand it very clearly.
   
   This conditional check will cause caching a overflow value into mmio 
   spte.
   The simple case is that kvm adds new slots for many times, the mmio-gen 
   is easily
   more than MMIO_MAX_GEN.
   
  Actually, the double zapping can be avoided by moving 
  kvm_mmu_invalidate_mmio_sptes to
  the end of install_new_memslots().
  
  
 
 Yes, the actual operation would be the same as this patch.  You can
 rename kvm_mmu_invalidate_mmio_sptes to kvm_arch_memslots_installed, or
 something like that.  But it would have to touch all architectures.

I tried to avoid introducing x86-centric code into the generic one.

If another arch can gain something by such function, I'm willing to
touch all arch code.

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Avoid zapping mmio sptes twice for generation wraparound

2013-07-03 Thread Takuya Yoshikawa
On Wed, 3 Jul 2013 12:10:57 +0300
Gleb Natapov g...@redhat.com wrote:

  Yes, makes sense.  However, this patch is still an improvement because
  the current code is too easily mistaken for an off-by-one bug.
  
  Any improvements to the API can go on top.
  
 If Takuya will send the proper fix shortly I do not see any reason to
 apply this one. It does not fix any bug.

No problem.

Please give me a day or so.
Thank you everyone for revewing this patch!

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] KVM: Introduce kvm_arch_memslots_updated() and use it for x86's mmio optimization

2013-07-03 Thread Takuya Yoshikawa
Patch 1: KVM-arch maintainers, please review this one.
  {x86, power, s390, arm}-kvm maintainers CCed.
  Could not find mips-kvm maintainer in MAINTAINERS.

Patch 2: I did not move the body of kvm_mmu_invalidate_mmio_sptes() into
  x86.c because it looked like mmu details.

Takuya Yoshikawa (2):
  KVM: Introduce kvm_arch_memslots_updated()
  KVM: x86: Avoid zapping mmio sptes twice for generation wraparound

 arch/arm/kvm/arm.c |4 
 arch/ia64/kvm/kvm-ia64.c   |4 
 arch/mips/kvm/kvm_mips.c   |4 
 arch/powerpc/kvm/powerpc.c |4 
 arch/s390/kvm/kvm-s390.c   |4 
 arch/x86/kvm/mmu.c |5 +
 arch/x86/kvm/x86.c |   14 +-
 include/linux/kvm_host.h   |1 +
 virt/kvm/kvm_main.c|5 -
 9 files changed, 35 insertions(+), 10 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: Introduce kvm_arch_memslots_updated()

2013-07-03 Thread Takuya Yoshikawa
This is called right after the memslots is updated, i.e. when the result
of update_memslots() gets installed in install_new_memslots().  Since
the memslots needs to be updated twice when we delete or move a memslot,
kvm_arch_commit_memory_region() does not correspond to this exactly.

In the following patch, x86 will use this new API to check if the mmio
generation has reached its maximum value, in which case mmio sptes need
to be flushed out.

Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
---
 Removed the trailing space after return old_memslots; at this chance.

 arch/arm/kvm/arm.c |4 
 arch/ia64/kvm/kvm-ia64.c   |4 
 arch/mips/kvm/kvm_mips.c   |4 
 arch/powerpc/kvm/powerpc.c |4 
 arch/s390/kvm/kvm-s390.c   |4 
 arch/x86/kvm/x86.c |4 
 include/linux/kvm_host.h   |1 +
 virt/kvm/kvm_main.c|5 -
 8 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 327a1fb..51c3659 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -219,6 +219,10 @@ long kvm_arch_dev_ioctl(struct file *filp,
return -EINVAL;
 }
 
+void kvm_arch_memslots_updated(struct kvm *kvm)
+{
+}
+
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
   struct kvm_memory_slot *memslot,
   struct kvm_userspace_memory_region *mem,
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 5b2dc0d..bdfd878 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1560,6 +1560,10 @@ int kvm_arch_create_memslot(struct kvm_memory_slot 
*slot, unsigned long npages)
return 0;
 }
 
+void kvm_arch_memslots_updated(struct kvm *kvm)
+{
+}
+
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
struct kvm_memory_slot *memslot,
struct kvm_userspace_memory_region *mem,
diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c
index e0dad02..daa32ea 100644
--- a/arch/mips/kvm/kvm_mips.c
+++ b/arch/mips/kvm/kvm_mips.c
@@ -208,6 +208,10 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, 
unsigned long npages)
return 0;
 }
 
+void kvm_arch_memslots_updated(struct kvm *kvm)
+{
+}
+
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
 struct kvm_memory_slot *memslot,
 struct kvm_userspace_memory_region *mem,
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6316ee3..ae63ae4 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -420,6 +420,10 @@ int kvm_arch_create_memslot(struct kvm_memory_slot *slot, 
unsigned long npages)
return kvmppc_core_create_memslot(slot, npages);
 }
 
+void kvm_arch_memslots_updated(struct kvm *kvm)
+{
+}
+
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
   struct kvm_memory_slot *memslot,
   struct kvm_userspace_memory_region *mem,
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index ba694d2..a3d797b 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1056,6 +1056,10 @@ int kvm_arch_create_memslot(struct kvm_memory_slot 
*slot, unsigned long npages)
return 0;
 }
 
+void kvm_arch_memslots_updated(struct kvm *kvm)
+{
+}
+
 /* Section: memory related */
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
   struct kvm_memory_slot *memslot,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7d71c0f..71912c9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7020,6 +7020,10 @@ out_free:
return -ENOMEM;
 }
 
+void kvm_arch_memslots_updated(struct kvm *kvm)
+{
+}
+
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
struct kvm_memory_slot *memslot,
struct kvm_userspace_memory_region *mem,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e3aae6d..1c1e9de 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -498,6 +498,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 void kvm_arch_free_memslot(struct kvm_memory_slot *free,
   struct kvm_memory_slot *dont);
 int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long 
npages);
+void kvm_arch_memslots_updated(struct kvm *kvm);
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
struct kvm_memory_slot *memslot,
struct kvm_userspace_memory_region *mem,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1580dd4..4ed9d89 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -731,7 +731,10 @@ static struct kvm_memslots *install_new_memslots(struct 
kvm *kvm,
update_memslots(slots, new, kvm-memslots-generation);
rcu_assign_pointer(kvm-memslots, slots

[PATCH v2 2/2] KVM: x86: Avoid zapping mmio sptes twice for generation wraparound

2013-07-03 Thread Takuya Yoshikawa
Now that kvm_arch_memslots_updated() catches every increment of the
memslots-generation, checking if the mmio generation has reached its
maximum value is enough.

Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
---
 arch/x86/kvm/mmu.c |5 +
 arch/x86/kvm/x86.c |   10 +-
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0d094da..bf7af1e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4383,11 +4383,8 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm)
/*
 * The very rare case: if the generation-number is round,
 * zap all shadow pages.
-*
-* The max value is MMIO_MAX_GEN - 1 since it is not called
-* when mark memslot invalid.
 */
-   if (unlikely(kvm_current_mmio_generation(kvm) = (MMIO_MAX_GEN - 1))) {
+   if (unlikely(kvm_current_mmio_generation(kvm) = MMIO_MAX_GEN)) {
printk_ratelimited(KERN_INFO kvm: zapping shadow pages for 
mmio generation wraparound\n);
kvm_mmu_invalidate_zap_all_pages(kvm);
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 71912c9..9be6621 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7022,6 +7022,11 @@ out_free:
 
 void kvm_arch_memslots_updated(struct kvm *kvm)
 {
+   /*
+* memslots-generation has been incremented.
+* mmio generation may have reached its maximum value.
+*/
+   kvm_mmu_invalidate_mmio_sptes(kvm);
 }
 
 int kvm_arch_prepare_memory_region(struct kvm *kvm,
@@ -7084,11 +7089,6 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
 */
if ((change != KVM_MR_DELETE)  (mem-flags  KVM_MEM_LOG_DIRTY_PAGES))
kvm_mmu_slot_remove_write_access(kvm, mem-slot);
-   /*
-* If memory slot is created, or moved, we need to clear all
-* mmio sptes.
-*/
-   kvm_mmu_invalidate_mmio_sptes(kvm);
 }
 
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: MMU: Inform users of mmio generation wraparound

2013-06-20 Thread Takuya Yoshikawa
Without this information, users will just see unexpected performance
problems and there is little chance we will get good reports from them:
note that mmio generation is increased even when we just start, or stop,
dirty logging for some memory slot, in which case users should never
expect all shadow pages to be zapped.

Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
---
 arch/x86/kvm/mmu.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c60c5da..bc8302f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4385,8 +4385,10 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm)
 * The max value is MMIO_MAX_GEN - 1 since it is not called
 * when mark memslot invalid.
 */
-   if (unlikely(kvm_current_mmio_generation(kvm) = (MMIO_MAX_GEN - 1)))
+   if (unlikely(kvm_current_mmio_generation(kvm) = (MMIO_MAX_GEN - 1))) {
+   printk(KERN_INFO kvm: zapping shadow pages for mmio generation 
wraparound);
kvm_mmu_invalidate_zap_all_pages(kvm);
+   }
 }
 
 static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: Inform users of mmio generation wraparound

2013-06-20 Thread Takuya Yoshikawa
On Thu, 20 Jun 2013 14:45:04 +0300
Gleb Natapov g...@redhat.com wrote:

 On Thu, Jun 20, 2013 at 12:59:54PM +0200, Paolo Bonzini wrote:
  Il 20/06/2013 10:59, Takuya Yoshikawa ha scritto:
   Without this information, users will just see unexpected performance
   problems and there is little chance we will get good reports from them:
   note that mmio generation is increased even when we just start, or stop,
   dirty logging for some memory slot, in which case users should never
   expect all shadow pages to be zapped.
   
   Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
   ---
arch/x86/kvm/mmu.c |4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
   
   diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
   index c60c5da..bc8302f 100644
   --- a/arch/x86/kvm/mmu.c
   +++ b/arch/x86/kvm/mmu.c
   @@ -4385,8 +4385,10 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm)
  * The max value is MMIO_MAX_GEN - 1 since it is not called
  * when mark memslot invalid.
  */
   - if (unlikely(kvm_current_mmio_generation(kvm) = (MMIO_MAX_GEN - 1)))
   + if (unlikely(kvm_current_mmio_generation(kvm) = (MMIO_MAX_GEN - 1))) {
   + printk(KERN_INFO kvm: zapping shadow pages for mmio generation 
   wraparound);
  
  This should at least be rate-limited, because it is guest triggerable.
  
 It will be hard for guest to triggers it 1  19 times too fast though.

I think guest-triggerable zap_all itself is a threat for the host, rather
than a matter of log flooding, even if it can be preempted.

 
  But why isn't the kvm_mmu_invalidate_zap_all_pages tracepoint enough?
  
 This one will trigger during slot deletion/move too.
 
 I would put it in to see if it actually triggers in some real world
 workloads (skipping the firs wraparound since it is intentional),
 we can always drop it if it will turn out to create a lot of noise.
  

This patch is not for developers but for end users: of course they do not
use tracers during running their services normally.

If they see mysterious peformance problems induced by this wraparound, the only
way to know the cause later is by this kind of information in the syslog.
So even the first wraparound may better be printed out IMO.

I want to let administrators know the cause if possible, any better way?

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: Inform users of mmio generation wraparound

2013-06-20 Thread Takuya Yoshikawa
On Thu, 20 Jun 2013 15:54:38 +0300
Gleb Natapov g...@redhat.com wrote:

 On Thu, Jun 20, 2013 at 09:28:37PM +0900, Takuya Yoshikawa wrote:
  On Thu, 20 Jun 2013 14:45:04 +0300
  Gleb Natapov g...@redhat.com wrote:
  
   On Thu, Jun 20, 2013 at 12:59:54PM +0200, Paolo Bonzini wrote:
Il 20/06/2013 10:59, Takuya Yoshikawa ha scritto:
 Without this information, users will just see unexpected performance
 problems and there is little chance we will get good reports from 
 them:
 note that mmio generation is increased even when we just start, or 
 stop,
 dirty logging for some memory slot, in which case users should never
 expect all shadow pages to be zapped.
 
 Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
 ---
  arch/x86/kvm/mmu.c |4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index c60c5da..bc8302f 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -4385,8 +4385,10 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm 
 *kvm)
* The max value is MMIO_MAX_GEN - 1 since it is not called
* when mark memslot invalid.
*/
 - if (unlikely(kvm_current_mmio_generation(kvm) = (MMIO_MAX_GEN 
 - 1)))
 + if (unlikely(kvm_current_mmio_generation(kvm) = (MMIO_MAX_GEN 
 - 1))) {
 + printk(KERN_INFO kvm: zapping shadow pages for mmio 
 generation wraparound);

This should at least be rate-limited, because it is guest triggerable.

   It will be hard for guest to triggers it 1  19 times too fast though.
  
  I think guest-triggerable zap_all itself is a threat for the host, rather
  than a matter of log flooding, even if it can be preempted.
  
 It's not much we can do about it. Slot removal/creation is triggerable
 through HW emulation registers.

OK, I see.

 
   
But why isn't the kvm_mmu_invalidate_zap_all_pages tracepoint enough?

   This one will trigger during slot deletion/move too.
   
   I would put it in to see if it actually triggers in some real world
   workloads (skipping the firs wraparound since it is intentional),
   we can always drop it if it will turn out to create a lot of noise.

  
  This patch is not for developers but for end users: of course they do not
  use tracers during running their services normally.
  
  If they see mysterious peformance problems induced by this wraparound, the 
  only
  way to know the cause later is by this kind of information in the syslog.
  So even the first wraparound may better be printed out IMO.
 Think about starting hundreds VMs on a freshly booted host. You will see
 hundreds of those pretty quickly.

Yes.

 
  
  I want to let administrators know the cause if possible, any better way?
  
 Not that I can think of. Paolo what about print_once() and ignore first
 wraparound?

Assuming that the first one will be removed someday, it's for debugging anyway,
we can just do print_once() in the future?

That way, admins can check if there is any guest which did some problematic
things.

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: Inform users of mmio generation wraparound

2013-06-20 Thread Takuya Yoshikawa
On Thu, 20 Jun 2013 15:14:42 +0200
Paolo Bonzini pbonz...@redhat.com wrote:

 Il 20/06/2013 14:54, Gleb Natapov ha scritto:
  If they see mysterious peformance problems induced by this wraparound, the 
  only
  way to know the cause later is by this kind of information in the syslog.
  So even the first wraparound may better be printed out IMO.
  Think about starting hundreds VMs on a freshly booted host. You will see
  hundreds of those pretty quickly.
 
 With the change I made to Xiao's patch (changing -13 to -150) you won't
 see it immediately after startup, but the first wraparound may still
 come very soon with a loop that reads the ROM.  (The second takes 5
 minutes).
 
  I want to let administrators know the cause if possible, any better way?
 
  Not that I can think of. Paolo what about print_once() and ignore first
  wraparound?
 
 printk_ratelimited is enough, even without ignoring the first
 wraparound.  It will handle the case of multiple VMs too.

OK, I'm now trying printk_ratelimited() version.
Will send v2 if it works.

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: MMU: Inform users of mmio generation wraparound

2013-06-20 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp

Without this information, users will just see unexpected performance
problems and there is little chance we will get good reports from them:
note that mmio generation is increased even when we just start, or stop,
dirty logging for some memory slot, in which case users cannot expect
all shadow pages to be zapped.

printk_ratelimited() is used for this taking into account the problems
that we can see the information many times when we start multiple VMs
and guests can trigger this by reading ROM in a loop for example.

Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
---
 Interestingly, I saw this information printed twice every time.
 Looks like current_mmio_gen can become mmio_max_gen...

 arch/x86/kvm/mmu.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c60c5da..54e3968 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4385,8 +4385,10 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm)
 * The max value is MMIO_MAX_GEN - 1 since it is not called
 * when mark memslot invalid.
 */
-   if (unlikely(kvm_current_mmio_generation(kvm) = (MMIO_MAX_GEN - 1)))
+   if (unlikely(kvm_current_mmio_generation(kvm) = (MMIO_MAX_GEN - 1))) {
+   printk_ratelimited(KERN_INFO kvm: zapping shadow pages for 
mmio generation wraparound\n);
kvm_mmu_invalidate_zap_all_pages(kvm);
+   }
 }
 
 static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: MMU: Inform users of mmio generation wraparound

2013-06-20 Thread Takuya Yoshikawa
On Thu, 20 Jun 2013 23:29:22 +0200
Paolo Bonzini pbonz...@redhat.com wrote:

  @@ -4385,8 +4385,10 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm)
   * The max value is MMIO_MAX_GEN - 1 since it is not called
   * when mark memslot invalid.
   */
  -   if (unlikely(kvm_current_mmio_generation(kvm) = (MMIO_MAX_GEN - 1)))
  +   if (unlikely(kvm_current_mmio_generation(kvm) = (MMIO_MAX_GEN - 1))) {
 
 There is an off-by-one here (that I can fix in a separate patch).  The
 right test is = MMIO_MAX_GEN, since we have:
 
 #define MMIO_MAX_GEN((1  MMIO_GEN_SHIFT) - 1)
 
 Paolo

As Xiao's comment says, this is intentional because when we delete a
memory slot, we increase slots-generation twice: first for setting
the invalid flag, then for actually commiting the slot.

To avoid this double zap_all(), we need to adjust the generation to
make it really wraparound when we hit:

kvm_current_mmio_generation(kvm) == MMIO_MAX_GEN - 1

 
  +   printk_ratelimited(KERN_INFO kvm: zapping shadow pages for 
  mmio generation wraparound\n);
  kvm_mmu_invalidate_zap_all_pages(kvm);
  +   }
   }

BTW, I'm now thinking of another way to know this information.

If it's possible to check slots-generation, plus current_mmio_gen if needed,
on some events, administrators can know how many times such wraparounds
happened before.

They do not need to do tracing all the time for that, just hitting one event
later makes it possible to know if the guest was doing something problematic
before.

I'm now looking for the best trace point for that.  Probably somewhere in
arch_commit_memory().

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

2013-06-14 Thread Takuya Yoshikawa
On Thu, 13 Jun 2013 21:08:21 -0300
Marcelo Tosatti mtosa...@redhat.com wrote:

 On Fri, Jun 07, 2013 at 04:51:22PM +0800, Xiao Guangrong wrote:

 - Where is the generation number increased?

Looks like when a new slot is installed in update_memslots() because
it's based on slots-generation.  This is not restricted to create
and move.

 - Should use spinlock breakable code in kvm_mmu_zap_mmio_sptes()
 (picture guest with 512GB of RAM, even walking all those pages is
 expensive) (ah, patch to remove kvm_mmu_zap_mmio_sptes does that).
 - Is -13 enough to test wraparound? Its highly likely the guest has 
 not began executing by the time 13 kvm_set_memory_calls are made
 (so no sptes around). Perhaps -2000 is more sensible (should confirm
 though).

In the future, after we've tested enough, we should change the testing
code to be executed only for some debugging configs.  Especially, if we
change zap_mmio_sptes() to zap_all_shadows(), very common guests, even
without huge memory like 512GB, can see the effect induced by sudden page
faults unnecessarily.

If necessary, developers can test the wraparound code by lowering the
max_gen itself anyway.

 - Why remove if (change == KVM_MR_CREATE) || (change
 ==  KVM_MR_MOVE) from kvm_arch_commit_memory_region?
 Its instructive.

There may be a chance that we miss generation wraparounds if we don't
check other cases: seems unlikely, but theoretically possible.

In short, all memory slot changes make mmio sptes stored in shadow pages
obsolete, or zapped for wraparounds, in the new way -- am I right?

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/6] KVM: MMU: make return value of mmio page fault handler more readable

2013-06-10 Thread Takuya Yoshikawa
On Mon, 10 Jun 2013 10:57:50 +0300
Gleb Natapov g...@redhat.com wrote:

 On Fri, Jun 07, 2013 at 04:51:25PM +0800, Xiao Guangrong wrote:

  +
  +/*
  + * Return values of handle_mmio_page_fault_common:
  + * RET_MMIO_PF_EMULATE: it is a real mmio page fault, emulate the 
  instruction
  + *  directly.
  + * RET_MMIO_PF_RETRY: let CPU fault again on the address.
  + * RET_MMIO_PF_BUG: bug is detected.
  + */
  +enum {
  +   RET_MMIO_PF_EMULATE = 1,
  +   RET_MMIO_PF_RETRY = 0,
  +   RET_MMIO_PF_BUG = -1
  +};
 I would order them from -1 to 1 and rename RET_MMIO_PF_BUG to
 RET_MMIO_PF_ERROR, but no need to resend just for that.

Why not just let compilers select the values? -- It's an enum.
Any reason to make it start from -1?

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

2013-06-10 Thread Takuya Yoshikawa
On Mon, 10 Jun 2013 16:39:37 +0800
Xiao Guangrong xiaoguangrong.e...@gmail.com wrote:

 On 06/10/2013 03:56 PM, Gleb Natapov wrote:
  On Fri, Jun 07, 2013 at 04:51:22PM +0800, Xiao Guangrong wrote:

  Looks good to me, but doesn't tis obsolete kvm_mmu_zap_mmio_sptes() and
  sp-mmio_cached, so they should be removed as part of the patch series?
 
 Yes, i agree, they should be removed. :)

I'm fine with removing it but please make it clear that you all agree
on the same basis.

Last time, Paolo mentioned the possibility to use some bits of spte for
other things.  The suggestion there was to keep sp-mmio_cached code
for the time we would need to reduce the bits for generation numbers.

Do you think that zap_all() is now preemptible and can treat the
situation reasonably well as the current kvm_mmu_zap_mmio_sptes()?

One downside is the need to zap unrelated shadow pages, but if this case
is really very rare, yes I agree, it should not be a problem: it depends
on how many bits we can use.

Just please reconfirm.

Takuya

 
 There is the patch to do these things:
 
 From bc1bc36e2640059f06c4860af802ecc74e1f3d2d Mon Sep 17 00:00:00 2001
 From: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 Date: Mon, 10 Jun 2013 16:28:55 +0800
 Subject: [PATCH 7/6] KVM: MMU: drop kvm_mmu_zap_mmio_sptes
 
 Drop kvm_mmu_zap_mmio_sptes and use kvm_mmu_invalidate_zap_all_pages
 instead to handle mmio generation number overflow
 
 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/include/asm/kvm_host.h |  1 -
  arch/x86/kvm/mmu.c  | 22 +-
  2 files changed, 1 insertion(+), 22 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index 90d05ed..966f265 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -230,7 +230,6 @@ struct kvm_mmu_page {
  #endif
 
   int write_flooding_count;
 - bool mmio_cached;
  };
 
  struct kvm_pio_request {
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 35cd0b6..c87b19d 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -246,13 +246,11 @@ static unsigned int kvm_current_mmio_generation(struct 
 kvm *kvm)
  static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
  unsigned access)
  {
 - struct kvm_mmu_page *sp =  page_header(__pa(sptep));
   unsigned int gen = kvm_current_mmio_generation(kvm);
   u64 mask = generation_mmio_spte_mask(gen);
 
   access = ACC_WRITE_MASK | ACC_USER_MASK;
   mask |= shadow_mmio_mask | access | gfn  PAGE_SHIFT;
 - sp-mmio_cached = true;
 
   trace_mark_mmio_spte(sptep, gfn, access, gen);
   mmu_spte_set(sptep, mask);
 @@ -4362,24 +4360,6 @@ void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm)
   spin_unlock(kvm-mmu_lock);
  }
 
 -static void kvm_mmu_zap_mmio_sptes(struct kvm *kvm)
 -{
 - struct kvm_mmu_page *sp, *node;
 - LIST_HEAD(invalid_list);
 -
 - spin_lock(kvm-mmu_lock);
 -restart:
 - list_for_each_entry_safe(sp, node, kvm-arch.active_mmu_pages, link) {
 - if (!sp-mmio_cached)
 - continue;
 - if (kvm_mmu_prepare_zap_page(kvm, sp, invalid_list))
 - goto restart;
 - }
 -
 - kvm_mmu_commit_zap_page(kvm, invalid_list);
 - spin_unlock(kvm-mmu_lock);
 -}
 -
  static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
  {
   return unlikely(!list_empty_careful(kvm-arch.zapped_obsolete_pages));
 @@ -4395,7 +4375,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm)
* when mark memslot invalid.
*/
   if (unlikely(kvm_current_mmio_generation(kvm) = (MMIO_MAX_GEN - 1)))
 - kvm_mmu_zap_mmio_sptes(kvm);
 + kvm_mmu_invalidate_zap_all_pages(kvm);
  }
 
  static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
 -- 
 1.8.1.4
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Takuya Yoshikawa takuya.yoshik...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 10/11] KVM: MMU: collapse TLB flushes when zap all pages

2013-05-30 Thread Takuya Yoshikawa
On Thu, 30 May 2013 03:53:38 +0300
Gleb Natapov g...@redhat.com wrote:

 On Wed, May 29, 2013 at 09:19:41PM +0800, Xiao Guangrong wrote:
  On 05/29/2013 08:39 PM, Marcelo Tosatti wrote:
   On Wed, May 29, 2013 at 11:03:19AM +0800, Xiao Guangrong wrote:
   the pages since other vcpus may be doing locklessly shadow
   page walking
  
   Ah, yes, i agree with you.
  
   We can introduce a list, say kvm-arch.obsolte_pages, to link all of the
   zapped-page, the page-shrink will free the page on that list first.
  
   Marcelo, if you do not have objection on  patch 1 ~ 8 and 11, could you 
   please
   let them merged first, and do add some comments and tlb optimization 
   later?
  
   Exclude patch 11 please, since it depends on the collapse optimization.
   
   I'm fine with patch 1 being merged. I think the remaining patches need 
   better
   understanding or explanation. The problems i see are:
   
   1) The magic number 10 to zap before considering reschedule is
   annoying. It would be good to understand why it is needed at all.
  
  ..
  
   
   But then again, the testcase is measuring kvm_mmu_zap_all performance
   alone which we know is not a common operation, so perhaps there is
   no need for that minimum-pages-to-zap-before-reschedule.
  
  Well. Although, this is not the common operation, but this operation
  can be triggered by VCPU - it one VCPU take long time on zap-all-pages,
  other vcpus is missing IPI-synce, or missing IO. This is easily cause
  soft lockups if the vcpu is doing memslot-releated things.
  
 +1. If it is trigarable by a guest it may slow down the guest, but we
 should not allow for it to slow down a host.
 

Well, I don't object to the minimum-pages-to-zap-before-reschedule idea
itself, but if you're going to take patch 4, please at least add a warning
in the changelog that the magic number 10 was selected without good enough
reasoning.

[ It improves kernel building 0.6% ~ 1% ] alone will make it hard for
others to change the number later.

I actually once tried to do a similar thing for other code.  So I have a
possible reasoning for this, and 10 should probably be changed later.

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 10/11] KVM: MMU: collapse TLB flushes when zap all pages

2013-05-30 Thread Takuya Yoshikawa
On Fri, 31 May 2013 01:24:43 +0900
Takuya Yoshikawa takuya.yoshik...@gmail.com wrote:

 On Thu, 30 May 2013 03:53:38 +0300
 Gleb Natapov g...@redhat.com wrote:
 
  On Wed, May 29, 2013 at 09:19:41PM +0800, Xiao Guangrong wrote:
   On 05/29/2013 08:39 PM, Marcelo Tosatti wrote:
On Wed, May 29, 2013 at 11:03:19AM +0800, Xiao Guangrong wrote:
the pages since other vcpus may be doing locklessly shadow
page walking
   
Ah, yes, i agree with you.
   
We can introduce a list, say kvm-arch.obsolte_pages, to link all of 
the
zapped-page, the page-shrink will free the page on that list first.
   
Marcelo, if you do not have objection on  patch 1 ~ 8 and 11, could 
you please
let them merged first, and do add some comments and tlb optimization 
later?
   
Exclude patch 11 please, since it depends on the collapse 
optimization.

I'm fine with patch 1 being merged. I think the remaining patches need 
better
understanding or explanation. The problems i see are:

1) The magic number 10 to zap before considering reschedule is
annoying. It would be good to understand why it is needed at all.
   
   ..
   

But then again, the testcase is measuring kvm_mmu_zap_all performance
alone which we know is not a common operation, so perhaps there is
no need for that minimum-pages-to-zap-before-reschedule.
   
   Well. Although, this is not the common operation, but this operation
   can be triggered by VCPU - it one VCPU take long time on zap-all-pages,
   other vcpus is missing IPI-synce, or missing IO. This is easily cause
   soft lockups if the vcpu is doing memslot-releated things.
   
  +1. If it is trigarable by a guest it may slow down the guest, but we
  should not allow for it to slow down a host.
  
 
 Well, I don't object to the minimum-pages-to-zap-before-reschedule idea
 itself, but if you're going to take patch 4, please at least add a warning
 in the changelog that the magic number 10 was selected without good enough
 reasoning.
 
 [ It improves kernel building 0.6% ~ 1% ] alone will make it hard for
 others to change the number later.
 
 I actually once tried to do a similar thing for other code.  So I have a
 possible reasoning for this, and 10 should probably be changed later.
 

In this case, the solution seems to be very simple: just drop spin_needbreak()
and leave need_resched() alone.

This way we can guarantee that zap-all will get a fair amount of CPU time for
each scheduling from the host scheduler's point of view.  Of course this can
block other VCPU threads waiting for mmu_lock during that time slice, but
should be much better than blocking them for some magical number of zappings.

We also need to remember that spin_needbreak() does not do anything for some
preempt config settings.

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 0/8] KVM: MMU: fast zap all shadow pages

2013-05-16 Thread Takuya Yoshikawa
On Thu, 16 May 2013 20:17:45 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:

 Bechmark result:
 I have tested this patchset and the previous version that only zaps the
 pages linked on invalid slot's rmap. The benchmark is written by myself
 which has been attached, it writes large memory when do pci rom read.
 
 Host: Intel(R) Xeon(R) CPU X5690  @ 3.47GHz + 36G Memory
 Guest: 12 VCPU + 32G Memory
 
 Current code:   This patchset Previous Version 
 2405434959 ns   2323016424 ns 2368810003 ns
 
 The interesting thing is, the previous version is slower than this patch,
 i guess the reason is that the former keeps lots of invalid pages in mmu
 which cause shadow page to be reclaimed due to used-pages  request-pages
 or host memory shrink.

This patch series looks very nice!

Minor issues may still need to be improved, but I really hope to see this
get merged during this cycle.

[for the future]  Do you think that postponing some zapping/freeing of
obsolete(already invalidated) pages to make_mmu_pages_available() time
can improve the situation more?  -- say, for big guests.

If accounting kept correct, make_mmu_pages_available() only needs to free
some obsolete pages instead of valid pages.

Takuya

 
 Changlog:
 V5:
   1): rename is_valid_sp to is_obsolete_sp
   2): use lock-break technique to zap all old pages instead of only pages
   linked on invalid slot's rmap suggested by Marcelo.
   3): trace invalid pages and kvm_mmu_invalidate_memslot_pages() 
   4): rename kvm_mmu_invalid_memslot_pages to kvm_mmu_invalidate_memslot_pages
   according to Takuya's comments. 
 
 V4:
   1): drop unmapping invalid rmap out of mmu-lock and use lock-break technique
   instead. Thanks to Gleb's comments.
 
   2): needn't handle invalid-gen pages specially due to page table always
   switched by KVM_REQ_MMU_RELOAD. Thanks to Marcelo's comments.
 
 V3:
   completely redesign the algorithm, please see below.
 
 V2:
   - do not reset n_requested_mmu_pages and n_max_mmu_pages
   - batch free root shadow pages to reduce vcpu notification and mmu-lock
 contention
   - remove the first patch that introduce kvm-arch.mmu_cache since we only
 'memset zero' on hashtable rather than all mmu cache members in this
 version
   - remove unnecessary kvm_reload_remote_mmus after kvm_mmu_zap_all
 
 * Issue
 The current kvm_mmu_zap_all is really slow - it is holding mmu-lock to
 walk and zap all shadow pages one by one, also it need to zap all guest
 page's rmap and all shadow page's parent spte list. Particularly, things
 become worse if guest uses more memory or vcpus. It is not good for
 scalability.
 
 * Idea
 KVM maintains a global mmu invalid generation-number which is stored in
 kvm-arch.mmu_valid_gen and every shadow page stores the current global
 generation-number into sp-mmu_valid_gen when it is created.
 
 When KVM need zap all shadow pages sptes, it just simply increase the
 global generation-number then reload root shadow pages on all vcpus.
 Vcpu will create a new shadow page table according to current kvm's
 generation-number. It ensures the old pages are not used any more.
 
 Then the invalid-gen pages (sp-mmu_valid_gen != kvm-arch.mmu_valid_gen)
 are zapped by using lock-break technique.
 
 Xiao Guangrong (8):
   KVM: MMU: drop unnecessary kvm_reload_remote_mmus
   KVM: MMU: delete shadow page from hash list in
 kvm_mmu_prepare_zap_page
   KVM: MMU: fast invalidate all pages
   KVM: x86: use the fast way to invalidate all pages
   KVM: MMU: make kvm_mmu_zap_all preemptable
   KVM: MMU: show mmu_valid_gen in shadow page related tracepoints
   KVM: MMU: add tracepoint for kvm_mmu_invalidate_memslot_pages
   KVM: MMU: zap pages in batch
 
  arch/x86/include/asm/kvm_host.h |2 +
  arch/x86/kvm/mmu.c  |  124 
 ++-
  arch/x86/kvm/mmu.h  |2 +
  arch/x86/kvm/mmutrace.h |   45 +++---
  arch/x86/kvm/x86.c  |9 +--
  5 files changed, 163 insertions(+), 19 deletions(-)
 
 -- 
 1.7.7.6
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Takuya Yoshikawa takuya.yoshik...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: MMU: Consolidate common code in mmu_free_roots()

2013-05-13 Thread Takuya Yoshikawa
On Mon, 13 May 2013 21:02:10 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:

 On 05/13/2013 07:24 PM, Gleb Natapov wrote:

  I agree that this is mostly code style issue and with Takuya patch the
  indentation is dipper. Also the structure of mmu_free_roots() resembles
  mmu_alloc_shadow_roots() currently. The latter has the same if(){return}
  pattern. I also agree with Takuya that locking symmetry can be improved.
  What about something like this (untested):
 
 It is good to me! ;)
 

I agree with you two!  Thank you for reviewing.

Gleb, could you please make it your official patch?

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] KVM: MMU: Simple mmu cleanups

2013-05-09 Thread Takuya Yoshikawa
Found during documenting mmu_lock usage for myself.

Takuya Yoshikawa (3):
  KVM: MMU: Clean up set_spte()'s ACC_WRITE_MASK handling
  KVM: MMU: Use kvm_mmu_sync_roots() in kvm_mmu_load()
  KVM: MMU: Consolidate common code in mmu_free_roots()

 arch/x86/kvm/mmu.c |   48 +---
 1 files changed, 21 insertions(+), 27 deletions(-)

-- 
1.7.5.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   8   9   10   >