Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Mon, Oct 13, 2014 at 05:52:38AM -0300, Marcelo Tosatti wrote: On Fri, Oct 10, 2014 at 04:09:29PM +0300, Gleb Natapov wrote: On Wed, Oct 08, 2014 at 04:22:31PM -0300, Marcelo Tosatti wrote: Argh, lets try again: skip_pinned = true -- mark page dirty, keep spte intact called from get dirty log path. skip_pinned = false --- reload remote mmu destroy pinned spte. called from: dirty log enablement, rmap write protect (unused for pinned sptes) Note this behaviour is your suggestion: Yes, I remember that and I thought we will not need this skip_pinned at all. For rmap write protect case there shouldn't be any pinned pages, but why dirty log enablement sets skip_pinned to false? Why not mark pinned pages as dirty just like you do in get dirty log path? Because if its a large spte, it must be nuked (or marked read-only, which for pinned sptes, is not possible). If a large page has one small page pinned inside it its spte will be marked as pinned, correct? Correct. We did nuke large ptes here until very recently: c126d94f2c90ed9d, but we cannot drop a pte here anyway without kicking all vcpu from a guest mode, but do you need additional skip_pinned parameter? Why not check if spte is large instead? Nuke only if large spte is found? Can do that, instead. So why not have per slot pinned page list (Xiao suggested the same) and do: The interface is per-vcpu (that is registration of pinned pages is performed on a per-vcpu basis). PEBS is per cpu, but it does not mean that pinning should be per cpu, it can be done globally with ref counting. spte_write_protect() { if (is_pinned(spte) { if (large(spte)) // cannot drop while vcpu are running mmu_reload_pinned_vcpus(); else return false; } get_dirty_log() { for_each(pinned pages i) makr_dirty(i); } That is effectively the same this patchset does, except that the spte pinned bit is checked at spte_write_protect, instead of looping over page pinned list. Fail to see huge advantage there. I think spte_write_protect is a strange place to mark pages dirty, but otherwise yes the effect is the same, so definitely not a huge difference. If global pinned list is a PITA in your opinion leave it as is. I'll drop the skip_pinned parameter and use is_large_pte check instead. Thanks, -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Fri, Oct 10, 2014 at 04:09:29PM +0300, Gleb Natapov wrote: On Wed, Oct 08, 2014 at 04:22:31PM -0300, Marcelo Tosatti wrote: Argh, lets try again: skip_pinned = true -- mark page dirty, keep spte intact called from get dirty log path. skip_pinned = false --- reload remote mmu destroy pinned spte. called from: dirty log enablement, rmap write protect (unused for pinned sptes) Note this behaviour is your suggestion: Yes, I remember that and I thought we will not need this skip_pinned at all. For rmap write protect case there shouldn't be any pinned pages, but why dirty log enablement sets skip_pinned to false? Why not mark pinned pages as dirty just like you do in get dirty log path? Because if its a large spte, it must be nuked (or marked read-only, which for pinned sptes, is not possible). If a large page has one small page pinned inside it its spte will be marked as pinned, correct? Correct. We did nuke large ptes here until very recently: c126d94f2c90ed9d, but we cannot drop a pte here anyway without kicking all vcpu from a guest mode, but do you need additional skip_pinned parameter? Why not check if spte is large instead? Nuke only if large spte is found? Can do that, instead. So why not have per slot pinned page list (Xiao suggested the same) and do: The interface is per-vcpu (that is registration of pinned pages is performed on a per-vcpu basis). spte_write_protect() { if (is_pinned(spte) { if (large(spte)) // cannot drop while vcpu are running mmu_reload_pinned_vcpus(); else return false; } get_dirty_log() { for_each(pinned pages i) makr_dirty(i); } That is effectively the same this patchset does, except that the spte pinned bit is checked at spte_write_protect, instead of looping over page pinned list. Fail to see huge advantage there. I'll drop the skip_pinned parameter and use is_large_pte check instead. Thanks -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Wed, Oct 08, 2014 at 04:22:31PM -0300, Marcelo Tosatti wrote: Argh, lets try again: skip_pinned = true -- mark page dirty, keep spte intact called from get dirty log path. skip_pinned = false --- reload remote mmu destroy pinned spte. called from: dirty log enablement, rmap write protect (unused for pinned sptes) Note this behaviour is your suggestion: Yes, I remember that and I thought we will not need this skip_pinned at all. For rmap write protect case there shouldn't be any pinned pages, but why dirty log enablement sets skip_pinned to false? Why not mark pinned pages as dirty just like you do in get dirty log path? Because if its a large spte, it must be nuked (or marked read-only, which for pinned sptes, is not possible). If a large page has one small page pinned inside it its spte will be marked as pinned, correct? We did nuke large ptes here until very recently: c126d94f2c90ed9d, but we cannot drop a pte here anyway without kicking all vcpu from a guest mode, but do you need additional skip_pinned parameter? Why not check if spte is large instead? So why not have per slot pinned page list (Xiao suggested the same) and do: spte_write_protect() { if (is_pinned(spte) { if (large(spte)) // cannot drop while vcpu are running mmu_reload_pinned_vcpus(); else return false; } get_dirty_log() { for_each(pinned pages i) makr_dirty(i); } -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Mon, Oct 06, 2014 at 02:19:32PM -0300, Marcelo Tosatti wrote: On Sat, Oct 04, 2014 at 10:23:32AM +0300, Gleb Natapov wrote: On Tue, Sep 09, 2014 at 12:28:11PM -0300, Marcelo Tosatti wrote: On Mon, Jul 21, 2014 at 04:14:24PM +0300, Gleb Natapov wrote: On Wed, Jul 09, 2014 at 04:12:53PM -0300, mtosa...@redhat.com wrote: Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before deleting a pinned spte. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/mmu.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c === --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c 2014-07-09 11:23:59.290744490 -0300 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 -0300 @@ -1208,7 +1208,8 @@ * * Return true if tlb need be flushed. */ -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect, +bool skip_pinned) { u64 spte = *sptep; @@ -1218,6 +1219,22 @@ rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep); + if (is_pinned_spte(spte)) { + /* keep pinned spte intact, mark page dirty again */ + if (skip_pinned) { + struct kvm_mmu_page *sp; + gfn_t gfn; + + sp = page_header(__pa(sptep)); + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt); + + mark_page_dirty(kvm, gfn); + return false; Why not mark all pinned gfns as dirty in kvm_vm_ioctl_get_dirty_log() while populating dirty_bitmap_buffer? The pinned gfns are per-vcpu. Requires looping all vcpus (not scalable). OK, but do they really have to be per-cpu? What's the advantage? The guest configures pinning on a per-cpu basis (that is, enabling PEBS is done on a per-cpu basis). Is it a problem to maintain global pinned pages list for each memslot too? + } else + mmu_reload_pinned_vcpus(kvm); Can you explain why do you need this? Because if skip_pinned = false, we want vcpus to exit (called from enable dirty logging codepath). I guess what I wanted to ask is why do we need skip_pinned? As far as I see it is set to false in two cases: 1: page is write protected for shadow MMU needs, should not happen since the feature Correct. is not supported with shadow mmu (can happen with nested EPT, but page will be marked is accessed during next vcpu entry anyway, so how will it work)? PEBS is not supported on nested EPT. OK, so for this case we do not need skip_pinned. Assert if it happens. 2: when slot is marked as read only: such slot cannot have PEBS pages and if it will guest will die anyway during next guest entry, so why not maintain list of pinned pages per slot and kill aguest if slot with pinned pages is marked read only. 2: when slots pages have dirty logging enabled. In that case, the page is marked dirty immediately. This is the call from kvm_mmu_slot_remove_write_access. Right, my 2 is incorrect. So when enabling dirty logging, for pinned sptes: - maintain pinned spte intact. - mark gfn for which pinned spte represents as dirty in the dirty log. But you set skip_pinned to false in kvm_mmu_slot_remove_write_access(), so this is not what is happening. Did you mean to set it to true there? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Wed, Oct 08, 2014 at 09:56:36AM +0300, Gleb Natapov wrote: On Mon, Oct 06, 2014 at 02:19:32PM -0300, Marcelo Tosatti wrote: On Sat, Oct 04, 2014 at 10:23:32AM +0300, Gleb Natapov wrote: On Tue, Sep 09, 2014 at 12:28:11PM -0300, Marcelo Tosatti wrote: On Mon, Jul 21, 2014 at 04:14:24PM +0300, Gleb Natapov wrote: On Wed, Jul 09, 2014 at 04:12:53PM -0300, mtosa...@redhat.com wrote: Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before deleting a pinned spte. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/mmu.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c === --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c2014-07-09 11:23:59.290744490 -0300 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 -0300 @@ -1208,7 +1208,8 @@ * * Return true if tlb need be flushed. */ -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect, + bool skip_pinned) { u64 spte = *sptep; @@ -1218,6 +1219,22 @@ rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep); + if (is_pinned_spte(spte)) { + /* keep pinned spte intact, mark page dirty again */ + if (skip_pinned) { + struct kvm_mmu_page *sp; + gfn_t gfn; + + sp = page_header(__pa(sptep)); + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt); + + mark_page_dirty(kvm, gfn); + return false; Why not mark all pinned gfns as dirty in kvm_vm_ioctl_get_dirty_log() while populating dirty_bitmap_buffer? The pinned gfns are per-vcpu. Requires looping all vcpus (not scalable). OK, but do they really have to be per-cpu? What's the advantage? The guest configures pinning on a per-cpu basis (that is, enabling PEBS is done on a per-cpu basis). Is it a problem to maintain global pinned pages list for each memslot too? + } else + mmu_reload_pinned_vcpus(kvm); Can you explain why do you need this? Because if skip_pinned = false, we want vcpus to exit (called from enable dirty logging codepath). I guess what I wanted to ask is why do we need skip_pinned? As far as I see it is set to false in two cases: 1: page is write protected for shadow MMU needs, should not happen since the feature Correct. is not supported with shadow mmu (can happen with nested EPT, but page will be marked is accessed during next vcpu entry anyway, so how will it work)? PEBS is not supported on nested EPT. OK, so for this case we do not need skip_pinned. Assert if it happens. 2: when slot is marked as read only: such slot cannot have PEBS pages and if it will guest will die anyway during next guest entry, so why not maintain list of pinned pages per slot and kill aguest if slot with pinned pages is marked read only. 2: when slots pages have dirty logging enabled. In that case, the page is marked dirty immediately. This is the call from kvm_mmu_slot_remove_write_access. Right, my 2 is incorrect. So when enabling dirty logging, for pinned sptes: - maintain pinned spte intact. - mark gfn for which pinned spte represents as dirty in the dirty log. But you set skip_pinned to false in kvm_mmu_slot_remove_write_access(), so this is not what is happening. Did you mean to set it to true there? Argh, lets try again: skip_pinned = true -- mark page dirty, keep spte intact called from get dirty log path. skip_pinned = false --- reload remote mmu destroy pinned spte. called from: dirty log enablement, rmap write protect (unused for pinned sptes) Note this behaviour is your suggestion: Why write protect it at all? mmu_reload_pinned_vcpus() will unprotected it anyway on the next vmentry. Isn't it better to just report all pinned pages as dirty alway. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Wed, Oct 08, 2014 at 02:15:34PM -0300, Marcelo Tosatti wrote: On Wed, Oct 08, 2014 at 09:56:36AM +0300, Gleb Natapov wrote: On Mon, Oct 06, 2014 at 02:19:32PM -0300, Marcelo Tosatti wrote: On Sat, Oct 04, 2014 at 10:23:32AM +0300, Gleb Natapov wrote: On Tue, Sep 09, 2014 at 12:28:11PM -0300, Marcelo Tosatti wrote: On Mon, Jul 21, 2014 at 04:14:24PM +0300, Gleb Natapov wrote: On Wed, Jul 09, 2014 at 04:12:53PM -0300, mtosa...@redhat.com wrote: Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before deleting a pinned spte. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/mmu.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c === --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c 2014-07-09 11:23:59.290744490 -0300 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 -0300 @@ -1208,7 +1208,8 @@ * * Return true if tlb need be flushed. */ -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect, +bool skip_pinned) { u64 spte = *sptep; @@ -1218,6 +1219,22 @@ rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep); + if (is_pinned_spte(spte)) { + /* keep pinned spte intact, mark page dirty again */ + if (skip_pinned) { + struct kvm_mmu_page *sp; + gfn_t gfn; + + sp = page_header(__pa(sptep)); + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt); + + mark_page_dirty(kvm, gfn); + return false; Why not mark all pinned gfns as dirty in kvm_vm_ioctl_get_dirty_log() while populating dirty_bitmap_buffer? The pinned gfns are per-vcpu. Requires looping all vcpus (not scalable). OK, but do they really have to be per-cpu? What's the advantage? The guest configures pinning on a per-cpu basis (that is, enabling PEBS is done on a per-cpu basis). Is it a problem to maintain global pinned pages list for each memslot too? + } else + mmu_reload_pinned_vcpus(kvm); Can you explain why do you need this? Because if skip_pinned = false, we want vcpus to exit (called from enable dirty logging codepath). I guess what I wanted to ask is why do we need skip_pinned? As far as I see it is set to false in two cases: 1: page is write protected for shadow MMU needs, should not happen since the feature Correct. is not supported with shadow mmu (can happen with nested EPT, but page will be marked is accessed during next vcpu entry anyway, so how will it work)? PEBS is not supported on nested EPT. OK, so for this case we do not need skip_pinned. Assert if it happens. 2: when slot is marked as read only: such slot cannot have PEBS pages and if it will guest will die anyway during next guest entry, so why not maintain list of pinned pages per slot and kill aguest if slot with pinned pages is marked read only. 2: when slots pages have dirty logging enabled. In that case, the page is marked dirty immediately. This is the call from kvm_mmu_slot_remove_write_access. Right, my 2 is incorrect. So when enabling dirty logging, for pinned sptes: - maintain pinned spte intact. - mark gfn for which pinned spte represents as dirty in the dirty log. But you set skip_pinned to false in kvm_mmu_slot_remove_write_access(), so this is not what is happening. Did you mean to set it to true there? Argh, lets try again: skip_pinned = true -- mark page dirty, keep spte intact called from get dirty log path. skip_pinned = false --- reload remote mmu destroy pinned spte. called from: dirty log enablement, rmap write protect (unused for pinned sptes) Note this behaviour is your suggestion: Yes, I remember that and I thought we will not need this skip_pinned at all. For rmap write protect case there shouldn't be any pinned pages, but why dirty log enablement sets skip_pinned to false? Why not mark pinned pages as dirty just like you do in get dirty log path? Why write protect it at all? mmu_reload_pinned_vcpus() will unprotected it anyway on the next vmentry. Isn't it better to just report all pinned pages as dirty alway. --
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
Argh, lets try again: skip_pinned = true -- mark page dirty, keep spte intact called from get dirty log path. skip_pinned = false --- reload remote mmu destroy pinned spte. called from: dirty log enablement, rmap write protect (unused for pinned sptes) Note this behaviour is your suggestion: Yes, I remember that and I thought we will not need this skip_pinned at all. For rmap write protect case there shouldn't be any pinned pages, but why dirty log enablement sets skip_pinned to false? Why not mark pinned pages as dirty just like you do in get dirty log path? Because if its a large spte, it must be nuked (or marked read-only, which for pinned sptes, is not possible). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Sat, Oct 04, 2014 at 10:23:32AM +0300, Gleb Natapov wrote: On Tue, Sep 09, 2014 at 12:28:11PM -0300, Marcelo Tosatti wrote: On Mon, Jul 21, 2014 at 04:14:24PM +0300, Gleb Natapov wrote: On Wed, Jul 09, 2014 at 04:12:53PM -0300, mtosa...@redhat.com wrote: Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before deleting a pinned spte. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/mmu.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c === --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c2014-07-09 11:23:59.290744490 -0300 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 -0300 @@ -1208,7 +1208,8 @@ * * Return true if tlb need be flushed. */ -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect, + bool skip_pinned) { u64 spte = *sptep; @@ -1218,6 +1219,22 @@ rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep); + if (is_pinned_spte(spte)) { + /* keep pinned spte intact, mark page dirty again */ + if (skip_pinned) { + struct kvm_mmu_page *sp; + gfn_t gfn; + + sp = page_header(__pa(sptep)); + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt); + + mark_page_dirty(kvm, gfn); + return false; Why not mark all pinned gfns as dirty in kvm_vm_ioctl_get_dirty_log() while populating dirty_bitmap_buffer? The pinned gfns are per-vcpu. Requires looping all vcpus (not scalable). OK, but do they really have to be per-cpu? What's the advantage? The guest configures pinning on a per-cpu basis (that is, enabling PEBS is done on a per-cpu basis). + } else + mmu_reload_pinned_vcpus(kvm); Can you explain why do you need this? Because if skip_pinned = false, we want vcpus to exit (called from enable dirty logging codepath). I guess what I wanted to ask is why do we need skip_pinned? As far as I see it is set to false in two cases: 1: page is write protected for shadow MMU needs, should not happen since the feature Correct. is not supported with shadow mmu (can happen with nested EPT, but page will be marked is accessed during next vcpu entry anyway, so how will it work)? PEBS is not supported on nested EPT. 2: when slot is marked as read only: such slot cannot have PEBS pages and if it will guest will die anyway during next guest entry, so why not maintain list of pinned pages per slot and kill aguest if slot with pinned pages is marked read only. 2: when slots pages have dirty logging enabled. In that case, the page is marked dirty immediately. This is the call from kvm_mmu_slot_remove_write_access. So when enabling dirty logging, for pinned sptes: - maintain pinned spte intact. - mark gfn for which pinned spte represents as dirty in the dirty log. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Tue, Sep 09, 2014 at 12:28:11PM -0300, Marcelo Tosatti wrote: On Mon, Jul 21, 2014 at 04:14:24PM +0300, Gleb Natapov wrote: On Wed, Jul 09, 2014 at 04:12:53PM -0300, mtosa...@redhat.com wrote: Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before deleting a pinned spte. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/mmu.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c === --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c 2014-07-09 11:23:59.290744490 -0300 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 -0300 @@ -1208,7 +1208,8 @@ * * Return true if tlb need be flushed. */ -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect, +bool skip_pinned) { u64 spte = *sptep; @@ -1218,6 +1219,22 @@ rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep); + if (is_pinned_spte(spte)) { + /* keep pinned spte intact, mark page dirty again */ + if (skip_pinned) { + struct kvm_mmu_page *sp; + gfn_t gfn; + + sp = page_header(__pa(sptep)); + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt); + + mark_page_dirty(kvm, gfn); + return false; Why not mark all pinned gfns as dirty in kvm_vm_ioctl_get_dirty_log() while populating dirty_bitmap_buffer? The pinned gfns are per-vcpu. Requires looping all vcpus (not scalable). OK, but do they really have to be per-cpu? What's the advantage? + } else + mmu_reload_pinned_vcpus(kvm); Can you explain why do you need this? Because if skip_pinned = false, we want vcpus to exit (called from enable dirty logging codepath). I guess what I wanted to ask is why do we need skip_pinned? As far as I see it is set to false in two cases: 1: page is write protected for shadow MMU needs, should not happen since the feature is not supported with shadow mmu (can happen with nested EPT, but page will be marked is accessed during next vcpu entry anyway, so how will it work)? 2: when slot is marked as read only: such slot cannot have PEBS pages and if it will guest will die anyway during next guest entry, so why not maintain list of pinned pages per slot and kill aguest if slot with pinned pages is marked read only. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
Ping? On Mon, Sep 22, 2014 at 02:19:14PM -0300, Marcelo Tosatti wrote: On Tue, Sep 09, 2014 at 12:28:11PM -0300, Marcelo Tosatti wrote: On Mon, Jul 21, 2014 at 04:14:24PM +0300, Gleb Natapov wrote: On Wed, Jul 09, 2014 at 04:12:53PM -0300, mtosa...@redhat.com wrote: Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before deleting a pinned spte. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/mmu.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c === --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c2014-07-09 11:23:59.290744490 -0300 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 -0300 @@ -1208,7 +1208,8 @@ * * Return true if tlb need be flushed. */ -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect, + bool skip_pinned) { u64 spte = *sptep; @@ -1218,6 +1219,22 @@ rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep); + if (is_pinned_spte(spte)) { + /* keep pinned spte intact, mark page dirty again */ + if (skip_pinned) { + struct kvm_mmu_page *sp; + gfn_t gfn; + + sp = page_header(__pa(sptep)); + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt); + + mark_page_dirty(kvm, gfn); + return false; Why not mark all pinned gfns as dirty in kvm_vm_ioctl_get_dirty_log() while populating dirty_bitmap_buffer? The pinned gfns are per-vcpu. Requires looping all vcpus (not scalable). + } else + mmu_reload_pinned_vcpus(kvm); Can you explain why do you need this? Because if skip_pinned = false, we want vcpus to exit (called from enable dirty logging codepath). Gleb, any further opinions on this ? Can you ack the patch ? TIA -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Tue, Sep 09, 2014 at 12:28:11PM -0300, Marcelo Tosatti wrote: On Mon, Jul 21, 2014 at 04:14:24PM +0300, Gleb Natapov wrote: On Wed, Jul 09, 2014 at 04:12:53PM -0300, mtosa...@redhat.com wrote: Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before deleting a pinned spte. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/mmu.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c === --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c 2014-07-09 11:23:59.290744490 -0300 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 -0300 @@ -1208,7 +1208,8 @@ * * Return true if tlb need be flushed. */ -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect, +bool skip_pinned) { u64 spte = *sptep; @@ -1218,6 +1219,22 @@ rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep); + if (is_pinned_spte(spte)) { + /* keep pinned spte intact, mark page dirty again */ + if (skip_pinned) { + struct kvm_mmu_page *sp; + gfn_t gfn; + + sp = page_header(__pa(sptep)); + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt); + + mark_page_dirty(kvm, gfn); + return false; Why not mark all pinned gfns as dirty in kvm_vm_ioctl_get_dirty_log() while populating dirty_bitmap_buffer? The pinned gfns are per-vcpu. Requires looping all vcpus (not scalable). + } else + mmu_reload_pinned_vcpus(kvm); Can you explain why do you need this? Because if skip_pinned = false, we want vcpus to exit (called from enable dirty logging codepath). Gleb, any further opinions on this ? Can you ack the patch ? TIA -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Mon, Jul 21, 2014 at 04:14:24PM +0300, Gleb Natapov wrote: On Wed, Jul 09, 2014 at 04:12:53PM -0300, mtosa...@redhat.com wrote: Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before deleting a pinned spte. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/mmu.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c === --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c2014-07-09 11:23:59.290744490 -0300 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 -0300 @@ -1208,7 +1208,8 @@ * * Return true if tlb need be flushed. */ -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect, + bool skip_pinned) { u64 spte = *sptep; @@ -1218,6 +1219,22 @@ rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep); + if (is_pinned_spte(spte)) { + /* keep pinned spte intact, mark page dirty again */ + if (skip_pinned) { + struct kvm_mmu_page *sp; + gfn_t gfn; + + sp = page_header(__pa(sptep)); + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt); + + mark_page_dirty(kvm, gfn); + return false; Why not mark all pinned gfns as dirty in kvm_vm_ioctl_get_dirty_log() while populating dirty_bitmap_buffer? The pinned gfns are per-vcpu. Requires looping all vcpus (not scalable). + } else + mmu_reload_pinned_vcpus(kvm); Can you explain why do you need this? Because if skip_pinned = false, we want vcpus to exit (called from enable dirty logging codepath). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Tue, Jul 22, 2014 at 05:55:20AM +0800, Xiao Guangrong wrote: On Jul 10, 2014, at 3:12 AM, mtosa...@redhat.com wrote: Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before deleting a pinned spte. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/mmu.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c === --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c2014-07-09 11:23:59.290744490 -0300 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 -0300 @@ -1208,7 +1208,8 @@ * * Return true if tlb need be flushed. */ -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect, + bool skip_pinned) { u64 spte = *sptep; @@ -1218,6 +1219,22 @@ rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep); + if (is_pinned_spte(spte)) { + /* keep pinned spte intact, mark page dirty again */ + if (skip_pinned) { + struct kvm_mmu_page *sp; + gfn_t gfn; + + sp = page_header(__pa(sptep)); + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt); + + mark_page_dirty(kvm, gfn); + return false; + } else + mmu_reload_pinned_vcpus(kvm); + } + + if (pt_protect) spte = ~SPTE_MMU_WRITEABLE; spte = spte ~PT_WRITABLE_MASK; This is also a window between marking spte readonly and re-ping… IIUC, I think all spte spte can not be zapped and write-protected at any time It is safe because mmu_lock is held by kvm_mmu_slot_remove_write_access ? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Wed, Jul 09, 2014 at 04:12:53PM -0300, mtosa...@redhat.com wrote: Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before deleting a pinned spte. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/mmu.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c === --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c 2014-07-09 11:23:59.290744490 -0300 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 -0300 @@ -1208,7 +1208,8 @@ * * Return true if tlb need be flushed. */ -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect, +bool skip_pinned) { u64 spte = *sptep; @@ -1218,6 +1219,22 @@ rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep); + if (is_pinned_spte(spte)) { + /* keep pinned spte intact, mark page dirty again */ + if (skip_pinned) { + struct kvm_mmu_page *sp; + gfn_t gfn; + + sp = page_header(__pa(sptep)); + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt); + + mark_page_dirty(kvm, gfn); + return false; Why not mark all pinned gfns as dirty in kvm_vm_ioctl_get_dirty_log() while populating dirty_bitmap_buffer? + } else + mmu_reload_pinned_vcpus(kvm); Can you explain why do you need this? + } + + if (pt_protect) spte = ~SPTE_MMU_WRITEABLE; spte = spte ~PT_WRITABLE_MASK; @@ -1226,7 +1243,7 @@ } static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, - bool pt_protect) + bool pt_protect, bool skip_pinned) { u64 *sptep; struct rmap_iterator iter; @@ -1235,7 +1252,7 @@ for (sptep = rmap_get_first(*rmapp, iter); sptep;) { BUG_ON(!(*sptep PT_PRESENT_MASK)); - flush |= spte_write_protect(kvm, sptep, pt_protect); + flush |= spte_write_protect(kvm, sptep, pt_protect, skip_pinned); sptep = rmap_get_next(iter); } @@ -1261,7 +1278,7 @@ while (mask) { rmapp = __gfn_to_rmap(slot-base_gfn + gfn_offset + __ffs(mask), PT_PAGE_TABLE_LEVEL, slot); - __rmap_write_protect(kvm, rmapp, false); + __rmap_write_protect(kvm, rmapp, false, true); /* clear the first set bit */ mask = mask - 1; @@ -1280,7 +1297,7 @@ for (i = PT_PAGE_TABLE_LEVEL; i PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { rmapp = __gfn_to_rmap(gfn, i, slot); - write_protected |= __rmap_write_protect(kvm, rmapp, true); + write_protected |= __rmap_write_protect(kvm, rmapp, true, false); } return write_protected; @@ -4565,7 +4582,7 @@ for (index = 0; index = last_index; ++index, ++rmapp) { if (*rmapp) - __rmap_write_protect(kvm, rmapp, false); + __rmap_write_protect(kvm, rmapp, false, false); if (need_resched() || spin_needbreak(kvm-mmu_lock)) cond_resched_lock(kvm-mmu_lock); -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Jul 10, 2014, at 3:12 AM, mtosa...@redhat.com wrote: Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before deleting a pinned spte. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/mmu.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c === --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c 2014-07-09 11:23:59.290744490 -0300 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 -0300 @@ -1208,7 +1208,8 @@ * * Return true if tlb need be flushed. */ -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect, +bool skip_pinned) { u64 spte = *sptep; @@ -1218,6 +1219,22 @@ rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep); + if (is_pinned_spte(spte)) { + /* keep pinned spte intact, mark page dirty again */ + if (skip_pinned) { + struct kvm_mmu_page *sp; + gfn_t gfn; + + sp = page_header(__pa(sptep)); + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt); + + mark_page_dirty(kvm, gfn); + return false; + } else + mmu_reload_pinned_vcpus(kvm); + } + + if (pt_protect) spte = ~SPTE_MMU_WRITEABLE; spte = spte ~PT_WRITABLE_MASK; This is also a window between marking spte readonly and re-ping… IIUC, I think all spte spte can not be zapped and write-protected at any time -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before deleting a pinned spte. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/mmu.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c === --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c2014-07-09 11:23:59.290744490 -0300 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 -0300 @@ -1208,7 +1208,8 @@ * * Return true if tlb need be flushed. */ -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect, + bool skip_pinned) { u64 spte = *sptep; @@ -1218,6 +1219,22 @@ rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep); + if (is_pinned_spte(spte)) { + /* keep pinned spte intact, mark page dirty again */ + if (skip_pinned) { + struct kvm_mmu_page *sp; + gfn_t gfn; + + sp = page_header(__pa(sptep)); + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt); + + mark_page_dirty(kvm, gfn); + return false; + } else + mmu_reload_pinned_vcpus(kvm); + } + + if (pt_protect) spte = ~SPTE_MMU_WRITEABLE; spte = spte ~PT_WRITABLE_MASK; @@ -1226,7 +1243,7 @@ } static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, -bool pt_protect) +bool pt_protect, bool skip_pinned) { u64 *sptep; struct rmap_iterator iter; @@ -1235,7 +1252,7 @@ for (sptep = rmap_get_first(*rmapp, iter); sptep;) { BUG_ON(!(*sptep PT_PRESENT_MASK)); - flush |= spte_write_protect(kvm, sptep, pt_protect); + flush |= spte_write_protect(kvm, sptep, pt_protect, skip_pinned); sptep = rmap_get_next(iter); } @@ -1261,7 +1278,7 @@ while (mask) { rmapp = __gfn_to_rmap(slot-base_gfn + gfn_offset + __ffs(mask), PT_PAGE_TABLE_LEVEL, slot); - __rmap_write_protect(kvm, rmapp, false); + __rmap_write_protect(kvm, rmapp, false, true); /* clear the first set bit */ mask = mask - 1; @@ -1280,7 +1297,7 @@ for (i = PT_PAGE_TABLE_LEVEL; i PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { rmapp = __gfn_to_rmap(gfn, i, slot); - write_protected |= __rmap_write_protect(kvm, rmapp, true); + write_protected |= __rmap_write_protect(kvm, rmapp, true, false); } return write_protected; @@ -4565,7 +4582,7 @@ for (index = 0; index = last_index; ++index, ++rmapp) { if (*rmapp) - __rmap_write_protect(kvm, rmapp, false); + __rmap_write_protect(kvm, rmapp, false, false); if (need_resched() || spin_needbreak(kvm-mmu_lock)) cond_resched_lock(kvm-mmu_lock); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html