Re: [Qemu-devel] [PATCH 0/7] KVM: MMU: fast write protect
On 06/05/2017 03:36 PM, Jay Zhou wrote: /* enable ucontrol for s390 */ struct kvm_s390_ucas_mapping { diff --git a/memory.c b/memory.c index 4c95aaf..b836675 100644 --- a/memory.c +++ b/memory.c @@ -809,6 +809,13 @@ static void address_space_update_ioeventfds(AddressSpace *as) flatview_unref(view); } +static write_protect_all_fn write_func; I think there should be a declaration in memory.h, diff --git a/include/exec/memory.h b/include/exec/memory.h index 7fc3f48..31f3098 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1152,6 +1152,9 @@ void memory_global_dirty_log_start(void); */ void memory_global_dirty_log_stop(void); +typedef void (*write_protect_all_fn)(bool write); +void memory_register_write_protect_all(write_protect_all_fn func); + void mtree_info(fprintf_function mon_printf, void *f); Thanks for your suggestion, Jay! This code just demonstrates how to enable this feature in QEMU, i will carefully consider it and merger your suggestion when the formal patch is posted out.
Re: [Qemu-devel] [PATCH 0/7] KVM: MMU: fast write protect
On 06/05/2017 03:36 PM, Jay Zhou wrote: /* enable ucontrol for s390 */ struct kvm_s390_ucas_mapping { diff --git a/memory.c b/memory.c index 4c95aaf..b836675 100644 --- a/memory.c +++ b/memory.c @@ -809,6 +809,13 @@ static void address_space_update_ioeventfds(AddressSpace *as) flatview_unref(view); } +static write_protect_all_fn write_func; I think there should be a declaration in memory.h, diff --git a/include/exec/memory.h b/include/exec/memory.h index 7fc3f48..31f3098 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1152,6 +1152,9 @@ void memory_global_dirty_log_start(void); */ void memory_global_dirty_log_stop(void); +typedef void (*write_protect_all_fn)(bool write); +void memory_register_write_protect_all(write_protect_all_fn func); + void mtree_info(fprintf_function mon_printf, void *f); Thanks for your suggestion, Jay! This code just demonstrates how to enable this feature in QEMU, i will carefully consider it and merger your suggestion when the formal patch is posted out.
Re: [Qemu-devel] [PATCH 0/7] KVM: MMU: fast write protect
On 2017/5/3 18:52, guangrong.x...@gmail.com wrote: From: Xiao GuangrongBackground == The original idea of this patchset is from Avi who raised it in the mailing list during my vMMU development some years ago This patchset introduces a extremely fast way to write protect all the guest memory. Comparing with the ordinary algorithm which write protects last level sptes based on the rmap one by one, it just simply updates the generation number to ask all vCPUs to reload its root page table, particularly, it can be done out of mmu-lock, so that it does not hurt vMMU's parallel. It is the O(1) algorithm which does not depends on the capacity of guest's memory and the number of guest's vCPUs Implementation == When write protect for all guest memory is required, we update the global generation number and ask vCPUs to reload its root page table by calling kvm_reload_remote_mmus(), the global number is protected by slots_lock During reloading its root page table, the vCPU checks root page table's generation number with current global number, if it is not matched, it makes all the entries in the shadow page readonly and directly go to VM. So the read access is still going on smoothly without KVM's involvement and write access triggers page fault If the page fault is triggered by write operation, KVM moves the write protection from the upper level to the lower level page - by making all the entries in the lower page readonly first then make the upper level writable, this operation is repeated until we meet the last spte In order to speed up the process of making all entries readonly, we introduce possible_writable_spte_bitmap which indicates the writable sptes and possiable_writable_sptes which is a counter indicating the number of writable sptes in the shadow page, they work very efficiently as usually only one entry in PML4 ( < 512 G),few entries in PDPT (one entry indicates 1G memory), PDEs and PTEs need to be write protected for the worst case. Note, the number of page fault and TLB flush are the same as the ordinary algorithm Performance Data Case 1) For a VM which has 3G memory and 12 vCPUs, we noticed that: a: the time required for dirty log (ns) before after 64289121 137654 +46603% b: the performance of memory write after dirty log, i.e, the dirty log path is not parallel with page fault, the time required to write all 3G memory for all vCPUs in the VM (ns): before after 281735017291150923 -3% We think the impact, 3%, is acceptable, particularly, mmu-lock contention is not take into account in this case Case 2) For a VM which has 30G memory and 8 vCPUs, we do the live migration, at the some time, a test case which greedily and repeatedly writes 3000M memory in the VM. 2.1) for the new booted VM, i.e, page fault is required to map guest memory in, we noticed that: a: the dirty page rate (pages): before after 333092 497266 +49% that means, the performance for the being migrated VM is hugely improved as the contention on mmu-lock is reduced b: the time to complete live migration (ms): before after 1253218467 -47% not surprise, the time required to complete live migration is increased as the VM is able to generate more dirty pages 2.2) pre-write the VM first, then run the test case and do live migration, i.e, no much page faults are needed to map guest memory in, we noticed that: a: the dirty page rate (pages): before after 447435 449284 +0% b: time time to complete live migration (ms) before after 3106828310 +10% under this case, we also noticed that the time of dirty log for the first time, before the patchset is 156 ms, after that, only 6 ms is needed The patch applied to QEMU = The draft patch is attached to enable this functionality in QEMU: diff --git a/kvm-all.c b/kvm-all.c index 90b8573..9ebe1ac 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -122,6 +122,7 @@ bool kvm_direct_msi_allowed; bool kvm_ioeventfd_any_length_allowed; bool kvm_msi_use_devid; static bool kvm_immediate_exit; +static bool kvm_write_protect_all; static const KVMCapabilityInfo kvm_required_capabilites[] = { KVM_CAP_INFO(USER_MEMORY), @@ -440,6 +441,26 @@ static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section, #define ALIGN(x, y) (((x)+(y)-1) & ~((y)-1)) +static bool kvm_write_protect_all_is_supported(KVMState *s) +{ + return kvm_check_extension(s, KVM_CAP_X86_WRITE_PROTECT_ALL_MEM) && + kvm_check_extension(s, KVM_CAP_X86_DIRTY_LOG_WITHOUT_WRITE_PROTECT); +} + +static void
Re: [Qemu-devel] [PATCH 0/7] KVM: MMU: fast write protect
On 2017/5/3 18:52, guangrong.x...@gmail.com wrote: From: Xiao Guangrong Background == The original idea of this patchset is from Avi who raised it in the mailing list during my vMMU development some years ago This patchset introduces a extremely fast way to write protect all the guest memory. Comparing with the ordinary algorithm which write protects last level sptes based on the rmap one by one, it just simply updates the generation number to ask all vCPUs to reload its root page table, particularly, it can be done out of mmu-lock, so that it does not hurt vMMU's parallel. It is the O(1) algorithm which does not depends on the capacity of guest's memory and the number of guest's vCPUs Implementation == When write protect for all guest memory is required, we update the global generation number and ask vCPUs to reload its root page table by calling kvm_reload_remote_mmus(), the global number is protected by slots_lock During reloading its root page table, the vCPU checks root page table's generation number with current global number, if it is not matched, it makes all the entries in the shadow page readonly and directly go to VM. So the read access is still going on smoothly without KVM's involvement and write access triggers page fault If the page fault is triggered by write operation, KVM moves the write protection from the upper level to the lower level page - by making all the entries in the lower page readonly first then make the upper level writable, this operation is repeated until we meet the last spte In order to speed up the process of making all entries readonly, we introduce possible_writable_spte_bitmap which indicates the writable sptes and possiable_writable_sptes which is a counter indicating the number of writable sptes in the shadow page, they work very efficiently as usually only one entry in PML4 ( < 512 G),few entries in PDPT (one entry indicates 1G memory), PDEs and PTEs need to be write protected for the worst case. Note, the number of page fault and TLB flush are the same as the ordinary algorithm Performance Data Case 1) For a VM which has 3G memory and 12 vCPUs, we noticed that: a: the time required for dirty log (ns) before after 64289121 137654 +46603% b: the performance of memory write after dirty log, i.e, the dirty log path is not parallel with page fault, the time required to write all 3G memory for all vCPUs in the VM (ns): before after 281735017291150923 -3% We think the impact, 3%, is acceptable, particularly, mmu-lock contention is not take into account in this case Case 2) For a VM which has 30G memory and 8 vCPUs, we do the live migration, at the some time, a test case which greedily and repeatedly writes 3000M memory in the VM. 2.1) for the new booted VM, i.e, page fault is required to map guest memory in, we noticed that: a: the dirty page rate (pages): before after 333092 497266 +49% that means, the performance for the being migrated VM is hugely improved as the contention on mmu-lock is reduced b: the time to complete live migration (ms): before after 1253218467 -47% not surprise, the time required to complete live migration is increased as the VM is able to generate more dirty pages 2.2) pre-write the VM first, then run the test case and do live migration, i.e, no much page faults are needed to map guest memory in, we noticed that: a: the dirty page rate (pages): before after 447435 449284 +0% b: time time to complete live migration (ms) before after 3106828310 +10% under this case, we also noticed that the time of dirty log for the first time, before the patchset is 156 ms, after that, only 6 ms is needed The patch applied to QEMU = The draft patch is attached to enable this functionality in QEMU: diff --git a/kvm-all.c b/kvm-all.c index 90b8573..9ebe1ac 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -122,6 +122,7 @@ bool kvm_direct_msi_allowed; bool kvm_ioeventfd_any_length_allowed; bool kvm_msi_use_devid; static bool kvm_immediate_exit; +static bool kvm_write_protect_all; static const KVMCapabilityInfo kvm_required_capabilites[] = { KVM_CAP_INFO(USER_MEMORY), @@ -440,6 +441,26 @@ static int kvm_get_dirty_pages_log_range(MemoryRegionSection *section, #define ALIGN(x, y) (((x)+(y)-1) & ~((y)-1)) +static bool kvm_write_protect_all_is_supported(KVMState *s) +{ + return kvm_check_extension(s, KVM_CAP_X86_WRITE_PROTECT_ALL_MEM) && + kvm_check_extension(s, KVM_CAP_X86_DIRTY_LOG_WITHOUT_WRITE_PROTECT); +} + +static void kvm_write_protect_all_mem(bool