Re: [PATCH v2 0/7] KVM: MMU: fast write protect

2017-07-03 Thread Xiao Guangrong



On 07/03/2017 11:47 PM, Paolo Bonzini wrote:



On 03/07/2017 16:39, Xiao Guangrong wrote:



On 06/20/2017 05:15 PM, guangrong.x...@gmail.com wrote:

From: Xiao Guangrong 

Changelog in v2:
thanks to Paolo's review, this version disables write-protect-all if
PML is supported


Hi Paolo,

Do you have time to have a look at this new version? ;)
Or I should wait until the patchset of dirty ring-buffer is merged?


I will look at it soon, but I still plan to merge dirty ring buffer first.

Thanks for your understanding,


Sure, i fully understand, thank you for bearing my push. :)


Re: [PATCH v2 0/7] KVM: MMU: fast write protect

2017-07-03 Thread Xiao Guangrong



On 07/03/2017 11:47 PM, Paolo Bonzini wrote:



On 03/07/2017 16:39, Xiao Guangrong wrote:



On 06/20/2017 05:15 PM, guangrong.x...@gmail.com wrote:

From: Xiao Guangrong 

Changelog in v2:
thanks to Paolo's review, this version disables write-protect-all if
PML is supported


Hi Paolo,

Do you have time to have a look at this new version? ;)
Or I should wait until the patchset of dirty ring-buffer is merged?


I will look at it soon, but I still plan to merge dirty ring buffer first.

Thanks for your understanding,


Sure, i fully understand, thank you for bearing my push. :)


Re: [PATCH v2 0/7] KVM: MMU: fast write protect

2017-07-03 Thread Paolo Bonzini


On 03/07/2017 16:39, Xiao Guangrong wrote:
> 
> 
> On 06/20/2017 05:15 PM, guangrong.x...@gmail.com wrote:
>> From: Xiao Guangrong 
>>
>> Changelog in v2:
>> thanks to Paolo's review, this version disables write-protect-all if
>> PML is supported
> 
> Hi Paolo,
> 
> Do you have time to have a look at this new version? ;)
> Or I should wait until the patchset of dirty ring-buffer is merged?

I will look at it soon, but I still plan to merge dirty ring buffer first.

Thanks for your understanding,

Paolo


Re: [PATCH v2 0/7] KVM: MMU: fast write protect

2017-07-03 Thread Paolo Bonzini


On 03/07/2017 16:39, Xiao Guangrong wrote:
> 
> 
> On 06/20/2017 05:15 PM, guangrong.x...@gmail.com wrote:
>> From: Xiao Guangrong 
>>
>> Changelog in v2:
>> thanks to Paolo's review, this version disables write-protect-all if
>> PML is supported
> 
> Hi Paolo,
> 
> Do you have time to have a look at this new version? ;)
> Or I should wait until the patchset of dirty ring-buffer is merged?

I will look at it soon, but I still plan to merge dirty ring buffer first.

Thanks for your understanding,

Paolo


Re: [PATCH v2 0/7] KVM: MMU: fast write protect

2017-07-03 Thread Xiao Guangrong



On 06/20/2017 05:15 PM, guangrong.x...@gmail.com wrote:

From: Xiao Guangrong 

Changelog in v2:
thanks to Paolo's review, this version disables write-protect-all if
PML is supported


Hi Paolo,

Do you have time to have a look at this new version? ;)
Or I should wait until the patchset of dirty ring-buffer is merged?

Thanks!


Re: [PATCH v2 0/7] KVM: MMU: fast write protect

2017-07-03 Thread Xiao Guangrong



On 06/20/2017 05:15 PM, guangrong.x...@gmail.com wrote:

From: Xiao Guangrong 

Changelog in v2:
thanks to Paolo's review, this version disables write-protect-all if
PML is supported


Hi Paolo,

Do you have time to have a look at this new version? ;)
Or I should wait until the patchset of dirty ring-buffer is merged?

Thanks!


[PATCH v2 0/7] KVM: MMU: fast write protect

2017-06-20 Thread guangrong . xiao
From: Xiao Guangrong 

Changelog in v2:
thanks to Paolo's review, this version disables write-protect-all if
PML is supported

Background
==
The original idea of this patchset is from Avi who raised it in
the mailing list during my vMMU development some years ago

This patchset introduces a extremely fast way to write protect
all the guest memory. Comparing with the ordinary algorithm which
write protects last level sptes based on the rmap one by one,
it just simply updates the generation number to ask all vCPUs
to reload its root page table, particularly, it can be done out
of mmu-lock, so that it does not hurt vMMU's parallel. It is
the O(1) algorithm which does not depends on the capacity of
guest's memory and the number of guest's vCPUs

Implementation
==
When write protect for all guest memory is required, we update
the global generation number and ask vCPUs to reload its root
page table by calling kvm_reload_remote_mmus(), the global number
is protected by slots_lock

During reloading its root page table, the vCPU checks root page
table's generation number with current global number, if it is not
matched, it makes all the entries in the shadow page readonly and
directly go to VM. So the read access is still going on smoothly
without KVM's involvement and write access triggers page fault

If the page fault is triggered by write operation, KVM moves the
write protection from the upper level to the lower level page - by
making all the entries in the lower page readonly first then make
the upper level writable, this operation is repeated until we meet
the last spte

In order to speed up the process of making all entries readonly, we
introduce possible_writable_spte_bitmap which indicates the writable
sptes and possiable_writable_sptes which is a counter indicating the
number of writable sptes in the shadow page, they work very efficiently
as usually only one entry in PML4 ( < 512 G),few entries in PDPT (one
entry indicates 1G memory), PDEs and PTEs need to be write protected for
the worst case. Note, the number of page fault and TLB flush are the same
as the ordinary algorithm

Performance Data

Case 1) For a VM which has 3G memory and 12 vCPUs, we noticed that:
   a: the time required for dirty log (ns)
   before   after
   64289121 137654  +46603%

   b: the performance of memory write after dirty log, i.e, the dirty
  log path is not parallel with page fault, the time required to
  write all 3G memory for all vCPUs in the VM (ns):
   before   after
   281735017291150923   -3%
  We think the impact, 3%, is acceptable, particularly, mmu-lock
  contention is not take into account in this case

Case 2) For a VM which has 30G memory and 8 vCPUs, we do the live
   migration, at the some time, a test case which greedily and repeatedly
   writes 3000M memory in the VM.

   2.1) for the new booted VM, i.e, page fault is required to map guest
memory in, we noticed that:
a: the dirty page rate (pages):
before   after
333092   497266 +49%
that means, the performance for the being migrated VM is hugely
improved as the contention on mmu-lock is reduced

b: the time to complete live migration (ms):
before   after
1253218467 -47%
not surprise, the time required to complete live migration is
increased as the VM is able to generate more dirty pages

   2.2) pre-write the VM first, then run the test case and do live
migration, i.e, no much page faults are needed to map guest
memory in, we noticed that:
a: the dirty page rate (pages):
before   after
447435   449284  +0%

b: time time to complete live migration (ms)
before   after
3106828310  +10%
under this case, we also noticed that the time of dirty log for
the first time, before the patchset is 156 ms, after that, only
6 ms is needed
   
The patch applied to QEMU
=
The draft patch is attached to enable this functionality in QEMU:

diff --git a/kvm-all.c b/kvm-all.c
index 90b8573..9ebe1ac 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -122,6 +122,7 @@ bool kvm_direct_msi_allowed;
 bool kvm_ioeventfd_any_length_allowed;
 bool kvm_msi_use_devid;
 static bool kvm_immediate_exit;
+static bool kvm_write_protect_all;
 
 static const KVMCapabilityInfo kvm_required_capabilites[] = {
 KVM_CAP_INFO(USER_MEMORY),
@@ -440,6 +441,26 @@ static int 
kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
 
 #define ALIGN(x, y)  (((x)+(y)-1) & ~((y)-1))
 
+static bool kvm_write_protect_all_is_supported(KVMState *s)
+{
+   return kvm_check_extension(s, KVM_CAP_X86_WRITE_PROTECT_ALL_MEM) &&
+   kvm_check_extension(s, 

[PATCH v2 0/7] KVM: MMU: fast write protect

2017-06-20 Thread guangrong . xiao
From: Xiao Guangrong 

Changelog in v2:
thanks to Paolo's review, this version disables write-protect-all if
PML is supported

Background
==
The original idea of this patchset is from Avi who raised it in
the mailing list during my vMMU development some years ago

This patchset introduces a extremely fast way to write protect
all the guest memory. Comparing with the ordinary algorithm which
write protects last level sptes based on the rmap one by one,
it just simply updates the generation number to ask all vCPUs
to reload its root page table, particularly, it can be done out
of mmu-lock, so that it does not hurt vMMU's parallel. It is
the O(1) algorithm which does not depends on the capacity of
guest's memory and the number of guest's vCPUs

Implementation
==
When write protect for all guest memory is required, we update
the global generation number and ask vCPUs to reload its root
page table by calling kvm_reload_remote_mmus(), the global number
is protected by slots_lock

During reloading its root page table, the vCPU checks root page
table's generation number with current global number, if it is not
matched, it makes all the entries in the shadow page readonly and
directly go to VM. So the read access is still going on smoothly
without KVM's involvement and write access triggers page fault

If the page fault is triggered by write operation, KVM moves the
write protection from the upper level to the lower level page - by
making all the entries in the lower page readonly first then make
the upper level writable, this operation is repeated until we meet
the last spte

In order to speed up the process of making all entries readonly, we
introduce possible_writable_spte_bitmap which indicates the writable
sptes and possiable_writable_sptes which is a counter indicating the
number of writable sptes in the shadow page, they work very efficiently
as usually only one entry in PML4 ( < 512 G),few entries in PDPT (one
entry indicates 1G memory), PDEs and PTEs need to be write protected for
the worst case. Note, the number of page fault and TLB flush are the same
as the ordinary algorithm

Performance Data

Case 1) For a VM which has 3G memory and 12 vCPUs, we noticed that:
   a: the time required for dirty log (ns)
   before   after
   64289121 137654  +46603%

   b: the performance of memory write after dirty log, i.e, the dirty
  log path is not parallel with page fault, the time required to
  write all 3G memory for all vCPUs in the VM (ns):
   before   after
   281735017291150923   -3%
  We think the impact, 3%, is acceptable, particularly, mmu-lock
  contention is not take into account in this case

Case 2) For a VM which has 30G memory and 8 vCPUs, we do the live
   migration, at the some time, a test case which greedily and repeatedly
   writes 3000M memory in the VM.

   2.1) for the new booted VM, i.e, page fault is required to map guest
memory in, we noticed that:
a: the dirty page rate (pages):
before   after
333092   497266 +49%
that means, the performance for the being migrated VM is hugely
improved as the contention on mmu-lock is reduced

b: the time to complete live migration (ms):
before   after
1253218467 -47%
not surprise, the time required to complete live migration is
increased as the VM is able to generate more dirty pages

   2.2) pre-write the VM first, then run the test case and do live
migration, i.e, no much page faults are needed to map guest
memory in, we noticed that:
a: the dirty page rate (pages):
before   after
447435   449284  +0%

b: time time to complete live migration (ms)
before   after
3106828310  +10%
under this case, we also noticed that the time of dirty log for
the first time, before the patchset is 156 ms, after that, only
6 ms is needed
   
The patch applied to QEMU
=
The draft patch is attached to enable this functionality in QEMU:

diff --git a/kvm-all.c b/kvm-all.c
index 90b8573..9ebe1ac 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -122,6 +122,7 @@ bool kvm_direct_msi_allowed;
 bool kvm_ioeventfd_any_length_allowed;
 bool kvm_msi_use_devid;
 static bool kvm_immediate_exit;
+static bool kvm_write_protect_all;
 
 static const KVMCapabilityInfo kvm_required_capabilites[] = {
 KVM_CAP_INFO(USER_MEMORY),
@@ -440,6 +441,26 @@ static int 
kvm_get_dirty_pages_log_range(MemoryRegionSection *section,
 
 #define ALIGN(x, y)  (((x)+(y)-1) & ~((y)-1))
 
+static bool kvm_write_protect_all_is_supported(KVMState *s)
+{
+   return kvm_check_extension(s, KVM_CAP_X86_WRITE_PROTECT_ALL_MEM) &&
+   kvm_check_extension(s, 
KVM_CAP_X86_DIRTY_LOG_WITHOUT_WRITE_PROTECT);
+}
+
+static void