On 2017/5/17 13:47, Wanpeng Li wrote:
Hi Zhoujian,
2017-05-17 10:20 GMT+08:00 Zhoujian (jay) <jianjay.z...@huawei.com>:
Hi Wanpeng,
On 11/05/2017 14:07, Zhoujian (jay) wrote:
- * Scan sptes if dirty logging has been stopped, dropping those
- * which can be collapsed into a single large-page spte. Later
- * page faults will create the large-page sptes.
+ * Reset each vcpu's mmu, then page faults will create the
large-page
+ * sptes later.
*/
if ((change != KVM_MR_DELETE) &&
(old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
- !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
- kvm_mmu_zap_collapsible_sptes(kvm, new);
This is an unlikely branch(unless guest live migration fails and continue
to run on the source machine) instead of hot path, do you have any
performance number for your real workloads?
Sorry to bother you again.
Recently, I have tested the performance before migration and after migration
failure
using spec cpu2006 https://www.spec.org/cpu2006/, which is a standard
performance
evaluation tool.
These are the results:
******
Before migration the score is 153, and the TLB miss statistics of the qemu
process is:
linux-sjrfac:/mnt/zhoujian # perf stat -e
dTLB-load-misses,dTLB-loads,dTLB-store-misses, \
dTLB-stores,iTLB-load-misses,iTLB-loads -p 26463 sleep 10
Performance counter stats for process id '26463':
698,938 dTLB-load-misses # 0.13% of all dTLB cache
hits (50.46%)
543,303,875 dTLB-loads
(50.43%)
199,597 dTLB-store-misses
(16.51%)
60,128,561 dTLB-stores
(16.67%)
69,986 iTLB-load-misses # 6.17% of all iTLB cache
hits (16.67%)
1,134,097 iTLB-loads
(33.33%)
10.000684064 seconds time elapsed
After migration failure the score is 149, and the TLB miss statistics of
the qemu process is:
linux-sjrfac:/mnt/zhoujian # perf stat -e
dTLB-load-misses,dTLB-loads,dTLB-store-misses, \
dTLB-stores,iTLB-load-misses,iTLB-loads -p 26463 sleep 10
Performance counter stats for process id '26463':
765,400 dTLB-load-misses # 0.14% of all dTLB cache
hits (50.50%)
540,972,144 dTLB-loads
(50.47%)
207,670 dTLB-store-misses
(16.50%)
58,363,787 dTLB-stores
(16.67%)
109,772 iTLB-load-misses # 9.52% of all iTLB cache
hits (16.67%)
1,152,784 iTLB-loads
(33.32%)
10.000703078 seconds time elapsed
******
Could you comment out the original "lazy collapse small sptes into
large sptes" codes in the function kvm_arch_commit_memory_region() and
post the results here?
With the patch below,
diff --git a/source/x86/x86.c b/source/x86/x86.c
index 054a7d3..e0288d5 100644
--- a/source/x86/x86.c
+++ b/source/x86/x86.c
@@ -8548,10 +8548,6 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
* which can be collapsed into a single large-page spte. Later
* page faults will create the large-page sptes.
*/
- if ((change != KVM_MR_DELETE) &&
- (old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
- !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
- kvm_mmu_zap_collapsible_sptes(kvm, new);
/*
* Set up write protection and/or dirty logging for the new slot.
After migration failure the score is 148, and the TLB miss statistics
of the qemu process is:
linux-sjrfac:/mnt/zhoujian # perf stat -e
dTLB-load-misses,dTLB-loads,dTLB-store-misses,dTLB-stores,iTLB-load-misses,iTLB-loads
-p 12432 sleep 10
Performance counter stats for process id '12432':
1,052,697 dTLB-load-misses # 0.19% of all
dTLB cache hits (50.45%)
551,828,702 dTLB-loads
(50.46%)
147,228 dTLB-store-misses
(16.55%)
60,427,834 dTLB-stores
(16.50%)
93,793 iTLB-load-misses # 7.43% of all
iTLB cache hits (16.67%)
1,262,137 iTLB-loads
(33.33%)
10.000709900 seconds time elapsed
Regards,
Jay Zhou
Regards,
Wanpeng Li
These are the steps:
======
(1) the version of kmod is 4.4.11(with slightly modified) and the version of
qemu is 2.6.0
(with slightly modified), the kmod is applied with the following patch
according to
Paolo's advice:
diff --git a/source/x86/x86.c b/source/x86/x86.c
index 054a7d3..75a4bb3 100644
--- a/source/x86/x86.c
+++ b/source/x86/x86.c
@@ -8550,8 +8550,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
*/
if ((change != KVM_MR_DELETE) &&
(old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
- !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
- kvm_mmu_zap_collapsible_sptes(kvm, new);
+ !(new->flags & KVM_MEM_LOG_DIRTY_PAGES)) {
+ printk(KERN_ERR "zj make KVM_REQ_MMU_RELOAD request\n");
+ kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
+ }
/*
* Set up write protection and/or dirty logging for the new slot.
(2) I started up a memory preoccupied 10G VM(suse11sp3), which means its "RES
column" in top is 10G,
in order to set up the EPT table in advance.
(3) And then, I run the test case 429.mcf of spec cpu2006 before migration and
after migration failure.
The 429.mcf is a memory intensive workload, and the migration failure is
constructed deliberately
with the following patch of qemu:
diff --git a/migration/migration.c b/migration/migration.c
index 5d725d0..88dfc59 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -625,6 +625,9 @@ static void process_incoming_migration_co(void *opaque)
MIGRATION_STATUS_ACTIVE);
ret = qemu_loadvm_state(f);
+ // deliberately construct the migration failure
+ exit(EXIT_FAILURE);
+
ps = postcopy_state_get();
trace_process_incoming_migration_co_end(ret, ps);
if (ps != POSTCOPY_INCOMING_NONE) {
======
Results of the score and TLB miss rate are almost the same, and I am confused.
May I ask which tool do you use to evaluate the performance?
And if my test steps are wrong, please let me know, thank you.
Regards,
Jay Zhou
.