During checking mmu_lock contention, I noticed that QEMU's
memory_region_get_dirty() was using unexpectedly much CPU time.
Thanks,
Takuya
=
perf top -t ${QEMU_TID}
=
51.52% qemu-system-x86_64 [.] memory_region_get_dirty
16.73% qemu-system-x86_64 [.] ram_save_remaining
7.25% qemu-system-x86_64 [.] cpu_physical_memory_reset_dirty
3.49% [kvm][k] __rmap_write_protect
2.85% [kvm][k] mmu_spte_update
2.20% [kernel] [k] copy_user_generic_string
2.16% libc-2.13.so [.] 0x874e9
1.71% qemu-system-x86_64 [.] memory_region_set_dirty
1.20% qemu-system-x86_64 [.] kvm_physical_sync_dirty_bitmap
1.00% [kernel] [k] __lock_acquire.isra.31
0.66% [kvm][k] rmap_get_next
0.58% [kvm][k] rmap_get_first
0.54% [kvm][k] kvm_mmu_write_protect_pt_masked
0.54% [kvm][k] spte_has_volatile_bits
0.42% [kernel] [k] lock_release
0.37% [kernel] [k] tcp_sendmsg
0.33% [kernel] [k] alloc_pages_current
0.29% [kernel] [k] native_read_tsc
0.29% qemu-system-x86_64 [.] ram_save_block
0.25% [kernel] [k] lock_is_held
0.25% [kernel] [k] __ticket_spin_trylock
0.21% [kernel] [k] lock_acquire
On Sat, 28 Apr 2012 19:05:44 +0900
Takuya Yoshikawa wrote:
> 1. Problem
> During live migration, if the guest tries to take mmu_lock at the same
> time as GET_DIRTY_LOG, which is called periodically by QEMU, it may be
> forced to wait long time; this is not restricted to page faults caused
> by GET_DIRTY_LOG's write protection.
>
> 2. Measurement
> - Server:
> Xeon: 8 cores(2 CPUs), 24GB memory
>
> - One VM was being migrated locally to the opposite numa node:
> Source(active) VM: binded to node 0
> Target(incoming) VM: binded to node 1
>
> This binding was for reducing extra noise.
>
> - The guest inside it:
> 3 VCPUs, 11GB memory
>
> - Workload:
> On VCPU 2 and 3, there were 3 threads and each of them was endlessly
> writing to 3GB, in total 9GB, anonymous memory at its maximum speed.
>
> I had checked that GET_DIRTY_LOG was forced to write protect more than
> 2 million pages. So the 9GB memory was almost always kept dirty to be
> sent.
>
> In parallel, on VCPU 1, I checked memory write latency: how long it
> takes to write to one byte of each page in 1GB anonymous memory.
>
> - Result:
> With the current KVM, I could see 1.5ms worst case latency: this
> corresponds well with the expected mmu_lock hold time.
>
> Here, you may think that this is too small compared to the numbers I
> reported before, using dirty-log-perf, but that was done on 32-bit
> host on a core-i3 box which was much slower than server machines.
>
>
> Although having 10GB dirty memory pages is a bit extreme for guests
> with less than 16GB memory, much larger guests, e.g. 128GB guests, may
> see latency longer than 1.5ms.
>
> 3. Solution
> GET_DIRTY_LOG time is very limited compared to other works in QEMU,
> so we should focus on alleviating the worst case latency first.
>
> The solution is very simple and originally suggested by Marcelo:
> "Conditionally reschedule when there is a contention."
>
> By this rescheduling, see the following patch, the worst case latency
> changed from 1.5ms to 800us for the same test.
>
> 4. TODO
> The patch treats kvm_vm_ioctl_get_dirty_log() only, so the write
> protection by kvm_mmu_slot_remove_write_access(), which is called when
> we enable dirty page logging, can cause the same problem.
>
> My plan is to replace it with rmap-based protection after this.
>
>
> Thanks,
> Takuya
>
> ---
> Takuya Yoshikawa (1):
> KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
>
> arch/x86/include/asm/kvm_host.h |6 +++---
> arch/x86/kvm/mmu.c | 12 +---
> arch/x86/kvm/x86.c | 22 +-
> 3 files changed, 29 insertions(+), 11 deletions(-)
>
> --
> 1.7.5.4
>
--
Takuya Yoshikawa