Hi, Peter On 10/12/2025 05:10, Peter Xu wrote: > On Mon, Dec 08, 2025 at 08:09:52PM +0800, Chuang Xu wrote: >> From: xuchuangxclwt <[email protected]> >> >> When the addresses processed are not aligned, a large number of >> clear_dirty ioctl occur (e.g. a 16MB misaligned memory can generate >> 4096 clear_dirty ioctls), which increases the time required for >> bitmap_sync and makes it more difficult for dirty pages to converge. >> >> Attempt to merge those fragmented clear_dirty ioctls. > (besides separate perf results I requested as in the cover letter reply..) > > Could you add something into the commit log explaining at least one example > that you observe? E.g. what is the VM setup, how many ramblocks are the > ones not aligned? > > Have you considered setting rb->clear_bmap when it's available? It'll > postpone the remote clear even further until page sent. Logically it > should be more efficient, but it may depend on the size of unaligned > ramblocks that you're hitting indeed, as clear_bmap isn't PAGE_SIZE based > but it can be much bigger. Some discussion around this would be nice on > how you chose the current solution. >
On my Intel(R) Xeon(R) 6986P-C(previous tests were based on Cascade Lake), I add some logs. Here are some examples of unaligned memory I observed: size 1966080: system.flash0 size 131072: /rom@etc/acpi/tables, isa-bios, system.flash1, pc.rom size 65536: cirrus_vga.rom Taking system.flash0 as an example, judging from its size, this should be the OVMF I'm using. This memory segment will trigger clear_dirty in both memory_listener "kvm-memory" and memory_listener "kvm-smram" simultaneously, ultimately resulting in a total of 960 kvm_ioctl calls. If a larger OVMF is used, this number will obviously worsen. On the machine I tested, clear system.flash0 took a total of 49ms, and clear all unaligned memory took a total of 61ms. Regarding why the current solution was chosen, because I'm not sure if there were any special considerations in the initial design of clear_dirty_log for not applying unaligned memory paths. Therefore, I chose to keep most of the logic the same as the existing one, only extracting and merging the actual clear_dirty operations. Thanks.
