Hi, Peter

On 10/12/2025 05:10, Peter Xu wrote:
> On Mon, Dec 08, 2025 at 08:09:52PM +0800, Chuang Xu wrote:
>> From: xuchuangxclwt <[email protected]>
>>
>> When the addresses processed are not aligned, a large number of
>> clear_dirty ioctl occur (e.g. a 16MB misaligned memory can generate
>> 4096 clear_dirty ioctls), which increases the time required for
>> bitmap_sync and makes it more difficult for dirty pages to converge.
>>
>> Attempt to merge those fragmented clear_dirty ioctls.
> (besides separate perf results I requested as in the cover letter reply..)
>
> Could you add something into the commit log explaining at least one example
> that you observe?  E.g. what is the VM setup, how many ramblocks are the
> ones not aligned?
>
> Have you considered setting rb->clear_bmap when it's available?  It'll
> postpone the remote clear even further until page sent.  Logically it
> should be more efficient, but it may depend on the size of unaligned
> ramblocks that you're hitting indeed, as clear_bmap isn't PAGE_SIZE based
> but it can be much bigger.  Some discussion around this would be nice on
> how you chose the current solution.
>

On my Intel(R) Xeon(R) 6986P-C(previous tests were based on Cascade Lake),
I add some logs. Here are some examples of unaligned memory I observed:
size 1966080: system.flash0
size 131072: /rom@etc/acpi/tables, isa-bios, system.flash1, pc.rom
size 65536: cirrus_vga.rom

Taking system.flash0 as an example, judging from its size, this should 
be the OVMF I'm using.
This memory segment will trigger clear_dirty in both memory_listener 
"kvm-memory" and
memory_listener "kvm-smram" simultaneously, ultimately resulting in a 
total of 960 kvm_ioctl calls.
If a larger OVMF is used, this number will obviously worsen.

On the machine I tested, clear system.flash0 took a total of 49ms,
and clear all unaligned memory took a total of 61ms.

Regarding why the current solution was chosen, because I'm not sure if 
there were any
special considerations in the initial design of clear_dirty_log for not 
applying unaligned memory paths.
Therefore, I chose to keep most of the logic the same as the existing one,
only extracting and merging the actual clear_dirty operations.

Thanks.

Reply via email to