On Wed, Dec 10, 2025 at 10:18:21PM +0800, Chuang Xu wrote:
> Hi, Peter
>
> On 10/12/2025 05:10, Peter Xu wrote:
> > On Mon, Dec 08, 2025 at 08:09:52PM +0800, Chuang Xu wrote:
> >> From: xuchuangxclwt <[email protected]>
> >>
> >> When the addresses processed are not aligned, a large number of
> >> clear_dirty ioctl occur (e.g. a 16MB misaligned memory can generate
> >> 4096 clear_dirty ioctls), which increases the time required for
> >> bitmap_sync and makes it more difficult for dirty pages to converge.
> >>
> >> Attempt to merge those fragmented clear_dirty ioctls.
> > (besides separate perf results I requested as in the cover letter reply..)
> >
> > Could you add something into the commit log explaining at least one example
> > that you observe? E.g. what is the VM setup, how many ramblocks are the
> > ones not aligned?
> >
> > Have you considered setting rb->clear_bmap when it's available? It'll
> > postpone the remote clear even further until page sent. Logically it
> > should be more efficient, but it may depend on the size of unaligned
> > ramblocks that you're hitting indeed, as clear_bmap isn't PAGE_SIZE based
> > but it can be much bigger. Some discussion around this would be nice on
> > how you chose the current solution.
> >
>
> On my Intel(R) Xeon(R) 6986P-C(previous tests were based on Cascade Lake),
> I add some logs. Here are some examples of unaligned memory I observed:
> size 1966080: system.flash0
> size 131072: /rom@etc/acpi/tables, isa-bios, system.flash1, pc.rom
> size 65536: cirrus_vga.rom
>
> Taking system.flash0 as an example, judging from its size, this should
> be the OVMF I'm using.
> This memory segment will trigger clear_dirty in both memory_listener
> "kvm-memory" and
> memory_listener "kvm-smram" simultaneously, ultimately resulting in a
> total of 960 kvm_ioctl calls.
> If a larger OVMF is used, this number will obviously worsen.
>
> On the machine I tested, clear system.flash0 took a total of 49ms,
> and clear all unaligned memory took a total of 61ms.
Thanks for the numbers. It might be more helpful if you share the other
relevant numbers, e.g. referring to your cover letter definitions,
- total bitmap sync time (x)
- iteration transfer time (t)
that'll provide a full picture of how much wasted on per-page log_clear().
You can use those to compare to the clear() operation it took after your
patch applied. I think that might be a better way to justify the patch.
In general, it looks reasonable to me.
Having a bool parameter for the old physical_memory_test_and_clear_dirty()
is fine but might be slightly error prone for new callers if they set it to
false by accident.
Another way to do it is, since physical_memory_test_and_clear_dirty()
always takes a range, we can pass the bitmap ("dest") into it when
available, then changing the retval to be num_dirty:
(1) when dest provided, only account it when "dest" bitmap wasn't set
already
(2) when dest==NULL, treat all dirty bits into num_dirty
Maybe that'll look better, you can double check.
Looks like patch 1 will be dropped. You can repost this patch alone when
you feel ready whatever way you see fit.
Thanks,
>
> Regarding why the current solution was chosen, because I'm not sure if
> there were any
> special considerations in the initial design of clear_dirty_log for not
> applying unaligned memory paths.
> Therefore, I chose to keep most of the logic the same as the existing one,
> only extracting and merging the actual clear_dirty operations.
>
> Thanks.
>
--
Peter Xu