On Wed, Dec 10, 2025 at 10:18:21PM +0800, Chuang Xu wrote:
> Hi, Peter
> 
> On 10/12/2025 05:10, Peter Xu wrote:
> > On Mon, Dec 08, 2025 at 08:09:52PM +0800, Chuang Xu wrote:
> >> From: xuchuangxclwt <[email protected]>
> >>
> >> When the addresses processed are not aligned, a large number of
> >> clear_dirty ioctl occur (e.g. a 16MB misaligned memory can generate
> >> 4096 clear_dirty ioctls), which increases the time required for
> >> bitmap_sync and makes it more difficult for dirty pages to converge.
> >>
> >> Attempt to merge those fragmented clear_dirty ioctls.
> > (besides separate perf results I requested as in the cover letter reply..)
> >
> > Could you add something into the commit log explaining at least one example
> > that you observe?  E.g. what is the VM setup, how many ramblocks are the
> > ones not aligned?
> >
> > Have you considered setting rb->clear_bmap when it's available?  It'll
> > postpone the remote clear even further until page sent.  Logically it
> > should be more efficient, but it may depend on the size of unaligned
> > ramblocks that you're hitting indeed, as clear_bmap isn't PAGE_SIZE based
> > but it can be much bigger.  Some discussion around this would be nice on
> > how you chose the current solution.
> >
> 
> On my Intel(R) Xeon(R) 6986P-C(previous tests were based on Cascade Lake),
> I add some logs. Here are some examples of unaligned memory I observed:
> size 1966080: system.flash0
> size 131072: /rom@etc/acpi/tables, isa-bios, system.flash1, pc.rom
> size 65536: cirrus_vga.rom
> 
> Taking system.flash0 as an example, judging from its size, this should 
> be the OVMF I'm using.
> This memory segment will trigger clear_dirty in both memory_listener 
> "kvm-memory" and
> memory_listener "kvm-smram" simultaneously, ultimately resulting in a 
> total of 960 kvm_ioctl calls.
> If a larger OVMF is used, this number will obviously worsen.
> 
> On the machine I tested, clear system.flash0 took a total of 49ms,
> and clear all unaligned memory took a total of 61ms.

Thanks for the numbers.  It might be more helpful if you share the other
relevant numbers, e.g. referring to your cover letter definitions,

  - total bitmap sync time (x)
  - iteration transfer time (t)

that'll provide a full picture of how much wasted on per-page log_clear().

You can use those to compare to the clear() operation it took after your
patch applied.  I think that might be a better way to justify the patch.

In general, it looks reasonable to me.

Having a bool parameter for the old physical_memory_test_and_clear_dirty()
is fine but might be slightly error prone for new callers if they set it to
false by accident.

Another way to do it is, since physical_memory_test_and_clear_dirty()
always takes a range, we can pass the bitmap ("dest") into it when
available, then changing the retval to be num_dirty:

  (1) when dest provided, only account it when "dest" bitmap wasn't set
  already

  (2) when dest==NULL, treat all dirty bits into num_dirty

Maybe that'll look better, you can double check.

Looks like patch 1 will be dropped. You can repost this patch alone when
you feel ready whatever way you see fit.

Thanks,

> 
> Regarding why the current solution was chosen, because I'm not sure if 
> there were any
> special considerations in the initial design of clear_dirty_log for not 
> applying unaligned memory paths.
> Therefore, I chose to keep most of the logic the same as the existing one,
> only extracting and merging the actual clear_dirty operations.
> 
> Thanks.
> 

-- 
Peter Xu


Reply via email to