On 30.01.2026 15:57, Roger Pau Monne wrote:
> Physmap population has the need to use pages as big as possible to reduce
> p2m shattering. However that triggers issues when big enough pages are not
> yet scrubbed, and so scrubbing must be done at allocation time. On some
> scenarios with added contention the watchdog can trigger:
>
> Watchdog timer detects that CPU55 is stuck!
> ----[ Xen-4.17.5-21 x86_64 debug=n Not tainted ]----
> CPU: 55
> RIP: e008:[<ffff82d040204c4a>] clear_page_sse2+0x1a/0x30
> RFLAGS: 0000000000000202 CONTEXT: hypervisor (d0v12)
> [...]
> Xen call trace:
> [<ffff82d040204c4a>] R clear_page_sse2+0x1a/0x30
> [<ffff82d04022a121>] S clear_domain_page+0x11/0x20
> [<ffff82d04022c170>] S common/page_alloc.c#alloc_heap_pages+0x400/0x5a0
> [<ffff82d04022d4a7>] S alloc_domheap_pages+0x67/0x180
> [<ffff82d040226f9f>] S common/memory.c#populate_physmap+0x22f/0x3b0
> [<ffff82d040228ec8>] S do_memory_op+0x728/0x1970
>
> Introduce a mechanism to preempt page scrubbing in populate_physmap(). It
> relies on stashing the dirty page in the domain struct temporarily to
> preempt to guest context, so the scrubbing can resume when the domain
> re-enters the hypercall. The added deferral mechanism will only be used for
> domain construction, and is designed to be used with a single threaded
> domain builder. If the toolstack makes concurrent calls to
> XENMEM_populate_physmap for the same target domain it will trash stashed
> pages, resulting in slow domain physmap population.
>
> Note a similar issue is present in increase reservation. However that
> hypercall is likely to only be used once the domain is already running and
> the known implementations use 4K pages. It will be deal with in a separate
> patch using a different approach, that will also take care of the
> allocation in populate_physmap() once the domain is running.
>
> Fixes: 74d2e11ccfd2 ("mm: Scrub pages in alloc_heap_pages() if needed")
> Signed-off-by: Roger Pau Monné <[email protected]>
Reviewed-by: Jan Beulich <[email protected]>
with one minor remark:
> @@ -286,6 +365,30 @@ static void populate_physmap(struct memop_args *a)
> goto out;
> }
>
> + if ( memflags & MEMF_no_scrub )
> + {
> + unsigned int dirty_cnt = 0;
> +
> + /* Check if there's anything to scrub. */
> + for ( j = scrub_start; j < (1U << a->extent_order); j++ )
> + {
> + if ( !test_and_clear_bit(_PGC_need_scrub,
> + &page[j].count_info) )
> + continue;
> +
> + scrub_one_page(&page[j], true);
> +
> + if ( (j + 1) != (1U << a->extent_order) &&
> + !(++dirty_cnt & 0xff) &&
> + hypercall_preempt_check() )
> + {
> + a->preempted = 1;
> + stash_allocation(d, page, a->extent_order, ++j);
> + goto out;
As j isn't used anymore afterwards, j + 1 may be more natural to use here,
if only to avoid raised eyebrows.
Jan