On Thu Sep 11, 2025 at 9:55 AM CEST, Jan Beulich wrote:
> On 10.09.2025 23:57, Andrew Cooper wrote:
>> On 10/09/2025 7:58 pm, Jason Andryuk wrote:
>>> Hi,
>>>
>>> We're running Android as a guest and it's running the Compatibility
>>> Test Suite.  During the CTS, the Android domU is rebooted multiple times.
>>>
>>> In the middle of the CTS, we've seen reboot fail.  xl -vvv shows:
>>> domainbuilder: detail: Could not allocate memory for HVM guest as we
>>> cannot claim memory!
>>> xc: error: panic: xg_dom_boot.c:119: xc_dom_boot_mem_init: can't
>>> allocate low memory for domain: Out of memory
>>> libxl: error: libxl_dom.c:581:libxl__build_dom: xc_dom_boot_mem_init
>>> failed: Cannot allocate memory
>>> domainbuilder: detail: xc_dom_release: called
>>>
>>> So the claim failed.  The system has enough memory since we're just
>>> rebooting the same VM.  As a work around, I added sleep(1) + retry,
>>> which works.
>>>
>>> The curious part is the memory allocation.  For d2 to d5, we have:
>>> domainbuilder: detail: range: start=0x0 end=0xf0000000
>>> domainbuilder: detail: range: start=0x100000000 end=0x1af000000
>>> xc: detail: PHYSICAL MEMORY ALLOCATION:
>>> xc: detail:   4KB PAGES: 0x0000000000000000
>>> xc: detail:   2MB PAGES: 0x00000000000006f8
>>> xc: detail:   1GB PAGES: 0x0000000000000003
>>>
>>> But when we have to retry the claim for d6, there are no 1GB pages used:
>>> domainbuilder: detail: range: start=0x0 end=0xf0000000
>>> domainbuilder: detail: range: start=0x100000000 end=0x1af000000
>>> domainbuilder: detail: HVM claim failed! attempt 0
>>> xc: detail: PHYSICAL MEMORY ALLOCATION:
>>> xc: detail:   4KB PAGES: 0x0000000000002800
>>> xc: detail:   2MB PAGES: 0x0000000000000ce4
>>> xc: detail:   1GB PAGES: 0x0000000000000000
>>>
>>> But subsequent reboots for d7 and d8 go back to using 1GB pages.
>>>
>>> Does the change in memory allocation stick out to anyone?
>>>
>>> Unfortunately, I don't have insight into what the failing test is doing.
>>>
>>> Xen doesn't seem set up to track the claim across reboot.  Retrying
>>> the claim works in our scenario since we have a controlled configuration.
>> 
>> This looks to me like a known phenomenon.  Ages back, a change was made
>> in how Xen scrubs memory, from being synchronous in domain_kill(), to
>> being asynchronous in the idle loop.
>> 
>> The consequence being that, on an idle system, you can shutdown and
>> reboot the domain faster, but on a busy system you end up trying to
>> allocate the new domain while memory from the old domain is still dirty.
>> 
>> It is a classic example of a false optimisation, which looks great on an
>> idle system only because the idle CPUs are swallowing the work.
>
> I wouldn't call this a "false optimization", but rather one ...
>
>> This impacts the ability to find a 1G aligned block of free memory to
>> allocate a superpage with, and by the sounds of it, claims (which
>> predate this behaviour change) aren't aware of the "to be scrubbed"
>> queue and fail instead.
>
> ... which isn't sufficiently integrated with the rest of the allocator.
>
> Jan

That'd depend on the threat model. At the very least there ought to be a
Kconfig knob to control it. You can't really tell a customer "your data is
gone from our systems" unless it really is gone. I'm guessing part of the
rationale was speeding up the obnoxiously slow destroydomain, since it hogs
a dom0 vCPU until it's done and it can take many minutes in large domains.

IOW: It's a nice optimisation, but there's multiple use cases for specifically
not wanting something like that.

Cheers,
Alejandro

Reply via email to