On 25/03/2025 16:23, Julien Grall wrote:
>
>
> Hi Oleksandr, Michal,
>
> On 25/03/2025 15:08, Oleksandr Tyshchenko wrote:
>> On the device-tree-based Arm64 system, if Xen is built with
>> CONFIG_ACPI=y, CONFIG_STATIC_MEMORY=y, and the static memory range
>> is provided in the host device tree, the BUG is triggered in
>> common/page_alloc.c during Xen's early boot. The BUG occurs when
>> the first page from the static range is fed to the domain
>> sub-allocator and finally ends up in mark_page_free().
>> The pg->count_info & PGC_state is not in the state that
>> the code expects to see there.
>>
>> (XEN) Checking for initrd in /chosen
>> (XEN) Checking for "xen,static-mem" in domain node
>> (XEN) RAM: 0000000040000000 - 00000000bfffffff
>> (XEN)
>> (XEN) MODULE[0]: 0000000043200000 - 0000000043343fff Xen
>> (XEN) MODULE[1]: 0000000043400000 - 0000000043402fff Device Tree
>> (XEN) MODULE[2]: 0000000042e00000 - 0000000043111f82 Ramdisk
>> (XEN) MODULE[3]: 0000000040400000 - 0000000042cfffff Kernel
>> (XEN) RESVD[0]: 0000000050000000 - 000000005fffffff
>> (XEN)
>> (XEN) CMDLINE[0000000040400000]:domU0 console=ttyAMA0
>> (XEN)
>> (XEN) Command line: console=dtuart conswitch=ax
>> (XEN) pg MFN 50000 c=0x2180000000000000 o=0 v=0 t=0
>> (XEN) Xen BUG at common/page_alloc.c:1474
>> [snip]
>>
>> The problem is that the static range gets mistakenly unreserved
>> in populate_boot_allocator() and reaches init_boot_pages().
>> This happens since by the time the populate_boot_allocator()
>> is executed, the evaluated in fw_unreserved_regions()
>> an acpi_disabled variable is still false and as the result
>> the dt_unreserved_regions() which should simply skip that static range
>> does not get called. The acpi_disabled will be set to the actual value
>> (in our case it is true) later on in acpi_boot_table_init().
> > > The important question is why acpi_disabled is false by the time
Simply because it's a static bool variable with no assigned value i.e. gets
defaulted to false. And it stays at default value until acpi_boot_table_init()
call that cannot really be moved before setup_mm().
>> setup_mm() is executed. With CONFIG_ACPI=n it is a macro that is always
>> true, but with CONFIG_ACPI=y it is a boolean that is false from the very
>> beggining, even though the entire acpi_boot_table_init() (which is called
>> after setup_mm()) is written with the assumption that ACPI is off by default
>> at the start. So, initialize acpi_disabled to true during declaration
>> if CONFIG_ACPI=y to avoid an issue and match to acpi_boot_table_init().
>
> While I agree that acpi_disabled should be false. It feels like a bit of
You meant true (?) i.e. ACPI default off not to make any assumptions whether
it's really on/off which can only be determined in acpi_boot_table_init(). I
think we still need this patch to match the code expectation.
> a workaround for the issue you are trying to solve here. If
> fw_unreserved_regions() doesn't work with ACPI enabled, then it is still
> a problem after your patch.
I don't understand. It does work with ACPI enabled provided that it's indeed
enabled. When booting with ACPI, reserved memory regions are not used - we even
have a comment in struct bootinfo. The issue here is that acpi_disabled is set
to false i.e. incorrectly there is assumption that ACPI is enabled by default
and calling fw_unreserved_regions() prior to acpi_boot_table_init() works as
long as we respect the expected default value.
>
> Furthermore, what happen if we decide to use ACPI afterwards? Wouldn't
> this mean that the static regions would be reserved even if ACPI doesn't
> use static memory (all the memory is expected to be given to the allocator)?
I don't think such hybrid configuration is valid (booting with ACPI yet
declaring reserved regions in DT). See commit:
9c2bc0f24b2ba7082df408b3c33ec9a86bf20cf0
~Michal