Hi Laszlo, On 2/27/19 9:14 PM, Laszlo Ersek wrote: > On 02/27/19 13:55, Shameerali Kolothum Thodi wrote: >> Hi Laszlo, >> >>> -----Original Message----- >>> From: Shameerali Kolothum Thodi >>> Sent: 25 February 2019 09:54 >>> To: 'Laszlo Ersek' <ler...@redhat.com>; Auger Eric <eric.au...@redhat.com>; >>> shannon.zha...@gmail.com; peter.mayd...@linaro.org; >>> imamm...@redhat.com; qemu-devel@nongnu.org; qemu-...@nongnu.org >>> Cc: xuwei (O) <xuw...@huawei.com>; Linuxarm <linux...@huawei.com>; Ard >>> Biesheuvel <ard.biesheu...@linaro.org>; Leif Lindholm (Linaro address) >>> <leif.lindh...@linaro.org> >>> Subject: RE: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support >> >> [...] >> >>>>>> The root cause seems to be EDK2 ACPI table checksum failure >>>>>> as NFIT table is getting updated on hot-add. This needs further >>>>>> investigation. >>>>> + Ard, Leif, Laszlo if they have any idea of what is missing/wrong. >>>> >>>> Huh, very interesting; I usually don't expect my sanity checks to fire >>>> in practice. :) >>>> >>>> The message >>>> >>>> ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables" >>>> >>>> is logged by OVMF's and ArmVirtQemu's ACPI Platform DXE Driver when it >>>> finds an invalid COMMAND_ADD_CHECKSUM command in QEMU's ACPI >>>> linker/loader script. >>>> >>>> Please see the command definition in QEMU's >>>> "hw/acpi/bios-linker-loader.c". In particular, please refer to the >>>> function bios_linker_loader_add_checksum(), which builds the command >>>> structure, and documents the fields. >>>> >>>> (You may also refer to QEMU_LOADER_ADD_CHECKSUM in file >>>> "OvmfPkg/AcpiPlatformDxe/QemuLoader.h" in the edk2 source tree, for the >>>> same information.) >>>> >>>> The error message is logged if: >>>> - the offset at which the checksum should be stored falls outside of the >>>> size of the fw_cfg blob, or >>>> - the range over which the checksum should be calculated falls outside >>>> (at least in part) of the fw_cfg blob. >>>> >>>> To me this suggests that QEMU generates an invalid >>>> COMMAND_ADD_CHECKSUM >>>> command for the firmware. >>>> >>>> ... I've tried to skim the patches briefly. I think there must be an >>>> error in the DSDT building logic that is only active on reboot if an >>>> nvdimm module was hot-added before the reboot. >>> >>> Thanks for taking a look and the pointers. I will debug this further >>> and get back. >> >> The root cause of the issue seems to be UEFI not seeing the updated acpi >> table blob size on reboot once a new NFIT table is added(nvdimm hot added). >> >> Please see the debug logs below, >> >> Initial Guest boot >> --------------------------- >> >> Debug logs from Qemu: >> >> build_header: acpi sig DSDT len 0x5127 >> build_header: acpi sig FACP len 0x10c >> build_header: acpi sig APIC len 0xa8 >> build_header: acpi sig GTDT len 0x60 >> build_header: acpi sig MCFG len 0x3c >> build_header: acpi sig SPCR len 0x50 >> build_header: acpi sig SRAT len 0x92 >> build_header: acpi sig SSDT len 0x38f >> build_header: acpi sig XSDT len 0x5c >> virt_acpi_build: acpi table_blob len 0x5844 >> >> Debug logs from UEFI: >> >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x9 Start=0x0 >> Length=0x5127 Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x5130 >> Start=0x5127 Length=0x10C Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x523C >> Start=0x5233 Length=0xA8 Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x52E4 >> Start=0x52DB Length=0x60 Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x5344 >> Start=0x533B Length=0x3C Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x5380 >> Start=0x5377 Length=0x50 Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x53D0 >> Start=0x53C7 Length=0x92 Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x5462 >> Start=0x5459 Length=0x38F Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x57F1 >> Start=0x57E8 Length=0x5C Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/rsdp" ResultOffset=0x8 Start=0x0 >> Length=0x14 Blob->Size=0x24 >> ProcessCmdAddChecksum: File="etc/acpi/rsdp" ResultOffset=0x20 Start=0x0 >> Length=0x24 Blob->Size=0x24 >> InstallQemuFwCfgTables: installed 8 tables >> >> Guest Reboot after ndimm hot added >> ------------------------------------ >> >> Debug logs from Qemu: >> >> build_header: acpi sig DSDT len 0x5127 >> build_header: acpi sig FACP len 0x10c >> build_header: acpi sig APIC len 0xa8 >> build_header: acpi sig GTDT len 0x60 >> build_header: acpi sig MCFG len 0x3c >> build_header: acpi sig SPCR len 0x50 >> build_header: acpi sig SRAT len 0x92 >> build_header: acpi sig SSDT len 0x38f >> build_header: acpi sig NFIT len 0xe0 -->New >> build_header: acpi sig XSDT len 0x64 >> virt_acpi_build: acpi table_blob len 0x592c -->blob len updated >> >> Debug logs from UEFI: >> >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x9 Start=0x0 >> Length=0x5127 Blob->Size=0x5844 -->Wrong blob size. >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x5130 >> Start=0x5127 Length=0x10C Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x523C >> Start=0x5233 Length=0xA8 Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x52E4 >> Start=0x52DB Length=0x60 Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x5344 >> Start=0x533B Length=0x3C Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x5380 >> Start=0x5377 Length=0x50 Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x53D0 >> Start=0x53C7 Length=0x92 Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x5462 >> Start=0x5459 Length=0x38F Blob->Size=0x5844 >> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x57F1 >> Start=0x57E8 Length=0xE0 Blob->Size=0x5844 >> ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables" >> OnRootBridgesConnected: InstallAcpiTables: Protocol Error >> >> >> To me it seems on ARM vit acpi path, the blob len is calculated based >> on actual tables and is updated only in virt_acpi_setup() --> >> acpi_add_rom_blob() >> path. I had a look at the x86 code and it looks like, there, the blob len >> gets updated >> with an additional buffer to take care of table resizing[1]. >> >> As a hack i added the same to ARM virt and it seems to resolve the issue. >> I am not sure this is the best approach to fix this though. >> >> Please let me know your thoughts. >> >> Thanks, >> Shameer >> >> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c >> index 132414c..4291553 100644 >> --- a/hw/arm/virt-acpi-build.c >> +++ b/hw/arm/virt-acpi-build.c >> @@ -50,6 +50,8 @@ >> #define ARM_SPI_BASE 32 >> #define ACPI_POWER_BUTTON_DEVICE "PWRB" >> >> +#define ACPI_BUILD_TABLE_SIZE 0x20000 >> + >> static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus) >> { >> uint16_t i; >> @@ -886,6 +888,10 @@ void virt_acpi_build(VirtMachineState *vms, >> AcpiBuildTables *tables) >> build_rsdp(tables->rsdp, tables->linker, &rsdp_data); >> } >> >> + /* Make sure we have a buffer in case we need to resize the tables. */ >> + g_array_set_size(tables_blob, ROUND_UP(acpi_data_len(tables_blob), >> + ACPI_BUILD_TABLE_SIZE)); >> + >> /* Cleanup memory that's no longer used. */ >> g_array_free(table_offsets, true); >> } >> >> [1] https://github.com/qemu/qemu/blob/master/hw/i386/acpi-build.c#L2792 > > Nice analysis, thanks. > > I think the line that you reference, i.e. > > acpi_align_size(tables_blob, ACPI_BUILD_TABLE_SIZE); > > in acpi_build() [hw/i386/acpi-build.c] masks this issue for x86 only as > a side effect. To my understanding, the alignment / padding exists there > for migration compatibility. It doesn't exist for updating the size of > the ACPI blobs in fw_cfg across reboots. The issue is masked because the > alignment is large enough (un-changed) to contain the regenerated blobs > as well.> > Given that the "virt" machine type is versioned, I think migration > compat is a valid concern there too. This in itself would justify a > similar padding. I don't understand the migration compat issue. Please could you elaborate? > > I don't know if we want to specifically care about size-changing > ACPI-regen across reboot. I believe measures for that specific use case > don't exist in x86 machine types either. The NFIT redimensioning should exit on x86 too? > > Another trick that is occasionally used (but might not apply here, I'm > uncertain) is to always generate the relevant ACPI objects, but, in case > they are not justified for the virtual hardware config, invalidate them > by overwriting particular parts of them (for example, one or two bytes > of their names). Hopefully this shouldn't introduce ACPI or AML errors, > just make the ACPI interpreter ignore the affected objects.
Thanks! Eric > > Thanks, > Laszlo >