On 04/09/15 15:59, Peter Maydell wrote: > On 9 April 2015 at 14:51, Igor Mammedov <imamm...@redhat.com> wrote: >> On Thu, 9 Apr 2015 14:27:58 +0100 >> Peter Maydell <peter.mayd...@linaro.org> wrote: >> >>> On 9 April 2015 at 14:17, Igor Mammedov <imamm...@redhat.com> wrote: >>>> On Thu, 09 Apr 2015 13:50:52 +0100 >>>> Alex Bennée <alex.ben...@linaro.org> wrote: >>>> >>>>> >>>>> Shannon Zhao <zhaoshengl...@huawei.com> writes: >>>>>> + for (i = 0; i < table_offsets->len; ++i) { >>>>>> + /* rsdt->table_offset_entry to be filled by Guest linker */ >>>>>> + bios_linker_loader_add_pointer(linker, >>>>>> + ACPI_BUILD_TABLE_FILE, >>>>>> + ACPI_BUILD_TABLE_FILE, >>>>>> + table_data, >>>>>> &rsdt->table_offset_entry[i], >>>>>> + sizeof(uint32_t)); >>>>> >>>>> Why are these pointers always 32 bit? Can they ever be 64 bit? >>>> Laszlo, can you confirm that UEFI puts APCI tables below 4G address >>>> space?
I confirmed that before, in the v2 discussion: http://thread.gmane.org/gmane.comp.emulators.qemu/316670/focus=317560 But in fact the RSDT / XSDT that QEMU exports for UEFI doesn't matter. See below. >>> In the general case you can't guarantee that there will >>> be any RAM at all below the 4G point. (The virt board >>> isn't like that, obviously, but I believe there's real >>> hardware out there that's designed that way.) I don't >>> think we should have any 32 bit assumptions in the >>> code at all -- pointer values should always be 64 bits >>> everywhere. >> >> then that forces us to use xsdt instead of 32-bit rsdt > > Does that matter much? I can mention two points here. (1) See this kernel patch: http://thread.gmane.org/gmane.linux.acpi.devel/74369/focus=1915858 > +The ACPI core will ignore any provided RSDT (Root System Description Table). > +RSDTs have been deprecated and are ignored on arm64 since they only allow > +for 32-bit addresses. So you could argue that providing an XSDT instead of an RSDT is justified by this alone. (2) the ACPI linker/loader client in edk2 (used for both OVMF and AAVMF) *does* restrict initial allocations to under 4GB. This is a super hairy subject, but I'll try to summarize it quickly. The ACPI linker/loader interface was originally designed for SeaBIOS. The central structure of the interface is a command table, which contains three kinds of commands: (a) "allocate blob" (which includes "download blob" as well), (b) "relocate pointer" (also known as "update pointer"), and (c) "checksum blob segment". Importantly, the "allocate" command comes with allocation hints; it can instruct the firmware to allocate the blob in some specific zone. So, the general process (as designed) is something like this: - The firmware allocates and downloads the blobs -- the allocate commands always come first in the table. Each blob can contain several ACPI tables, in any kind of arrangement. - Then the "relocate pointer" commands are processed (simply because they come later in the command table, that's guaranteed), which update pointers in some tables (residing in some blobs) to some other tables (residing in some other, or the same, blob(s)). In more detail, the pointers in the tables in the blobs are pre-initialized with relative offsets (into other blobs), and the relocation means that these relative offsets are made absolute -- they are incremented with the actual allocation base addresses that are the results of the allocation command processing (see previous step). - Finally (in fact, intermixed with "relocate pointer" commands), checksums are updated. The idea is that after the initial allocations, everything is processed *in place*. (This is what SeaBIOS does.) Because pointer fields, updated by the "relocate pointer" commands (which basically mean increments by actual blob base addresses) can come in various sizes (1, 2, 4, 8 bytes), the allocation commands must take care to instruct the firmware to allocate the "target" blobs "low enough" so that the referring pointers can accommodate these actual base addresses. All fine; again, SeaBIOS does exactly this; the important thing to note is that everything is processed, and then left for the runtime OS, *in place*. And then UEFI / edk2 came along. :) The problem with UEFI is that you are not supposed to just throw a bunch of binary stuff into RAM. Instead, the RSD PTR table needs to be linked into the UEFI system config table, plus each table needs to be installed *individually*, by passing it to EFI_ACPI_TABLE_PROTOCOL.InstallTable(). The first requirement is actually a relaxation -- the RSD PTR can be anywhere in memory, it doesn't need to be low. However, the second requirement is a huge pain, because it doesn't match the design of the ACPI linker/loader interface. EFI_ACPI_TABLE_PROTOCOL is "smart" about the specification, and knows what to allocate where -- it copies tables, links the copies together, checksums them, handles the RSD PTR internally, and so on; but it does need to *receive* tables individually. So, what the OVMF (and AAVMF) code does is: we first implement the above algorithm, but only as a first pass. (*) Once we're done with that, we process the "relocate pointer" commands in a second pass, the idea being that wherever these pointers point *to*, after the first pass, those things must be actual ACPI tables, *or* operation regions. Therefore, we check the targets of these pointers, and if the target looks like an ACPI table (has the appropriate header, checksum is okay etc -- it's heuristical, yes), then we pass it to EFI_ACPI_TABLE_PROTOCOL.InstallTable(). Otherwise, the target area is considered an operation region referenced by some other ACPI table, and then the hosting *blob* is marked as "this blob hosts something else than *just* ACPI tables". Finally, after the second pass, we check our blobs -- each blob that is left marked as "hosts *only* ACPI tables" is released (because those tables have been installed already, individually), and the rest of the blobs are preserved in-place. These could be considered "leaked" to some extent, because any ACPI tables in those blobs *have* been installed (deeply copied and linked). Nevertheless, we can't do any better than keep the full blob even if it hosts only one non-ACPI-table "thing" (ie. an operation region). (*) I'll note here that the first pass in edk2 is extremely careful; it checks *everything* about pointer arithmetic. And, indeed, that has caught errors in Shannon's v1 submission. Okay, with the above wall of text out of the way, why is it relevant? It is relevant for several points: - whether you provide an RSDT or an XSDT, it won't matter, because OVMF & AAVMF skip both of those tables, intentionally. They are handled automatically by EFI_ACPI_TABLE_PROTOCOL. - The allocation (and alignment) hints in the "allocate & download blob" command do not map to UEFI cleanly: - BIOS_LINKER_LOADER_ALLOC_ZONE_FSEG is both nonsensical and unnecessary for UEFI: Allocating in the F segment would be important for RSD PTR in the BIOS case only, but under UEFI, RSD PTR is located differently (see above). So this hint is simply ignored. - BIOS_LINKER_LOADER_ALLOC_ZONE_HIGH just means "allocate up to 4GB" (for BIOS). - the alignment hint can request any power of two. Now, the UEFI memory allocation services can only accommodate a subset of all the possible *combinations* of the last two points. Staying under 4GB is important in the first pass, because SeaBIOS simply wouldn't allocate blobs higher than that (even with BIOS_LINKER_LOADER_ALLOC_ZONE_HIGH), and we need to be prepared for permanently preserving a blob that hosts an operation region. If I allowed edk2 to allocate a blob anywhere at all in the first pass, given BIOS_LINKER_LOADER_ALLOC_ZONE_HIGH, then I could run into a situation where I can't relocate a 4-byte wide pointer *into* that blob. And, I couldn't even do the relocation temporarily in a UINT64 (just stashing it for pass 2), because the target area might be an operation region (not an ACPI table), which would have to remain in place after pass 2, *and* be referenced by the relocated, 4-byte wide pointer. So, allocating under 4GB is a requirement, and the memory allocation services in UEFI do allow me to request such a top address. But that service (gBS->AllocatePages(), to be exact) *also* implies that the allocations will be aligned to 4KB exactly -- no lower and no higher granularities are guaranteed. Therefore in edk2 we check the alignment hint as well, and as long as it does not *exceed* 4KB, we silently succeed. If the alignment hint wants something bigger than 4KB, we abort the first pass. (This has never happened yet.) Summary: - it doesn't matter if you give UEFI an RSDT or an XSDT, it'll ignore either anyway (beyond processing the opaque "relocate pointer" commands that happen to reference these tables, of course) -- EFI_ACPI_TABLE_PROTOCOL will create and populate these tables automatically. - the described edk2 code, as-is, will fail on platforms where there is no system RAM under 4GB -- gracefully, but certainly - The 4GB limit will permanently affect operation regions *only*. For ACPI tables hosted in the blobs, the 4GB limit is just a temporary limit, until the second pass completes; EFI_ACPI_TABLE_PROTOCOL will put their copies wherever they belong, even above 4GB if possible. Basically, the current ACPI linker/loader interface comes with the silent assumption: any 4-byte pointer can be successfully relocated to a BIOS_LINKER_LOADER_ALLOC_ZONE_HIGH blob This assumption is automatically satisfied by SeaBIOS, but edk2 needs to ensure it specifically, and it does. Therefore it doesn't matter if you push down an RSDT (with 4-byte pointer entries) or an XSDT (with 8-byte pointer entries); the "relocate pointer" commands that happen to modify either will *always* succeed in the first pass (as long as the relative offsets are valid) -- the target base addresses will never exceed 4GB. If, however, even the above "mostly temporary" 4GB limit should be lifted, then two things are necessary: - an XSDT must be pushed down (so that the first pass relocations succeed, regardless of actual blob base addresses), - a new allocation hint is necessary (in the "allocate blob" command), so that edk2 knows it is safe to allocate the blob anywhere at all (ie. only 8-byte pointers will be referencing it). ... I hope you guys enjoyed this. :) Laszlo