On 02/15/17 07:15, b...@skyportsystems.com wrote: > From: Ben Warren <b...@skyportsystems.com> > > This patch set adds support for passing a GUID to Windows guests. It > is a re-implementation of previous patch sets written by Igor Mammedov > et al, but this time passing the GUID data as a fw_cfg blob. > > This patch set has dependencies on new guest functionality, in > particular the support for a new linker-loader command and the ability > to write back data to QEMU over a DMA link. Work is in flight in both > SeaBIOS and OVMF to support this. > > v5->v6: > - Rebased to top of tree. > - Changed device from sysbus to a simple device. This removed the need > for > adding dynamic sysbus support to pc_piix boards. > - Removed patch that introduced QWORD patching of AML. > - Removed ability to set GUID via QMP/HMP. > - Improved comments/documentation in code.
So here's my testing with a RHEL-7 guest: (1) The command line option passed to QEMU is -device vmgenid,guid=00112233-4455-6677-8899-AABBCCDDEEFF This is the example GUID provided in the SMBIOS spec v3.0.0 (DSP0134), section 7.2.1 "System -- UUID". (SMBIOS is only relevant here because it codifies the fact that Microsoft consumes UUID in little-endian order.) The expected representation, according to the SMBIOS spec, is 33 22 11 00 55 44 77 66 88 99 AA BB CC DD EE FF (2) Here's an excerpt from the OVMF log: > ProcessCmdAllocate: File="etc/vmgenid_guid" Alignment=0x1000 Zone=1 > Size=0x1000 Address=0x7FE5C000 This is where "etc/vmgenid_guid" is allocated and downloaded, the allocation address is 0x7FE5C000. > Select Item: 0x19 > Select Item: 0x22 > ProcessCmdAllocate: File="etc/acpi/tables" Alignment=0x40 Zone=1 Size=0x20000 > Address=0x7E7AB000 > ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x49 Start=0x40 > Length=0x1403 > ProcessCmdAddPointer: PointerFile="etc/acpi/tables" > PointeeFile="etc/acpi/tables" PointerOffset=0x1467 PointerSize=4 > ProcessCmdAddPointer: PointerFile="etc/acpi/tables" > PointeeFile="etc/acpi/tables" PointerOffset=0x146B PointerSize=4 > ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x144C > Start=0x1443 Length=0x74 > ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x14C0 > Start=0x14B7 Length=0x80 > Select Item: 0x19 > SaveCondensedWritePointerToS3Context: 0x002B/[0x00000000+8] := 0x7FE5C000 (0) This is where OVMF stashes the WRITE_POINTER command in "condensed" form, for S3. The fw_cfg selector value is 0x2B (for the fw_cfg file to be rewritten), the pointer is located at offset 0, has size 0, and the value to assign is 0x7FE5C000. And, this is #0 of the saved / condensed WRITE_POINTER commands. > Select Item: 0x2B > ProcessCmdWritePointer: PointerFile="etc/vmgenid_addr" > PointeeFile="etc/vmgenid_guid" PointerOffset=0x0 PointerSize=8 This is where the WRITE_POINTER command is actually executed, during normal boot. > ProcessCmdAddPointer: PointerFile="etc/acpi/tables" > PointeeFile="etc/vmgenid_guid" PointerOffset=0x1561 PointerSize=4 This is where we link "etc/vmgenid_guid" into VGIA. > ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x1540 > Start=0x1537 Length=0xCA > ProcessCmdAddPointer: PointerFile="etc/acpi/tables" > PointeeFile="etc/acpi/tables" PointerOffset=0x1625 PointerSize=4 > ProcessCmdAddPointer: PointerFile="etc/acpi/tables" > PointeeFile="etc/acpi/tables" PointerOffset=0x1629 PointerSize=4 > ProcessCmdAddPointer: PointerFile="etc/acpi/tables" > PointeeFile="etc/acpi/tables" PointerOffset=0x162D PointerSize=4 > ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x160A > Start=0x1601 Length=0x30 > ProcessCmdAddPointer: PointerFile="etc/acpi/rsdp" > PointeeFile="etc/acpi/tables" PointerOffset=0x10 PointerSize=4 > ProcessCmdAddChecksum: File="etc/acpi/rsdp" ResultOffset=0x8 Start=0x0 > Length=0x24 > InstallQemuFwCfgTables: unknown loader command: 0x0 > InstallQemuFwCfgTables: unknown loader command: 0x0 > InstallQemuFwCfgTables: unknown loader command: 0x0 > InstallQemuFwCfgTables: unknown loader command: 0x0 > InstallQemuFwCfgTables: unknown loader command: 0x0 > InstallQemuFwCfgTables: unknown loader command: 0x0 > InstallQemuFwCfgTables: unknown loader command: 0x0 > InstallQemuFwCfgTables: unknown loader command: 0x0 > InstallQemuFwCfgTables: unknown loader command: 0x0 > InstallQemuFwCfgTables: unknown loader command: 0x0 > InstallQemuFwCfgTables: unknown loader command: 0x0 > InstallQemuFwCfgTables: unknown loader command: 0x0 > InstallQemuFwCfgTables: unknown loader command: 0x0 > InstallQemuFwCfgTables: unknown loader command: 0x0 > InstallQemuFwCfgTables: unknown loader command: 0x0 > Process2ndPassCmdAddPointer: checking for ACPI header in "etc/acpi/tables" at > 0x7E7AB000 (remaining: 0x20000): found "FACS" size 0x40 > Process2ndPassCmdAddPointer: checking for ACPI header in "etc/acpi/tables" at > 0x7E7AB040 (remaining: 0x1FFC0): found "DSDT" size 0x1403 > Process2ndPassCmdAddPointer: checking for ACPI header in "etc/vmgenid_guid" > at 0x7FE5C000 (remaining: 0x1000): not found; marking fw_cfg blob as opaque This is where the OVMF SDT Header Probe Suppressor does its job. (NB, the "opaque marking" has happened already in ProcessCmdWritePointer() too, above.) > Process2ndPassCmdAddPointer: checking for ACPI header in "etc/acpi/tables" at > 0x7E7AC443 (remaining: 0x1EBBD): found "FACP" size 0x74 > Process2ndPassCmdAddPointer: checking for ACPI header in "etc/acpi/tables" at > 0x7E7AC4B7 (remaining: 0x1EB49): found "APIC" size 0x80 > Process2ndPassCmdAddPointer: checking for ACPI header in "etc/acpi/tables" at > 0x7E7AC537 (remaining: 0x1EAC9): found "SSDT" size 0xCA > Process2ndPassCmdAddPointer: checking for ACPI header in "etc/acpi/tables" at > 0x7E7AC601 (remaining: 0x1E9FF): found "RSDT" size 0x30 > TransferS3ContextToBootScript: boot script fragment saved, > ScratchBuffer=7FE4F018 This is where the WRITE_POINTER commands, stashed earlier in condensed form, are translated to S3 Boot Script opcodes. > InstallQemuFwCfgTables: installed 5 tables Such as: FACS, DSDT, FACP, APIC, SSDT. OVMF recognizes RSDT and ignores it (it's handled by edk2 automatically). > InstallQemuFwCfgTables: freeing "etc/acpi/rsdp" > InstallQemuFwCfgTables: freeing "etc/acpi/tables" OVMF sees that the above two blobs have not been marked as "opaque" -- they only contained ACPI tables, judged from the ADD_POINTER commands that pointed into them. So these two blobs are freed. Note that "etc/vmgenid_guid" is not freed. So, from the firmware log, everything looks OK. (3) I dumped the SSDT in the RHEL-7 guest: > /* > * Intel ACPI Component Architecture > * AML/ASL+ Disassembler version 20160527-64 > * Copyright (c) 2000 - 2016 Intel Corporation > * > * Disassembling to symbolic ASL+ operators > * > * Disassembly of ssdt.dat, Wed Feb 15 19:21:11 2017 > * > * Original Table Header: > * Signature "SSDT" > * Length 0x000000CA (202) > * Revision 0x01 > * Checksum 0x1D > * OEM ID "BOCHS " > * OEM Table ID "VMGENID" > * OEM Revision 0x00000001 (1) > * Compiler ID "BXPC" > * Compiler Version 0x00000001 (1) > */ > DefinitionBlock ("", "SSDT", 1, "BOCHS ", "VMGENID", 0x00000001) > { > Name (VGIA, 0x7FE5C000) Note that the value matches the value logged by the firmware in (2). > Scope (\_SB) > { > Device (VGEN) > { > Name (_HID, "QEMUVGID") // _HID: Hardware ID > Name (_CID, "VM_Gen_Counter") // _CID: Compatible ID > Name (_DDN, "VM_Gen_Counter") // _DDN: DOS Device Name > Method (_STA, 0, NotSerialized) // _STA: Status > { > Local0 = 0x0F > If (VGIA == Zero) > { > Local0 = Zero > } > > Return (Local0) > } > > Method (ADDR, 0, NotSerialized) > { > Local0 = Package (0x02) {} > Local0 [Zero] = (VGIA + 0x28) > Local0 [One] = Zero > Return (Local0) > } > } > } > > Method (\_GPE._E05, 0, NotSerialized) // _Exx: Edge-Triggered GPE > { > Notify (\_SB.VGEN, 0x80) // Status Change > } > } Looks good and matches the documentation. (4) To be sure, I checked the address against the guest dmesg, which contains a dump of the UEFI memory map: > [ 0.000000] efi: mem52: type=10, attr=0xf, > range=[0x000000007fe5a000-0x000000007fe5e000) (0MB) The page (4096 bytes) at 0x7FE5C000 falls into this range. Type=10 means EfiACPIMemoryNVS. (5) At this point I dumped the guest RAM with the dump-guest-memory monitor command, opened it with "crash", and listed it: > crash> rd -p -8 0x7FE5C000 0x40 > 7fe5c000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > ................ > 7fe5c010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > ................ > 7fe5c020: 00 00 00 00 00 00 00 00 33 22 11 00 55 44 77 66 > ........3"..UDwf > 7fe5c030: 88 99 aa bb cc dd ee ff 00 00 00 00 00 00 00 00 > ................ We can see that the GUID starts at 0x7FE5C000 + 0x28, and also that the byte-level representation matches the little endian one given in (1). This proves that the initial blob download worked fine. (6) Here I attached "gdb" to QEMU, set a breakpoint on vmgenid_handle_reset(), allowed the inferior process to continue execution. Then I suspended and resumed the guest (ACPI S3). The breakpoint was hit during resume: > Breakpoint 1, vmgenid_handle_reset (opaque=0x7f2bd03c36e0) at > .../hw/acpi/vmgenid.c:205 > 205 VmGenIdState *vms = VMGENID(opaque); First of all, before allowing QEMU to zero out the address blob, I listed the address and the contents of the address blob (here exploiting that my host is also little endian): > (gdb) print (void*)vms->vmgenid_addr_le > $2 = (void *) 0x7f2bd03c37b0 > (gdb) print /x *(uint64_t*)vms->vmgenid_addr_le > $4 = 0x7fe5c000 This proves that QEMU has the right address, matching the firmware log from (2), and the ACPI dump from (3). (7) At this point I allowed the inferior to proceed a bit: > (gdb) n > 207 memset(vms->vmgenid_addr_le, 0, ARRAY_SIZE(vms->vmgenid_addr_le)); > (gdb) n > 208 } I verified that the blob was zeroed: > (gdb) print /x *(uint64_t*)vms->vmgenid_addr_le > $5 = 0x0 then allowed the inferior to run free. > (gdb) cont > Continuing. (8) New messages appeared in the firmware log: > S3ResumeExecuteBootScript() > PeiS3ResumeState - 7FF92B18 > transfer control to Standalone Boot Script Executor > S3BootScriptExecute: > TableHeader - 0x7E7A7000 > TableHeader.Version - 0x0001 > TableHeader.TableLength - 0x000000ED > ExecuteBootScript - 7E7A700D > EFI_BOOT_SCRIPT_MEM_WRITE_OPCODE > BootScriptExecuteMemoryWrite - 0x7FE4F018, 0x00000010, 0x00000000 Here the ACPI S3 Boot Script, prepared in TransferS3ContextToBootScript() -- see (2) -- creates a DMA access command for fw_cfg. The DMA access command is written to pre-reserved memory (see "ScratchBuffer" above). > S3BootScriptWidthUint8 - 0x7FE4F018 (0x00) > S3BootScriptWidthUint8 - 0x7FE4F019 (0x2B) The fw_cfg selector is 0x2B. (See under (2).) > S3BootScriptWidthUint8 - 0x7FE4F01A (0x00) > S3BootScriptWidthUint8 - 0x7FE4F01B (0x0C) This is a combined select+skip operation. > S3BootScriptWidthUint8 - 0x7FE4F01C (0x00) > S3BootScriptWidthUint8 - 0x7FE4F01D (0x00) > S3BootScriptWidthUint8 - 0x7FE4F01E (0x00) > S3BootScriptWidthUint8 - 0x7FE4F01F (0x00) The skip size is 0 bytes. > S3BootScriptWidthUint8 - 0x7FE4F020 (0x00) > S3BootScriptWidthUint8 - 0x7FE4F021 (0x00) > S3BootScriptWidthUint8 - 0x7FE4F022 (0x00) > S3BootScriptWidthUint8 - 0x7FE4F023 (0x00) > S3BootScriptWidthUint8 - 0x7FE4F024 (0x00) > S3BootScriptWidthUint8 - 0x7FE4F025 (0x00) > S3BootScriptWidthUint8 - 0x7FE4F026 (0x00) > S3BootScriptWidthUint8 - 0x7FE4F027 (0x00) The address is irrelevant for skip, so it's just nuleld. > ExecuteBootScript - 7E7A7030 > EFI_BOOT_SCRIPT_IO_WRITE_OPCODE > BootScriptExecuteIoWrite - 0x00000514, 0x00000002, 0x00000002 > S3BootScriptWidthUint32 - 0x00000514 (0x00000000) > S3BootScriptWidthUint32 - 0x00000518 (0x18F0E47F) The Boot Script passes the DMA command to QEMU, by writing the address of the command buffer to IO ports 0x514 and 0x518, in BE byte order. > ExecuteBootScript - 7E7A704B > EFI_BOOT_SCRIPT_MEM_POLL_OPCODE > BootScriptExecuteMemPoll - 0x7FE4F018, 0x00000000FFFFFFFF, 0x0000000000000000 > S3BootScriptWidthUint32 - 0x7FE4F018 > ExecuteBootScript - 7E7A7072 This waits until the DMA command succeeds (reading back the Control field continuously until it reads as zero). > EFI_BOOT_SCRIPT_MEM_WRITE_OPCODE > BootScriptExecuteMemoryWrite - 0x7FE4F018, 0x00000018, 0x00000000 This is another DMA access command for fw_cfg, prepared in the same pre-reserved buffer. This time > S3BootScriptWidthUint8 - 0x7FE4F018 (0x00) > S3BootScriptWidthUint8 - 0x7FE4F019 (0x00) > S3BootScriptWidthUint8 - 0x7FE4F01A (0x00) > S3BootScriptWidthUint8 - 0x7FE4F01B (0x10) we request a write operation, > S3BootScriptWidthUint8 - 0x7FE4F01C (0x00) > S3BootScriptWidthUint8 - 0x7FE4F01D (0x00) > S3BootScriptWidthUint8 - 0x7FE4F01E (0x00) > S3BootScriptWidthUint8 - 0x7FE4F01F (0x08) with a length of 8 bytes (big endian), matching the pointer size, > S3BootScriptWidthUint8 - 0x7FE4F020 (0x00) > S3BootScriptWidthUint8 - 0x7FE4F021 (0x00) > S3BootScriptWidthUint8 - 0x7FE4F022 (0x00) > S3BootScriptWidthUint8 - 0x7FE4F023 (0x00) > S3BootScriptWidthUint8 - 0x7FE4F024 (0x7F) > S3BootScriptWidthUint8 - 0x7FE4F025 (0xE4) > S3BootScriptWidthUint8 - 0x7FE4F026 (0xF0) > S3BootScriptWidthUint8 - 0x7FE4F027 (0x28) the data to transfer is located at 0x7FE4F028 (just below, tacked to the command buffer itself), > S3BootScriptWidthUint8 - 0x7FE4F028 (0x00) > S3BootScriptWidthUint8 - 0x7FE4F029 (0xC0) > S3BootScriptWidthUint8 - 0x7FE4F02A (0xE5) > S3BootScriptWidthUint8 - 0x7FE4F02B (0x7F) > S3BootScriptWidthUint8 - 0x7FE4F02C (0x00) > S3BootScriptWidthUint8 - 0x7FE4F02D (0x00) > S3BootScriptWidthUint8 - 0x7FE4F02E (0x00) > S3BootScriptWidthUint8 - 0x7FE4F02F (0x00) and the data to write is the original allocation address of the blob (0x7fe5c000). > ExecuteBootScript - 7E7A709D > EFI_BOOT_SCRIPT_IO_WRITE_OPCODE > BootScriptExecuteIoWrite - 0x00000514, 0x00000002, 0x00000002 > S3BootScriptWidthUint32 - 0x00000514 (0x00000000) > S3BootScriptWidthUint32 - 0x00000518 (0x18F0E47F) > ExecuteBootScript - 7E7A70B8 > EFI_BOOT_SCRIPT_MEM_POLL_OPCODE > BootScriptExecuteMemPoll - 0x7FE4F018, 0x00000000FFFFFFFF, 0x0000000000000000 > S3BootScriptWidthUint32 - 0x7FE4F018 > ExecuteBootScript - 7E7A70DF Same story as above: fire off the transfer and wait until it completes. > EFI_BOOT_SCRIPT_INFORMATION_OPCODE > BootScriptExecuteInformation - 0x7E7A70E6 > BootScriptInformation: DE AD BE EF > ExecuteBootScript - 7E7A70EA > S3_BOOT_SCRIPT_LIB_TERMINATE_OPCODE > S3BootScriptDone - Success > [...] The DEADBEEF informational (no-op) opcode is something that OVMF appends to the very end for hysterical raisins. (9) Okay, so the guest is now resumed and running, let's interrupt it in gdb again, and check the contents of address blob again (we know the address of the address blob from step (6)): > ^C > Program received signal SIGINT, Interrupt. > 0x00007f2bbf1d1ebf in ppoll () from /lib64/libc.so.6 > (gdb) print /x *(uint64_t*)0x7f2bd03c37b0 > $6 = 0x7fe5c000 Et voila. (10) I detached gdb from QEMU, and issued the following monitor command: > $ virsh qemu-monitor-command ovmf.rhel7 --hmp 'info vm-generation-id' > 00112233-4455-6677-8899-aabbccddeeff (11) I also booted a Windows Server 2012 R2 guest (Q35, broadcast SMI enabled) with a similar vmgenid device/parameter. According to Device Manager | System devices, "Microsoft Hyper-V Generation Counter" is working properly. I also tested S3 briefly, it worked okay. (I mentioned the SMI broadcast above because for that, OVMF generates an independent S3 Boot Script fragment.) I'll let someone else test live migration. For patches #1, #3, #4 and #5: Tested-by: Laszlo Ersek <ler...@redhat.com> I'll soon post the OVMF patches. Thanks! Laszlo