> On Feb 15, 2017, at 12:09 PM, Michael S. Tsirkin <m...@redhat.com> wrote: > > On Wed, Feb 15, 2017 at 08:47:48PM +0100, Laszlo Ersek wrote: >> On 02/15/17 07:15, b...@skyportsystems.com wrote: >>> From: Ben Warren <b...@skyportsystems.com> >>> >>> This patch set adds support for passing a GUID to Windows guests. It >>> is a re-implementation of previous patch sets written by Igor Mammedov >>> et al, but this time passing the GUID data as a fw_cfg blob. >>> >>> This patch set has dependencies on new guest functionality, in >>> particular the support for a new linker-loader command and the ability >>> to write back data to QEMU over a DMA link. Work is in flight in both >>> SeaBIOS and OVMF to support this. >>> >>> v5->v6: >>> - Rebased to top of tree. >>> - Changed device from sysbus to a simple device. This removed the need >>> for >>> adding dynamic sysbus support to pc_piix boards. >>> - Removed patch that introduced QWORD patching of AML. >>> - Removed ability to set GUID via QMP/HMP. >>> - Improved comments/documentation in code. >> >> So here's my testing with a RHEL-7 guest: >> >> (1) The command line option passed to QEMU is >> >> -device vmgenid,guid=00112233-4455-6677-8899-AABBCCDDEEFF >> >> This is the example GUID provided in the SMBIOS spec v3.0.0 (DSP0134), >> section 7.2.1 "System -- UUID". (SMBIOS is only relevant here because it >> codifies the fact that Microsoft consumes UUID in little-endian order.) >> The expected representation, according to the SMBIOS spec, is >> >> 33 22 11 00 55 44 77 66 88 99 AA BB CC DD EE FF >> >> (2) Here's an excerpt from the OVMF log: >> >>> ProcessCmdAllocate: File="etc/vmgenid_guid" Alignment=0x1000 Zone=1 >>> Size=0x1000 Address=0x7FE5C000 >> >> This is where "etc/vmgenid_guid" is allocated and downloaded, the >> allocation address is 0x7FE5C000. >> >>> Select Item: 0x19 >>> Select Item: 0x22 >>> ProcessCmdAllocate: File="etc/acpi/tables" Alignment=0x40 Zone=1 >>> Size=0x20000 Address=0x7E7AB000 >>> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x49 Start=0x40 >>> Length=0x1403 >>> ProcessCmdAddPointer: PointerFile="etc/acpi/tables" >>> PointeeFile="etc/acpi/tables" PointerOffset=0x1467 PointerSize=4 >>> ProcessCmdAddPointer: PointerFile="etc/acpi/tables" >>> PointeeFile="etc/acpi/tables" PointerOffset=0x146B PointerSize=4 >>> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x144C >>> Start=0x1443 Length=0x74 >>> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x14C0 >>> Start=0x14B7 Length=0x80 >>> Select Item: 0x19 >>> SaveCondensedWritePointerToS3Context: 0x002B/[0x00000000+8] := 0x7FE5C000 >>> (0) >> >> This is where OVMF stashes the WRITE_POINTER command in "condensed" >> form, for S3. The fw_cfg selector value is 0x2B (for the fw_cfg file to >> be rewritten), the pointer is located at offset 0, has size 0, and the >> value to assign is 0x7FE5C000. And, this is #0 of the saved / condensed >> WRITE_POINTER commands. >> >>> Select Item: 0x2B >>> ProcessCmdWritePointer: PointerFile="etc/vmgenid_addr" >>> PointeeFile="etc/vmgenid_guid" PointerOffset=0x0 PointerSize=8 >> >> This is where the WRITE_POINTER command is actually executed, during >> normal boot. >> >>> ProcessCmdAddPointer: PointerFile="etc/acpi/tables" >>> PointeeFile="etc/vmgenid_guid" PointerOffset=0x1561 PointerSize=4 >> >> This is where we link "etc/vmgenid_guid" into VGIA. >> >>> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x1540 >>> Start=0x1537 Length=0xCA >>> ProcessCmdAddPointer: PointerFile="etc/acpi/tables" >>> PointeeFile="etc/acpi/tables" PointerOffset=0x1625 PointerSize=4 >>> ProcessCmdAddPointer: PointerFile="etc/acpi/tables" >>> PointeeFile="etc/acpi/tables" PointerOffset=0x1629 PointerSize=4 >>> ProcessCmdAddPointer: PointerFile="etc/acpi/tables" >>> PointeeFile="etc/acpi/tables" PointerOffset=0x162D PointerSize=4 >>> ProcessCmdAddChecksum: File="etc/acpi/tables" ResultOffset=0x160A >>> Start=0x1601 Length=0x30 >>> ProcessCmdAddPointer: PointerFile="etc/acpi/rsdp" >>> PointeeFile="etc/acpi/tables" PointerOffset=0x10 PointerSize=4 >>> ProcessCmdAddChecksum: File="etc/acpi/rsdp" ResultOffset=0x8 Start=0x0 >>> Length=0x24 >>> InstallQemuFwCfgTables: unknown loader command: 0x0 >>> InstallQemuFwCfgTables: unknown loader command: 0x0 >>> InstallQemuFwCfgTables: unknown loader command: 0x0 >>> InstallQemuFwCfgTables: unknown loader command: 0x0 >>> InstallQemuFwCfgTables: unknown loader command: 0x0 >>> InstallQemuFwCfgTables: unknown loader command: 0x0 >>> InstallQemuFwCfgTables: unknown loader command: 0x0 >>> InstallQemuFwCfgTables: unknown loader command: 0x0 >>> InstallQemuFwCfgTables: unknown loader command: 0x0 >>> InstallQemuFwCfgTables: unknown loader command: 0x0 >>> InstallQemuFwCfgTables: unknown loader command: 0x0 >>> InstallQemuFwCfgTables: unknown loader command: 0x0 >>> InstallQemuFwCfgTables: unknown loader command: 0x0 >>> InstallQemuFwCfgTables: unknown loader command: 0x0 >>> InstallQemuFwCfgTables: unknown loader command: 0x0 >>> Process2ndPassCmdAddPointer: checking for ACPI header in "etc/acpi/tables" >>> at 0x7E7AB000 (remaining: 0x20000): found "FACS" size 0x40 >>> Process2ndPassCmdAddPointer: checking for ACPI header in "etc/acpi/tables" >>> at 0x7E7AB040 (remaining: 0x1FFC0): found "DSDT" size 0x1403 >>> Process2ndPassCmdAddPointer: checking for ACPI header in "etc/vmgenid_guid" >>> at 0x7FE5C000 (remaining: 0x1000): not found; marking fw_cfg blob as opaque >> >> This is where the OVMF SDT Header Probe Suppressor does its job. (NB, >> the "opaque marking" has happened already in ProcessCmdWritePointer() >> too, above.) >> >>> Process2ndPassCmdAddPointer: checking for ACPI header in "etc/acpi/tables" >>> at 0x7E7AC443 (remaining: 0x1EBBD): found "FACP" size 0x74 >>> Process2ndPassCmdAddPointer: checking for ACPI header in "etc/acpi/tables" >>> at 0x7E7AC4B7 (remaining: 0x1EB49): found "APIC" size 0x80 >>> Process2ndPassCmdAddPointer: checking for ACPI header in "etc/acpi/tables" >>> at 0x7E7AC537 (remaining: 0x1EAC9): found "SSDT" size 0xCA >>> Process2ndPassCmdAddPointer: checking for ACPI header in "etc/acpi/tables" >>> at 0x7E7AC601 (remaining: 0x1E9FF): found "RSDT" size 0x30 >>> TransferS3ContextToBootScript: boot script fragment saved, >>> ScratchBuffer=7FE4F018 >> >> This is where the WRITE_POINTER commands, stashed earlier in condensed >> form, are translated to S3 Boot Script opcodes. >> >>> InstallQemuFwCfgTables: installed 5 tables >> >> Such as: FACS, DSDT, FACP, APIC, SSDT. OVMF recognizes RSDT and ignores >> it (it's handled by edk2 automatically). >> >>> InstallQemuFwCfgTables: freeing "etc/acpi/rsdp" >>> InstallQemuFwCfgTables: freeing "etc/acpi/tables" >> >> OVMF sees that the above two blobs have not been marked as "opaque" -- >> they only contained ACPI tables, judged from the ADD_POINTER commands >> that pointed into them. So these two blobs are freed. >> >> Note that "etc/vmgenid_guid" is not freed. >> >> So, from the firmware log, everything looks OK. >> >> (3) I dumped the SSDT in the RHEL-7 guest: >> >>> /* >>> * Intel ACPI Component Architecture >>> * AML/ASL+ Disassembler version 20160527-64 >>> * Copyright (c) 2000 - 2016 Intel Corporation >>> * >>> * Disassembling to symbolic ASL+ operators >>> * >>> * Disassembly of ssdt.dat, Wed Feb 15 19:21:11 2017 >>> * >>> * Original Table Header: >>> * Signature "SSDT" >>> * Length 0x000000CA (202) >>> * Revision 0x01 >>> * Checksum 0x1D >>> * OEM ID "BOCHS " >>> * OEM Table ID "VMGENID" >>> * OEM Revision 0x00000001 (1) >>> * Compiler ID "BXPC" >>> * Compiler Version 0x00000001 (1) >>> */ >>> DefinitionBlock ("", "SSDT", 1, "BOCHS ", "VMGENID", 0x00000001) >>> { >>> Name (VGIA, 0x7FE5C000) >> >> Note that the value matches the value logged by the firmware in (2). >> >>> Scope (\_SB) >>> { >>> Device (VGEN) >>> { >>> Name (_HID, "QEMUVGID") // _HID: Hardware ID >>> Name (_CID, "VM_Gen_Counter") // _CID: Compatible ID >>> Name (_DDN, "VM_Gen_Counter") // _DDN: DOS Device Name >>> Method (_STA, 0, NotSerialized) // _STA: Status >>> { >>> Local0 = 0x0F >>> If (VGIA == Zero) >>> { >>> Local0 = Zero >>> } >>> >>> Return (Local0) >>> } >>> >>> Method (ADDR, 0, NotSerialized) >>> { >>> Local0 = Package (0x02) {} >>> Local0 [Zero] = (VGIA + 0x28) >>> Local0 [One] = Zero >>> Return (Local0) >>> } >>> } >>> } >>> >>> Method (\_GPE._E05, 0, NotSerialized) // _Exx: Edge-Triggered GPE >>> { >>> Notify (\_SB.VGEN, 0x80) // Status Change >>> } >>> } >> >> Looks good and matches the documentation. >> >> (4) To be sure, I checked the address against the guest dmesg, which >> contains a dump of the UEFI memory map: >> >>> [ 0.000000] efi: mem52: type=10, attr=0xf, >>> range=[0x000000007fe5a000-0x000000007fe5e000) (0MB) >> >> The page (4096 bytes) at 0x7FE5C000 falls into this range. Type=10 means >> EfiACPIMemoryNVS. >> >> (5) At this point I dumped the guest RAM with the dump-guest-memory >> monitor command, opened it with "crash", and listed it: >> >>> crash> rd -p -8 0x7FE5C000 0x40 >>> 7fe5c000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>> ................ >>> 7fe5c010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>> ................ >>> 7fe5c020: 00 00 00 00 00 00 00 00 33 22 11 00 55 44 77 66 >>> ........3"..UDwf >>> 7fe5c030: 88 99 aa bb cc dd ee ff 00 00 00 00 00 00 00 00 >>> ................ >> >> We can see that the GUID starts at 0x7FE5C000 + 0x28, and also that the >> byte-level representation matches the little endian one given in (1). >> >> This proves that the initial blob download worked fine. >> >> (6) Here I attached "gdb" to QEMU, set a breakpoint on >> vmgenid_handle_reset(), allowed the inferior process to continue >> execution. >> >> Then I suspended and resumed the guest (ACPI S3). The breakpoint was hit >> during resume: >> >>> Breakpoint 1, vmgenid_handle_reset (opaque=0x7f2bd03c36e0) at >>> .../hw/acpi/vmgenid.c:205 >>> 205 VmGenIdState *vms = VMGENID(opaque); >> >> First of all, before allowing QEMU to zero out the address blob, I >> listed the address and the contents of the address blob (here exploiting >> that my host is also little endian): >> >>> (gdb) print (void*)vms->vmgenid_addr_le >>> $2 = (void *) 0x7f2bd03c37b0 >> >>> (gdb) print /x *(uint64_t*)vms->vmgenid_addr_le >>> $4 = 0x7fe5c000 >> >> This proves that QEMU has the right address, matching the firmware log >> from (2), and the ACPI dump from (3). >> >> (7) At this point I allowed the inferior to proceed a bit: >> >>> (gdb) n >>> 207 memset(vms->vmgenid_addr_le, 0, >>> ARRAY_SIZE(vms->vmgenid_addr_le)); >>> (gdb) n >>> 208 } >> >> I verified that the blob was zeroed: >> >>> (gdb) print /x *(uint64_t*)vms->vmgenid_addr_le >>> $5 = 0x0 >> >> then allowed the inferior to run free. >> >>> (gdb) cont >>> Continuing. >> >> (8) New messages appeared in the firmware log: >> >>> S3ResumeExecuteBootScript() >>> PeiS3ResumeState - 7FF92B18 >>> transfer control to Standalone Boot Script Executor >>> S3BootScriptExecute: >>> TableHeader - 0x7E7A7000 >>> TableHeader.Version - 0x0001 >>> TableHeader.TableLength - 0x000000ED >>> ExecuteBootScript - 7E7A700D >>> EFI_BOOT_SCRIPT_MEM_WRITE_OPCODE >>> BootScriptExecuteMemoryWrite - 0x7FE4F018, 0x00000010, 0x00000000 >> >> Here the ACPI S3 Boot Script, prepared in >> TransferS3ContextToBootScript() -- see (2) -- creates a DMA access >> command for fw_cfg. The DMA access command is written to pre-reserved >> memory (see "ScratchBuffer" above). >> >>> S3BootScriptWidthUint8 - 0x7FE4F018 (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F019 (0x2B) >> >> The fw_cfg selector is 0x2B. (See under (2).) >> >>> S3BootScriptWidthUint8 - 0x7FE4F01A (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F01B (0x0C) >> >> This is a combined select+skip operation. >> >>> S3BootScriptWidthUint8 - 0x7FE4F01C (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F01D (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F01E (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F01F (0x00) >> >> The skip size is 0 bytes. >> >>> S3BootScriptWidthUint8 - 0x7FE4F020 (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F021 (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F022 (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F023 (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F024 (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F025 (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F026 (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F027 (0x00) >> >> The address is irrelevant for skip, so it's just nuleld. >> >>> ExecuteBootScript - 7E7A7030 >>> EFI_BOOT_SCRIPT_IO_WRITE_OPCODE >>> BootScriptExecuteIoWrite - 0x00000514, 0x00000002, 0x00000002 >>> S3BootScriptWidthUint32 - 0x00000514 (0x00000000) >>> S3BootScriptWidthUint32 - 0x00000518 (0x18F0E47F) >> >> The Boot Script passes the DMA command to QEMU, by writing the address >> of the command buffer to IO ports 0x514 and 0x518, in BE byte order. >> >>> ExecuteBootScript - 7E7A704B >>> EFI_BOOT_SCRIPT_MEM_POLL_OPCODE >>> BootScriptExecuteMemPoll - 0x7FE4F018, 0x00000000FFFFFFFF, >>> 0x0000000000000000 >>> S3BootScriptWidthUint32 - 0x7FE4F018 >>> ExecuteBootScript - 7E7A7072 >> >> This waits until the DMA command succeeds (reading back the Control >> field continuously until it reads as zero). >> >>> EFI_BOOT_SCRIPT_MEM_WRITE_OPCODE >>> BootScriptExecuteMemoryWrite - 0x7FE4F018, 0x00000018, 0x00000000 >> >> This is another DMA access command for fw_cfg, prepared in the same >> pre-reserved buffer. This time >> >>> S3BootScriptWidthUint8 - 0x7FE4F018 (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F019 (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F01A (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F01B (0x10) >> >> we request a write operation, >> >>> S3BootScriptWidthUint8 - 0x7FE4F01C (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F01D (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F01E (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F01F (0x08) >> >> with a length of 8 bytes (big endian), matching the pointer size, >> >>> S3BootScriptWidthUint8 - 0x7FE4F020 (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F021 (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F022 (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F023 (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F024 (0x7F) >>> S3BootScriptWidthUint8 - 0x7FE4F025 (0xE4) >>> S3BootScriptWidthUint8 - 0x7FE4F026 (0xF0) >>> S3BootScriptWidthUint8 - 0x7FE4F027 (0x28) >> >> the data to transfer is located at 0x7FE4F028 (just below, tacked to the >> command buffer itself), >> >>> S3BootScriptWidthUint8 - 0x7FE4F028 (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F029 (0xC0) >>> S3BootScriptWidthUint8 - 0x7FE4F02A (0xE5) >>> S3BootScriptWidthUint8 - 0x7FE4F02B (0x7F) >>> S3BootScriptWidthUint8 - 0x7FE4F02C (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F02D (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F02E (0x00) >>> S3BootScriptWidthUint8 - 0x7FE4F02F (0x00) >> >> and the data to write is the original allocation address of the blob >> (0x7fe5c000). >> >>> ExecuteBootScript - 7E7A709D >>> EFI_BOOT_SCRIPT_IO_WRITE_OPCODE >>> BootScriptExecuteIoWrite - 0x00000514, 0x00000002, 0x00000002 >>> S3BootScriptWidthUint32 - 0x00000514 (0x00000000) >>> S3BootScriptWidthUint32 - 0x00000518 (0x18F0E47F) >>> ExecuteBootScript - 7E7A70B8 >>> EFI_BOOT_SCRIPT_MEM_POLL_OPCODE >>> BootScriptExecuteMemPoll - 0x7FE4F018, 0x00000000FFFFFFFF, >>> 0x0000000000000000 >>> S3BootScriptWidthUint32 - 0x7FE4F018 >>> ExecuteBootScript - 7E7A70DF >> >> Same story as above: fire off the transfer and wait until it completes. >> >>> EFI_BOOT_SCRIPT_INFORMATION_OPCODE >>> BootScriptExecuteInformation - 0x7E7A70E6 >>> BootScriptInformation: DE AD BE EF >>> ExecuteBootScript - 7E7A70EA >>> S3_BOOT_SCRIPT_LIB_TERMINATE_OPCODE >>> S3BootScriptDone - Success >>> [...] >> >> The DEADBEEF informational (no-op) opcode is something that OVMF appends >> to the very end for hysterical raisins. >> >> (9) Okay, so the guest is now resumed and running, let's interrupt it in >> gdb again, and check the contents of address blob again (we know the >> address of the address blob from step (6)): >> >>> ^C >>> Program received signal SIGINT, Interrupt. >>> 0x00007f2bbf1d1ebf in ppoll () from /lib64/libc.so.6 >>> (gdb) print /x *(uint64_t*)0x7f2bd03c37b0 >>> $6 = 0x7fe5c000 >> >> Et voila. >> >> (10) I detached gdb from QEMU, and issued the following monitor command: >> >>> $ virsh qemu-monitor-command ovmf.rhel7 --hmp 'info vm-generation-id' >>> 00112233-4455-6677-8899-aabbccddeeff >> >> (11) I also booted a Windows Server 2012 R2 guest (Q35, broadcast SMI >> enabled) with a similar vmgenid device/parameter. According to Device >> Manager | System devices, "Microsoft Hyper-V Generation Counter" is >> working properly. >> >> I also tested S3 briefly, it worked okay. (I mentioned the SMI broadcast >> above because for that, OVMF generates an independent S3 Boot Script >> fragment.) >> >> >> I'll let someone else test live migration. >> >> For patches #1, #3, #4 and #5: >> >> Tested-by: Laszlo Ersek <ler...@redhat.com> >> >> I'll soon post the OVMF patches. >> >> Thanks! >> Laszlo > > > How do you feel about Igor's request to change WRITE_POINTER to add > offset in there, so guest can pass in the address of GUID and > not start of table? Would that be a lot of work to add? > I know you’re asking Laszlo, but hopefully the answer is “OK”. I’m finished making those changes to QEMU and am coding up SeaBIOS right now. > -- > MST
smime.p7s
Description: S/MIME cryptographic signature