[RFC PATCH] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/hvmlite.html This PoC patch enables Qemu to use that same entry point for booting KVM guests. Even though the code is still PoC quality, I'm sending this as an RFC now since there are a number of different ways the specific implementation details can be handled. I chose a shared code path for Xen and KVM guests but could just as easily create a separate code path that is advertised by a different ELF note for KVM. There also seems to be some flexibility in how the e820 table data is passed and how (or if) it should be identified as e820 data. As a starting point, I've chosen the options that seem to result in the smallest patch with minimal to no changes required of the x86/HVM direct boot ABI. --- arch/x86/xen/enlighten_pvh.c | 74 1 file changed, 55 insertions(+), 19 deletions(-) diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 98ab176..d93f711 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -31,21 +31,46 @@ static void xen_pvh_arch_setup(void) acpi_irq_model = ACPI_IRQ_MODEL_PLATFORM; } -static void __init init_pvh_bootparams(void) +static void __init init_pvh_bootparams(bool xen_guest) { struct xen_memory_map memmap; int rc; memset(_bootparams, 0, sizeof(pvh_bootparams)); - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); - rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); - if (rc) { - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); - BUG(); + if (xen_guest) { + memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); + set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); + if (rc) { + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + BUG(); + } + pvh_bootparams.e820_entries = memmap.nr_entries; + } else if (pvh_start_info.nr_modules > 1) { + /* The second module should be the e820 data for KVM guests */ + struct hvm_modlist_entry *modaddr; + char e820_sig[] = "e820 data"; + struct boot_e820_entry *ep; + struct e820_table *tp; + char *cmdline_str; + int idx; + + modaddr = __va(pvh_start_info.modlist_paddr + + sizeof(struct hvm_modlist_entry)); + cmdline_str = __va(modaddr->cmdline_paddr); + + if ((modaddr->cmdline_paddr) && + (!strncmp(e820_sig, cmdline_str, sizeof(e820_sig { + tp = __va(modaddr->paddr); + ep = (struct boot_e820_entry *)tp->entries; + + pvh_bootparams.e820_entries = tp->nr_entries; + + for (idx = 0; idx < tp->nr_entries ; idx++, ep++) + pvh_bootparams.e820_table[idx] = *ep; + } } - pvh_bootparams.e820_entries = memmap.nr_entries; if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = @@ -55,8 +80,9 @@ static void __init init_pvh_bootparams(void) pvh_bootparams.e820_table[pvh_bootparams.e820_entries].type = E820_TYPE_RESERVED; pvh_bootparams.e820_entries++; - } else + } else if (xen_guest) { xen_raw_printk("Warning: Can fit ISA range into e820\n"); + } pvh_bootparams.hdr.cmd_line_ptr = pvh_start_info.cmdline_paddr; @@ -76,7 +102,7 @@ static void __init init_pvh_bootparams(void) * environment (i.e. hardware_subarch 0). */ pvh_bootparams.hdr.version = 0x212; - pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ + pvh_bootparams.hdr.type_of_loader = ((xen_guest ? 0x9 : 0xb) << 4) | 0; } /* @@ -85,22 +111,32 @@ static void __init init_pvh_bootparams(void) */ void __init xen_prepare_pvh(void) { - u32 msr; + + u32 msr = xen_cpuid_base(); u64 pfn; + bool xen_guest = msr ? true : false; if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { - xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", -
Re: [RFC PATCH] KVM: x86: Allow Qemu/KVM to use PVH entry point
On 11/29/2017 12:21 AM, Juergen Gross wrote: On 28/11/17 20:34, Maran Wilson wrote: For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/hvmlite.html This PoC patch enables Qemu to use that same entry point for booting KVM guests. Even though the code is still PoC quality, I'm sending this as an RFC now since there are a number of different ways the specific implementation details can be handled. I chose a shared code path for Xen and KVM guests but could just as easily create a separate code path that is advertised by a different ELF note for KVM. There also seems to be some flexibility in how the e820 table data is passed and how (or if) it should be identified as e820 data. As a starting point, I've chosen the options that seem to result in the smallest patch with minimal to no changes required of the x86/HVM direct boot ABI. I like the idea. I'd rather split up the different hypervisor types early and use a common set of service functions instead of special casing xen_guest everywhere. This would make it much easier to support the KVM PVH boot without the need to configure the kernel with CONFIG_XEN. Thanks for the feedback. I'll try doing something like that as this patch moves from proof of concept to a real proposal. Another option would be to use the same boot path as with grub: set the boot params in zeropage and start at startup_32. I think others have already responded about that. The main thing I was trying to avoid, was adding any Linux OS specific initialization (like zeropage) to QEMU. Especially since this PVH entry point already exists in Linux. Thanks, -Maran Juergen --- arch/x86/xen/enlighten_pvh.c | 74 1 file changed, 55 insertions(+), 19 deletions(-) diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 98ab176..d93f711 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -31,21 +31,46 @@ static void xen_pvh_arch_setup(void) acpi_irq_model = ACPI_IRQ_MODEL_PLATFORM; } -static void __init init_pvh_bootparams(void) +static void __init init_pvh_bootparams(bool xen_guest) { struct xen_memory_map memmap; int rc; memset(_bootparams, 0, sizeof(pvh_bootparams)); - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); - rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); - if (rc) { - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); - BUG(); + if (xen_guest) { + memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); + set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); + if (rc) { + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + BUG(); + } + pvh_bootparams.e820_entries = memmap.nr_entries; + } else if (pvh_start_info.nr_modules > 1) { + /* The second module should be the e820 data for KVM guests */ + struct hvm_modlist_entry *modaddr; + char e820_sig[] = "e820 data"; + struct boot_e820_entry *ep; + struct e820_table *tp; + char *cmdline_str; + int idx; + + modaddr = __va(pvh_start_info.modlist_paddr + + sizeof(struct hvm_modlist_entry)); + cmdline_str = __va(modaddr->cmdline_paddr); + + if ((modaddr->cmdline_paddr) && + (!strncmp(e820_sig, cmdline_str, sizeof(e820_sig { + tp = __va(modaddr->paddr); + ep = (struct boot_e820_entry *)tp->entries; + + pvh_bootparams.e820_entries = tp->nr_entries; + + for (idx = 0; idx < tp->nr_entries ; idx++, ep++) + pvh_bootparams.e820_table[idx] = *ep; + } } - pvh_bootparams.e820_entries = memmap.nr_entries; if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = @@ -55,8 +80,9 @@ static void __init init_pvh_bootparams(void) pvh_bootparams.e820_table[pvh_bootparams.e820_entries].type = E820_TYPE_RESERVED; pvh_bootp
Re: [RFC PATCH] KVM: x86: Allow Qemu/KVM to use PVH entry point
On 11/29/2017 12:59 AM, Paolo Bonzini wrote: On 28/11/2017 20:34, Maran Wilson wrote: For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/hvmlite.html This PoC patch enables Qemu to use that same entry point for booting KVM guests. Nice! So QEMU would parse the ELF file just like for multiboot, find the ELF note, and then prepare an hvmlite boot info struct instead of the multiboot one? Yes, exactly. There would then be a new option ROM, very similar to multiboot.S. That is one option. I guess this gets into a discussion about the QEMU side of the upcoming patches that would follow ... I'm currently just initializing the CPU state in QEMU for testing since there is such minimal (non Linux specific) setup that is required by the ABI. And (borrowing from the Intel clear container patches) that VM setup is only performed when user selects the "nofw" option with the q35 model. But yeah, if folks think it important to move all such machine state initialization out of QEMU and into an option ROM, I can look into coding it up that way for the QEMU patches. Thanks, -Maran Thanks, Paolo
Re: [RFC PATCH] KVM: x86: Allow Qemu/KVM to use PVH entry point
On 11/29/2017 6:44 AM, Paolo Bonzini wrote: I actually like this patch, except that I'd get the e820 memory map from fw_cfg (see the first part of https://github.com/bonzini/qboot/blob/master/fw_cfg.c, and extract_e820 inhttps://github.com/bonzini/qboot/blob/master/main.c) instead of the second module. Hi Paolo, I want to make sure I understand exactly what you are suggesting... Are you saying the Linux PVH entry code (such as init_pvh_bootparams()) should use the fw_cfg interface to read the e820 memory map data and put it into the zeropage? Basically, keeping the patch very much like it already is, just extracting the e820 data via the fw_cfg interface instead of from the second module of start_info struct? If that is the case, I guess I'm a bit hesitant to throw the QEMU specific fw_cfg interface into the mix on the Linux PVH side when the existing PVH ABI already seems to contain an interface for passing modules/blobs to the guest. But if you feel there is a compelling reason to use the fw_cfg interface here, I'm happy to explore that approach further. Thanks, -Maran
[RFC PATCH v3 0/2] KVM: x86: Allow Qemu/KVM to use PVH entry point
Changes from v2: * All structures (including memory map table entries) are padded and aligned to an 8 byte boundary. * Removed the "packed" attributes and made changes to comments as suggested by Jan. Changes from v1: * Adopted Paolo's suggestion for defining a v2 PVH ABI that includes the e820 map instead of using the second module entry to pass the table. * Cleaned things up a bit to reduce the number of xen vs non-xen special cases. Maran Wilson (2): xen/pvh: Add memory map pointer to hvm_start_info struct KVM: x86: Allow Qemu/KVM to use PVH entry point arch/x86/xen/enlighten_pvh.c | 51 --- include/xen/interface/hvm/start_info.h | 50 +- 2 files changed, 85 insertions(+), 16 deletions(-)
[RFC PATCH v3 1/2] xen/pvh: Add memory map pointer to hvm_start_info struct
The start info structure that is defined as part of the x86/HVM direct boot ABI and used for starting Xen PVH guests would be more versatile if it also included a way to pass information about the memory map to the guest. This would allow KVM guests to share the same entry point. --- include/xen/interface/hvm/start_info.h | 50 +- 1 file changed, 49 insertions(+), 1 deletion(-) diff --git a/include/xen/interface/hvm/start_info.h b/include/xen/interface/hvm/start_info.h index 6484159..80cfbd3 100644 --- a/include/xen/interface/hvm/start_info.h +++ b/include/xen/interface/hvm/start_info.h @@ -33,7 +33,7 @@ *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE *|| ("xEn3" with the 0x80 bit of the "E" set). * 4 ++ - *| version| Version of this structure. Current version is 0. New + *| version| Version of this structure. Current version is 1. New *|| versions are guaranteed to be backwards-compatible. * 8 ++ *| flags | SIF_xxx flags. @@ -48,6 +48,15 @@ * 32 ++ *| rsdp_paddr | Physical address of the RSDP ACPI data structure. * 40 ++ + *| memmap_paddr | Physical address of the (optional) memory map. Only + *|| present in version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Only + *|| present in version 1 and newer of the structure. + *|| Zero if there is no memory map being provided. + * 52 ++ + *| reserved | Version 1 and newer only. + * 56 ++ * * The layout of each entry in the module structure is the following: * @@ -62,10 +71,34 @@ *| reserved | * 32 ++ * + * The layout of each entry in the memory map table is as follows: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping in bytes + * 16 ++ + *| type | Type of mapping as defined between the hypervisor + *|| and guest it's starting. E820_TYPE_xxx, for example. + * 20 +| + *| reserved | + * 24 ++ + * * The address and sizes are always a 64bit little endian unsigned integer. * * NB: Xen on x86 will always try to place all the data below the 4GiB * boundary. + * + * Version numbers of the hvm_start_info structure have evolved like this: + * + * Version 0: + * + * Version 1: Added the memmap_paddr/memmap_entries fields (plus 4 bytes of + * padding) to the end of the hvm_start_info struct. These new + * fields can be used to pass a memory map to the guest. The + * memory map is optional and so guests that understand version 1 + * of the structure must check that memmap_entries is non-zero + * before trying to read the memory map. */ #define XEN_HVM_START_MAGIC_VALUE 0x336ec578 @@ -86,6 +119,14 @@ struct hvm_start_info { uint64_t cmdline_paddr; /* Physical address of the command line. */ uint64_t rsdp_paddr;/* Physical address of the RSDP ACPI data*/ /* structure.*/ +uint64_t memmap_paddr; /* Physical address of an array of */ + /* hvm_memmap_table_entry. Only present in */ + /* version 1 and newer of the structure */ +uint32_t memmap_entries; /* Number of entries in the memmap table.*/ + /* Only present in version 1 and newer of*/ + /* the structure. Value will be zero if */ + /* there is no memory map being provided.*/ +uint32_t reserved; }; struct hvm_modlist_entry { @@ -95,4 +136,11 @@ struct hvm_modlist_entry { uint64_t reserved; }; +struct hvm_memmap_table_entry { +uint64_t addr; /* Base address of the memory region */ +uint64_t size; /* Size of the memory region in bytes*/ +uint32_t type; /* Mapping type */ +uint32_t reserved; +}; + #endif /* __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ */ -- 1.8.3.1
[RFC PATCH v3 2/2] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch enables Qemu to use that same entry point for booting KVM guests. --- arch/x86/xen/enlighten_pvh.c | 51 +++- 1 file changed, 36 insertions(+), 15 deletions(-) diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 98ab176..12f3716 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -31,21 +31,38 @@ static void xen_pvh_arch_setup(void) acpi_irq_model = ACPI_IRQ_MODEL_PLATFORM; } -static void __init init_pvh_bootparams(void) +static void __init init_pvh_bootparams(bool xen_guest) { struct xen_memory_map memmap; int rc; memset(_bootparams, 0, sizeof(pvh_bootparams)); - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); - rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); - if (rc) { - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + if ((pvh_start_info.version > 0) && (pvh_start_info.memmap_entries)) { + struct hvm_memmap_table_entry *ep; + int i; + + ep = __va(pvh_start_info.memmap_paddr); + pvh_bootparams.e820_entries = pvh_start_info.memmap_entries; + + for (i = 0; i < pvh_bootparams.e820_entries ; i++, ep++) { + pvh_bootparams.e820_table[i].addr = ep->addr; + pvh_bootparams.e820_table[i].size = ep->size; + pvh_bootparams.e820_table[i].type = ep->type; + } + } else if (xen_guest) { + memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); + set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); + if (rc) { + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + BUG(); + } + pvh_bootparams.e820_entries = memmap.nr_entries; + } else { + xen_raw_printk("Error: Could not find memory map\n"); BUG(); } - pvh_bootparams.e820_entries = memmap.nr_entries; if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = @@ -76,7 +93,7 @@ static void __init init_pvh_bootparams(void) * environment (i.e. hardware_subarch 0). */ pvh_bootparams.hdr.version = 0x212; - pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ + pvh_bootparams.hdr.type_of_loader = ((xen_guest ? 0x9 : 0xb) << 4) | 0; } /* @@ -85,8 +102,10 @@ static void __init init_pvh_bootparams(void) */ void __init xen_prepare_pvh(void) { - u32 msr; + + u32 msr = xen_cpuid_base(); u64 pfn; + bool xen_guest = !!msr; if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", @@ -94,13 +113,15 @@ void __init xen_prepare_pvh(void) BUG(); } - xen_pvh = 1; + if (xen_guest) { + xen_pvh = 1; - msr = cpuid_ebx(xen_cpuid_base() + 2); - pfn = __pa(hypercall_page); - wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); + msr = cpuid_ebx(msr + 2); + pfn = __pa(hypercall_page); + wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); - init_pvh_bootparams(); + x86_init.oem.arch_setup = xen_pvh_arch_setup; + } - x86_init.oem.arch_setup = xen_pvh_arch_setup; + init_pvh_bootparams(xen_guest); } -- 1.8.3.1
[RFC PATCH v2 0/2] KVM: x86: Allow Qemu/KVM to use PVH entry point
Changes from v1: * Adopted Paolo's suggestion for defining a v2 PVH ABI that includes the e820 map instead of using the second module entry to pass the table. * Cleaned things up a bit to reduce the number of xen vs non-xen special cases. Juergen also had a suggestion to split the different hypervisor types early and use a common set of service functions instead of special casing xen_guest everywhere. There are certainly less special cases in this version of the patch, but if we still think it's important to split things up between common, Xen, and KVM components, then I would appreciate a suggestion on how best that can be done. Are we talking about just re-factoring functions in the existing file? Or do we need to go all the way and pull all the PVH entry code out of xen directories and find a home for it somewhere else so that we can use kernels built without CONFIG_XEN to start KVM guests via the PVH entry point. If the latter, any suggestions for which common files or directories I can move this stuff to? Maran Wilson (2): xen/pvh: Add memory map pointer to hvm_start_info struct KVM: x86: Allow Qemu/KVM to use PVH entry point arch/x86/xen/enlighten_pvh.c | 48 +--- include/xen/interface/hvm/start_info.h | 34 +++--- 2 files changed, 64 insertions(+), 18 deletions(-)
[RFC PATCH v2 1/2] xen/pvh: Add memory map pointer to hvm_start_info struct
The start info structure that is defined as part of the x86/HVM direct boot ABI and used for starting Xen PVH guests would be more versatile if it also included a way to efficiently pass information about the memory map to the guest. That way Xen PVH guests would not be forced to use a hypercall to get the information and would make it easier for KVM guests to share the PVH entry point. --- include/xen/interface/hvm/start_info.h | 34 +++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/include/xen/interface/hvm/start_info.h b/include/xen/interface/hvm/start_info.h index 6484159..60206bb 100644 --- a/include/xen/interface/hvm/start_info.h +++ b/include/xen/interface/hvm/start_info.h @@ -33,7 +33,7 @@ *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE *|| ("xEn3" with the 0x80 bit of the "E" set). * 4 ++ - *| version| Version of this structure. Current version is 0. New + *| version| Version of this structure. Current version is 1. New *|| versions are guaranteed to be backwards-compatible. * 8 ++ *| flags | SIF_xxx flags. @@ -48,6 +48,12 @@ * 32 ++ *| rsdp_paddr | Physical address of the RSDP ACPI data structure. * 40 ++ + *| memmap_paddr | Physical address of the memory map. Only present in + *|| version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Only + *|| present in version 1 and newer of the structure. + * 52 ++ * * The layout of each entry in the module structure is the following: * @@ -62,6 +68,17 @@ *| reserved | * 32 ++ * + * The layout of each entry in the memory map table is as follows and no + * padding is used between entries in the array: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping + * 16 ++ + *| type | E820_TYPE_xxx + * 20 +| + * * The address and sizes are always a 64bit little endian unsigned integer. * * NB: Xen on x86 will always try to place all the data below the 4GiB @@ -86,13 +103,24 @@ struct hvm_start_info { uint64_t cmdline_paddr; /* Physical address of the command line. */ uint64_t rsdp_paddr;/* Physical address of the RSDP ACPI data*/ /* structure.*/ -}; +uint64_t memmap_paddr; /* Physical address of an array of */ + /* hvm_memmap_table_entry. Only present in */ + /* Ver 1 or later. For e820 mem map table. */ +uint32_t memmap_entries; /* Only present in Ver 1 or later. Number of */ + /* entries in the memmap table. */ +} __attribute__((packed)); struct hvm_modlist_entry { uint64_t paddr; /* Physical address of the module. */ uint64_t size; /* Size of the module in bytes. */ uint64_t cmdline_paddr; /* Physical address of the command line. */ uint64_t reserved; -}; +} __attribute__((packed)); + +struct hvm_memmap_table_entry { +uint64_t addr; /* Base address of the memory region */ +uint64_t size; /* Size of the memory region */ +uint32_t type; /* E820_TYPE_xxx of the memory region*/ +} __attribute__((packed)); #endif /* __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ */ -- 1.8.3.1
[RFC PATCH v2 2/2] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch enables Qemu to use that same entry point for booting KVM guests. --- arch/x86/xen/enlighten_pvh.c | 48 ++-- 1 file changed, 33 insertions(+), 15 deletions(-) diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 98ab176..f11fbfc 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -31,21 +31,35 @@ static void xen_pvh_arch_setup(void) acpi_irq_model = ACPI_IRQ_MODEL_PLATFORM; } -static void __init init_pvh_bootparams(void) +static void __init init_pvh_bootparams(bool xen_guest) { struct xen_memory_map memmap; int rc; memset(_bootparams, 0, sizeof(pvh_bootparams)); - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); - rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); - if (rc) { - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + if ((pvh_start_info.version > 0) && (pvh_start_info.memmap_entries)) { + struct boot_e820_entry *ep; + int idx; + + ep = __va(pvh_start_info.memmap_paddr); + pvh_bootparams.e820_entries = pvh_start_info.memmap_entries; + + for (idx = 0; idx < pvh_bootparams.e820_entries ; idx++, ep++) + pvh_bootparams.e820_table[idx] = *ep; + } else if (xen_guest) { + memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); + set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); + if (rc) { + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + BUG(); + } + pvh_bootparams.e820_entries = memmap.nr_entries; + } else { + xen_raw_printk("Error: Could not find memory map\n"); BUG(); } - pvh_bootparams.e820_entries = memmap.nr_entries; if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = @@ -76,7 +90,7 @@ static void __init init_pvh_bootparams(void) * environment (i.e. hardware_subarch 0). */ pvh_bootparams.hdr.version = 0x212; - pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ + pvh_bootparams.hdr.type_of_loader = ((xen_guest ? 0x9 : 0xb) << 4) | 0; } /* @@ -85,8 +99,10 @@ static void __init init_pvh_bootparams(void) */ void __init xen_prepare_pvh(void) { - u32 msr; + + u32 msr = xen_cpuid_base(); u64 pfn; + bool xen_guest = !!msr; if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", @@ -94,13 +110,15 @@ void __init xen_prepare_pvh(void) BUG(); } - xen_pvh = 1; + if (xen_guest) { + xen_pvh = 1; - msr = cpuid_ebx(xen_cpuid_base() + 2); - pfn = __pa(hypercall_page); - wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); + msr = cpuid_ebx(msr + 2); + pfn = __pa(hypercall_page); + wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); - init_pvh_bootparams(); + x86_init.oem.arch_setup = xen_pvh_arch_setup; + } - x86_init.oem.arch_setup = xen_pvh_arch_setup; + init_pvh_bootparams(xen_guest); } -- 1.8.3.1
Re: [RFC PATCH] KVM: x86: Allow Qemu/KVM to use PVH entry point
Just FYI: I sent out a v2 of this patch but in doing so I moved a few people from the "to" line to the "cc" line. For anyone who previously did not comment but still wanted to follow the discussion, here's the link to the v2 email: https://lkml.org/lkml/2017/12/7/1624 Thanks, -Maran On 12/1/2017 12:08 AM, Paolo Bonzini wrote: On 30/11/2017 19:23, Maran Wilson wrote: Are you saying the Linux PVH entry code (such as init_pvh_bootparams()) should use the fw_cfg interface to read the e820 memory map data and put it into the zeropage? Basically, keeping the patch very much like it already is, just extracting the e820 data via the fw_cfg interface instead of from the second module of start_info struct? Yes. If that is the case, I guess I'm a bit hesitant to throw the QEMU specific fw_cfg interface into the mix on the Linux PVH side when the existing PVH ABI already seems to contain an interface for passing modules/blobs to the guest. But if you feel there is a compelling reason to use the fw_cfg interface here, I'm happy to explore that approach further. I think the same holds true for Xen, but it is still using a hypercall to get the memory map. In the end, using fw_cfg seems closest to what the Xen code does. There are other possibilities: 1) defining a v2 PVH ABI that includes the e820 map would also be a possibility. 2) modify enlighten_pvh.c to get the start info in multiboot format, something like: diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 98ab17673454..656e41449db0 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -88,19 +88,22 @@ void __init xen_prepare_pvh(void) u32 msr; u64 pfn; - if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { + if (pvh_start_info.magic == XEN_HVM_START_MAGIC_VALUE) { + xen_pvh = 1; + + init_pvh_bootparams_xen(); + + msr = cpuid_ebx(xen_cpuid_base() + 2); + pfn = __pa(hypercall_page); + wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); + + x86_init.oem.arch_setup = xen_pvh_arch_setup; + } else if (pvh_start_info.magic == MULTIBOOT_INFO_MAGIC_VALUE) { + init_pvh_bootparams_multiboot(); + + } else { xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", pvh_start_info.magic); BUG(); } - - xen_pvh = 1; - - msr = cpuid_ebx(xen_cpuid_base() + 2); - pfn = __pa(hypercall_page); - wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); - - init_pvh_bootparams(); - - x86_init.oem.arch_setup = xen_pvh_arch_setup; } Note that this would *not* be a multiboot-format kernel, as it would still have the Xen PVH ELF note. It would just reuse the format of the start info struct. However, I think it is simpler to just use the e820 memory map from fw_cfg. Paolo
Re: [RFC PATCH v2 1/2] xen/pvh: Add memory map pointer to hvm_start_info struct
Thanks for taking a look Jan. More below... On 12/8/2017 12:49 AM, Jan Beulich wrote: On 07.12.17 at 23:45,wrote: The start info structure that is defined as part of the x86/HVM direct boot ABI and used for starting Xen PVH guests would be more versatile if it also included a way to efficiently pass information about the memory map to the guest. That way Xen PVH guests would not be forced to use a hypercall to get the information and would make it easier for KVM guests to share the PVH entry point. --- include/xen/interface/hvm/start_info.h | 34 +++--- 1 file changed, 31 insertions(+), 3 deletions(-) First of all such a change should be submitted against the canonical copy of the header, which lives in the Xen tree. Understood. Will do that when this converts from RFC to actual patch. The argument of avoiding a hypercall doesn't really count imo - this isn't in any way performance critical code. The argument of making re-use easier is fine, though. Okay, I will reword the commit message. --- a/include/xen/interface/hvm/start_info.h +++ b/include/xen/interface/hvm/start_info.h @@ -33,7 +33,7 @@ *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE *|| ("xEn3" with the 0x80 bit of the "E" set). * 4 ++ - *| version| Version of this structure. Current version is 0. New + *| version| Version of this structure. Current version is 1. New *|| versions are guaranteed to be backwards-compatible. * 8 ++ *| flags | SIF_xxx flags. @@ -48,6 +48,12 @@ * 32 ++ *| rsdp_paddr | Physical address of the RSDP ACPI data structure. * 40 ++ + *| memmap_paddr | Physical address of the memory map. Only present in + *|| version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Only + *|| present in version 1 and newer of the structure. + * 52 ++ Please let's make this optional even in v1 (and later), i.e. spell out that it may be zero. That way Xen code could continue to use the hypercall approach even. Yes, my intention was to make this optional. I will spell it out. Also please spell out a 4-byte reserved entry at the end, to make the specified structure a multiple of 8 in size again regardless of bitness of the producer/consumer. Sure, I can add that. @@ -62,6 +68,17 @@ *| reserved | * 32 ++ * + * The layout of each entry in the memory map table is as follows and no + * padding is used between entries in the array: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping + * 16 ++ + *| type | E820_TYPE_xxx + * 20 +| I'm not convinced of re-using E820 types here. I can see that this might ease the consumption in Linux, but I don't think there should be any connection to x86 aspects here - the data being supplied is x86-agnostic, and Linux'es placement of the header is also making no connection to x86 (oddly enough, the current placement in the Xen tree does, for a reason which escapes me). I could also imagine reasons to add new types without them being sanctioned by whoever maintains E820 type assignments. So there are three aspects to discuss here. 1) The addition of the "E820_TYPE_xxx" comment. I am fine with just changing that to "mapping type" and leaving it as something to be coordinated between the hypervisor and the guest OS being started by that hypervisor. 2) x86 vs x86-agnostic. While I'm trying to keep this interface generic in terms of guest OS (like Linux, FreeBSD, possible other guests in the future) and hypervisor type (Xen, QEMU/KVM, etc), I was actually under the impression that we are dealing with an ABI that is very much x86 specific. The canonical document describing the ABI (https://xenbits.xen.org/docs/unstable/misc/pvh.html) is titled "x86/HVM direct boot ABI" and goes on to describe an interface in very x86-specific terms. i.e. The ebx register must contain a pointer, cs, ds, es must be set a certain way, etc. That is probably why Xen's placement of the header file is in a x86 section of the tree. And also why there already exist a number of "x86" references in the existing header file. A quick grep of the existing header file will show lines like: "C representation of the x86/HVM start info layout" "Start of day structure passed to PVH guests and to HVM guests in %ebx" "Xen on x86 will always try to place all the data below the 4GiB" If at some point in the future someone decides to implement a similar ABI for a different CPU architecture while re-using this same hvm_start_info struct, then this
Re: [PATCH v7 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
Friendly ping. I am hopeful one of the x86 and/or KVM maintainers has a few cycles to spare to look this over. And thanks to everyone who has helped thus far by providing valuable feedback and reviewing. https://lkml.org/lkml/2018/4/16/1002 Thanks, -Maran On 4/16/2018 4:09 PM, Maran Wilson wrote: For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch series would enable Qemu to use that same entry point for booting KVM guests. Changes from v6: * Addressed issues caught by the kbuild test robot: - Restored an #include line that had been dropped by mistake (patch 4) - Removed a pair of #include lines that were no longer needed in a common code file and causing problems for certain 32-bit configs (patchs 4 and 7) Changes from v5: * The interface changes to the x86/HVM start info layout have now been accepted into the Xen tree. * Rebase and merge upstream PVH file changes. * (Patch 6) Synced up to the final version of the header file that was acked and pulled into the Xen tree. * (Patch 1) Fixed typo and removed redundant "def_bool n" line. Changes from v4: Note: I've withheld Juergen's earlier "Reviewed-by" tags from patches 1 and 7 since there were minor changes (mostly just addition of CONFIG_KVM_GUEST_PVH as requested) that came afterwards. * Changed subject prefix from RFC to PATCH * Added CONFIG_KVM_GUEST_PVH as suggested * Relocated the PVH common files to arch/x86/platform/pvh/{enlighten.c,head.S} * Realized I also needed to move the objtool override for those files * Updated a few code comments per reviewer feedback * Sent out a patch of the hvm_start_info struct changes against the Xen tree since that is the canonical copy of the header. Discussions on that thread have resulted in some (non-functional) updates to start_info.h (patch 6/7) and those changes are reflected here as well in order to keep the files in sync. The header file has since been ack'ed for the Xen tree by Jan Beulich. Changes from v3: * Implemented Juergen's suggestion for refactoring and moving the PVH code so that CONFIG_XEN is no longer required for booting KVM guests via the PVH entry point. Functionally, nothing has changed from V3 really, but the patches look completely different now because of all the code movement and refactoring. Some of these patches can be combined, but I've left them very small in some cases to make the refactoring and code movement easier to review. My approach for refactoring has been to create a PVH entry layer that still has understanding and knowledge about Xen vs non-Xen guest types so that it can make run time decisions to handle either case, as opposed to going all the way and re-writing it to be a completely hypervisor agnostic and architecturally pure layer that is separate from guest type details. The latter seemed a bit overkill in this situation. And I've handled the complexity of having to support Qemu/KVM boot of kernels compiled with or without CONFIG_XEN via a pair of xen specific __weak routines that can be overridden in kernels that support Xen guests. Importantly, the __weak routines are for xen specific code only (not generic "guest type" specific code) so there is no clashing between xen version of the strong routine and, say, a KVM version of the same routine. But I'm sure there are many ways to skin this cat, so I'm open to alternate suggestions if there is a compelling reason for not using __weak in this situation. Changes from v2: * All structures (including memory map table entries) are padded and aligned to an 8 byte boundary. * Removed the "packed" attributes and made changes to comments as suggested by Jan. Changes from v1: * Adopted Paolo's suggestion for defining a v2 PVH ABI that includes the e820 map instead of using the second module entry to pass the table. * Cleaned things up a bit to reduce the number of xen vs non-xen special cases. Maran Wilson (7): xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH xen/pvh: Move PVH entry code out of Xen specific tree xen/pvh: Create a new file for Xen specific PVH code xen/pvh: Move Xen specific PVH VM initialization out of common file xen/pvh: Move Xen code for getting mem map via hcall out of common file xen/pvh: Add memory map pointer to hvm_start_info struct KVM: x86: Allow Qemu/KVM to use PVH entry point MAINTAINERS
Re: [PATCH v7 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
On 5/18/2018 4:31 AM, Paolo Bonzini wrote: On 16/05/2018 22:27, Maran Wilson wrote: Friendly ping. I am hopeful one of the x86 and/or KVM maintainers has a few cycles to spare to look this over. And thanks to everyone who has helped thus far by providing valuable feedback and reviewing. https://lkml.org/lkml/2018/4/16/1002 KVM bits look fine. This would be the right time to post the QEMU patches... Thanks Paolo. Yes, we will have the Qemu patches out soon. It is being actively worked. We have had one implementation in place, but decided to re-implement things slightly based on some preliminary feedback before sending it out to the wider community. Thanks, -Maran Paolo
Re: [PATCH v7 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
On 4/18/2018 1:11 AM, Linus Walleij wrote: I wonder why I am starting to get CCed on Xen patches all of a sudden. I happened to run into Jürgen at a conference only last weekend, but I still don't know anything whatsoever about Xen or how it works. If get_maintainer.pl has started to return my name on this stuff I really want to know why :/ It has nothing to do with Xen actually. But for some reason, the get_maintainer.pl script is returning your name for any patch that modifies the MAINTAINERS file. Although why that is the case wasn't clear to me based on a quick look at both those files. -Maran Yours, Linus Walleij
[RFC PATCH v4 6/7] xen/pvh: Add memory map pointer to hvm_start_info struct
The start info structure that is defined as part of the x86/HVM direct boot ABI and used for starting Xen PVH guests would be more versatile if it also included a way to pass information about the memory map to the guest. This would allow KVM guests to share the same entry point. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- include/xen/interface/hvm/start_info.h | 50 +- 1 file changed, 49 insertions(+), 1 deletion(-) diff --git a/include/xen/interface/hvm/start_info.h b/include/xen/interface/hvm/start_info.h index 648415976ead..80cfbd35c1af 100644 --- a/include/xen/interface/hvm/start_info.h +++ b/include/xen/interface/hvm/start_info.h @@ -33,7 +33,7 @@ *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE *|| ("xEn3" with the 0x80 bit of the "E" set). * 4 ++ - *| version| Version of this structure. Current version is 0. New + *| version| Version of this structure. Current version is 1. New *|| versions are guaranteed to be backwards-compatible. * 8 ++ *| flags | SIF_xxx flags. @@ -48,6 +48,15 @@ * 32 ++ *| rsdp_paddr | Physical address of the RSDP ACPI data structure. * 40 ++ + *| memmap_paddr | Physical address of the (optional) memory map. Only + *|| present in version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Only + *|| present in version 1 and newer of the structure. + *|| Zero if there is no memory map being provided. + * 52 ++ + *| reserved | Version 1 and newer only. + * 56 ++ * * The layout of each entry in the module structure is the following: * @@ -62,10 +71,34 @@ *| reserved | * 32 ++ * + * The layout of each entry in the memory map table is as follows: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping in bytes + * 16 ++ + *| type | Type of mapping as defined between the hypervisor + *|| and guest it's starting. E820_TYPE_xxx, for example. + * 20 +| + *| reserved | + * 24 ++ + * * The address and sizes are always a 64bit little endian unsigned integer. * * NB: Xen on x86 will always try to place all the data below the 4GiB * boundary. + * + * Version numbers of the hvm_start_info structure have evolved like this: + * + * Version 0: + * + * Version 1: Added the memmap_paddr/memmap_entries fields (plus 4 bytes of + * padding) to the end of the hvm_start_info struct. These new + * fields can be used to pass a memory map to the guest. The + * memory map is optional and so guests that understand version 1 + * of the structure must check that memmap_entries is non-zero + * before trying to read the memory map. */ #define XEN_HVM_START_MAGIC_VALUE 0x336ec578 @@ -86,6 +119,14 @@ struct hvm_start_info { uint64_t cmdline_paddr; /* Physical address of the command line. */ uint64_t rsdp_paddr;/* Physical address of the RSDP ACPI data*/ /* structure.*/ +uint64_t memmap_paddr; /* Physical address of an array of */ + /* hvm_memmap_table_entry. Only present in */ + /* version 1 and newer of the structure */ +uint32_t memmap_entries; /* Number of entries in the memmap table.*/ + /* Only present in version 1 and newer of*/ + /* the structure. Value will be zero if */ + /* there is no memory map being provided.*/ +uint32_t reserved; }; struct hvm_modlist_entry { @@ -95,4 +136,11 @@ struct hvm_modlist_entry { uint64_t reserved; }; +struct hvm_memmap_table_entry { +uint64_t addr; /* Base address of the memory region */ +uint64_t size; /* Size of the memory region in bytes*/ +uint32_t type; /* Mapping type */ +uint32_t reserved; +}; + #endif /* __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ */ -- 2.16.1
[RFC PATCH v4 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
Sorry for the delay between this version and the last -- it was mostly due to holidays and everyone being focused on security bug mitigation issues. Here are the links to the previous email threads in case it is helpful: V3: https://lkml.org/lkml/2017/12/12/1230 V2: https://lkml.org/lkml/2017/12/7/1624 V1: https://lkml.org/lkml/2017/11/28/1280 Changes from v3: * Implemented Juergen's suggestion for refactoring and moving the PVH code so that CONFIG_XEN is no longer required for booting KVM guests via the PVH entry point. Functionally, nothing has changed from V3 really, but the patches look completely different now because of all the code movement and refactoring. Some of these patches can be combined, but I've left them very small in some cases to make the refactoring and code movement easier to review. My approach for refactoring has been to create a PVH entry layer that still has understanding and knowledge about Xen vs non-Xen guest types so that it can make run time decisions to handle either case, as opposed to going all the way and re-writing it to be a completely hypervisor agnostic and architecturally pure layer that is separate from guest type details. The latter seemed a bit overkill in this situation. And I've handled the complexity of having to support Qemu/KVM boot of kernels compiled with or without CONFIG_XEN via a pair of xen specific __weak routines that can be overridden in kernels that support Xen guests. Importantly, the __weak routines are for xen specific code only (not generic "guest type" specific code) so there is no clashing between xen version of the strong routine and, say, a KVM version of the same routine. But I'm sure there are many ways to skin this cat, so I'm open to alternate suggestions if there is a compelling reason for not using __weak in this situation. Changes from v2: * All structures (including memory map table entries) are padded and aligned to an 8 byte boundary. * Removed the "packed" attributes and made changes to comments as suggested by Jan. Changes from v1: * Adopted Paolo's suggestion for defining a v2 PVH ABI that includes the e820 map instead of using the second module entry to pass the table. * Cleaned things up a bit to reduce the number of xen vs non-xen special cases. Maran Wilson (7): xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH xen/pvh: Move PVH entry code out of Xen specific tree xen/pvh: Create a new file for Xen specific PVH code xen/pvh: Move Xen specific PVH VM initialization out of common code xen/pvh: Move Xen code for getting mem map via hcall out of common file xen/pvh: Add memory map pointer to hvm_start_info struct KVM: x86: Allow Qemu/KVM to use PVH entry point MAINTAINERS| 1 + arch/x86/Kbuild| 3 + arch/x86/Kconfig | 8 ++ arch/x86/kernel/head_64.S | 4 +- arch/x86/pvh-head.S| 161 +++ arch/x86/pvh.c | 130 ++ arch/x86/xen/Kconfig | 3 +- arch/x86/xen/Makefile | 1 - arch/x86/xen/enlighten_pvh.c | 87 +++- arch/x86/xen/xen-pvh.S | 161 --- include/xen/interface/hvm/start_info.h | 50 ++- 11 files changed, 374 insertions(+), 235 deletions(-)
[RFC PATCH v4 7/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch enables Qemu to use that same entry point for booting KVM guests. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- arch/x86/Kbuild | 4 ++-- arch/x86/pvh.c | 43 --- 2 files changed, 34 insertions(+), 13 deletions(-) diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index a4e5e3d348dc..e9dc0f1c9d32 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,8 +7,8 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ -obj-$(CONFIG_XEN_PVH) += pvh.o -obj-$(CONFIG_XEN_PVH) += pvh-head.o +obj-$(CONFIG_PVH) += pvh.o +obj-$(CONFIG_PVH) += pvh-head.o # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/pvh.c b/arch/x86/pvh.c index 6e9f6a6e97b3..97042d11342f 100644 --- a/arch/x86/pvh.c +++ b/arch/x86/pvh.c @@ -7,6 +7,9 @@ #include #include +#include +#include + #include #include @@ -34,11 +37,28 @@ void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) BUG(); } -static void __init init_pvh_bootparams(void) +static void __init init_pvh_bootparams(bool xen_guest) { memset(_bootparams, 0, sizeof(pvh_bootparams)); - mem_map_via_hcall(_bootparams); + if ((pvh_start_info.version > 0) && (pvh_start_info.memmap_entries)) { + struct hvm_memmap_table_entry *ep; + int i; + + ep = __va(pvh_start_info.memmap_paddr); + pvh_bootparams.e820_entries = pvh_start_info.memmap_entries; + + for (i = 0; i < pvh_bootparams.e820_entries ; i++, ep++) { + pvh_bootparams.e820_table[i].addr = ep->addr; + pvh_bootparams.e820_table[i].size = ep->size; + pvh_bootparams.e820_table[i].type = ep->type; + } + } else if (xen_guest) { + mem_map_via_hcall(_bootparams); + } else { + /* Non-xen guests are not supported by version 0 */ + BUG(); + } if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = @@ -69,7 +89,7 @@ static void __init init_pvh_bootparams(void) * environment (i.e. hardware_subarch 0). */ pvh_bootparams.hdr.version = 0x212; - pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ + pvh_bootparams.hdr.type_of_loader = ((xen_guest ? 0x9 : 0xb) << 4) | 0; } /* @@ -82,13 +102,10 @@ void __init __weak xen_pvh_init(void) BUG(); } -/* - * When we add support for other hypervisors like Qemu/KVM, this routine can - * selectively invoke the appropriate initialization based on guest type. - */ -static void hypervisor_specific_init(void) +static void hypervisor_specific_init(bool xen_guest) { - xen_pvh_init(); + if (xen_guest) + xen_pvh_init(); } /* @@ -97,13 +114,17 @@ static void hypervisor_specific_init(void) */ void __init xen_prepare_pvh(void) { + + u32 msr = xen_cpuid_base(); + bool xen_guest = !!msr; + if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", pvh_start_info.magic); BUG(); } - hypervisor_specific_init(); + hypervisor_specific_init(xen_guest); - init_pvh_bootparams(); + init_pvh_bootparams(xen_guest); } -- 2.16.1
[RFC PATCH v4 1/7] xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
In order to pave the way for hypervisors other then Xen to use the PVH entry point for VMs, we need to factor the PVH entry code into Xen specific and hypervisor agnostic components. The first step in doing that, is to create a new config option for PVH entry that can be enabled independently from CONFIG_XEN. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- arch/x86/Kconfig | 8 arch/x86/kernel/head_64.S | 4 ++-- arch/x86/xen/Kconfig | 3 ++- 3 files changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index eb7f43f23521..fa7cd0305125 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -791,6 +791,14 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config PVH + bool "Support for running PVH guests" + depends on KVM_GUEST || XEN + def_bool n + ---help--- + This option enables the PVH entry point for guest virtual machines + as specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 0f545b3cf926..fc9f678c6413 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -41,7 +41,7 @@ #define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1)) -#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) +#if defined(CONFIG_XEN_PV) || defined(CONFIG_PVH) PGD_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE) PGD_START_KERNEL = pgd_index(__START_KERNEL_map) #endif @@ -387,7 +387,7 @@ NEXT_PAGE(early_dynamic_pgts) .data -#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) +#if defined(CONFIG_XEN_PV) || defined(CONFIG_PVH) NEXT_PGD_PAGE(init_top_pgt) .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC .orginit_top_pgt + PGD_PAGE_OFFSET*8, 0 diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index f605825a04ab..021c8591c3c0 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -77,8 +77,9 @@ config XEN_DEBUG_FS Enabling this option may incur a significant performance overhead. config XEN_PVH - bool "Support for running as a PVH guest" + bool "Support for running as a Xen PVH guest" depends on XEN && XEN_PVHVM && ACPI # Pre-built page tables are not ready to handle 5-level paging. depends on !X86_5LEVEL + select PVH def_bool n -- 2.16.1
[RFC PATCH v4 2/7] xen/pvh: Move PVH entry code out of Xen specific tree
Once hypervisors other than Xen start using the PVH entry point for starting VMs, we would like the option of being able to compile PVH entry capable kernels without enabling CONFIG_XEN and all the code that comes along with that. To allow that, we are moving the PVH code out of Xen and into files sitting at a higher level in the tree. This patch is not introducing any code or functional changes, just moving files from one location to another. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- MAINTAINERS | 1 + arch/x86/Kbuild | 3 +++ arch/x86/{xen/xen-pvh.S => pvh-head.S} | 0 arch/x86/{xen/enlighten_pvh.c => pvh.c} | 0 arch/x86/xen/Makefile | 2 -- 5 files changed, 4 insertions(+), 2 deletions(-) rename arch/x86/{xen/xen-pvh.S => pvh-head.S} (100%) rename arch/x86/{xen/enlighten_pvh.c => pvh.c} (100%) diff --git a/MAINTAINERS b/MAINTAINERS index 93a12af4f180..dc89f3a279bd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15210,6 +15210,7 @@ L: xen-de...@lists.xenproject.org (moderated for non-subscribers) T: git git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git S: Supported F: arch/x86/xen/ +F: arch/x86/*pvh* F: drivers/*/xen-*front.c F: drivers/xen/ F: arch/x86/include/asm/xen/ diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index 0038a2d10a7a..a4e5e3d348dc 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,6 +7,9 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ +obj-$(CONFIG_XEN_PVH) += pvh.o +obj-$(CONFIG_XEN_PVH) += pvh-head.o + # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/pvh-head.S similarity index 100% rename from arch/x86/xen/xen-pvh.S rename to arch/x86/pvh-head.S diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/pvh.c similarity index 100% rename from arch/x86/xen/enlighten_pvh.c rename to arch/x86/pvh.c diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index d83cb5478f54..7e8145b33997 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -21,7 +21,6 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o -obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o @@ -33,4 +32,3 @@ obj-$(CONFIG_XEN_DEBUG_FS)+= debugfs.o obj-$(CONFIG_XEN_DOM0) += vga.o obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o obj-$(CONFIG_XEN_EFI) += efi.o -obj-$(CONFIG_XEN_PVH) += xen-pvh.o -- 2.16.1
[RFC PATCH v4 3/7] xen/pvh: Create a new file for Xen specific PVH code
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The first step in that direction is to create a new file that will eventually hold the Xen specific routines. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- arch/x86/pvh.c | 1 - arch/x86/xen/Makefile| 1 + arch/x86/xen/enlighten_pvh.c | 11 +++ 3 files changed, 12 insertions(+), 1 deletion(-) create mode 100644 arch/x86/xen/enlighten_pvh.c diff --git a/arch/x86/pvh.c b/arch/x86/pvh.c index 436c4f003e17..b56cb5e7d6ac 100644 --- a/arch/x86/pvh.c +++ b/arch/x86/pvh.c @@ -19,7 +19,6 @@ * xen_pvh and pvh_bootparams need to live in data segment since they * are used after startup_{32|64}, which clear .bss, are invoked. */ -bool xen_pvh __attribute__((section(".data"))) = 0; struct boot_params pvh_bootparams __attribute__((section(".data"))); struct hvm_start_info pvh_start_info; diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index 7e8145b33997..ef6481a83768 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -21,6 +21,7 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o +obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c new file mode 100644 index ..4b4e9cc78b8a --- /dev/null +++ b/arch/x86/xen/enlighten_pvh.c @@ -0,0 +1,11 @@ +#include + +/* + * PVH variables. + * + * The variables xen_pvh and pvh_bootparams need to live in the data segment + * since they are used after startup_{32|64} is invoked, which will clear the + * .bss segment. + */ +bool xen_pvh __attribute__((section(".data"))) = 0; + -- 2.16.1
[RFC PATCH v4 4/7] xen/pvh: Move Xen specific PVH VM initialization out of common code
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. This patch moves the small block of code used for initializing Xen PVH virtual machines into the Xen specific file. This initialization is not going to be needed for Qemu/KVM guests. Moving it out of the common file is going to allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- arch/x86/pvh.c | 28 arch/x86/xen/enlighten_pvh.c | 18 +- 2 files changed, 37 insertions(+), 9 deletions(-) diff --git a/arch/x86/pvh.c b/arch/x86/pvh.c index b56cb5e7d6ac..2d7a7f4958cb 100644 --- a/arch/x86/pvh.c +++ b/arch/x86/pvh.c @@ -72,26 +72,38 @@ static void __init init_pvh_bootparams(void) pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ } +/* + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide the required override for this routine. + */ +void __init __weak xen_pvh_init(void) +{ + xen_raw_printk("Error: Missing xen PVH initialization\n"); + BUG(); +} + +/* + * When we add support for other hypervisors like Qemu/KVM, this routine can + * selectively invoke the appropriate initialization based on guest type. + */ +static void hypervisor_specific_init(void) +{ + xen_pvh_init(); +} + /* * This routine (and those that it might call) should not use * anything that lives in .bss since that segment will be cleared later. */ void __init xen_prepare_pvh(void) { - u32 msr; - u64 pfn; - if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", pvh_start_info.magic); BUG(); } - xen_pvh = 1; - - msr = cpuid_ebx(xen_cpuid_base() + 2); - pfn = __pa(hypercall_page); - wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); + hypervisor_specific_init(); init_pvh_bootparams(); } diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 4b4e9cc78b8a..833c441a20df 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,4 +1,9 @@ -#include +#include + +#include + +#include +#include /* * PVH variables. @@ -9,3 +14,14 @@ */ bool xen_pvh __attribute__((section(".data"))) = 0; +void __init xen_pvh_init(void) +{ + u32 msr; + u64 pfn; + + xen_pvh = 1; + + msr = cpuid_ebx(xen_cpuid_base() + 2); + pfn = __pa(hypercall_page); + wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); +} -- 2.16.1
[RFC PATCH v4 5/7] xen/pvh: Move Xen code for getting mem map via hcall out of common file
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The original design for PVH entry in Xen guests relies on being able to obtain the memory map from the hypervisor using a hypercall. When we extend the PVH entry ABI to support other hypervisors like Qemu/KVM, a new mechanism will be added that allows the guest to get the memory map without needing to use hypercalls. For Xen guests, the hypercall approach will still be supported. In preparation for adding support for other hypervisors, we can move the code that uses hypercalls into the Xen specific file. This will allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- arch/x86/pvh.c | 28 ++-- arch/x86/xen/enlighten_pvh.c | 20 2 files changed, 34 insertions(+), 14 deletions(-) diff --git a/arch/x86/pvh.c b/arch/x86/pvh.c index 2d7a7f4958cb..6e9f6a6e97b3 100644 --- a/arch/x86/pvh.c +++ b/arch/x86/pvh.c @@ -7,9 +7,6 @@ #include #include -#include -#include - #include #include @@ -24,21 +21,24 @@ struct boot_params pvh_bootparams __attribute__((section(".data"))); struct hvm_start_info pvh_start_info; unsigned int pvh_start_info_sz = sizeof(pvh_start_info); +/* + * Xen guests are able to obtain the memory map from the hypervisor via the + * HYPERVISOR_memory_op hypercall. + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide an override for this routine to do + * just that. + */ +void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) +{ + xen_raw_printk("Error: Could not find memory map\n"); + BUG(); +} + static void __init init_pvh_bootparams(void) { - struct xen_memory_map memmap; - int rc; - memset(_bootparams, 0, sizeof(pvh_bootparams)); - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); - rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); - if (rc) { - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); - BUG(); - } - pvh_bootparams.e820_entries = memmap.nr_entries; + mem_map_via_hcall(_bootparams); if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 833c441a20df..3a830caef8ee 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,10 +1,15 @@ #include +#include + #include +#include #include #include +#include + /* * PVH variables. * @@ -25,3 +30,18 @@ void __init xen_pvh_init(void) pfn = __pa(hypercall_page); wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); } + +void __init mem_map_via_hcall(struct boot_params *boot_params_p) +{ + struct xen_memory_map memmap; + int rc; + + memmap.nr_entries = ARRAY_SIZE(boot_params_p->e820_table); + set_xen_guest_handle(memmap.buffer, boot_params_p->e820_table); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); + if (rc) { + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + BUG(); + } + boot_params_p->e820_entries = memmap.nr_entries; +} -- 2.16.1
[PATCH v6 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch series would enable Qemu to use that same entry point for booting KVM guests. Changes from v5: * The interface changes to the x86/HVM start info layout have now been accepted into the Xen tree. * Rebase and merge upstream PVH file changes. * (Patch 6) Synced up to the final version of the header file that was acked and pulled into the Xen tree. * (Patch 1) Fixed typo and removed redundant "def_bool n" line. Changes from v4: Note: I've withheld Juergen's earlier "Reviewed-by" tags from patches 1 and 7 since there were minor changes (mostly just addition of CONFIG_KVM_GUEST_PVH as requested) that came afterwards. * Changed subject prefix from RFC to PATCH * Added CONFIG_KVM_GUEST_PVH as suggested * Relocated the PVH common files to arch/x86/platform/pvh/{enlighten.c,head.S} * Realized I also needed to move the objtool override for those files * Updated a few code comments per reviewer feedback * Sent out a patch of the hvm_start_info struct changes against the Xen tree since that is the canonical copy of the header. Discussions on that thread have resulted in some (non-functional) updates to start_info.h (patch 6/7) and those changes are reflected here as well in order to keep the files in sync. The header file has since been ack'ed for the Xen tree by Jan Beulich. Changes from v3: * Implemented Juergen's suggestion for refactoring and moving the PVH code so that CONFIG_XEN is no longer required for booting KVM guests via the PVH entry point. Functionally, nothing has changed from V3 really, but the patches look completely different now because of all the code movement and refactoring. Some of these patches can be combined, but I've left them very small in some cases to make the refactoring and code movement easier to review. My approach for refactoring has been to create a PVH entry layer that still has understanding and knowledge about Xen vs non-Xen guest types so that it can make run time decisions to handle either case, as opposed to going all the way and re-writing it to be a completely hypervisor agnostic and architecturally pure layer that is separate from guest type details. The latter seemed a bit overkill in this situation. And I've handled the complexity of having to support Qemu/KVM boot of kernels compiled with or without CONFIG_XEN via a pair of xen specific __weak routines that can be overridden in kernels that support Xen guests. Importantly, the __weak routines are for xen specific code only (not generic "guest type" specific code) so there is no clashing between xen version of the strong routine and, say, a KVM version of the same routine. But I'm sure there are many ways to skin this cat, so I'm open to alternate suggestions if there is a compelling reason for not using __weak in this situation. Changes from v2: * All structures (including memory map table entries) are padded and aligned to an 8 byte boundary. * Removed the "packed" attributes and made changes to comments as suggested by Jan. Changes from v1: * Adopted Paolo's suggestion for defining a v2 PVH ABI that includes the e820 map instead of using the second module entry to pass the table. * Cleaned things up a bit to reduce the number of xen vs non-xen special cases. Maran Wilson (7): xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH xen/pvh: Move PVH entry code out of Xen specific tree xen/pvh: Create a new file for Xen specific PVH code xen/pvh: Move Xen specific PVH VM initialization out of common file xen/pvh: Move Xen code for getting mem map via hcall out of common file xen/pvh: Add memory map pointer to hvm_start_info struct KVM: x86: Allow Qemu/KVM to use PVH entry point MAINTAINERS | 1 + arch/x86/Kbuild | 2 + arch/x86/Kconfig| 14 +++ arch/x86/kernel/head_64.S | 2 +- arch/x86/platform/pvh/Makefile | 5 + arch/x86/platform/pvh/enlighten.c | 138 arch/x86/{xen/xen-pvh.S => platform/pvh/head.S} | 0 arch/x86/xen/Kconfig| 3 +- arch/x86/xen/Makefile | 2 - arch/x86/xen/enlighten_pvh.c| 94 +++- include/xen/interface/hvm/start_info.h | 63 ++- 11 files changed, 242
[PATCH v6 3/7] xen/pvh: Create a new file for Xen specific PVH code
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The first step in that direction is to create a new file that will eventually hold the Xen specific routines. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- arch/x86/platform/pvh/enlighten.c | 5 ++--- arch/x86/xen/Makefile | 1 + arch/x86/xen/enlighten_pvh.c | 10 ++ 3 files changed, 13 insertions(+), 3 deletions(-) create mode 100644 arch/x86/xen/enlighten_pvh.c diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index aa1c6a6831a9..74ff1c3d2789 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -17,10 +17,9 @@ /* * PVH variables. * - * xen_pvh pvh_bootparams and pvh_start_info need to live in data segment - * since they are used after startup_{32|64}, which clear .bss, are invoked. + * pvh_bootparams and pvh_start_info need to live in the data segment since + * they are used after startup_{32|64}, which clear .bss, are invoked. */ -bool xen_pvh __attribute__((section(".data"))) = 0; struct boot_params pvh_bootparams __attribute__((section(".data"))); struct hvm_start_info pvh_start_info __attribute__((section(".data"))); diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index f1b850607212..ae5c6f1f0fe0 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -20,6 +20,7 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o +obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c new file mode 100644 index ..c5409c1f259f --- /dev/null +++ b/arch/x86/xen/enlighten_pvh.c @@ -0,0 +1,10 @@ +#include + +/* + * PVH variables. + * + * The variable xen_pvh needs to live in the data segment since it is used + * after startup_{32|64} is invoked, which will clear the .bss segment. + */ +bool xen_pvh __attribute__((section(".data"))) = 0; + -- 2.16.1
[PATCH v6 4/7] xen/pvh: Move Xen specific PVH VM initialization out of common file
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. This patch moves the small block of code used for initializing Xen PVH virtual machines into the Xen specific file. This initialization is not going to be needed for Qemu/KVM guests. Moving it out of the common file is going to allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> Reviewed-by: Juergen Gross <jgr...@suse.com> --- arch/x86/platform/pvh/enlighten.c | 28 arch/x86/xen/enlighten_pvh.c | 18 +- 2 files changed, 37 insertions(+), 9 deletions(-) diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index 74ff1c3d2789..edcff7de0529 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -80,26 +80,38 @@ static void __init init_pvh_bootparams(void) x86_init.acpi.get_root_pointer = pvh_get_root_pointer; } +/* + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide the required override for this routine. + */ +void __init __weak xen_pvh_init(void) +{ + xen_raw_printk("Error: Missing xen PVH initialization\n"); + BUG(); +} + +/* + * When we add support for other hypervisors like Qemu/KVM, this routine can + * selectively invoke the appropriate initialization based on guest type. + */ +static void hypervisor_specific_init(void) +{ + xen_pvh_init(); +} + /* * This routine (and those that it might call) should not use * anything that lives in .bss since that segment will be cleared later. */ void __init xen_prepare_pvh(void) { - u32 msr; - u64 pfn; - if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", pvh_start_info.magic); BUG(); } - xen_pvh = 1; - - msr = cpuid_ebx(xen_cpuid_base() + 2); - pfn = __pa(hypercall_page); - wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); + hypervisor_specific_init(); init_pvh_bootparams(); } diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index c5409c1f259f..08fc63d14ae5 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,4 +1,9 @@ -#include +#include + +#include + +#include +#include /* * PVH variables. @@ -8,3 +13,14 @@ */ bool xen_pvh __attribute__((section(".data"))) = 0; +void __init xen_pvh_init(void) +{ + u32 msr; + u64 pfn; + + xen_pvh = 1; + + msr = cpuid_ebx(xen_cpuid_base() + 2); + pfn = __pa(hypercall_page); + wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); +} -- 2.16.1
[PATCH v6 6/7] xen/pvh: Add memory map pointer to hvm_start_info struct
The start info structure that is defined as part of the x86/HVM direct boot ABI and used for starting Xen PVH guests would be more versatile if it also included a way to pass information about the memory map to the guest. This would allow KVM guests to share the same entry point. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- include/xen/interface/hvm/start_info.h | 63 +- 1 file changed, 62 insertions(+), 1 deletion(-) diff --git a/include/xen/interface/hvm/start_info.h b/include/xen/interface/hvm/start_info.h index 648415976ead..50af9ea2ff1e 100644 --- a/include/xen/interface/hvm/start_info.h +++ b/include/xen/interface/hvm/start_info.h @@ -33,7 +33,7 @@ *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE *|| ("xEn3" with the 0x80 bit of the "E" set). * 4 ++ - *| version| Version of this structure. Current version is 0. New + *| version| Version of this structure. Current version is 1. New *|| versions are guaranteed to be backwards-compatible. * 8 ++ *| flags | SIF_xxx flags. @@ -48,6 +48,15 @@ * 32 ++ *| rsdp_paddr | Physical address of the RSDP ACPI data structure. * 40 ++ + *| memmap_paddr | Physical address of the (optional) memory map. Only + *|| present in version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Zero + *|| if there is no memory map being provided. Only + *|| present in version 1 and newer of the structure. + * 52 ++ + *| reserved | Version 1 and newer only. + * 56 ++ * * The layout of each entry in the module structure is the following: * @@ -62,13 +71,51 @@ *| reserved | * 32 ++ * + * The layout of each entry in the memory map table is as follows: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping in bytes + * 16 ++ + *| type | Type of mapping as defined between the hypervisor + *|| and guest. See XEN_HVM_MEMMAP_TYPE_* values below. + * 20 +| + *| reserved | + * 24 ++ + * * The address and sizes are always a 64bit little endian unsigned integer. * * NB: Xen on x86 will always try to place all the data below the 4GiB * boundary. + * + * Version numbers of the hvm_start_info structure have evolved like this: + * + * Version 0: Initial implementation. + * + * Version 1: Added the memmap_paddr/memmap_entries fields (plus 4 bytes of + * padding) to the end of the hvm_start_info struct. These new + * fields can be used to pass a memory map to the guest. The + * memory map is optional and so guests that understand version 1 + * of the structure must check that memmap_entries is non-zero + * before trying to read the memory map. */ #define XEN_HVM_START_MAGIC_VALUE 0x336ec578 +/* + * The values used in the type field of the memory map table entries are + * defined below and match the Address Range Types as defined in the "System + * Address Map Interfaces" section of the ACPI Specification. Please refer to + * section 15 in version 6.2 of the ACPI spec: http://uefi.org/specifications + */ +#define XEN_HVM_MEMMAP_TYPE_RAM 1 +#define XEN_HVM_MEMMAP_TYPE_RESERVED 2 +#define XEN_HVM_MEMMAP_TYPE_ACPI 3 +#define XEN_HVM_MEMMAP_TYPE_NVS 4 +#define XEN_HVM_MEMMAP_TYPE_UNUSABLE 5 +#define XEN_HVM_MEMMAP_TYPE_DISABLED 6 +#define XEN_HVM_MEMMAP_TYPE_PMEM 7 + /* * C representation of the x86/HVM start info layout. * @@ -86,6 +133,13 @@ struct hvm_start_info { uint64_t cmdline_paddr; /* Physical address of the command line. */ uint64_t rsdp_paddr;/* Physical address of the RSDP ACPI data*/ /* structure.*/ +/* All following fields only present in version 1 and newer */ +uint64_t memmap_paddr; /* Physical address of an array of */ +/* hvm_memmap_table_entry. */ +uint32_t memmap_entries;/* Number of entries in the memmap table.*/ +/* Value will be zero if there is no memory */ +/* map being provided. */ +uint32_t reserved; /* Must be zero. */ }; struct hvm_modlist_entry { @@ -95,4 +149,11 @@ struct hvm_modlist_entry { uint64_t reserved; }; +struct hvm_memmap_table_entry { +uint64_t addr;
[PATCH v6 7/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch enables Qemu to use that same entry point for booting KVM guests. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> Suggested-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> Suggested-by: Boris Ostrovsky <boris.ostrov...@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrov...@oracle.com> --- arch/x86/Kbuild | 2 +- arch/x86/Kconfig | 8 arch/x86/platform/pvh/Makefile| 4 ++-- arch/x86/platform/pvh/enlighten.c | 43 +-- 4 files changed, 43 insertions(+), 14 deletions(-) diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index 2089e4414300..c625f57472f7 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,7 +7,7 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ -obj-$(CONFIG_XEN_PVH) += platform/pvh/ +obj-$(CONFIG_PVH) += platform/pvh/ # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index e3b836d7ad09..1e6d83e181b5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -787,6 +787,14 @@ config PVH This option enables the PVH entry point for guest virtual machines as specified in the x86/HVM direct boot ABI. +config KVM_GUEST_PVH + bool "Support for running as a KVM PVH guest" + depends on KVM_GUEST + select PVH + ---help--- + This option enables starting KVM guests via the PVH entry point as + specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS diff --git a/arch/x86/platform/pvh/Makefile b/arch/x86/platform/pvh/Makefile index 9fd25efcd2a3..5dec5067c9fb 100644 --- a/arch/x86/platform/pvh/Makefile +++ b/arch/x86/platform/pvh/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 OBJECT_FILES_NON_STANDARD_head.o := y -obj-$(CONFIG_XEN_PVH) += enlighten.o -obj-$(CONFIG_XEN_PVH) += head.o +obj-$(CONFIG_PVH) += enlighten.o +obj-$(CONFIG_PVH) += head.o diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index efbceba8db4f..815a09ad625c 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -8,6 +8,9 @@ #include #include +#include +#include + #include #include @@ -40,11 +43,28 @@ void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) BUG(); } -static void __init init_pvh_bootparams(void) +static void __init init_pvh_bootparams(bool xen_guest) { memset(_bootparams, 0, sizeof(pvh_bootparams)); - mem_map_via_hcall(_bootparams); + if ((pvh_start_info.version > 0) && (pvh_start_info.memmap_entries)) { + struct hvm_memmap_table_entry *ep; + int i; + + ep = __va(pvh_start_info.memmap_paddr); + pvh_bootparams.e820_entries = pvh_start_info.memmap_entries; + + for (i = 0; i < pvh_bootparams.e820_entries ; i++, ep++) { + pvh_bootparams.e820_table[i].addr = ep->addr; + pvh_bootparams.e820_table[i].size = ep->size; + pvh_bootparams.e820_table[i].type = ep->type; + } + } else if (xen_guest) { + mem_map_via_hcall(_bootparams); + } else { + /* Non-xen guests are not supported by version 0 */ + BUG(); + } if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = @@ -75,7 +95,7 @@ static void __init init_pvh_bootparams(void) * environment (i.e. hardware_subarch 0). */ pvh_bootparams.hdr.version = 0x212; - pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ + pvh_bootparams.hdr.type_of_loader = ((xen_guest ? 0x9 : 0xb) << 4) | 0; x86_init.acpi.get_root_pointer = pvh_get_root_pointer; } @@ -90,13 +110,10 @@ void __init __weak xen_pvh_init(void) BUG(); } -/* - * When we add support for other hypervisors like Qemu/KVM, this routine can - * selectively invoke the appropriate initialization based on guest type. - */ -static void hypervisor_specific_init(void) +static void hypervisor_specific_init(bool xen_guest) { - xen_pvh_init(); + if (xen_guest) +
[PATCH v6 5/7] xen/pvh: Move Xen code for getting mem map via hcall out of common file
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The original design for PVH entry in Xen guests relies on being able to obtain the memory map from the hypervisor using a hypercall. When we extend the PVH entry ABI to support other hypervisors like Qemu/KVM, a new mechanism will be added that allows the guest to get the memory map without needing to use hypercalls. For Xen guests, the hypercall approach will still be supported. In preparation for adding support for other hypervisors, we can move the code that uses hypercalls into the Xen specific file. This will allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> Reviewed-by: Juergen Gross <jgr...@suse.com> --- arch/x86/platform/pvh/enlighten.c | 28 ++-- arch/x86/xen/enlighten_pvh.c | 20 2 files changed, 34 insertions(+), 14 deletions(-) diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index edcff7de0529..efbceba8db4f 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -8,9 +8,6 @@ #include #include -#include -#include - #include #include @@ -30,21 +27,24 @@ static u64 pvh_get_root_pointer(void) return pvh_start_info.rsdp_paddr; } +/* + * Xen guests are able to obtain the memory map from the hypervisor via the + * HYPERVISOR_memory_op hypercall. + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide an override for this routine to do + * just that. + */ +void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) +{ + xen_raw_printk("Error: Could not find memory map\n"); + BUG(); +} + static void __init init_pvh_bootparams(void) { - struct xen_memory_map memmap; - int rc; - memset(_bootparams, 0, sizeof(pvh_bootparams)); - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); - rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); - if (rc) { - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); - BUG(); - } - pvh_bootparams.e820_entries = memmap.nr_entries; + mem_map_via_hcall(_bootparams); if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 08fc63d14ae5..00658d4bc4f4 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,10 +1,15 @@ #include +#include + #include +#include #include #include +#include + /* * PVH variables. * @@ -24,3 +29,18 @@ void __init xen_pvh_init(void) pfn = __pa(hypercall_page); wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); } + +void __init mem_map_via_hcall(struct boot_params *boot_params_p) +{ + struct xen_memory_map memmap; + int rc; + + memmap.nr_entries = ARRAY_SIZE(boot_params_p->e820_table); + set_xen_guest_handle(memmap.buffer, boot_params_p->e820_table); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); + if (rc) { + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + BUG(); + } + boot_params_p->e820_entries = memmap.nr_entries; +} -- 2.16.1
[PATCH v6 1/7] xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
In order to pave the way for hypervisors other than Xen to use the PVH entry point for VMs, we need to factor the PVH entry code into Xen specific and hypervisor agnostic components. The first step in doing that, is to create a new config option for PVH entry that can be enabled independently from CONFIG_XEN. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- arch/x86/Kconfig | 6 ++ arch/x86/kernel/head_64.S | 2 +- arch/x86/xen/Kconfig | 3 ++- 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 27fede438959..e3b836d7ad09 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -781,6 +781,12 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config PVH + bool "Support for running PVH guests" + ---help--- + This option enables the PVH entry point for guest virtual machines + as specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 48385c1074a5..d83f2b110b47 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -385,7 +385,7 @@ NEXT_PAGE(early_dynamic_pgts) .data -#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) +#if defined(CONFIG_XEN_PV) || defined(CONFIG_PVH) NEXT_PGD_PAGE(init_top_pgt) .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC .orginit_top_pgt + L4_PAGE_OFFSET*8, 0 diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index c1f98f32c45f..5fccee76f44d 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -74,6 +74,7 @@ config XEN_DEBUG_FS Enabling this option may incur a significant performance overhead. config XEN_PVH - bool "Support for running as a PVH guest" + bool "Support for running as a Xen PVH guest" depends on XEN && XEN_PVHVM && ACPI + select PVH def_bool n -- 2.16.1
[PATCH v6 2/7] xen/pvh: Move PVH entry code out of Xen specific tree
Once hypervisors other than Xen start using the PVH entry point for starting VMs, we would like the option of being able to compile PVH entry capable kernels without enabling CONFIG_XEN and all the code that comes along with that. To allow that, we are moving the PVH code out of Xen and into files sitting at a higher level in the tree. This patch is not introducing any code or functional changes, just moving files from one location to another. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> --- MAINTAINERS| 1 + arch/x86/Kbuild| 2 ++ arch/x86/platform/pvh/Makefile | 5 + arch/x86/{xen/enlighten_pvh.c => platform/pvh/enlighten.c} | 0 arch/x86/{xen/xen-pvh.S => platform/pvh/head.S}| 0 arch/x86/xen/Makefile | 3 --- 6 files changed, 8 insertions(+), 3 deletions(-) create mode 100644 arch/x86/platform/pvh/Makefile rename arch/x86/{xen/enlighten_pvh.c => platform/pvh/enlighten.c} (100%) rename arch/x86/{xen/xen-pvh.S => platform/pvh/head.S} (100%) diff --git a/MAINTAINERS b/MAINTAINERS index 65ab509e4a42..52afae73beab 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15189,6 +15189,7 @@ L: xen-de...@lists.xenproject.org (moderated for non-subscribers) T: git git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git S: Supported F: arch/x86/xen/ +F: arch/x86/platform/pvh/ F: drivers/*/xen-*front.c F: drivers/xen/ F: arch/x86/include/asm/xen/ diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index 0038a2d10a7a..2089e4414300 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,6 +7,8 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ +obj-$(CONFIG_XEN_PVH) += platform/pvh/ + # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/platform/pvh/Makefile b/arch/x86/platform/pvh/Makefile new file mode 100644 index ..9fd25efcd2a3 --- /dev/null +++ b/arch/x86/platform/pvh/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 +OBJECT_FILES_NON_STANDARD_head.o := y + +obj-$(CONFIG_XEN_PVH) += enlighten.o +obj-$(CONFIG_XEN_PVH) += head.o diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/platform/pvh/enlighten.c similarity index 100% rename from arch/x86/xen/enlighten_pvh.c rename to arch/x86/platform/pvh/enlighten.c diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/platform/pvh/head.S similarity index 100% rename from arch/x86/xen/xen-pvh.S rename to arch/x86/platform/pvh/head.S diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index d83cb5478f54..f1b850607212 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -1,6 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 OBJECT_FILES_NON_STANDARD_xen-asm_$(BITS).o := y -OBJECT_FILES_NON_STANDARD_xen-pvh.o := y ifdef CONFIG_FUNCTION_TRACER # Do not profile debug and lowlevel utilities @@ -21,7 +20,6 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o -obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o @@ -33,4 +31,3 @@ obj-$(CONFIG_XEN_DEBUG_FS)+= debugfs.o obj-$(CONFIG_XEN_DOM0) += vga.o obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o obj-$(CONFIG_XEN_EFI) += efi.o -obj-$(CONFIG_XEN_PVH) += xen-pvh.o -- 2.16.1
[PATCH v7 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch series would enable Qemu to use that same entry point for booting KVM guests. Changes from v6: * Addressed issues caught by the kbuild test robot: - Restored an #include line that had been dropped by mistake (patch 4) - Removed a pair of #include lines that were no longer needed in a common code file and causing problems for certain 32-bit configs (patchs 4 and 7) Changes from v5: * The interface changes to the x86/HVM start info layout have now been accepted into the Xen tree. * Rebase and merge upstream PVH file changes. * (Patch 6) Synced up to the final version of the header file that was acked and pulled into the Xen tree. * (Patch 1) Fixed typo and removed redundant "def_bool n" line. Changes from v4: Note: I've withheld Juergen's earlier "Reviewed-by" tags from patches 1 and 7 since there were minor changes (mostly just addition of CONFIG_KVM_GUEST_PVH as requested) that came afterwards. * Changed subject prefix from RFC to PATCH * Added CONFIG_KVM_GUEST_PVH as suggested * Relocated the PVH common files to arch/x86/platform/pvh/{enlighten.c,head.S} * Realized I also needed to move the objtool override for those files * Updated a few code comments per reviewer feedback * Sent out a patch of the hvm_start_info struct changes against the Xen tree since that is the canonical copy of the header. Discussions on that thread have resulted in some (non-functional) updates to start_info.h (patch 6/7) and those changes are reflected here as well in order to keep the files in sync. The header file has since been ack'ed for the Xen tree by Jan Beulich. Changes from v3: * Implemented Juergen's suggestion for refactoring and moving the PVH code so that CONFIG_XEN is no longer required for booting KVM guests via the PVH entry point. Functionally, nothing has changed from V3 really, but the patches look completely different now because of all the code movement and refactoring. Some of these patches can be combined, but I've left them very small in some cases to make the refactoring and code movement easier to review. My approach for refactoring has been to create a PVH entry layer that still has understanding and knowledge about Xen vs non-Xen guest types so that it can make run time decisions to handle either case, as opposed to going all the way and re-writing it to be a completely hypervisor agnostic and architecturally pure layer that is separate from guest type details. The latter seemed a bit overkill in this situation. And I've handled the complexity of having to support Qemu/KVM boot of kernels compiled with or without CONFIG_XEN via a pair of xen specific __weak routines that can be overridden in kernels that support Xen guests. Importantly, the __weak routines are for xen specific code only (not generic "guest type" specific code) so there is no clashing between xen version of the strong routine and, say, a KVM version of the same routine. But I'm sure there are many ways to skin this cat, so I'm open to alternate suggestions if there is a compelling reason for not using __weak in this situation. Changes from v2: * All structures (including memory map table entries) are padded and aligned to an 8 byte boundary. * Removed the "packed" attributes and made changes to comments as suggested by Jan. Changes from v1: * Adopted Paolo's suggestion for defining a v2 PVH ABI that includes the e820 map instead of using the second module entry to pass the table. * Cleaned things up a bit to reduce the number of xen vs non-xen special cases. Maran Wilson (7): xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH xen/pvh: Move PVH entry code out of Xen specific tree xen/pvh: Create a new file for Xen specific PVH code xen/pvh: Move Xen specific PVH VM initialization out of common file xen/pvh: Move Xen code for getting mem map via hcall out of common file xen/pvh: Add memory map pointer to hvm_start_info struct KVM: x86: Allow Qemu/KVM to use PVH entry point MAINTAINERS | 1 + arch/x86/Kbuild | 2 + arch/x86/Kconfig| 14 +++ arch/x86/kernel/head_64.S | 2 +- arch/x86/platform/pvh/Makefile | 5 + arch/x86/platform/pvh/enlighten.c | 136 arch/x86/{xen/xen-pvh.S
[PATCH v7 1/7] xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
In order to pave the way for hypervisors other than Xen to use the PVH entry point for VMs, we need to factor the PVH entry code into Xen specific and hypervisor agnostic components. The first step in doing that, is to create a new config option for PVH entry that can be enabled independently from CONFIG_XEN. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- arch/x86/Kconfig | 6 ++ arch/x86/kernel/head_64.S | 2 +- arch/x86/xen/Kconfig | 3 ++- 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d234cca296db..8511d419e39f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -781,6 +781,12 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config PVH + bool "Support for running PVH guests" + ---help--- + This option enables the PVH entry point for guest virtual machines + as specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 48385c1074a5..d83f2b110b47 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -385,7 +385,7 @@ NEXT_PAGE(early_dynamic_pgts) .data -#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) +#if defined(CONFIG_XEN_PV) || defined(CONFIG_PVH) NEXT_PGD_PAGE(init_top_pgt) .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC .orginit_top_pgt + L4_PAGE_OFFSET*8, 0 diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index c1f98f32c45f..5fccee76f44d 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -74,6 +74,7 @@ config XEN_DEBUG_FS Enabling this option may incur a significant performance overhead. config XEN_PVH - bool "Support for running as a PVH guest" + bool "Support for running as a Xen PVH guest" depends on XEN && XEN_PVHVM && ACPI + select PVH def_bool n -- 2.16.1
[PATCH v7 2/7] xen/pvh: Move PVH entry code out of Xen specific tree
Once hypervisors other than Xen start using the PVH entry point for starting VMs, we would like the option of being able to compile PVH entry capable kernels without enabling CONFIG_XEN and all the code that comes along with that. To allow that, we are moving the PVH code out of Xen and into files sitting at a higher level in the tree. This patch is not introducing any code or functional changes, just moving files from one location to another. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> --- MAINTAINERS| 1 + arch/x86/Kbuild| 2 ++ arch/x86/platform/pvh/Makefile | 5 + arch/x86/{xen/enlighten_pvh.c => platform/pvh/enlighten.c} | 0 arch/x86/{xen/xen-pvh.S => platform/pvh/head.S}| 0 arch/x86/xen/Makefile | 3 --- 6 files changed, 8 insertions(+), 3 deletions(-) create mode 100644 arch/x86/platform/pvh/Makefile rename arch/x86/{xen/enlighten_pvh.c => platform/pvh/enlighten.c} (100%) rename arch/x86/{xen/xen-pvh.S => platform/pvh/head.S} (100%) diff --git a/MAINTAINERS b/MAINTAINERS index 7bb2e9595f14..0b816f588fe1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15385,6 +15385,7 @@ L: xen-de...@lists.xenproject.org (moderated for non-subscribers) T: git git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git S: Supported F: arch/x86/xen/ +F: arch/x86/platform/pvh/ F: drivers/*/xen-*front.c F: drivers/xen/ F: arch/x86/include/asm/xen/ diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index 0038a2d10a7a..2089e4414300 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,6 +7,8 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ +obj-$(CONFIG_XEN_PVH) += platform/pvh/ + # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/platform/pvh/Makefile b/arch/x86/platform/pvh/Makefile new file mode 100644 index ..9fd25efcd2a3 --- /dev/null +++ b/arch/x86/platform/pvh/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 +OBJECT_FILES_NON_STANDARD_head.o := y + +obj-$(CONFIG_XEN_PVH) += enlighten.o +obj-$(CONFIG_XEN_PVH) += head.o diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/platform/pvh/enlighten.c similarity index 100% rename from arch/x86/xen/enlighten_pvh.c rename to arch/x86/platform/pvh/enlighten.c diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/platform/pvh/head.S similarity index 100% rename from arch/x86/xen/xen-pvh.S rename to arch/x86/platform/pvh/head.S diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index d83cb5478f54..f1b850607212 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -1,6 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 OBJECT_FILES_NON_STANDARD_xen-asm_$(BITS).o := y -OBJECT_FILES_NON_STANDARD_xen-pvh.o := y ifdef CONFIG_FUNCTION_TRACER # Do not profile debug and lowlevel utilities @@ -21,7 +20,6 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o -obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o @@ -33,4 +31,3 @@ obj-$(CONFIG_XEN_DEBUG_FS)+= debugfs.o obj-$(CONFIG_XEN_DOM0) += vga.o obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o obj-$(CONFIG_XEN_EFI) += efi.o -obj-$(CONFIG_XEN_PVH) += xen-pvh.o -- 2.16.1
[PATCH v7 6/7] xen/pvh: Add memory map pointer to hvm_start_info struct
The start info structure that is defined as part of the x86/HVM direct boot ABI and used for starting Xen PVH guests would be more versatile if it also included a way to pass information about the memory map to the guest. This would allow KVM guests to share the same entry point. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- include/xen/interface/hvm/start_info.h | 63 +- 1 file changed, 62 insertions(+), 1 deletion(-) diff --git a/include/xen/interface/hvm/start_info.h b/include/xen/interface/hvm/start_info.h index 648415976ead..50af9ea2ff1e 100644 --- a/include/xen/interface/hvm/start_info.h +++ b/include/xen/interface/hvm/start_info.h @@ -33,7 +33,7 @@ *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE *|| ("xEn3" with the 0x80 bit of the "E" set). * 4 ++ - *| version| Version of this structure. Current version is 0. New + *| version| Version of this structure. Current version is 1. New *|| versions are guaranteed to be backwards-compatible. * 8 ++ *| flags | SIF_xxx flags. @@ -48,6 +48,15 @@ * 32 ++ *| rsdp_paddr | Physical address of the RSDP ACPI data structure. * 40 ++ + *| memmap_paddr | Physical address of the (optional) memory map. Only + *|| present in version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Zero + *|| if there is no memory map being provided. Only + *|| present in version 1 and newer of the structure. + * 52 ++ + *| reserved | Version 1 and newer only. + * 56 ++ * * The layout of each entry in the module structure is the following: * @@ -62,13 +71,51 @@ *| reserved | * 32 ++ * + * The layout of each entry in the memory map table is as follows: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping in bytes + * 16 ++ + *| type | Type of mapping as defined between the hypervisor + *|| and guest. See XEN_HVM_MEMMAP_TYPE_* values below. + * 20 +| + *| reserved | + * 24 ++ + * * The address and sizes are always a 64bit little endian unsigned integer. * * NB: Xen on x86 will always try to place all the data below the 4GiB * boundary. + * + * Version numbers of the hvm_start_info structure have evolved like this: + * + * Version 0: Initial implementation. + * + * Version 1: Added the memmap_paddr/memmap_entries fields (plus 4 bytes of + * padding) to the end of the hvm_start_info struct. These new + * fields can be used to pass a memory map to the guest. The + * memory map is optional and so guests that understand version 1 + * of the structure must check that memmap_entries is non-zero + * before trying to read the memory map. */ #define XEN_HVM_START_MAGIC_VALUE 0x336ec578 +/* + * The values used in the type field of the memory map table entries are + * defined below and match the Address Range Types as defined in the "System + * Address Map Interfaces" section of the ACPI Specification. Please refer to + * section 15 in version 6.2 of the ACPI spec: http://uefi.org/specifications + */ +#define XEN_HVM_MEMMAP_TYPE_RAM 1 +#define XEN_HVM_MEMMAP_TYPE_RESERVED 2 +#define XEN_HVM_MEMMAP_TYPE_ACPI 3 +#define XEN_HVM_MEMMAP_TYPE_NVS 4 +#define XEN_HVM_MEMMAP_TYPE_UNUSABLE 5 +#define XEN_HVM_MEMMAP_TYPE_DISABLED 6 +#define XEN_HVM_MEMMAP_TYPE_PMEM 7 + /* * C representation of the x86/HVM start info layout. * @@ -86,6 +133,13 @@ struct hvm_start_info { uint64_t cmdline_paddr; /* Physical address of the command line. */ uint64_t rsdp_paddr;/* Physical address of the RSDP ACPI data*/ /* structure.*/ +/* All following fields only present in version 1 and newer */ +uint64_t memmap_paddr; /* Physical address of an array of */ +/* hvm_memmap_table_entry. */ +uint32_t memmap_entries;/* Number of entries in the memmap table.*/ +/* Value will be zero if there is no memory */ +/* map being provided. */ +uint32_t reserved; /* Must be zero. */ }; struct hvm_modlist_entry { @@ -95,4 +149,11 @@ struct hvm_modlist_entry { uint64_t reserved; }; +struct hvm_memmap_table_entry { +uint64_t addr;
[PATCH v7 7/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch enables Qemu to use that same entry point for booting KVM guests. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> Suggested-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> Suggested-by: Boris Ostrovsky <boris.ostrov...@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrov...@oracle.com> --- arch/x86/Kbuild | 2 +- arch/x86/Kconfig | 8 arch/x86/platform/pvh/Makefile| 4 ++-- arch/x86/platform/pvh/enlighten.c | 42 +-- 4 files changed, 42 insertions(+), 14 deletions(-) diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index 2089e4414300..c625f57472f7 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,7 +7,7 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ -obj-$(CONFIG_XEN_PVH) += platform/pvh/ +obj-$(CONFIG_PVH) += platform/pvh/ # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8511d419e39f..26fef538d3ef 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -787,6 +787,14 @@ config PVH This option enables the PVH entry point for guest virtual machines as specified in the x86/HVM direct boot ABI. +config KVM_GUEST_PVH + bool "Support for running as a KVM PVH guest" + depends on KVM_GUEST + select PVH + ---help--- + This option enables starting KVM guests via the PVH entry point as + specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS diff --git a/arch/x86/platform/pvh/Makefile b/arch/x86/platform/pvh/Makefile index 9fd25efcd2a3..5dec5067c9fb 100644 --- a/arch/x86/platform/pvh/Makefile +++ b/arch/x86/platform/pvh/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 OBJECT_FILES_NON_STANDARD_head.o := y -obj-$(CONFIG_XEN_PVH) += enlighten.o -obj-$(CONFIG_XEN_PVH) += head.o +obj-$(CONFIG_PVH) += enlighten.o +obj-$(CONFIG_PVH) += head.o diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index c42a9f36ee9c..0c7f570d3c16 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -8,6 +8,8 @@ #include #include +#include + #include /* @@ -39,11 +41,28 @@ void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) BUG(); } -static void __init init_pvh_bootparams(void) +static void __init init_pvh_bootparams(bool xen_guest) { memset(_bootparams, 0, sizeof(pvh_bootparams)); - mem_map_via_hcall(_bootparams); + if ((pvh_start_info.version > 0) && (pvh_start_info.memmap_entries)) { + struct hvm_memmap_table_entry *ep; + int i; + + ep = __va(pvh_start_info.memmap_paddr); + pvh_bootparams.e820_entries = pvh_start_info.memmap_entries; + + for (i = 0; i < pvh_bootparams.e820_entries ; i++, ep++) { + pvh_bootparams.e820_table[i].addr = ep->addr; + pvh_bootparams.e820_table[i].size = ep->size; + pvh_bootparams.e820_table[i].type = ep->type; + } + } else if (xen_guest) { + mem_map_via_hcall(_bootparams); + } else { + /* Non-xen guests are not supported by version 0 */ + BUG(); + } if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = @@ -74,7 +93,7 @@ static void __init init_pvh_bootparams(void) * environment (i.e. hardware_subarch 0). */ pvh_bootparams.hdr.version = 0x212; - pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ + pvh_bootparams.hdr.type_of_loader = ((xen_guest ? 0x9 : 0xb) << 4) | 0; x86_init.acpi.get_root_pointer = pvh_get_root_pointer; } @@ -89,13 +108,10 @@ void __init __weak xen_pvh_init(void) BUG(); } -/* - * When we add support for other hypervisors like Qemu/KVM, this routine can - * selectively invoke the appropriate initialization based on guest type. - */ -static void hypervisor_specific_init(void) +static void hypervisor_specific_init(bool xen_guest) { - xen_pvh_init(); + if (xen_guest) + xen_pvh_init();
[PATCH v7 5/7] xen/pvh: Move Xen code for getting mem map via hcall out of common file
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The original design for PVH entry in Xen guests relies on being able to obtain the memory map from the hypervisor using a hypercall. When we extend the PVH entry ABI to support other hypervisors like Qemu/KVM, a new mechanism will be added that allows the guest to get the memory map without needing to use hypercalls. For Xen guests, the hypercall approach will still be supported. In preparation for adding support for other hypervisors, we can move the code that uses hypercalls into the Xen specific file. This will allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> Reviewed-by: Juergen Gross <jgr...@suse.com> --- arch/x86/platform/pvh/enlighten.c | 29 ++--- arch/x86/xen/enlighten_pvh.c | 20 2 files changed, 34 insertions(+), 15 deletions(-) diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index edcff7de0529..c42a9f36ee9c 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -8,10 +8,6 @@ #include #include -#include -#include - -#include #include /* @@ -30,21 +26,24 @@ static u64 pvh_get_root_pointer(void) return pvh_start_info.rsdp_paddr; } +/* + * Xen guests are able to obtain the memory map from the hypervisor via the + * HYPERVISOR_memory_op hypercall. + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide an override for this routine to do + * just that. + */ +void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) +{ + xen_raw_printk("Error: Could not find memory map\n"); + BUG(); +} + static void __init init_pvh_bootparams(void) { - struct xen_memory_map memmap; - int rc; - memset(_bootparams, 0, sizeof(pvh_bootparams)); - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); - rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); - if (rc) { - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); - BUG(); - } - pvh_bootparams.e820_entries = memmap.nr_entries; + mem_map_via_hcall(_bootparams); if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index bb5784f354b8..0141dd1d21e2 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,11 +1,16 @@ #include +#include + #include #include +#include #include #include +#include + /* * PVH variables. * @@ -25,3 +30,18 @@ void __init xen_pvh_init(void) pfn = __pa(hypercall_page); wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); } + +void __init mem_map_via_hcall(struct boot_params *boot_params_p) +{ + struct xen_memory_map memmap; + int rc; + + memmap.nr_entries = ARRAY_SIZE(boot_params_p->e820_table); + set_xen_guest_handle(memmap.buffer, boot_params_p->e820_table); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); + if (rc) { + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + BUG(); + } + boot_params_p->e820_entries = memmap.nr_entries; +} -- 2.16.1
[PATCH v7 4/7] xen/pvh: Move Xen specific PVH VM initialization out of common file
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. This patch moves the small block of code used for initializing Xen PVH virtual machines into the Xen specific file. This initialization is not going to be needed for Qemu/KVM guests. Moving it out of the common file is going to allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> Reviewed-by: Juergen Gross <jgr...@suse.com> --- arch/x86/platform/pvh/enlighten.c | 28 arch/x86/xen/enlighten_pvh.c | 20 +++- 2 files changed, 39 insertions(+), 9 deletions(-) diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index 74ff1c3d2789..edcff7de0529 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -80,26 +80,38 @@ static void __init init_pvh_bootparams(void) x86_init.acpi.get_root_pointer = pvh_get_root_pointer; } +/* + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide the required override for this routine. + */ +void __init __weak xen_pvh_init(void) +{ + xen_raw_printk("Error: Missing xen PVH initialization\n"); + BUG(); +} + +/* + * When we add support for other hypervisors like Qemu/KVM, this routine can + * selectively invoke the appropriate initialization based on guest type. + */ +static void hypervisor_specific_init(void) +{ + xen_pvh_init(); +} + /* * This routine (and those that it might call) should not use * anything that lives in .bss since that segment will be cleared later. */ void __init xen_prepare_pvh(void) { - u32 msr; - u64 pfn; - if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", pvh_start_info.magic); BUG(); } - xen_pvh = 1; - - msr = cpuid_ebx(xen_cpuid_base() + 2); - pfn = __pa(hypercall_page); - wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); + hypervisor_specific_init(); init_pvh_bootparams(); } diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 313fe499065e..bb5784f354b8 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,4 +1,10 @@ -#include +#include + +#include +#include + +#include +#include /* * PVH variables. @@ -7,3 +13,15 @@ * after startup_{32|64} is invoked, which will clear the .bss segment. */ bool xen_pvh __attribute__((section(".data"))) = 0; + +void __init xen_pvh_init(void) +{ + u32 msr; + u64 pfn; + + xen_pvh = 1; + + msr = cpuid_ebx(xen_cpuid_base() + 2); + pfn = __pa(hypercall_page); + wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); +} -- 2.16.1
[PATCH v7 3/7] xen/pvh: Create a new file for Xen specific PVH code
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The first step in that direction is to create a new file that will eventually hold the Xen specific routines. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- arch/x86/platform/pvh/enlighten.c | 5 ++--- arch/x86/xen/Makefile | 1 + arch/x86/xen/enlighten_pvh.c | 9 + 3 files changed, 12 insertions(+), 3 deletions(-) create mode 100644 arch/x86/xen/enlighten_pvh.c diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index aa1c6a6831a9..74ff1c3d2789 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -17,10 +17,9 @@ /* * PVH variables. * - * xen_pvh pvh_bootparams and pvh_start_info need to live in data segment - * since they are used after startup_{32|64}, which clear .bss, are invoked. + * pvh_bootparams and pvh_start_info need to live in the data segment since + * they are used after startup_{32|64}, which clear .bss, are invoked. */ -bool xen_pvh __attribute__((section(".data"))) = 0; struct boot_params pvh_bootparams __attribute__((section(".data"))); struct hvm_start_info pvh_start_info __attribute__((section(".data"))); diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index f1b850607212..ae5c6f1f0fe0 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -20,6 +20,7 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o +obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c new file mode 100644 index ..313fe499065e --- /dev/null +++ b/arch/x86/xen/enlighten_pvh.c @@ -0,0 +1,9 @@ +#include + +/* + * PVH variables. + * + * The variable xen_pvh needs to live in the data segment since it is used + * after startup_{32|64} is invoked, which will clear the .bss segment. + */ +bool xen_pvh __attribute__((section(".data"))) = 0; -- 2.16.1
Re: [PATCH v5 1/7] xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
On 3/20/2018 12:23 PM, Randy Dunlap wrote: Hi, On 03/20/2018 12:18 PM, Maran Wilson wrote: In order to pave the way for hypervisors other then Xen to use the PVH than entry point for VMs, we need to factor the PVH entry code into Xen specific and hypervisor agnostic components. The first step in doing that, is to create a new config option for PVH entry that can be enabled independently from CONFIG_XEN. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- arch/x86/Kconfig | 7 +++ arch/x86/kernel/head_64.S | 4 ++-- arch/x86/xen/Kconfig | 3 ++- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index eb7f43f23521..58831320b5d2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -791,6 +791,13 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config PVH + bool "Support for running PVH guests" + def_bool n You don't need two (2) "bool"s here. And 'n' is already the default, so just drop the second line. + ---help--- + This option enables the PVH entry point for guest virtual machines + as specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS Hi Randy, Will make both changes. Thanks, -Maran
[PATCH v5 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch series would enable Qemu to use that same entry point for booting KVM guests. Note: I've withheld Juergen's earlier "Reviewed-by" tags from patches 1 and 7 since there were minor changes (mostly just addition of CONFIG_KVM_GUEST_PVH as requested) that came afterwards. Changes from v4: * Changed subject prefix from RFC to PATCH * Added CONFIG_KVM_GUEST_PVH as suggested * Relocated the PVH common files to arch/x86/platform/pvh/{enlighten.c,head.S} * Realized I also needed to move the objtool override for those files * Updated a few code comments per reviewer feedback * Sent out a patch of the hvm_start_info struct changes against the Xen tree since that is the canonical copy of the header. Discussions on that thread have resulted in some (non-functional) updates to start_info.h (patch 6/7) and those changes are reflected here as well in order to keep the files in sync. The header file has since been ack'ed for the Xen tree by Jan Beulich. https://lists.xenproject.org/archives/html/xen-devel/2018-03/msg02333.html Changes from v3: * Implemented Juergen's suggestion for refactoring and moving the PVH code so that CONFIG_XEN is no longer required for booting KVM guests via the PVH entry point. Functionally, nothing has changed from V3 really, but the patches look completely different now because of all the code movement and refactoring. Some of these patches can be combined, but I've left them very small in some cases to make the refactoring and code movement easier to review. My approach for refactoring has been to create a PVH entry layer that still has understanding and knowledge about Xen vs non-Xen guest types so that it can make run time decisions to handle either case, as opposed to going all the way and re-writing it to be a completely hypervisor agnostic and architecturally pure layer that is separate from guest type details. The latter seemed a bit overkill in this situation. And I've handled the complexity of having to support Qemu/KVM boot of kernels compiled with or without CONFIG_XEN via a pair of xen specific __weak routines that can be overridden in kernels that support Xen guests. Importantly, the __weak routines are for xen specific code only (not generic "guest type" specific code) so there is no clashing between xen version of the strong routine and, say, a KVM version of the same routine. But I'm sure there are many ways to skin this cat, so I'm open to alternate suggestions if there is a compelling reason for not using __weak in this situation. Changes from v2: * All structures (including memory map table entries) are padded and aligned to an 8 byte boundary. * Removed the "packed" attributes and made changes to comments as suggested by Jan. Changes from v1: * Adopted Paolo's suggestion for defining a v2 PVH ABI that includes the e820 map instead of using the second module entry to pass the table. * Cleaned things up a bit to reduce the number of xen vs non-xen special cases. Maran Wilson (7): xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH xen/pvh: Move PVH entry code out of Xen specific tree xen/pvh: Create a new file for Xen specific PVH code xen/pvh: Move Xen specific PVH VM initialization out of common file xen/pvh: Move Xen code for getting mem map via hcall out of common file xen/pvh: Add memory map pointer to hvm_start_info struct KVM: x86: Allow Qemu/KVM to use PVH entry point MAINTAINERS | 1 + arch/x86/Kbuild | 2 + arch/x86/Kconfig| 15 +++ arch/x86/kernel/head_64.S | 4 +- arch/x86/platform/pvh/Makefile | 5 + arch/x86/platform/pvh/enlighten.c | 130 arch/x86/{xen/xen-pvh.S => platform/pvh/head.S} | 0 arch/x86/xen/Kconfig| 3 +- arch/x86/xen/Makefile | 2 - arch/x86/xen/enlighten_pvh.c| 86 include/xen/interface/hvm/start_info.h | 65 +++- 11 files changed, 238 insertions(+), 75 deletions(-) create mode 100644 arch/x86/platform/pvh/Makefile create mode 100644 arch/x86/platform/pvh/enlighten.c rename arch/x86/{xen/xen-pvh.S => platform/pvh/head.S} (100%) -- 2.16.1
[PATCH v5 1/7] xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
In order to pave the way for hypervisors other then Xen to use the PVH entry point for VMs, we need to factor the PVH entry code into Xen specific and hypervisor agnostic components. The first step in doing that, is to create a new config option for PVH entry that can be enabled independently from CONFIG_XEN. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- arch/x86/Kconfig | 7 +++ arch/x86/kernel/head_64.S | 4 ++-- arch/x86/xen/Kconfig | 3 ++- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index eb7f43f23521..58831320b5d2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -791,6 +791,13 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config PVH + bool "Support for running PVH guests" + def_bool n + ---help--- + This option enables the PVH entry point for guest virtual machines + as specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 0f545b3cf926..fc9f678c6413 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -41,7 +41,7 @@ #define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1)) -#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) +#if defined(CONFIG_XEN_PV) || defined(CONFIG_PVH) PGD_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE) PGD_START_KERNEL = pgd_index(__START_KERNEL_map) #endif @@ -387,7 +387,7 @@ NEXT_PAGE(early_dynamic_pgts) .data -#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) +#if defined(CONFIG_XEN_PV) || defined(CONFIG_PVH) NEXT_PGD_PAGE(init_top_pgt) .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC .orginit_top_pgt + PGD_PAGE_OFFSET*8, 0 diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index f605825a04ab..021c8591c3c0 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -77,8 +77,9 @@ config XEN_DEBUG_FS Enabling this option may incur a significant performance overhead. config XEN_PVH - bool "Support for running as a PVH guest" + bool "Support for running as a Xen PVH guest" depends on XEN && XEN_PVHVM && ACPI # Pre-built page tables are not ready to handle 5-level paging. depends on !X86_5LEVEL + select PVH def_bool n -- 2.16.1
[PATCH v5 3/7] xen/pvh: Create a new file for Xen specific PVH code
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The first step in that direction is to create a new file that will eventually hold the Xen specific routines. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- arch/x86/platform/pvh/enlighten.c | 5 ++--- arch/x86/xen/Makefile | 1 + arch/x86/xen/enlighten_pvh.c | 10 ++ 3 files changed, 13 insertions(+), 3 deletions(-) create mode 100644 arch/x86/xen/enlighten_pvh.c diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index 436c4f003e17..74c0a711ebe7 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -16,10 +16,9 @@ /* * PVH variables. * - * xen_pvh and pvh_bootparams need to live in data segment since they - * are used after startup_{32|64}, which clear .bss, are invoked. + * pvh_bootparams needs to live in the data segment since it is + * used after startup_{32|64}, which clear .bss, are invoked. */ -bool xen_pvh __attribute__((section(".data"))) = 0; struct boot_params pvh_bootparams __attribute__((section(".data"))); struct hvm_start_info pvh_start_info; diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index f1b850607212..ae5c6f1f0fe0 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -20,6 +20,7 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o +obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c new file mode 100644 index ..c5409c1f259f --- /dev/null +++ b/arch/x86/xen/enlighten_pvh.c @@ -0,0 +1,10 @@ +#include + +/* + * PVH variables. + * + * The variable xen_pvh needs to live in the data segment since it is used + * after startup_{32|64} is invoked, which will clear the .bss segment. + */ +bool xen_pvh __attribute__((section(".data"))) = 0; + -- 2.16.1
[PATCH v5 6/7] xen/pvh: Add memory map pointer to hvm_start_info struct
The start info structure that is defined as part of the x86/HVM direct boot ABI and used for starting Xen PVH guests would be more versatile if it also included a way to pass information about the memory map to the guest. This would allow KVM guests to share the same entry point. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- include/xen/interface/hvm/start_info.h | 65 +- 1 file changed, 64 insertions(+), 1 deletion(-) diff --git a/include/xen/interface/hvm/start_info.h b/include/xen/interface/hvm/start_info.h index 648415976ead..d491f2d89393 100644 --- a/include/xen/interface/hvm/start_info.h +++ b/include/xen/interface/hvm/start_info.h @@ -33,7 +33,7 @@ *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE *|| ("xEn3" with the 0x80 bit of the "E" set). * 4 ++ - *| version| Version of this structure. Current version is 0. New + *| version| Version of this structure. Current version is 1. New *|| versions are guaranteed to be backwards-compatible. * 8 ++ *| flags | SIF_xxx flags. @@ -48,6 +48,15 @@ * 32 ++ *| rsdp_paddr | Physical address of the RSDP ACPI data structure. * 40 ++ + *| memmap_paddr | Physical address of the (optional) memory map. Only + *|| present in version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Zero + *|| if there is no memory map being provided. Only + *|| present in version 1 and newer of the structure. + * 52 ++ + *| reserved | Version 1 and newer only. + * 56 ++ * * The layout of each entry in the module structure is the following: * @@ -62,13 +71,52 @@ *| reserved | * 32 ++ * + * The layout of each entry in the memory map table is as follows: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping in bytes + * 16 ++ + *| type | Type of mapping as defined between the hypervisor + *|| and guest it's starting. See XEN_HVM_MEMMAP_TYPE_* + *|| values below. + * 20 +| + *| reserved | + * 24 ++ + * * The address and sizes are always a 64bit little endian unsigned integer. * * NB: Xen on x86 will always try to place all the data below the 4GiB * boundary. + * + * Version numbers of the hvm_start_info structure have evolved like this: + * + * Version 0: Initial implementation. + * + * Version 1: Added the memmap_paddr/memmap_entries fields (plus 4 bytes of + * padding) to the end of the hvm_start_info struct. These new + * fields can be used to pass a memory map to the guest. The + * memory map is optional and so guests that understand version 1 + * of the structure must check that memmap_entries is non-zero + * before trying to read the memory map. */ #define XEN_HVM_START_MAGIC_VALUE 0x336ec578 +/* + * The values used in the type field of the memory map table entries are + * defined below and match the Address Range Types as defined in the "System + * Address Map Interfaces" section of the ACPI Specification. Please refer to + * section 15 in version 6.2 of the ACPI spec: http://uefi.org/specifications + */ +#define XEN_HVM_MEMMAP_TYPE_RAM 1 +#define XEN_HVM_MEMMAP_TYPE_RESERVED 2 +#define XEN_HVM_MEMMAP_TYPE_ACPI 3 +#define XEN_HVM_MEMMAP_TYPE_NVS 4 +#define XEN_HVM_MEMMAP_TYPE_UNUSABLE 5 +#define XEN_HVM_MEMMAP_TYPE_DISABLED 6 +#define XEN_HVM_MEMMAP_TYPE_PMEM 7 + /* * C representation of the x86/HVM start info layout. * @@ -86,6 +134,14 @@ struct hvm_start_info { uint64_t cmdline_paddr; /* Physical address of the command line. */ uint64_t rsdp_paddr;/* Physical address of the RSDP ACPI data*/ /* structure.*/ +uint64_t memmap_paddr; /* Physical address of an array of */ +/* hvm_memmap_table_entry. Only present in */ +/* version 1 and newer of the structure */ +uint32_t memmap_entries;/* Number of entries in the memmap table.*/ +/* Only present in version 1 and newer of*/ +/* the structure. Value will be zero if */ +/* there is no memory map being provided.*/ +uint32_t reserved; /* Must be zero for Version 1. */ }; struct hvm_modlist_entry { @@ -95,4 +151,11 @@ stru
[PATCH v5 2/7] xen/pvh: Move PVH entry code out of Xen specific tree
Once hypervisors other than Xen start using the PVH entry point for starting VMs, we would like the option of being able to compile PVH entry capable kernels without enabling CONFIG_XEN and all the code that comes along with that. To allow that, we are moving the PVH code out of Xen and into files sitting at a higher level in the tree. This patch is not introducing any code or functional changes, just moving files from one location to another. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> --- MAINTAINERS| 1 + arch/x86/Kbuild| 2 ++ arch/x86/platform/pvh/Makefile | 5 + arch/x86/{xen/enlighten_pvh.c => platform/pvh/enlighten.c} | 0 arch/x86/{xen/xen-pvh.S => platform/pvh/head.S}| 0 arch/x86/xen/Makefile | 3 --- 6 files changed, 8 insertions(+), 3 deletions(-) create mode 100644 arch/x86/platform/pvh/Makefile rename arch/x86/{xen/enlighten_pvh.c => platform/pvh/enlighten.c} (100%) rename arch/x86/{xen/xen-pvh.S => platform/pvh/head.S} (100%) diff --git a/MAINTAINERS b/MAINTAINERS index 93a12af4f180..58a836f39ad4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15210,6 +15210,7 @@ L: xen-de...@lists.xenproject.org (moderated for non-subscribers) T: git git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git S: Supported F: arch/x86/xen/ +F: arch/x86/platform/pvh/ F: drivers/*/xen-*front.c F: drivers/xen/ F: arch/x86/include/asm/xen/ diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index 0038a2d10a7a..2089e4414300 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,6 +7,8 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ +obj-$(CONFIG_XEN_PVH) += platform/pvh/ + # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/platform/pvh/Makefile b/arch/x86/platform/pvh/Makefile new file mode 100644 index ..9fd25efcd2a3 --- /dev/null +++ b/arch/x86/platform/pvh/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 +OBJECT_FILES_NON_STANDARD_head.o := y + +obj-$(CONFIG_XEN_PVH) += enlighten.o +obj-$(CONFIG_XEN_PVH) += head.o diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/platform/pvh/enlighten.c similarity index 100% rename from arch/x86/xen/enlighten_pvh.c rename to arch/x86/platform/pvh/enlighten.c diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/platform/pvh/head.S similarity index 100% rename from arch/x86/xen/xen-pvh.S rename to arch/x86/platform/pvh/head.S diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index d83cb5478f54..f1b850607212 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -1,6 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 OBJECT_FILES_NON_STANDARD_xen-asm_$(BITS).o := y -OBJECT_FILES_NON_STANDARD_xen-pvh.o := y ifdef CONFIG_FUNCTION_TRACER # Do not profile debug and lowlevel utilities @@ -21,7 +20,6 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o -obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o @@ -33,4 +31,3 @@ obj-$(CONFIG_XEN_DEBUG_FS)+= debugfs.o obj-$(CONFIG_XEN_DOM0) += vga.o obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o obj-$(CONFIG_XEN_EFI) += efi.o -obj-$(CONFIG_XEN_PVH) += xen-pvh.o -- 2.16.1
[PATCH v5 4/7] xen/pvh: Move Xen specific PVH VM initialization out of common file
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. This patch moves the small block of code used for initializing Xen PVH virtual machines into the Xen specific file. This initialization is not going to be needed for Qemu/KVM guests. Moving it out of the common file is going to allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> Reviewed-by: Juergen Gross <jgr...@suse.com> --- arch/x86/platform/pvh/enlighten.c | 28 arch/x86/xen/enlighten_pvh.c | 18 +- 2 files changed, 37 insertions(+), 9 deletions(-) diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index 74c0a711ebe7..b463ee30517a 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -72,26 +72,38 @@ static void __init init_pvh_bootparams(void) pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ } +/* + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide the required override for this routine. + */ +void __init __weak xen_pvh_init(void) +{ + xen_raw_printk("Error: Missing xen PVH initialization\n"); + BUG(); +} + +/* + * When we add support for other hypervisors like Qemu/KVM, this routine can + * selectively invoke the appropriate initialization based on guest type. + */ +static void hypervisor_specific_init(void) +{ + xen_pvh_init(); +} + /* * This routine (and those that it might call) should not use * anything that lives in .bss since that segment will be cleared later. */ void __init xen_prepare_pvh(void) { - u32 msr; - u64 pfn; - if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", pvh_start_info.magic); BUG(); } - xen_pvh = 1; - - msr = cpuid_ebx(xen_cpuid_base() + 2); - pfn = __pa(hypercall_page); - wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); + hypervisor_specific_init(); init_pvh_bootparams(); } diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index c5409c1f259f..08fc63d14ae5 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,4 +1,9 @@ -#include +#include + +#include + +#include +#include /* * PVH variables. @@ -8,3 +13,14 @@ */ bool xen_pvh __attribute__((section(".data"))) = 0; +void __init xen_pvh_init(void) +{ + u32 msr; + u64 pfn; + + xen_pvh = 1; + + msr = cpuid_ebx(xen_cpuid_base() + 2); + pfn = __pa(hypercall_page); + wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); +} -- 2.16.1
[PATCH v5 5/7] xen/pvh: Move Xen code for getting mem map via hcall out of common file
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The original design for PVH entry in Xen guests relies on being able to obtain the memory map from the hypervisor using a hypercall. When we extend the PVH entry ABI to support other hypervisors like Qemu/KVM, a new mechanism will be added that allows the guest to get the memory map without needing to use hypercalls. For Xen guests, the hypercall approach will still be supported. In preparation for adding support for other hypervisors, we can move the code that uses hypercalls into the Xen specific file. This will allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> Reviewed-by: Juergen Gross <jgr...@suse.com> --- arch/x86/platform/pvh/enlighten.c | 28 ++-- arch/x86/xen/enlighten_pvh.c | 20 2 files changed, 34 insertions(+), 14 deletions(-) diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index b463ee30517a..347ecb1860d5 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -7,9 +7,6 @@ #include #include -#include -#include - #include #include @@ -24,21 +21,24 @@ struct boot_params pvh_bootparams __attribute__((section(".data"))); struct hvm_start_info pvh_start_info; unsigned int pvh_start_info_sz = sizeof(pvh_start_info); +/* + * Xen guests are able to obtain the memory map from the hypervisor via the + * HYPERVISOR_memory_op hypercall. + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide an override for this routine to do + * just that. + */ +void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) +{ + xen_raw_printk("Error: Could not find memory map\n"); + BUG(); +} + static void __init init_pvh_bootparams(void) { - struct xen_memory_map memmap; - int rc; - memset(_bootparams, 0, sizeof(pvh_bootparams)); - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); - rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); - if (rc) { - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); - BUG(); - } - pvh_bootparams.e820_entries = memmap.nr_entries; + mem_map_via_hcall(_bootparams); if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 08fc63d14ae5..00658d4bc4f4 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,10 +1,15 @@ #include +#include + #include +#include #include #include +#include + /* * PVH variables. * @@ -24,3 +29,18 @@ void __init xen_pvh_init(void) pfn = __pa(hypercall_page); wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); } + +void __init mem_map_via_hcall(struct boot_params *boot_params_p) +{ + struct xen_memory_map memmap; + int rc; + + memmap.nr_entries = ARRAY_SIZE(boot_params_p->e820_table); + set_xen_guest_handle(memmap.buffer, boot_params_p->e820_table); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); + if (rc) { + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + BUG(); + } + boot_params_p->e820_entries = memmap.nr_entries; +} -- 2.16.1
[PATCH v5 7/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch enables Qemu to use that same entry point for booting KVM guests. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> Suggested-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> Suggested-by: Boris Ostrovsky <boris.ostrov...@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrov...@oracle.com> --- arch/x86/Kbuild | 2 +- arch/x86/Kconfig | 8 arch/x86/platform/pvh/Makefile| 4 ++-- arch/x86/platform/pvh/enlighten.c | 43 +-- 4 files changed, 43 insertions(+), 14 deletions(-) diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index 2089e4414300..c625f57472f7 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,7 +7,7 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ -obj-$(CONFIG_XEN_PVH) += platform/pvh/ +obj-$(CONFIG_PVH) += platform/pvh/ # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 58831320b5d2..74ad956ee0f6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -798,6 +798,14 @@ config PVH This option enables the PVH entry point for guest virtual machines as specified in the x86/HVM direct boot ABI. +config KVM_GUEST_PVH + bool "Support for running as a KVM PVH guest" + depends on KVM_GUEST + select PVH + ---help--- + This option enables starting KVM guests via the PVH entry point as + specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS diff --git a/arch/x86/platform/pvh/Makefile b/arch/x86/platform/pvh/Makefile index 9fd25efcd2a3..5dec5067c9fb 100644 --- a/arch/x86/platform/pvh/Makefile +++ b/arch/x86/platform/pvh/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 OBJECT_FILES_NON_STANDARD_head.o := y -obj-$(CONFIG_XEN_PVH) += enlighten.o -obj-$(CONFIG_XEN_PVH) += head.o +obj-$(CONFIG_PVH) += enlighten.o +obj-$(CONFIG_PVH) += head.o diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index 347ecb1860d5..433f586d8302 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -7,6 +7,9 @@ #include #include +#include +#include + #include #include @@ -34,11 +37,28 @@ void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) BUG(); } -static void __init init_pvh_bootparams(void) +static void __init init_pvh_bootparams(bool xen_guest) { memset(_bootparams, 0, sizeof(pvh_bootparams)); - mem_map_via_hcall(_bootparams); + if ((pvh_start_info.version > 0) && (pvh_start_info.memmap_entries)) { + struct hvm_memmap_table_entry *ep; + int i; + + ep = __va(pvh_start_info.memmap_paddr); + pvh_bootparams.e820_entries = pvh_start_info.memmap_entries; + + for (i = 0; i < pvh_bootparams.e820_entries ; i++, ep++) { + pvh_bootparams.e820_table[i].addr = ep->addr; + pvh_bootparams.e820_table[i].size = ep->size; + pvh_bootparams.e820_table[i].type = ep->type; + } + } else if (xen_guest) { + mem_map_via_hcall(_bootparams); + } else { + /* Non-xen guests are not supported by version 0 */ + BUG(); + } if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = @@ -69,7 +89,7 @@ static void __init init_pvh_bootparams(void) * environment (i.e. hardware_subarch 0). */ pvh_bootparams.hdr.version = 0x212; - pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ + pvh_bootparams.hdr.type_of_loader = ((xen_guest ? 0x9 : 0xb) << 4) | 0; } /* @@ -82,13 +102,10 @@ void __init __weak xen_pvh_init(void) BUG(); } -/* - * When we add support for other hypervisors like Qemu/KVM, this routine can - * selectively invoke the appropriate initialization based on guest type. - */ -static void hypervisor_specific_init(void) +static void hypervisor_specific_init(bool xen_guest) { - xen_pvh_init(); + if (xen_guest) + xen_pvh_init(); } /* @@ -97,13 +114,17 @@ static void h
Re: [RFC PATCH v4 1/7] xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
On 3/1/2018 7:17 AM, Paolo Bonzini wrote: On 01/03/2018 16:02, Boris Ostrovsky wrote: On 02/28/2018 01:27 PM, Maran Wilson wrote: diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index eb7f43f23521..fa7cd0305125 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -791,6 +791,14 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config PVH + bool "Support for running PVH guests" + depends on KVM_GUEST || XEN Not sure about XEN part. PVH is selected by XEN_PVH for Xen. What about introducing KVM_GUEST_PVH that will select PVH and then drop dependency here? That is, "config KVM_GUEST_PVH" "depends on KVM_GUEST" "select PVH". Sounds good to me. OK, will do. Thanks, -Maran Paolo -boris + def_bool n + ---help--- + This option enables the PVH entry point for guest virtual machines + as specified in the x86/HVM direct boot ABI. + diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index f605825a04ab..021c8591c3c0 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -77,8 +77,9 @@ config XEN_DEBUG_FS Enabling this option may incur a significant performance overhead. config XEN_PVH - bool "Support for running as a PVH guest" + bool "Support for running as a Xen PVH guest" depends on XEN && XEN_PVHVM && ACPI # Pre-built page tables are not ready to handle 5-level paging. depends on !X86_5LEVEL + select PVH def_bool n
Re: [RFC PATCH v4 4/7] xen/pvh: Move Xen specific PVH VM initialization out of common code
On 3/1/2018 8:05 AM, Boris Ostrovsky wrote: On 02/28/2018 01:28 PM, Maran Wilson wrote: We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. This patch moves the small block of code used for initializing Xen PVH virtual machines into the Xen specific file. This initialization is not going to be needed for Qemu/KVM guests. Moving it out of the common file is going to allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> --- arch/x86/pvh.c | 28 arch/x86/xen/enlighten_pvh.c | 18 +- 2 files changed, 37 insertions(+), 9 deletions(-) diff --git a/arch/x86/pvh.c b/arch/x86/pvh.c index b56cb5e7d6ac..2d7a7f4958cb 100644 --- a/arch/x86/pvh.c +++ b/arch/x86/pvh.c @@ -72,26 +72,38 @@ static void __init init_pvh_bootparams(void) pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ } +/* + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide the required override for this routine. + */ +void __init __weak xen_pvh_init(void) +{ + xen_raw_printk("Error: Missing xen PVH initialization\n"); I think this should be printk (or, more precisely, this should not be xen_raw_printk()): we are here because we are *not* a Xen guest and so Xen-specific printk will not work. (and the same is true for the next patch where weak mem_map_via_hcall() is added). Actually I left that xen_raw_printk() statement in on purpose. It's also possible that some future developer accidentally drops or hides the strong version of the routine even when CONFIG_XEN (and CONFIG_HVC_XEN) is enabled. In that situation, this error message might prove helpful in quickly identifying the problem when he or she attempts to boot a Xen guest. And in situations where CONFIG_XEN is disabled or someone is booting a non xen guest, the statement simply becomes a nop, so no harm is done. And also, I believe this code is far too early for normal printk() statements to work so switching to that won't buy us anything. Thanks, -Maran -boris + BUG(); +}
Re: [RFC PATCH v4 6/7] xen/pvh: Add memory map pointer to hvm_start_info struct
On 2/28/2018 11:41 PM, Jan Beulich wrote: Juergen Gross <jgr...@suse.com> 03/01/18 8:29 AM >>> On 28/02/18 19:28, Maran Wilson wrote: The start info structure that is defined as part of the x86/HVM direct boot ABI and used for starting Xen PVH guests would be more versatile if it also included a way to pass information about the memory map to the guest. This would allow KVM guests to share the same entry point. Signed-off-by: Maran Wilson <maran.wil...@oracle.com> I'm fine with this, but we need this change being accepted by the Xen community first. So an Ack from Jan or Andrew is required as the same change should be done on Xen side. And for an ack to be given I continue to demand that a patch be sent against the Xen tree. That said, the change looks fine to me now (as indicated before). Yes, I plan to send that out against the Xen tree shortly. Thanks, -Maran Jan
Re: [Xen-devel] [RFC PATCH v4 2/7] xen/pvh: Move PVH entry code out of Xen specific tree
On 2/28/2018 1:35 PM, Paolo Bonzini wrote: On 28/02/2018 22:08, Konrad Rzeszutek Wilk wrote: +obj-$(CONFIG_XEN_PVH) += pvh.o +obj-$(CONFIG_XEN_PVH) += pvh-head.o + Probably a better place for these would be arch/x86/platform/pvh/{enlighten.c,head.S}. (Just because there are no .c or .S files in arch/x86). Sounds good. Will make that change. Thanks, -Maran Maybe Xen ought to be moved under arch/x86/platform too. Paolo
[RFC PATCH v4 5/7] xen/pvh: Move Xen code for getting mem map via hcall out of common file
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The original design for PVH entry in Xen guests relies on being able to obtain the memory map from the hypervisor using a hypercall. When we extend the PVH entry ABI to support other hypervisors like Qemu/KVM, a new mechanism will be added that allows the guest to get the memory map without needing to use hypercalls. For Xen guests, the hypercall approach will still be supported. In preparation for adding support for other hypervisors, we can move the code that uses hypercalls into the Xen specific file. This will allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson --- arch/x86/pvh.c | 28 ++-- arch/x86/xen/enlighten_pvh.c | 20 2 files changed, 34 insertions(+), 14 deletions(-) diff --git a/arch/x86/pvh.c b/arch/x86/pvh.c index 2d7a7f4958cb..6e9f6a6e97b3 100644 --- a/arch/x86/pvh.c +++ b/arch/x86/pvh.c @@ -7,9 +7,6 @@ #include #include -#include -#include - #include #include @@ -24,21 +21,24 @@ struct boot_params pvh_bootparams __attribute__((section(".data"))); struct hvm_start_info pvh_start_info; unsigned int pvh_start_info_sz = sizeof(pvh_start_info); +/* + * Xen guests are able to obtain the memory map from the hypervisor via the + * HYPERVISOR_memory_op hypercall. + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide an override for this routine to do + * just that. + */ +void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) +{ + xen_raw_printk("Error: Could not find memory map\n"); + BUG(); +} + static void __init init_pvh_bootparams(void) { - struct xen_memory_map memmap; - int rc; - memset(_bootparams, 0, sizeof(pvh_bootparams)); - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); - rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); - if (rc) { - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); - BUG(); - } - pvh_bootparams.e820_entries = memmap.nr_entries; + mem_map_via_hcall(_bootparams); if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 833c441a20df..3a830caef8ee 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,10 +1,15 @@ #include +#include + #include +#include #include #include +#include + /* * PVH variables. * @@ -25,3 +30,18 @@ void __init xen_pvh_init(void) pfn = __pa(hypercall_page); wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); } + +void __init mem_map_via_hcall(struct boot_params *boot_params_p) +{ + struct xen_memory_map memmap; + int rc; + + memmap.nr_entries = ARRAY_SIZE(boot_params_p->e820_table); + set_xen_guest_handle(memmap.buffer, boot_params_p->e820_table); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); + if (rc) { + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + BUG(); + } + boot_params_p->e820_entries = memmap.nr_entries; +} -- 2.16.1
[RFC PATCH v4 3/7] xen/pvh: Create a new file for Xen specific PVH code
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The first step in that direction is to create a new file that will eventually hold the Xen specific routines. Signed-off-by: Maran Wilson --- arch/x86/pvh.c | 1 - arch/x86/xen/Makefile| 1 + arch/x86/xen/enlighten_pvh.c | 11 +++ 3 files changed, 12 insertions(+), 1 deletion(-) create mode 100644 arch/x86/xen/enlighten_pvh.c diff --git a/arch/x86/pvh.c b/arch/x86/pvh.c index 436c4f003e17..b56cb5e7d6ac 100644 --- a/arch/x86/pvh.c +++ b/arch/x86/pvh.c @@ -19,7 +19,6 @@ * xen_pvh and pvh_bootparams need to live in data segment since they * are used after startup_{32|64}, which clear .bss, are invoked. */ -bool xen_pvh __attribute__((section(".data"))) = 0; struct boot_params pvh_bootparams __attribute__((section(".data"))); struct hvm_start_info pvh_start_info; diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index 7e8145b33997..ef6481a83768 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -21,6 +21,7 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o +obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c new file mode 100644 index ..4b4e9cc78b8a --- /dev/null +++ b/arch/x86/xen/enlighten_pvh.c @@ -0,0 +1,11 @@ +#include + +/* + * PVH variables. + * + * The variables xen_pvh and pvh_bootparams need to live in the data segment + * since they are used after startup_{32|64} is invoked, which will clear the + * .bss segment. + */ +bool xen_pvh __attribute__((section(".data"))) = 0; + -- 2.16.1
[RFC PATCH v4 1/7] xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
In order to pave the way for hypervisors other then Xen to use the PVH entry point for VMs, we need to factor the PVH entry code into Xen specific and hypervisor agnostic components. The first step in doing that, is to create a new config option for PVH entry that can be enabled independently from CONFIG_XEN. Signed-off-by: Maran Wilson --- arch/x86/Kconfig | 8 arch/x86/kernel/head_64.S | 4 ++-- arch/x86/xen/Kconfig | 3 ++- 3 files changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index eb7f43f23521..fa7cd0305125 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -791,6 +791,14 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config PVH + bool "Support for running PVH guests" + depends on KVM_GUEST || XEN + def_bool n + ---help--- + This option enables the PVH entry point for guest virtual machines + as specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 0f545b3cf926..fc9f678c6413 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -41,7 +41,7 @@ #define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1)) -#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) +#if defined(CONFIG_XEN_PV) || defined(CONFIG_PVH) PGD_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE) PGD_START_KERNEL = pgd_index(__START_KERNEL_map) #endif @@ -387,7 +387,7 @@ NEXT_PAGE(early_dynamic_pgts) .data -#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) +#if defined(CONFIG_XEN_PV) || defined(CONFIG_PVH) NEXT_PGD_PAGE(init_top_pgt) .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC .orginit_top_pgt + PGD_PAGE_OFFSET*8, 0 diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index f605825a04ab..021c8591c3c0 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -77,8 +77,9 @@ config XEN_DEBUG_FS Enabling this option may incur a significant performance overhead. config XEN_PVH - bool "Support for running as a PVH guest" + bool "Support for running as a Xen PVH guest" depends on XEN && XEN_PVHVM && ACPI # Pre-built page tables are not ready to handle 5-level paging. depends on !X86_5LEVEL + select PVH def_bool n -- 2.16.1
[RFC PATCH v4 2/7] xen/pvh: Move PVH entry code out of Xen specific tree
Once hypervisors other than Xen start using the PVH entry point for starting VMs, we would like the option of being able to compile PVH entry capable kernels without enabling CONFIG_XEN and all the code that comes along with that. To allow that, we are moving the PVH code out of Xen and into files sitting at a higher level in the tree. This patch is not introducing any code or functional changes, just moving files from one location to another. Signed-off-by: Maran Wilson --- MAINTAINERS | 1 + arch/x86/Kbuild | 3 +++ arch/x86/{xen/xen-pvh.S => pvh-head.S} | 0 arch/x86/{xen/enlighten_pvh.c => pvh.c} | 0 arch/x86/xen/Makefile | 2 -- 5 files changed, 4 insertions(+), 2 deletions(-) rename arch/x86/{xen/xen-pvh.S => pvh-head.S} (100%) rename arch/x86/{xen/enlighten_pvh.c => pvh.c} (100%) diff --git a/MAINTAINERS b/MAINTAINERS index 93a12af4f180..dc89f3a279bd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15210,6 +15210,7 @@ L: xen-de...@lists.xenproject.org (moderated for non-subscribers) T: git git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git S: Supported F: arch/x86/xen/ +F: arch/x86/*pvh* F: drivers/*/xen-*front.c F: drivers/xen/ F: arch/x86/include/asm/xen/ diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index 0038a2d10a7a..a4e5e3d348dc 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,6 +7,9 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ +obj-$(CONFIG_XEN_PVH) += pvh.o +obj-$(CONFIG_XEN_PVH) += pvh-head.o + # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/pvh-head.S similarity index 100% rename from arch/x86/xen/xen-pvh.S rename to arch/x86/pvh-head.S diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/pvh.c similarity index 100% rename from arch/x86/xen/enlighten_pvh.c rename to arch/x86/pvh.c diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index d83cb5478f54..7e8145b33997 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -21,7 +21,6 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o -obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o @@ -33,4 +32,3 @@ obj-$(CONFIG_XEN_DEBUG_FS)+= debugfs.o obj-$(CONFIG_XEN_DOM0) += vga.o obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o obj-$(CONFIG_XEN_EFI) += efi.o -obj-$(CONFIG_XEN_PVH) += xen-pvh.o -- 2.16.1
[RFC PATCH v4 4/7] xen/pvh: Move Xen specific PVH VM initialization out of common code
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. This patch moves the small block of code used for initializing Xen PVH virtual machines into the Xen specific file. This initialization is not going to be needed for Qemu/KVM guests. Moving it out of the common file is going to allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson --- arch/x86/pvh.c | 28 arch/x86/xen/enlighten_pvh.c | 18 +- 2 files changed, 37 insertions(+), 9 deletions(-) diff --git a/arch/x86/pvh.c b/arch/x86/pvh.c index b56cb5e7d6ac..2d7a7f4958cb 100644 --- a/arch/x86/pvh.c +++ b/arch/x86/pvh.c @@ -72,26 +72,38 @@ static void __init init_pvh_bootparams(void) pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ } +/* + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide the required override for this routine. + */ +void __init __weak xen_pvh_init(void) +{ + xen_raw_printk("Error: Missing xen PVH initialization\n"); + BUG(); +} + +/* + * When we add support for other hypervisors like Qemu/KVM, this routine can + * selectively invoke the appropriate initialization based on guest type. + */ +static void hypervisor_specific_init(void) +{ + xen_pvh_init(); +} + /* * This routine (and those that it might call) should not use * anything that lives in .bss since that segment will be cleared later. */ void __init xen_prepare_pvh(void) { - u32 msr; - u64 pfn; - if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", pvh_start_info.magic); BUG(); } - xen_pvh = 1; - - msr = cpuid_ebx(xen_cpuid_base() + 2); - pfn = __pa(hypercall_page); - wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); + hypervisor_specific_init(); init_pvh_bootparams(); } diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 4b4e9cc78b8a..833c441a20df 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,4 +1,9 @@ -#include +#include + +#include + +#include +#include /* * PVH variables. @@ -9,3 +14,14 @@ */ bool xen_pvh __attribute__((section(".data"))) = 0; +void __init xen_pvh_init(void) +{ + u32 msr; + u64 pfn; + + xen_pvh = 1; + + msr = cpuid_ebx(xen_cpuid_base() + 2); + pfn = __pa(hypercall_page); + wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); +} -- 2.16.1
[RFC PATCH v4 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
Sorry for the delay between this version and the last -- it was mostly due to holidays and everyone being focused on security bug mitigation issues. Here are the links to the previous email threads in case it is helpful: V3: https://lkml.org/lkml/2017/12/12/1230 V2: https://lkml.org/lkml/2017/12/7/1624 V1: https://lkml.org/lkml/2017/11/28/1280 Changes from v3: * Implemented Juergen's suggestion for refactoring and moving the PVH code so that CONFIG_XEN is no longer required for booting KVM guests via the PVH entry point. Functionally, nothing has changed from V3 really, but the patches look completely different now because of all the code movement and refactoring. Some of these patches can be combined, but I've left them very small in some cases to make the refactoring and code movement easier to review. My approach for refactoring has been to create a PVH entry layer that still has understanding and knowledge about Xen vs non-Xen guest types so that it can make run time decisions to handle either case, as opposed to going all the way and re-writing it to be a completely hypervisor agnostic and architecturally pure layer that is separate from guest type details. The latter seemed a bit overkill in this situation. And I've handled the complexity of having to support Qemu/KVM boot of kernels compiled with or without CONFIG_XEN via a pair of xen specific __weak routines that can be overridden in kernels that support Xen guests. Importantly, the __weak routines are for xen specific code only (not generic "guest type" specific code) so there is no clashing between xen version of the strong routine and, say, a KVM version of the same routine. But I'm sure there are many ways to skin this cat, so I'm open to alternate suggestions if there is a compelling reason for not using __weak in this situation. Changes from v2: * All structures (including memory map table entries) are padded and aligned to an 8 byte boundary. * Removed the "packed" attributes and made changes to comments as suggested by Jan. Changes from v1: * Adopted Paolo's suggestion for defining a v2 PVH ABI that includes the e820 map instead of using the second module entry to pass the table. * Cleaned things up a bit to reduce the number of xen vs non-xen special cases. Maran Wilson (7): xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH xen/pvh: Move PVH entry code out of Xen specific tree xen/pvh: Create a new file for Xen specific PVH code xen/pvh: Move Xen specific PVH VM initialization out of common code xen/pvh: Move Xen code for getting mem map via hcall out of common file xen/pvh: Add memory map pointer to hvm_start_info struct KVM: x86: Allow Qemu/KVM to use PVH entry point MAINTAINERS| 1 + arch/x86/Kbuild| 3 + arch/x86/Kconfig | 8 ++ arch/x86/kernel/head_64.S | 4 +- arch/x86/pvh-head.S| 161 +++ arch/x86/pvh.c | 130 ++ arch/x86/xen/Kconfig | 3 +- arch/x86/xen/Makefile | 1 - arch/x86/xen/enlighten_pvh.c | 87 +++- arch/x86/xen/xen-pvh.S | 161 --- include/xen/interface/hvm/start_info.h | 50 ++- 11 files changed, 374 insertions(+), 235 deletions(-)
[RFC PATCH v4 7/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch enables Qemu to use that same entry point for booting KVM guests. Signed-off-by: Maran Wilson --- arch/x86/Kbuild | 4 ++-- arch/x86/pvh.c | 43 --- 2 files changed, 34 insertions(+), 13 deletions(-) diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index a4e5e3d348dc..e9dc0f1c9d32 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,8 +7,8 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ -obj-$(CONFIG_XEN_PVH) += pvh.o -obj-$(CONFIG_XEN_PVH) += pvh-head.o +obj-$(CONFIG_PVH) += pvh.o +obj-$(CONFIG_PVH) += pvh-head.o # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/pvh.c b/arch/x86/pvh.c index 6e9f6a6e97b3..97042d11342f 100644 --- a/arch/x86/pvh.c +++ b/arch/x86/pvh.c @@ -7,6 +7,9 @@ #include #include +#include +#include + #include #include @@ -34,11 +37,28 @@ void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) BUG(); } -static void __init init_pvh_bootparams(void) +static void __init init_pvh_bootparams(bool xen_guest) { memset(_bootparams, 0, sizeof(pvh_bootparams)); - mem_map_via_hcall(_bootparams); + if ((pvh_start_info.version > 0) && (pvh_start_info.memmap_entries)) { + struct hvm_memmap_table_entry *ep; + int i; + + ep = __va(pvh_start_info.memmap_paddr); + pvh_bootparams.e820_entries = pvh_start_info.memmap_entries; + + for (i = 0; i < pvh_bootparams.e820_entries ; i++, ep++) { + pvh_bootparams.e820_table[i].addr = ep->addr; + pvh_bootparams.e820_table[i].size = ep->size; + pvh_bootparams.e820_table[i].type = ep->type; + } + } else if (xen_guest) { + mem_map_via_hcall(_bootparams); + } else { + /* Non-xen guests are not supported by version 0 */ + BUG(); + } if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = @@ -69,7 +89,7 @@ static void __init init_pvh_bootparams(void) * environment (i.e. hardware_subarch 0). */ pvh_bootparams.hdr.version = 0x212; - pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ + pvh_bootparams.hdr.type_of_loader = ((xen_guest ? 0x9 : 0xb) << 4) | 0; } /* @@ -82,13 +102,10 @@ void __init __weak xen_pvh_init(void) BUG(); } -/* - * When we add support for other hypervisors like Qemu/KVM, this routine can - * selectively invoke the appropriate initialization based on guest type. - */ -static void hypervisor_specific_init(void) +static void hypervisor_specific_init(bool xen_guest) { - xen_pvh_init(); + if (xen_guest) + xen_pvh_init(); } /* @@ -97,13 +114,17 @@ static void hypervisor_specific_init(void) */ void __init xen_prepare_pvh(void) { + + u32 msr = xen_cpuid_base(); + bool xen_guest = !!msr; + if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", pvh_start_info.magic); BUG(); } - hypervisor_specific_init(); + hypervisor_specific_init(xen_guest); - init_pvh_bootparams(); + init_pvh_bootparams(xen_guest); } -- 2.16.1
[RFC PATCH v4 6/7] xen/pvh: Add memory map pointer to hvm_start_info struct
The start info structure that is defined as part of the x86/HVM direct boot ABI and used for starting Xen PVH guests would be more versatile if it also included a way to pass information about the memory map to the guest. This would allow KVM guests to share the same entry point. Signed-off-by: Maran Wilson --- include/xen/interface/hvm/start_info.h | 50 +- 1 file changed, 49 insertions(+), 1 deletion(-) diff --git a/include/xen/interface/hvm/start_info.h b/include/xen/interface/hvm/start_info.h index 648415976ead..80cfbd35c1af 100644 --- a/include/xen/interface/hvm/start_info.h +++ b/include/xen/interface/hvm/start_info.h @@ -33,7 +33,7 @@ *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE *|| ("xEn3" with the 0x80 bit of the "E" set). * 4 ++ - *| version| Version of this structure. Current version is 0. New + *| version| Version of this structure. Current version is 1. New *|| versions are guaranteed to be backwards-compatible. * 8 ++ *| flags | SIF_xxx flags. @@ -48,6 +48,15 @@ * 32 ++ *| rsdp_paddr | Physical address of the RSDP ACPI data structure. * 40 ++ + *| memmap_paddr | Physical address of the (optional) memory map. Only + *|| present in version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Only + *|| present in version 1 and newer of the structure. + *|| Zero if there is no memory map being provided. + * 52 ++ + *| reserved | Version 1 and newer only. + * 56 ++ * * The layout of each entry in the module structure is the following: * @@ -62,10 +71,34 @@ *| reserved | * 32 ++ * + * The layout of each entry in the memory map table is as follows: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping in bytes + * 16 ++ + *| type | Type of mapping as defined between the hypervisor + *|| and guest it's starting. E820_TYPE_xxx, for example. + * 20 +| + *| reserved | + * 24 ++ + * * The address and sizes are always a 64bit little endian unsigned integer. * * NB: Xen on x86 will always try to place all the data below the 4GiB * boundary. + * + * Version numbers of the hvm_start_info structure have evolved like this: + * + * Version 0: + * + * Version 1: Added the memmap_paddr/memmap_entries fields (plus 4 bytes of + * padding) to the end of the hvm_start_info struct. These new + * fields can be used to pass a memory map to the guest. The + * memory map is optional and so guests that understand version 1 + * of the structure must check that memmap_entries is non-zero + * before trying to read the memory map. */ #define XEN_HVM_START_MAGIC_VALUE 0x336ec578 @@ -86,6 +119,14 @@ struct hvm_start_info { uint64_t cmdline_paddr; /* Physical address of the command line. */ uint64_t rsdp_paddr;/* Physical address of the RSDP ACPI data*/ /* structure.*/ +uint64_t memmap_paddr; /* Physical address of an array of */ + /* hvm_memmap_table_entry. Only present in */ + /* version 1 and newer of the structure */ +uint32_t memmap_entries; /* Number of entries in the memmap table.*/ + /* Only present in version 1 and newer of*/ + /* the structure. Value will be zero if */ + /* there is no memory map being provided.*/ +uint32_t reserved; }; struct hvm_modlist_entry { @@ -95,4 +136,11 @@ struct hvm_modlist_entry { uint64_t reserved; }; +struct hvm_memmap_table_entry { +uint64_t addr; /* Base address of the memory region */ +uint64_t size; /* Size of the memory region in bytes*/ +uint32_t type; /* Mapping type */ +uint32_t reserved; +}; + #endif /* __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ */ -- 2.16.1
Re: [RFC PATCH v4 6/7] xen/pvh: Add memory map pointer to hvm_start_info struct
On 2/28/2018 11:41 PM, Jan Beulich wrote: Juergen Gross 03/01/18 8:29 AM >>> On 28/02/18 19:28, Maran Wilson wrote: The start info structure that is defined as part of the x86/HVM direct boot ABI and used for starting Xen PVH guests would be more versatile if it also included a way to pass information about the memory map to the guest. This would allow KVM guests to share the same entry point. Signed-off-by: Maran Wilson I'm fine with this, but we need this change being accepted by the Xen community first. So an Ack from Jan or Andrew is required as the same change should be done on Xen side. And for an ack to be given I continue to demand that a patch be sent against the Xen tree. That said, the change looks fine to me now (as indicated before). Yes, I plan to send that out against the Xen tree shortly. Thanks, -Maran Jan
Re: [RFC PATCH v4 1/7] xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
On 3/1/2018 7:17 AM, Paolo Bonzini wrote: On 01/03/2018 16:02, Boris Ostrovsky wrote: On 02/28/2018 01:27 PM, Maran Wilson wrote: diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index eb7f43f23521..fa7cd0305125 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -791,6 +791,14 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config PVH + bool "Support for running PVH guests" + depends on KVM_GUEST || XEN Not sure about XEN part. PVH is selected by XEN_PVH for Xen. What about introducing KVM_GUEST_PVH that will select PVH and then drop dependency here? That is, "config KVM_GUEST_PVH" "depends on KVM_GUEST" "select PVH". Sounds good to me. OK, will do. Thanks, -Maran Paolo -boris + def_bool n + ---help--- + This option enables the PVH entry point for guest virtual machines + as specified in the x86/HVM direct boot ABI. + diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index f605825a04ab..021c8591c3c0 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -77,8 +77,9 @@ config XEN_DEBUG_FS Enabling this option may incur a significant performance overhead. config XEN_PVH - bool "Support for running as a PVH guest" + bool "Support for running as a Xen PVH guest" depends on XEN && XEN_PVHVM && ACPI # Pre-built page tables are not ready to handle 5-level paging. depends on !X86_5LEVEL + select PVH def_bool n
Re: [Xen-devel] [RFC PATCH v4 2/7] xen/pvh: Move PVH entry code out of Xen specific tree
On 2/28/2018 1:35 PM, Paolo Bonzini wrote: On 28/02/2018 22:08, Konrad Rzeszutek Wilk wrote: +obj-$(CONFIG_XEN_PVH) += pvh.o +obj-$(CONFIG_XEN_PVH) += pvh-head.o + Probably a better place for these would be arch/x86/platform/pvh/{enlighten.c,head.S}. (Just because there are no .c or .S files in arch/x86). Sounds good. Will make that change. Thanks, -Maran Maybe Xen ought to be moved under arch/x86/platform too. Paolo
Re: [RFC PATCH v4 4/7] xen/pvh: Move Xen specific PVH VM initialization out of common code
On 3/1/2018 8:05 AM, Boris Ostrovsky wrote: On 02/28/2018 01:28 PM, Maran Wilson wrote: We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. This patch moves the small block of code used for initializing Xen PVH virtual machines into the Xen specific file. This initialization is not going to be needed for Qemu/KVM guests. Moving it out of the common file is going to allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson --- arch/x86/pvh.c | 28 arch/x86/xen/enlighten_pvh.c | 18 +- 2 files changed, 37 insertions(+), 9 deletions(-) diff --git a/arch/x86/pvh.c b/arch/x86/pvh.c index b56cb5e7d6ac..2d7a7f4958cb 100644 --- a/arch/x86/pvh.c +++ b/arch/x86/pvh.c @@ -72,26 +72,38 @@ static void __init init_pvh_bootparams(void) pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ } +/* + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide the required override for this routine. + */ +void __init __weak xen_pvh_init(void) +{ + xen_raw_printk("Error: Missing xen PVH initialization\n"); I think this should be printk (or, more precisely, this should not be xen_raw_printk()): we are here because we are *not* a Xen guest and so Xen-specific printk will not work. (and the same is true for the next patch where weak mem_map_via_hcall() is added). Actually I left that xen_raw_printk() statement in on purpose. It's also possible that some future developer accidentally drops or hides the strong version of the routine even when CONFIG_XEN (and CONFIG_HVC_XEN) is enabled. In that situation, this error message might prove helpful in quickly identifying the problem when he or she attempts to boot a Xen guest. And in situations where CONFIG_XEN is disabled or someone is booting a non xen guest, the statement simply becomes a nop, so no harm is done. And also, I believe this code is far too early for normal printk() statements to work so switching to that won't buy us anything. Thanks, -Maran -boris + BUG(); +}
[PATCH v5 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch series would enable Qemu to use that same entry point for booting KVM guests. Note: I've withheld Juergen's earlier "Reviewed-by" tags from patches 1 and 7 since there were minor changes (mostly just addition of CONFIG_KVM_GUEST_PVH as requested) that came afterwards. Changes from v4: * Changed subject prefix from RFC to PATCH * Added CONFIG_KVM_GUEST_PVH as suggested * Relocated the PVH common files to arch/x86/platform/pvh/{enlighten.c,head.S} * Realized I also needed to move the objtool override for those files * Updated a few code comments per reviewer feedback * Sent out a patch of the hvm_start_info struct changes against the Xen tree since that is the canonical copy of the header. Discussions on that thread have resulted in some (non-functional) updates to start_info.h (patch 6/7) and those changes are reflected here as well in order to keep the files in sync. The header file has since been ack'ed for the Xen tree by Jan Beulich. https://lists.xenproject.org/archives/html/xen-devel/2018-03/msg02333.html Changes from v3: * Implemented Juergen's suggestion for refactoring and moving the PVH code so that CONFIG_XEN is no longer required for booting KVM guests via the PVH entry point. Functionally, nothing has changed from V3 really, but the patches look completely different now because of all the code movement and refactoring. Some of these patches can be combined, but I've left them very small in some cases to make the refactoring and code movement easier to review. My approach for refactoring has been to create a PVH entry layer that still has understanding and knowledge about Xen vs non-Xen guest types so that it can make run time decisions to handle either case, as opposed to going all the way and re-writing it to be a completely hypervisor agnostic and architecturally pure layer that is separate from guest type details. The latter seemed a bit overkill in this situation. And I've handled the complexity of having to support Qemu/KVM boot of kernels compiled with or without CONFIG_XEN via a pair of xen specific __weak routines that can be overridden in kernels that support Xen guests. Importantly, the __weak routines are for xen specific code only (not generic "guest type" specific code) so there is no clashing between xen version of the strong routine and, say, a KVM version of the same routine. But I'm sure there are many ways to skin this cat, so I'm open to alternate suggestions if there is a compelling reason for not using __weak in this situation. Changes from v2: * All structures (including memory map table entries) are padded and aligned to an 8 byte boundary. * Removed the "packed" attributes and made changes to comments as suggested by Jan. Changes from v1: * Adopted Paolo's suggestion for defining a v2 PVH ABI that includes the e820 map instead of using the second module entry to pass the table. * Cleaned things up a bit to reduce the number of xen vs non-xen special cases. Maran Wilson (7): xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH xen/pvh: Move PVH entry code out of Xen specific tree xen/pvh: Create a new file for Xen specific PVH code xen/pvh: Move Xen specific PVH VM initialization out of common file xen/pvh: Move Xen code for getting mem map via hcall out of common file xen/pvh: Add memory map pointer to hvm_start_info struct KVM: x86: Allow Qemu/KVM to use PVH entry point MAINTAINERS | 1 + arch/x86/Kbuild | 2 + arch/x86/Kconfig| 15 +++ arch/x86/kernel/head_64.S | 4 +- arch/x86/platform/pvh/Makefile | 5 + arch/x86/platform/pvh/enlighten.c | 130 arch/x86/{xen/xen-pvh.S => platform/pvh/head.S} | 0 arch/x86/xen/Kconfig| 3 +- arch/x86/xen/Makefile | 2 - arch/x86/xen/enlighten_pvh.c| 86 include/xen/interface/hvm/start_info.h | 65 +++- 11 files changed, 238 insertions(+), 75 deletions(-) create mode 100644 arch/x86/platform/pvh/Makefile create mode 100644 arch/x86/platform/pvh/enlighten.c rename arch/x86/{xen/xen-pvh.S => platform/pvh/head.S} (100%) -- 2.16.1
[PATCH v5 1/7] xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
In order to pave the way for hypervisors other then Xen to use the PVH entry point for VMs, we need to factor the PVH entry code into Xen specific and hypervisor agnostic components. The first step in doing that, is to create a new config option for PVH entry that can be enabled independently from CONFIG_XEN. Signed-off-by: Maran Wilson --- arch/x86/Kconfig | 7 +++ arch/x86/kernel/head_64.S | 4 ++-- arch/x86/xen/Kconfig | 3 ++- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index eb7f43f23521..58831320b5d2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -791,6 +791,13 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config PVH + bool "Support for running PVH guests" + def_bool n + ---help--- + This option enables the PVH entry point for guest virtual machines + as specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 0f545b3cf926..fc9f678c6413 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -41,7 +41,7 @@ #define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1)) -#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) +#if defined(CONFIG_XEN_PV) || defined(CONFIG_PVH) PGD_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE) PGD_START_KERNEL = pgd_index(__START_KERNEL_map) #endif @@ -387,7 +387,7 @@ NEXT_PAGE(early_dynamic_pgts) .data -#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) +#if defined(CONFIG_XEN_PV) || defined(CONFIG_PVH) NEXT_PGD_PAGE(init_top_pgt) .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC .orginit_top_pgt + PGD_PAGE_OFFSET*8, 0 diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index f605825a04ab..021c8591c3c0 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -77,8 +77,9 @@ config XEN_DEBUG_FS Enabling this option may incur a significant performance overhead. config XEN_PVH - bool "Support for running as a PVH guest" + bool "Support for running as a Xen PVH guest" depends on XEN && XEN_PVHVM && ACPI # Pre-built page tables are not ready to handle 5-level paging. depends on !X86_5LEVEL + select PVH def_bool n -- 2.16.1
[PATCH v5 3/7] xen/pvh: Create a new file for Xen specific PVH code
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The first step in that direction is to create a new file that will eventually hold the Xen specific routines. Signed-off-by: Maran Wilson --- arch/x86/platform/pvh/enlighten.c | 5 ++--- arch/x86/xen/Makefile | 1 + arch/x86/xen/enlighten_pvh.c | 10 ++ 3 files changed, 13 insertions(+), 3 deletions(-) create mode 100644 arch/x86/xen/enlighten_pvh.c diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index 436c4f003e17..74c0a711ebe7 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -16,10 +16,9 @@ /* * PVH variables. * - * xen_pvh and pvh_bootparams need to live in data segment since they - * are used after startup_{32|64}, which clear .bss, are invoked. + * pvh_bootparams needs to live in the data segment since it is + * used after startup_{32|64}, which clear .bss, are invoked. */ -bool xen_pvh __attribute__((section(".data"))) = 0; struct boot_params pvh_bootparams __attribute__((section(".data"))); struct hvm_start_info pvh_start_info; diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index f1b850607212..ae5c6f1f0fe0 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -20,6 +20,7 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o +obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c new file mode 100644 index ..c5409c1f259f --- /dev/null +++ b/arch/x86/xen/enlighten_pvh.c @@ -0,0 +1,10 @@ +#include + +/* + * PVH variables. + * + * The variable xen_pvh needs to live in the data segment since it is used + * after startup_{32|64} is invoked, which will clear the .bss segment. + */ +bool xen_pvh __attribute__((section(".data"))) = 0; + -- 2.16.1
[PATCH v5 5/7] xen/pvh: Move Xen code for getting mem map via hcall out of common file
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The original design for PVH entry in Xen guests relies on being able to obtain the memory map from the hypervisor using a hypercall. When we extend the PVH entry ABI to support other hypervisors like Qemu/KVM, a new mechanism will be added that allows the guest to get the memory map without needing to use hypercalls. For Xen guests, the hypercall approach will still be supported. In preparation for adding support for other hypervisors, we can move the code that uses hypercalls into the Xen specific file. This will allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson Reviewed-by: Juergen Gross --- arch/x86/platform/pvh/enlighten.c | 28 ++-- arch/x86/xen/enlighten_pvh.c | 20 2 files changed, 34 insertions(+), 14 deletions(-) diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index b463ee30517a..347ecb1860d5 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -7,9 +7,6 @@ #include #include -#include -#include - #include #include @@ -24,21 +21,24 @@ struct boot_params pvh_bootparams __attribute__((section(".data"))); struct hvm_start_info pvh_start_info; unsigned int pvh_start_info_sz = sizeof(pvh_start_info); +/* + * Xen guests are able to obtain the memory map from the hypervisor via the + * HYPERVISOR_memory_op hypercall. + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide an override for this routine to do + * just that. + */ +void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) +{ + xen_raw_printk("Error: Could not find memory map\n"); + BUG(); +} + static void __init init_pvh_bootparams(void) { - struct xen_memory_map memmap; - int rc; - memset(_bootparams, 0, sizeof(pvh_bootparams)); - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); - rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); - if (rc) { - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); - BUG(); - } - pvh_bootparams.e820_entries = memmap.nr_entries; + mem_map_via_hcall(_bootparams); if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 08fc63d14ae5..00658d4bc4f4 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,10 +1,15 @@ #include +#include + #include +#include #include #include +#include + /* * PVH variables. * @@ -24,3 +29,18 @@ void __init xen_pvh_init(void) pfn = __pa(hypercall_page); wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); } + +void __init mem_map_via_hcall(struct boot_params *boot_params_p) +{ + struct xen_memory_map memmap; + int rc; + + memmap.nr_entries = ARRAY_SIZE(boot_params_p->e820_table); + set_xen_guest_handle(memmap.buffer, boot_params_p->e820_table); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); + if (rc) { + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + BUG(); + } + boot_params_p->e820_entries = memmap.nr_entries; +} -- 2.16.1
[PATCH v5 4/7] xen/pvh: Move Xen specific PVH VM initialization out of common file
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. This patch moves the small block of code used for initializing Xen PVH virtual machines into the Xen specific file. This initialization is not going to be needed for Qemu/KVM guests. Moving it out of the common file is going to allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Juergen Gross --- arch/x86/platform/pvh/enlighten.c | 28 arch/x86/xen/enlighten_pvh.c | 18 +- 2 files changed, 37 insertions(+), 9 deletions(-) diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index 74c0a711ebe7..b463ee30517a 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -72,26 +72,38 @@ static void __init init_pvh_bootparams(void) pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ } +/* + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide the required override for this routine. + */ +void __init __weak xen_pvh_init(void) +{ + xen_raw_printk("Error: Missing xen PVH initialization\n"); + BUG(); +} + +/* + * When we add support for other hypervisors like Qemu/KVM, this routine can + * selectively invoke the appropriate initialization based on guest type. + */ +static void hypervisor_specific_init(void) +{ + xen_pvh_init(); +} + /* * This routine (and those that it might call) should not use * anything that lives in .bss since that segment will be cleared later. */ void __init xen_prepare_pvh(void) { - u32 msr; - u64 pfn; - if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", pvh_start_info.magic); BUG(); } - xen_pvh = 1; - - msr = cpuid_ebx(xen_cpuid_base() + 2); - pfn = __pa(hypercall_page); - wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); + hypervisor_specific_init(); init_pvh_bootparams(); } diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index c5409c1f259f..08fc63d14ae5 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,4 +1,9 @@ -#include +#include + +#include + +#include +#include /* * PVH variables. @@ -8,3 +13,14 @@ */ bool xen_pvh __attribute__((section(".data"))) = 0; +void __init xen_pvh_init(void) +{ + u32 msr; + u64 pfn; + + xen_pvh = 1; + + msr = cpuid_ebx(xen_cpuid_base() + 2); + pfn = __pa(hypercall_page); + wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); +} -- 2.16.1
[PATCH v5 6/7] xen/pvh: Add memory map pointer to hvm_start_info struct
The start info structure that is defined as part of the x86/HVM direct boot ABI and used for starting Xen PVH guests would be more versatile if it also included a way to pass information about the memory map to the guest. This would allow KVM guests to share the same entry point. Signed-off-by: Maran Wilson --- include/xen/interface/hvm/start_info.h | 65 +- 1 file changed, 64 insertions(+), 1 deletion(-) diff --git a/include/xen/interface/hvm/start_info.h b/include/xen/interface/hvm/start_info.h index 648415976ead..d491f2d89393 100644 --- a/include/xen/interface/hvm/start_info.h +++ b/include/xen/interface/hvm/start_info.h @@ -33,7 +33,7 @@ *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE *|| ("xEn3" with the 0x80 bit of the "E" set). * 4 ++ - *| version| Version of this structure. Current version is 0. New + *| version| Version of this structure. Current version is 1. New *|| versions are guaranteed to be backwards-compatible. * 8 ++ *| flags | SIF_xxx flags. @@ -48,6 +48,15 @@ * 32 ++ *| rsdp_paddr | Physical address of the RSDP ACPI data structure. * 40 ++ + *| memmap_paddr | Physical address of the (optional) memory map. Only + *|| present in version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Zero + *|| if there is no memory map being provided. Only + *|| present in version 1 and newer of the structure. + * 52 ++ + *| reserved | Version 1 and newer only. + * 56 ++ * * The layout of each entry in the module structure is the following: * @@ -62,13 +71,52 @@ *| reserved | * 32 ++ * + * The layout of each entry in the memory map table is as follows: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping in bytes + * 16 ++ + *| type | Type of mapping as defined between the hypervisor + *|| and guest it's starting. See XEN_HVM_MEMMAP_TYPE_* + *|| values below. + * 20 +| + *| reserved | + * 24 ++ + * * The address and sizes are always a 64bit little endian unsigned integer. * * NB: Xen on x86 will always try to place all the data below the 4GiB * boundary. + * + * Version numbers of the hvm_start_info structure have evolved like this: + * + * Version 0: Initial implementation. + * + * Version 1: Added the memmap_paddr/memmap_entries fields (plus 4 bytes of + * padding) to the end of the hvm_start_info struct. These new + * fields can be used to pass a memory map to the guest. The + * memory map is optional and so guests that understand version 1 + * of the structure must check that memmap_entries is non-zero + * before trying to read the memory map. */ #define XEN_HVM_START_MAGIC_VALUE 0x336ec578 +/* + * The values used in the type field of the memory map table entries are + * defined below and match the Address Range Types as defined in the "System + * Address Map Interfaces" section of the ACPI Specification. Please refer to + * section 15 in version 6.2 of the ACPI spec: http://uefi.org/specifications + */ +#define XEN_HVM_MEMMAP_TYPE_RAM 1 +#define XEN_HVM_MEMMAP_TYPE_RESERVED 2 +#define XEN_HVM_MEMMAP_TYPE_ACPI 3 +#define XEN_HVM_MEMMAP_TYPE_NVS 4 +#define XEN_HVM_MEMMAP_TYPE_UNUSABLE 5 +#define XEN_HVM_MEMMAP_TYPE_DISABLED 6 +#define XEN_HVM_MEMMAP_TYPE_PMEM 7 + /* * C representation of the x86/HVM start info layout. * @@ -86,6 +134,14 @@ struct hvm_start_info { uint64_t cmdline_paddr; /* Physical address of the command line. */ uint64_t rsdp_paddr;/* Physical address of the RSDP ACPI data*/ /* structure.*/ +uint64_t memmap_paddr; /* Physical address of an array of */ +/* hvm_memmap_table_entry. Only present in */ +/* version 1 and newer of the structure */ +uint32_t memmap_entries;/* Number of entries in the memmap table.*/ +/* Only present in version 1 and newer of*/ +/* the structure. Value will be zero if */ +/* there is no memory map being provided.*/ +uint32_t reserved; /* Must be zero for Version 1. */ }; struct hvm_modlist_entry { @@ -95,4 +151,11 @@ struct hvm_modlist_entry { ui
[PATCH v5 2/7] xen/pvh: Move PVH entry code out of Xen specific tree
Once hypervisors other than Xen start using the PVH entry point for starting VMs, we would like the option of being able to compile PVH entry capable kernels without enabling CONFIG_XEN and all the code that comes along with that. To allow that, we are moving the PVH code out of Xen and into files sitting at a higher level in the tree. This patch is not introducing any code or functional changes, just moving files from one location to another. Signed-off-by: Maran Wilson Reviewed-by: Konrad Rzeszutek Wilk --- MAINTAINERS| 1 + arch/x86/Kbuild| 2 ++ arch/x86/platform/pvh/Makefile | 5 + arch/x86/{xen/enlighten_pvh.c => platform/pvh/enlighten.c} | 0 arch/x86/{xen/xen-pvh.S => platform/pvh/head.S}| 0 arch/x86/xen/Makefile | 3 --- 6 files changed, 8 insertions(+), 3 deletions(-) create mode 100644 arch/x86/platform/pvh/Makefile rename arch/x86/{xen/enlighten_pvh.c => platform/pvh/enlighten.c} (100%) rename arch/x86/{xen/xen-pvh.S => platform/pvh/head.S} (100%) diff --git a/MAINTAINERS b/MAINTAINERS index 93a12af4f180..58a836f39ad4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15210,6 +15210,7 @@ L: xen-de...@lists.xenproject.org (moderated for non-subscribers) T: git git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git S: Supported F: arch/x86/xen/ +F: arch/x86/platform/pvh/ F: drivers/*/xen-*front.c F: drivers/xen/ F: arch/x86/include/asm/xen/ diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index 0038a2d10a7a..2089e4414300 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,6 +7,8 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ +obj-$(CONFIG_XEN_PVH) += platform/pvh/ + # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/platform/pvh/Makefile b/arch/x86/platform/pvh/Makefile new file mode 100644 index ..9fd25efcd2a3 --- /dev/null +++ b/arch/x86/platform/pvh/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 +OBJECT_FILES_NON_STANDARD_head.o := y + +obj-$(CONFIG_XEN_PVH) += enlighten.o +obj-$(CONFIG_XEN_PVH) += head.o diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/platform/pvh/enlighten.c similarity index 100% rename from arch/x86/xen/enlighten_pvh.c rename to arch/x86/platform/pvh/enlighten.c diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/platform/pvh/head.S similarity index 100% rename from arch/x86/xen/xen-pvh.S rename to arch/x86/platform/pvh/head.S diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index d83cb5478f54..f1b850607212 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -1,6 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 OBJECT_FILES_NON_STANDARD_xen-asm_$(BITS).o := y -OBJECT_FILES_NON_STANDARD_xen-pvh.o := y ifdef CONFIG_FUNCTION_TRACER # Do not profile debug and lowlevel utilities @@ -21,7 +20,6 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o -obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o @@ -33,4 +31,3 @@ obj-$(CONFIG_XEN_DEBUG_FS)+= debugfs.o obj-$(CONFIG_XEN_DOM0) += vga.o obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o obj-$(CONFIG_XEN_EFI) += efi.o -obj-$(CONFIG_XEN_PVH) += xen-pvh.o -- 2.16.1
[PATCH v5 7/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch enables Qemu to use that same entry point for booting KVM guests. Signed-off-by: Maran Wilson Suggested-by: Konrad Rzeszutek Wilk Suggested-by: Boris Ostrovsky Tested-by: Boris Ostrovsky --- arch/x86/Kbuild | 2 +- arch/x86/Kconfig | 8 arch/x86/platform/pvh/Makefile| 4 ++-- arch/x86/platform/pvh/enlighten.c | 43 +-- 4 files changed, 43 insertions(+), 14 deletions(-) diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index 2089e4414300..c625f57472f7 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,7 +7,7 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ -obj-$(CONFIG_XEN_PVH) += platform/pvh/ +obj-$(CONFIG_PVH) += platform/pvh/ # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 58831320b5d2..74ad956ee0f6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -798,6 +798,14 @@ config PVH This option enables the PVH entry point for guest virtual machines as specified in the x86/HVM direct boot ABI. +config KVM_GUEST_PVH + bool "Support for running as a KVM PVH guest" + depends on KVM_GUEST + select PVH + ---help--- + This option enables starting KVM guests via the PVH entry point as + specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS diff --git a/arch/x86/platform/pvh/Makefile b/arch/x86/platform/pvh/Makefile index 9fd25efcd2a3..5dec5067c9fb 100644 --- a/arch/x86/platform/pvh/Makefile +++ b/arch/x86/platform/pvh/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 OBJECT_FILES_NON_STANDARD_head.o := y -obj-$(CONFIG_XEN_PVH) += enlighten.o -obj-$(CONFIG_XEN_PVH) += head.o +obj-$(CONFIG_PVH) += enlighten.o +obj-$(CONFIG_PVH) += head.o diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index 347ecb1860d5..433f586d8302 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -7,6 +7,9 @@ #include #include +#include +#include + #include #include @@ -34,11 +37,28 @@ void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) BUG(); } -static void __init init_pvh_bootparams(void) +static void __init init_pvh_bootparams(bool xen_guest) { memset(_bootparams, 0, sizeof(pvh_bootparams)); - mem_map_via_hcall(_bootparams); + if ((pvh_start_info.version > 0) && (pvh_start_info.memmap_entries)) { + struct hvm_memmap_table_entry *ep; + int i; + + ep = __va(pvh_start_info.memmap_paddr); + pvh_bootparams.e820_entries = pvh_start_info.memmap_entries; + + for (i = 0; i < pvh_bootparams.e820_entries ; i++, ep++) { + pvh_bootparams.e820_table[i].addr = ep->addr; + pvh_bootparams.e820_table[i].size = ep->size; + pvh_bootparams.e820_table[i].type = ep->type; + } + } else if (xen_guest) { + mem_map_via_hcall(_bootparams); + } else { + /* Non-xen guests are not supported by version 0 */ + BUG(); + } if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = @@ -69,7 +89,7 @@ static void __init init_pvh_bootparams(void) * environment (i.e. hardware_subarch 0). */ pvh_bootparams.hdr.version = 0x212; - pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ + pvh_bootparams.hdr.type_of_loader = ((xen_guest ? 0x9 : 0xb) << 4) | 0; } /* @@ -82,13 +102,10 @@ void __init __weak xen_pvh_init(void) BUG(); } -/* - * When we add support for other hypervisors like Qemu/KVM, this routine can - * selectively invoke the appropriate initialization based on guest type. - */ -static void hypervisor_specific_init(void) +static void hypervisor_specific_init(bool xen_guest) { - xen_pvh_init(); + if (xen_guest) + xen_pvh_init(); } /* @@ -97,13 +114,17 @@ static void hypervisor_specific_init(void) */ void __init xen_prepare_pvh(void) { + + u32 msr = xen_cpuid_base();
Re: [PATCH v5 1/7] xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
On 3/20/2018 12:23 PM, Randy Dunlap wrote: Hi, On 03/20/2018 12:18 PM, Maran Wilson wrote: In order to pave the way for hypervisors other then Xen to use the PVH than entry point for VMs, we need to factor the PVH entry code into Xen specific and hypervisor agnostic components. The first step in doing that, is to create a new config option for PVH entry that can be enabled independently from CONFIG_XEN. Signed-off-by: Maran Wilson --- arch/x86/Kconfig | 7 +++ arch/x86/kernel/head_64.S | 4 ++-- arch/x86/xen/Kconfig | 3 ++- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index eb7f43f23521..58831320b5d2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -791,6 +791,13 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config PVH + bool "Support for running PVH guests" + def_bool n You don't need two (2) "bool"s here. And 'n' is already the default, so just drop the second line. + ---help--- + This option enables the PVH entry point for guest virtual machines + as specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS Hi Randy, Will make both changes. Thanks, -Maran
Re: [PATCH v7 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
On 4/18/2018 1:11 AM, Linus Walleij wrote: I wonder why I am starting to get CCed on Xen patches all of a sudden. I happened to run into Jürgen at a conference only last weekend, but I still don't know anything whatsoever about Xen or how it works. If get_maintainer.pl has started to return my name on this stuff I really want to know why :/ It has nothing to do with Xen actually. But for some reason, the get_maintainer.pl script is returning your name for any patch that modifies the MAINTAINERS file. Although why that is the case wasn't clear to me based on a quick look at both those files. -Maran Yours, Linus Walleij
Re: [PATCH v7 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
Friendly ping. I am hopeful one of the x86 and/or KVM maintainers has a few cycles to spare to look this over. And thanks to everyone who has helped thus far by providing valuable feedback and reviewing. https://lkml.org/lkml/2018/4/16/1002 Thanks, -Maran On 4/16/2018 4:09 PM, Maran Wilson wrote: For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch series would enable Qemu to use that same entry point for booting KVM guests. Changes from v6: * Addressed issues caught by the kbuild test robot: - Restored an #include line that had been dropped by mistake (patch 4) - Removed a pair of #include lines that were no longer needed in a common code file and causing problems for certain 32-bit configs (patchs 4 and 7) Changes from v5: * The interface changes to the x86/HVM start info layout have now been accepted into the Xen tree. * Rebase and merge upstream PVH file changes. * (Patch 6) Synced up to the final version of the header file that was acked and pulled into the Xen tree. * (Patch 1) Fixed typo and removed redundant "def_bool n" line. Changes from v4: Note: I've withheld Juergen's earlier "Reviewed-by" tags from patches 1 and 7 since there were minor changes (mostly just addition of CONFIG_KVM_GUEST_PVH as requested) that came afterwards. * Changed subject prefix from RFC to PATCH * Added CONFIG_KVM_GUEST_PVH as suggested * Relocated the PVH common files to arch/x86/platform/pvh/{enlighten.c,head.S} * Realized I also needed to move the objtool override for those files * Updated a few code comments per reviewer feedback * Sent out a patch of the hvm_start_info struct changes against the Xen tree since that is the canonical copy of the header. Discussions on that thread have resulted in some (non-functional) updates to start_info.h (patch 6/7) and those changes are reflected here as well in order to keep the files in sync. The header file has since been ack'ed for the Xen tree by Jan Beulich. Changes from v3: * Implemented Juergen's suggestion for refactoring and moving the PVH code so that CONFIG_XEN is no longer required for booting KVM guests via the PVH entry point. Functionally, nothing has changed from V3 really, but the patches look completely different now because of all the code movement and refactoring. Some of these patches can be combined, but I've left them very small in some cases to make the refactoring and code movement easier to review. My approach for refactoring has been to create a PVH entry layer that still has understanding and knowledge about Xen vs non-Xen guest types so that it can make run time decisions to handle either case, as opposed to going all the way and re-writing it to be a completely hypervisor agnostic and architecturally pure layer that is separate from guest type details. The latter seemed a bit overkill in this situation. And I've handled the complexity of having to support Qemu/KVM boot of kernels compiled with or without CONFIG_XEN via a pair of xen specific __weak routines that can be overridden in kernels that support Xen guests. Importantly, the __weak routines are for xen specific code only (not generic "guest type" specific code) so there is no clashing between xen version of the strong routine and, say, a KVM version of the same routine. But I'm sure there are many ways to skin this cat, so I'm open to alternate suggestions if there is a compelling reason for not using __weak in this situation. Changes from v2: * All structures (including memory map table entries) are padded and aligned to an 8 byte boundary. * Removed the "packed" attributes and made changes to comments as suggested by Jan. Changes from v1: * Adopted Paolo's suggestion for defining a v2 PVH ABI that includes the e820 map instead of using the second module entry to pass the table. * Cleaned things up a bit to reduce the number of xen vs non-xen special cases. Maran Wilson (7): xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH xen/pvh: Move PVH entry code out of Xen specific tree xen/pvh: Create a new file for Xen specific PVH code xen/pvh: Move Xen specific PVH VM initialization out of common file xen/pvh: Move Xen code for getting mem map via hcall out of common file xen/pvh: Add memory map pointer to hvm_start_info struct KVM: x86: Allow Qemu/KVM to use PVH entry point MAINTAINERS
[RFC PATCH] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/hvmlite.html This PoC patch enables Qemu to use that same entry point for booting KVM guests. Even though the code is still PoC quality, I'm sending this as an RFC now since there are a number of different ways the specific implementation details can be handled. I chose a shared code path for Xen and KVM guests but could just as easily create a separate code path that is advertised by a different ELF note for KVM. There also seems to be some flexibility in how the e820 table data is passed and how (or if) it should be identified as e820 data. As a starting point, I've chosen the options that seem to result in the smallest patch with minimal to no changes required of the x86/HVM direct boot ABI. --- arch/x86/xen/enlighten_pvh.c | 74 1 file changed, 55 insertions(+), 19 deletions(-) diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 98ab176..d93f711 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -31,21 +31,46 @@ static void xen_pvh_arch_setup(void) acpi_irq_model = ACPI_IRQ_MODEL_PLATFORM; } -static void __init init_pvh_bootparams(void) +static void __init init_pvh_bootparams(bool xen_guest) { struct xen_memory_map memmap; int rc; memset(_bootparams, 0, sizeof(pvh_bootparams)); - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); - rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); - if (rc) { - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); - BUG(); + if (xen_guest) { + memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); + set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); + if (rc) { + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + BUG(); + } + pvh_bootparams.e820_entries = memmap.nr_entries; + } else if (pvh_start_info.nr_modules > 1) { + /* The second module should be the e820 data for KVM guests */ + struct hvm_modlist_entry *modaddr; + char e820_sig[] = "e820 data"; + struct boot_e820_entry *ep; + struct e820_table *tp; + char *cmdline_str; + int idx; + + modaddr = __va(pvh_start_info.modlist_paddr + + sizeof(struct hvm_modlist_entry)); + cmdline_str = __va(modaddr->cmdline_paddr); + + if ((modaddr->cmdline_paddr) && + (!strncmp(e820_sig, cmdline_str, sizeof(e820_sig { + tp = __va(modaddr->paddr); + ep = (struct boot_e820_entry *)tp->entries; + + pvh_bootparams.e820_entries = tp->nr_entries; + + for (idx = 0; idx < tp->nr_entries ; idx++, ep++) + pvh_bootparams.e820_table[idx] = *ep; + } } - pvh_bootparams.e820_entries = memmap.nr_entries; if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = @@ -55,8 +80,9 @@ static void __init init_pvh_bootparams(void) pvh_bootparams.e820_table[pvh_bootparams.e820_entries].type = E820_TYPE_RESERVED; pvh_bootparams.e820_entries++; - } else + } else if (xen_guest) { xen_raw_printk("Warning: Can fit ISA range into e820\n"); + } pvh_bootparams.hdr.cmd_line_ptr = pvh_start_info.cmdline_paddr; @@ -76,7 +102,7 @@ static void __init init_pvh_bootparams(void) * environment (i.e. hardware_subarch 0). */ pvh_bootparams.hdr.version = 0x212; - pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ + pvh_bootparams.hdr.type_of_loader = ((xen_guest ? 0x9 : 0xb) << 4) | 0; } /* @@ -85,22 +111,32 @@ static void __init init_pvh_bootparams(void) */ void __init xen_prepare_pvh(void) { - u32 msr; + + u32 msr = xen_cpuid_base(); u64 pfn; + bool xen_guest = msr ? true : false; if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { - xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", -
[PATCH v6 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch series would enable Qemu to use that same entry point for booting KVM guests. Changes from v5: * The interface changes to the x86/HVM start info layout have now been accepted into the Xen tree. * Rebase and merge upstream PVH file changes. * (Patch 6) Synced up to the final version of the header file that was acked and pulled into the Xen tree. * (Patch 1) Fixed typo and removed redundant "def_bool n" line. Changes from v4: Note: I've withheld Juergen's earlier "Reviewed-by" tags from patches 1 and 7 since there were minor changes (mostly just addition of CONFIG_KVM_GUEST_PVH as requested) that came afterwards. * Changed subject prefix from RFC to PATCH * Added CONFIG_KVM_GUEST_PVH as suggested * Relocated the PVH common files to arch/x86/platform/pvh/{enlighten.c,head.S} * Realized I also needed to move the objtool override for those files * Updated a few code comments per reviewer feedback * Sent out a patch of the hvm_start_info struct changes against the Xen tree since that is the canonical copy of the header. Discussions on that thread have resulted in some (non-functional) updates to start_info.h (patch 6/7) and those changes are reflected here as well in order to keep the files in sync. The header file has since been ack'ed for the Xen tree by Jan Beulich. Changes from v3: * Implemented Juergen's suggestion for refactoring and moving the PVH code so that CONFIG_XEN is no longer required for booting KVM guests via the PVH entry point. Functionally, nothing has changed from V3 really, but the patches look completely different now because of all the code movement and refactoring. Some of these patches can be combined, but I've left them very small in some cases to make the refactoring and code movement easier to review. My approach for refactoring has been to create a PVH entry layer that still has understanding and knowledge about Xen vs non-Xen guest types so that it can make run time decisions to handle either case, as opposed to going all the way and re-writing it to be a completely hypervisor agnostic and architecturally pure layer that is separate from guest type details. The latter seemed a bit overkill in this situation. And I've handled the complexity of having to support Qemu/KVM boot of kernels compiled with or without CONFIG_XEN via a pair of xen specific __weak routines that can be overridden in kernels that support Xen guests. Importantly, the __weak routines are for xen specific code only (not generic "guest type" specific code) so there is no clashing between xen version of the strong routine and, say, a KVM version of the same routine. But I'm sure there are many ways to skin this cat, so I'm open to alternate suggestions if there is a compelling reason for not using __weak in this situation. Changes from v2: * All structures (including memory map table entries) are padded and aligned to an 8 byte boundary. * Removed the "packed" attributes and made changes to comments as suggested by Jan. Changes from v1: * Adopted Paolo's suggestion for defining a v2 PVH ABI that includes the e820 map instead of using the second module entry to pass the table. * Cleaned things up a bit to reduce the number of xen vs non-xen special cases. Maran Wilson (7): xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH xen/pvh: Move PVH entry code out of Xen specific tree xen/pvh: Create a new file for Xen specific PVH code xen/pvh: Move Xen specific PVH VM initialization out of common file xen/pvh: Move Xen code for getting mem map via hcall out of common file xen/pvh: Add memory map pointer to hvm_start_info struct KVM: x86: Allow Qemu/KVM to use PVH entry point MAINTAINERS | 1 + arch/x86/Kbuild | 2 + arch/x86/Kconfig| 14 +++ arch/x86/kernel/head_64.S | 2 +- arch/x86/platform/pvh/Makefile | 5 + arch/x86/platform/pvh/enlighten.c | 138 arch/x86/{xen/xen-pvh.S => platform/pvh/head.S} | 0 arch/x86/xen/Kconfig| 3 +- arch/x86/xen/Makefile | 2 - arch/x86/xen/enlighten_pvh.c| 94 +++- include/xen/interface/hvm/start_info.h | 63 ++- 11 files changed, 242
[PATCH v6 1/7] xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
In order to pave the way for hypervisors other than Xen to use the PVH entry point for VMs, we need to factor the PVH entry code into Xen specific and hypervisor agnostic components. The first step in doing that, is to create a new config option for PVH entry that can be enabled independently from CONFIG_XEN. Signed-off-by: Maran Wilson --- arch/x86/Kconfig | 6 ++ arch/x86/kernel/head_64.S | 2 +- arch/x86/xen/Kconfig | 3 ++- 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 27fede438959..e3b836d7ad09 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -781,6 +781,12 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config PVH + bool "Support for running PVH guests" + ---help--- + This option enables the PVH entry point for guest virtual machines + as specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 48385c1074a5..d83f2b110b47 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -385,7 +385,7 @@ NEXT_PAGE(early_dynamic_pgts) .data -#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) +#if defined(CONFIG_XEN_PV) || defined(CONFIG_PVH) NEXT_PGD_PAGE(init_top_pgt) .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC .orginit_top_pgt + L4_PAGE_OFFSET*8, 0 diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index c1f98f32c45f..5fccee76f44d 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -74,6 +74,7 @@ config XEN_DEBUG_FS Enabling this option may incur a significant performance overhead. config XEN_PVH - bool "Support for running as a PVH guest" + bool "Support for running as a Xen PVH guest" depends on XEN && XEN_PVHVM && ACPI + select PVH def_bool n -- 2.16.1
[PATCH v6 2/7] xen/pvh: Move PVH entry code out of Xen specific tree
Once hypervisors other than Xen start using the PVH entry point for starting VMs, we would like the option of being able to compile PVH entry capable kernels without enabling CONFIG_XEN and all the code that comes along with that. To allow that, we are moving the PVH code out of Xen and into files sitting at a higher level in the tree. This patch is not introducing any code or functional changes, just moving files from one location to another. Signed-off-by: Maran Wilson Reviewed-by: Konrad Rzeszutek Wilk --- MAINTAINERS| 1 + arch/x86/Kbuild| 2 ++ arch/x86/platform/pvh/Makefile | 5 + arch/x86/{xen/enlighten_pvh.c => platform/pvh/enlighten.c} | 0 arch/x86/{xen/xen-pvh.S => platform/pvh/head.S}| 0 arch/x86/xen/Makefile | 3 --- 6 files changed, 8 insertions(+), 3 deletions(-) create mode 100644 arch/x86/platform/pvh/Makefile rename arch/x86/{xen/enlighten_pvh.c => platform/pvh/enlighten.c} (100%) rename arch/x86/{xen/xen-pvh.S => platform/pvh/head.S} (100%) diff --git a/MAINTAINERS b/MAINTAINERS index 65ab509e4a42..52afae73beab 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15189,6 +15189,7 @@ L: xen-de...@lists.xenproject.org (moderated for non-subscribers) T: git git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git S: Supported F: arch/x86/xen/ +F: arch/x86/platform/pvh/ F: drivers/*/xen-*front.c F: drivers/xen/ F: arch/x86/include/asm/xen/ diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index 0038a2d10a7a..2089e4414300 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,6 +7,8 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ +obj-$(CONFIG_XEN_PVH) += platform/pvh/ + # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/platform/pvh/Makefile b/arch/x86/platform/pvh/Makefile new file mode 100644 index ..9fd25efcd2a3 --- /dev/null +++ b/arch/x86/platform/pvh/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 +OBJECT_FILES_NON_STANDARD_head.o := y + +obj-$(CONFIG_XEN_PVH) += enlighten.o +obj-$(CONFIG_XEN_PVH) += head.o diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/platform/pvh/enlighten.c similarity index 100% rename from arch/x86/xen/enlighten_pvh.c rename to arch/x86/platform/pvh/enlighten.c diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/platform/pvh/head.S similarity index 100% rename from arch/x86/xen/xen-pvh.S rename to arch/x86/platform/pvh/head.S diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index d83cb5478f54..f1b850607212 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -1,6 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 OBJECT_FILES_NON_STANDARD_xen-asm_$(BITS).o := y -OBJECT_FILES_NON_STANDARD_xen-pvh.o := y ifdef CONFIG_FUNCTION_TRACER # Do not profile debug and lowlevel utilities @@ -21,7 +20,6 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o -obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o @@ -33,4 +31,3 @@ obj-$(CONFIG_XEN_DEBUG_FS)+= debugfs.o obj-$(CONFIG_XEN_DOM0) += vga.o obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o obj-$(CONFIG_XEN_EFI) += efi.o -obj-$(CONFIG_XEN_PVH) += xen-pvh.o -- 2.16.1
[PATCH v6 5/7] xen/pvh: Move Xen code for getting mem map via hcall out of common file
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The original design for PVH entry in Xen guests relies on being able to obtain the memory map from the hypervisor using a hypercall. When we extend the PVH entry ABI to support other hypervisors like Qemu/KVM, a new mechanism will be added that allows the guest to get the memory map without needing to use hypercalls. For Xen guests, the hypercall approach will still be supported. In preparation for adding support for other hypervisors, we can move the code that uses hypercalls into the Xen specific file. This will allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson Reviewed-by: Juergen Gross --- arch/x86/platform/pvh/enlighten.c | 28 ++-- arch/x86/xen/enlighten_pvh.c | 20 2 files changed, 34 insertions(+), 14 deletions(-) diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index edcff7de0529..efbceba8db4f 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -8,9 +8,6 @@ #include #include -#include -#include - #include #include @@ -30,21 +27,24 @@ static u64 pvh_get_root_pointer(void) return pvh_start_info.rsdp_paddr; } +/* + * Xen guests are able to obtain the memory map from the hypervisor via the + * HYPERVISOR_memory_op hypercall. + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide an override for this routine to do + * just that. + */ +void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) +{ + xen_raw_printk("Error: Could not find memory map\n"); + BUG(); +} + static void __init init_pvh_bootparams(void) { - struct xen_memory_map memmap; - int rc; - memset(_bootparams, 0, sizeof(pvh_bootparams)); - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); - rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); - if (rc) { - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); - BUG(); - } - pvh_bootparams.e820_entries = memmap.nr_entries; + mem_map_via_hcall(_bootparams); if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 08fc63d14ae5..00658d4bc4f4 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,10 +1,15 @@ #include +#include + #include +#include #include #include +#include + /* * PVH variables. * @@ -24,3 +29,18 @@ void __init xen_pvh_init(void) pfn = __pa(hypercall_page); wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); } + +void __init mem_map_via_hcall(struct boot_params *boot_params_p) +{ + struct xen_memory_map memmap; + int rc; + + memmap.nr_entries = ARRAY_SIZE(boot_params_p->e820_table); + set_xen_guest_handle(memmap.buffer, boot_params_p->e820_table); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); + if (rc) { + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + BUG(); + } + boot_params_p->e820_entries = memmap.nr_entries; +} -- 2.16.1
[PATCH v6 4/7] xen/pvh: Move Xen specific PVH VM initialization out of common file
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. This patch moves the small block of code used for initializing Xen PVH virtual machines into the Xen specific file. This initialization is not going to be needed for Qemu/KVM guests. Moving it out of the common file is going to allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Juergen Gross --- arch/x86/platform/pvh/enlighten.c | 28 arch/x86/xen/enlighten_pvh.c | 18 +- 2 files changed, 37 insertions(+), 9 deletions(-) diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index 74ff1c3d2789..edcff7de0529 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -80,26 +80,38 @@ static void __init init_pvh_bootparams(void) x86_init.acpi.get_root_pointer = pvh_get_root_pointer; } +/* + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide the required override for this routine. + */ +void __init __weak xen_pvh_init(void) +{ + xen_raw_printk("Error: Missing xen PVH initialization\n"); + BUG(); +} + +/* + * When we add support for other hypervisors like Qemu/KVM, this routine can + * selectively invoke the appropriate initialization based on guest type. + */ +static void hypervisor_specific_init(void) +{ + xen_pvh_init(); +} + /* * This routine (and those that it might call) should not use * anything that lives in .bss since that segment will be cleared later. */ void __init xen_prepare_pvh(void) { - u32 msr; - u64 pfn; - if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", pvh_start_info.magic); BUG(); } - xen_pvh = 1; - - msr = cpuid_ebx(xen_cpuid_base() + 2); - pfn = __pa(hypercall_page); - wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); + hypervisor_specific_init(); init_pvh_bootparams(); } diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index c5409c1f259f..08fc63d14ae5 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,4 +1,9 @@ -#include +#include + +#include + +#include +#include /* * PVH variables. @@ -8,3 +13,14 @@ */ bool xen_pvh __attribute__((section(".data"))) = 0; +void __init xen_pvh_init(void) +{ + u32 msr; + u64 pfn; + + xen_pvh = 1; + + msr = cpuid_ebx(xen_cpuid_base() + 2); + pfn = __pa(hypercall_page); + wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); +} -- 2.16.1
[PATCH v6 3/7] xen/pvh: Create a new file for Xen specific PVH code
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The first step in that direction is to create a new file that will eventually hold the Xen specific routines. Signed-off-by: Maran Wilson --- arch/x86/platform/pvh/enlighten.c | 5 ++--- arch/x86/xen/Makefile | 1 + arch/x86/xen/enlighten_pvh.c | 10 ++ 3 files changed, 13 insertions(+), 3 deletions(-) create mode 100644 arch/x86/xen/enlighten_pvh.c diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index aa1c6a6831a9..74ff1c3d2789 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -17,10 +17,9 @@ /* * PVH variables. * - * xen_pvh pvh_bootparams and pvh_start_info need to live in data segment - * since they are used after startup_{32|64}, which clear .bss, are invoked. + * pvh_bootparams and pvh_start_info need to live in the data segment since + * they are used after startup_{32|64}, which clear .bss, are invoked. */ -bool xen_pvh __attribute__((section(".data"))) = 0; struct boot_params pvh_bootparams __attribute__((section(".data"))); struct hvm_start_info pvh_start_info __attribute__((section(".data"))); diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index f1b850607212..ae5c6f1f0fe0 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -20,6 +20,7 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o +obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c new file mode 100644 index ..c5409c1f259f --- /dev/null +++ b/arch/x86/xen/enlighten_pvh.c @@ -0,0 +1,10 @@ +#include + +/* + * PVH variables. + * + * The variable xen_pvh needs to live in the data segment since it is used + * after startup_{32|64} is invoked, which will clear the .bss segment. + */ +bool xen_pvh __attribute__((section(".data"))) = 0; + -- 2.16.1
[PATCH v6 6/7] xen/pvh: Add memory map pointer to hvm_start_info struct
The start info structure that is defined as part of the x86/HVM direct boot ABI and used for starting Xen PVH guests would be more versatile if it also included a way to pass information about the memory map to the guest. This would allow KVM guests to share the same entry point. Signed-off-by: Maran Wilson --- include/xen/interface/hvm/start_info.h | 63 +- 1 file changed, 62 insertions(+), 1 deletion(-) diff --git a/include/xen/interface/hvm/start_info.h b/include/xen/interface/hvm/start_info.h index 648415976ead..50af9ea2ff1e 100644 --- a/include/xen/interface/hvm/start_info.h +++ b/include/xen/interface/hvm/start_info.h @@ -33,7 +33,7 @@ *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE *|| ("xEn3" with the 0x80 bit of the "E" set). * 4 ++ - *| version| Version of this structure. Current version is 0. New + *| version| Version of this structure. Current version is 1. New *|| versions are guaranteed to be backwards-compatible. * 8 ++ *| flags | SIF_xxx flags. @@ -48,6 +48,15 @@ * 32 ++ *| rsdp_paddr | Physical address of the RSDP ACPI data structure. * 40 ++ + *| memmap_paddr | Physical address of the (optional) memory map. Only + *|| present in version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Zero + *|| if there is no memory map being provided. Only + *|| present in version 1 and newer of the structure. + * 52 ++ + *| reserved | Version 1 and newer only. + * 56 ++ * * The layout of each entry in the module structure is the following: * @@ -62,13 +71,51 @@ *| reserved | * 32 ++ * + * The layout of each entry in the memory map table is as follows: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping in bytes + * 16 ++ + *| type | Type of mapping as defined between the hypervisor + *|| and guest. See XEN_HVM_MEMMAP_TYPE_* values below. + * 20 +| + *| reserved | + * 24 ++ + * * The address and sizes are always a 64bit little endian unsigned integer. * * NB: Xen on x86 will always try to place all the data below the 4GiB * boundary. + * + * Version numbers of the hvm_start_info structure have evolved like this: + * + * Version 0: Initial implementation. + * + * Version 1: Added the memmap_paddr/memmap_entries fields (plus 4 bytes of + * padding) to the end of the hvm_start_info struct. These new + * fields can be used to pass a memory map to the guest. The + * memory map is optional and so guests that understand version 1 + * of the structure must check that memmap_entries is non-zero + * before trying to read the memory map. */ #define XEN_HVM_START_MAGIC_VALUE 0x336ec578 +/* + * The values used in the type field of the memory map table entries are + * defined below and match the Address Range Types as defined in the "System + * Address Map Interfaces" section of the ACPI Specification. Please refer to + * section 15 in version 6.2 of the ACPI spec: http://uefi.org/specifications + */ +#define XEN_HVM_MEMMAP_TYPE_RAM 1 +#define XEN_HVM_MEMMAP_TYPE_RESERVED 2 +#define XEN_HVM_MEMMAP_TYPE_ACPI 3 +#define XEN_HVM_MEMMAP_TYPE_NVS 4 +#define XEN_HVM_MEMMAP_TYPE_UNUSABLE 5 +#define XEN_HVM_MEMMAP_TYPE_DISABLED 6 +#define XEN_HVM_MEMMAP_TYPE_PMEM 7 + /* * C representation of the x86/HVM start info layout. * @@ -86,6 +133,13 @@ struct hvm_start_info { uint64_t cmdline_paddr; /* Physical address of the command line. */ uint64_t rsdp_paddr;/* Physical address of the RSDP ACPI data*/ /* structure.*/ +/* All following fields only present in version 1 and newer */ +uint64_t memmap_paddr; /* Physical address of an array of */ +/* hvm_memmap_table_entry. */ +uint32_t memmap_entries;/* Number of entries in the memmap table.*/ +/* Value will be zero if there is no memory */ +/* map being provided. */ +uint32_t reserved; /* Must be zero. */ }; struct hvm_modlist_entry { @@ -95,4 +149,11 @@ struct hvm_modlist_entry { uint64_t reserved; }; +struct hvm_memmap_table_entry { +uint64_t addr; /* Base address of the memory region
[PATCH v6 7/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch enables Qemu to use that same entry point for booting KVM guests. Signed-off-by: Maran Wilson Suggested-by: Konrad Rzeszutek Wilk Suggested-by: Boris Ostrovsky Tested-by: Boris Ostrovsky --- arch/x86/Kbuild | 2 +- arch/x86/Kconfig | 8 arch/x86/platform/pvh/Makefile| 4 ++-- arch/x86/platform/pvh/enlighten.c | 43 +-- 4 files changed, 43 insertions(+), 14 deletions(-) diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index 2089e4414300..c625f57472f7 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,7 +7,7 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ -obj-$(CONFIG_XEN_PVH) += platform/pvh/ +obj-$(CONFIG_PVH) += platform/pvh/ # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index e3b836d7ad09..1e6d83e181b5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -787,6 +787,14 @@ config PVH This option enables the PVH entry point for guest virtual machines as specified in the x86/HVM direct boot ABI. +config KVM_GUEST_PVH + bool "Support for running as a KVM PVH guest" + depends on KVM_GUEST + select PVH + ---help--- + This option enables starting KVM guests via the PVH entry point as + specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS diff --git a/arch/x86/platform/pvh/Makefile b/arch/x86/platform/pvh/Makefile index 9fd25efcd2a3..5dec5067c9fb 100644 --- a/arch/x86/platform/pvh/Makefile +++ b/arch/x86/platform/pvh/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 OBJECT_FILES_NON_STANDARD_head.o := y -obj-$(CONFIG_XEN_PVH) += enlighten.o -obj-$(CONFIG_XEN_PVH) += head.o +obj-$(CONFIG_PVH) += enlighten.o +obj-$(CONFIG_PVH) += head.o diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index efbceba8db4f..815a09ad625c 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -8,6 +8,9 @@ #include #include +#include +#include + #include #include @@ -40,11 +43,28 @@ void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) BUG(); } -static void __init init_pvh_bootparams(void) +static void __init init_pvh_bootparams(bool xen_guest) { memset(_bootparams, 0, sizeof(pvh_bootparams)); - mem_map_via_hcall(_bootparams); + if ((pvh_start_info.version > 0) && (pvh_start_info.memmap_entries)) { + struct hvm_memmap_table_entry *ep; + int i; + + ep = __va(pvh_start_info.memmap_paddr); + pvh_bootparams.e820_entries = pvh_start_info.memmap_entries; + + for (i = 0; i < pvh_bootparams.e820_entries ; i++, ep++) { + pvh_bootparams.e820_table[i].addr = ep->addr; + pvh_bootparams.e820_table[i].size = ep->size; + pvh_bootparams.e820_table[i].type = ep->type; + } + } else if (xen_guest) { + mem_map_via_hcall(_bootparams); + } else { + /* Non-xen guests are not supported by version 0 */ + BUG(); + } if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = @@ -75,7 +95,7 @@ static void __init init_pvh_bootparams(void) * environment (i.e. hardware_subarch 0). */ pvh_bootparams.hdr.version = 0x212; - pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ + pvh_bootparams.hdr.type_of_loader = ((xen_guest ? 0x9 : 0xb) << 4) | 0; x86_init.acpi.get_root_pointer = pvh_get_root_pointer; } @@ -90,13 +110,10 @@ void __init __weak xen_pvh_init(void) BUG(); } -/* - * When we add support for other hypervisors like Qemu/KVM, this routine can - * selectively invoke the appropriate initialization based on guest type. - */ -static void hypervisor_specific_init(void) +static void hypervisor_specific_init(bool xen_guest) { - xen_pvh_init(); + if (xen_guest) + xen_pvh_init(); } /* @@ -105,13 +122,17 @@ static void hypervisor_specific_init(void) */ void __init xen_prepare_pvh(void) { +
[PATCH v7 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch series would enable Qemu to use that same entry point for booting KVM guests. Changes from v6: * Addressed issues caught by the kbuild test robot: - Restored an #include line that had been dropped by mistake (patch 4) - Removed a pair of #include lines that were no longer needed in a common code file and causing problems for certain 32-bit configs (patchs 4 and 7) Changes from v5: * The interface changes to the x86/HVM start info layout have now been accepted into the Xen tree. * Rebase and merge upstream PVH file changes. * (Patch 6) Synced up to the final version of the header file that was acked and pulled into the Xen tree. * (Patch 1) Fixed typo and removed redundant "def_bool n" line. Changes from v4: Note: I've withheld Juergen's earlier "Reviewed-by" tags from patches 1 and 7 since there were minor changes (mostly just addition of CONFIG_KVM_GUEST_PVH as requested) that came afterwards. * Changed subject prefix from RFC to PATCH * Added CONFIG_KVM_GUEST_PVH as suggested * Relocated the PVH common files to arch/x86/platform/pvh/{enlighten.c,head.S} * Realized I also needed to move the objtool override for those files * Updated a few code comments per reviewer feedback * Sent out a patch of the hvm_start_info struct changes against the Xen tree since that is the canonical copy of the header. Discussions on that thread have resulted in some (non-functional) updates to start_info.h (patch 6/7) and those changes are reflected here as well in order to keep the files in sync. The header file has since been ack'ed for the Xen tree by Jan Beulich. Changes from v3: * Implemented Juergen's suggestion for refactoring and moving the PVH code so that CONFIG_XEN is no longer required for booting KVM guests via the PVH entry point. Functionally, nothing has changed from V3 really, but the patches look completely different now because of all the code movement and refactoring. Some of these patches can be combined, but I've left them very small in some cases to make the refactoring and code movement easier to review. My approach for refactoring has been to create a PVH entry layer that still has understanding and knowledge about Xen vs non-Xen guest types so that it can make run time decisions to handle either case, as opposed to going all the way and re-writing it to be a completely hypervisor agnostic and architecturally pure layer that is separate from guest type details. The latter seemed a bit overkill in this situation. And I've handled the complexity of having to support Qemu/KVM boot of kernels compiled with or without CONFIG_XEN via a pair of xen specific __weak routines that can be overridden in kernels that support Xen guests. Importantly, the __weak routines are for xen specific code only (not generic "guest type" specific code) so there is no clashing between xen version of the strong routine and, say, a KVM version of the same routine. But I'm sure there are many ways to skin this cat, so I'm open to alternate suggestions if there is a compelling reason for not using __weak in this situation. Changes from v2: * All structures (including memory map table entries) are padded and aligned to an 8 byte boundary. * Removed the "packed" attributes and made changes to comments as suggested by Jan. Changes from v1: * Adopted Paolo's suggestion for defining a v2 PVH ABI that includes the e820 map instead of using the second module entry to pass the table. * Cleaned things up a bit to reduce the number of xen vs non-xen special cases. Maran Wilson (7): xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH xen/pvh: Move PVH entry code out of Xen specific tree xen/pvh: Create a new file for Xen specific PVH code xen/pvh: Move Xen specific PVH VM initialization out of common file xen/pvh: Move Xen code for getting mem map via hcall out of common file xen/pvh: Add memory map pointer to hvm_start_info struct KVM: x86: Allow Qemu/KVM to use PVH entry point MAINTAINERS | 1 + arch/x86/Kbuild | 2 + arch/x86/Kconfig| 14 +++ arch/x86/kernel/head_64.S | 2 +- arch/x86/platform/pvh/Makefile | 5 + arch/x86/platform/pvh/enlighten.c | 136 arch/x86/{xen/xen-pvh.S
[PATCH v7 1/7] xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH
In order to pave the way for hypervisors other than Xen to use the PVH entry point for VMs, we need to factor the PVH entry code into Xen specific and hypervisor agnostic components. The first step in doing that, is to create a new config option for PVH entry that can be enabled independently from CONFIG_XEN. Signed-off-by: Maran Wilson --- arch/x86/Kconfig | 6 ++ arch/x86/kernel/head_64.S | 2 +- arch/x86/xen/Kconfig | 3 ++- 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d234cca296db..8511d419e39f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -781,6 +781,12 @@ config KVM_GUEST underlying device model, the host provides the guest with timing infrastructure such as time of day, and system time +config PVH + bool "Support for running PVH guests" + ---help--- + This option enables the PVH entry point for guest virtual machines + as specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 48385c1074a5..d83f2b110b47 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -385,7 +385,7 @@ NEXT_PAGE(early_dynamic_pgts) .data -#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) +#if defined(CONFIG_XEN_PV) || defined(CONFIG_PVH) NEXT_PGD_PAGE(init_top_pgt) .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC .orginit_top_pgt + L4_PAGE_OFFSET*8, 0 diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index c1f98f32c45f..5fccee76f44d 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -74,6 +74,7 @@ config XEN_DEBUG_FS Enabling this option may incur a significant performance overhead. config XEN_PVH - bool "Support for running as a PVH guest" + bool "Support for running as a Xen PVH guest" depends on XEN && XEN_PVHVM && ACPI + select PVH def_bool n -- 2.16.1
[PATCH v7 2/7] xen/pvh: Move PVH entry code out of Xen specific tree
Once hypervisors other than Xen start using the PVH entry point for starting VMs, we would like the option of being able to compile PVH entry capable kernels without enabling CONFIG_XEN and all the code that comes along with that. To allow that, we are moving the PVH code out of Xen and into files sitting at a higher level in the tree. This patch is not introducing any code or functional changes, just moving files from one location to another. Signed-off-by: Maran Wilson Reviewed-by: Konrad Rzeszutek Wilk --- MAINTAINERS| 1 + arch/x86/Kbuild| 2 ++ arch/x86/platform/pvh/Makefile | 5 + arch/x86/{xen/enlighten_pvh.c => platform/pvh/enlighten.c} | 0 arch/x86/{xen/xen-pvh.S => platform/pvh/head.S}| 0 arch/x86/xen/Makefile | 3 --- 6 files changed, 8 insertions(+), 3 deletions(-) create mode 100644 arch/x86/platform/pvh/Makefile rename arch/x86/{xen/enlighten_pvh.c => platform/pvh/enlighten.c} (100%) rename arch/x86/{xen/xen-pvh.S => platform/pvh/head.S} (100%) diff --git a/MAINTAINERS b/MAINTAINERS index 7bb2e9595f14..0b816f588fe1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15385,6 +15385,7 @@ L: xen-de...@lists.xenproject.org (moderated for non-subscribers) T: git git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git S: Supported F: arch/x86/xen/ +F: arch/x86/platform/pvh/ F: drivers/*/xen-*front.c F: drivers/xen/ F: arch/x86/include/asm/xen/ diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index 0038a2d10a7a..2089e4414300 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,6 +7,8 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ +obj-$(CONFIG_XEN_PVH) += platform/pvh/ + # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/platform/pvh/Makefile b/arch/x86/platform/pvh/Makefile new file mode 100644 index ..9fd25efcd2a3 --- /dev/null +++ b/arch/x86/platform/pvh/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 +OBJECT_FILES_NON_STANDARD_head.o := y + +obj-$(CONFIG_XEN_PVH) += enlighten.o +obj-$(CONFIG_XEN_PVH) += head.o diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/platform/pvh/enlighten.c similarity index 100% rename from arch/x86/xen/enlighten_pvh.c rename to arch/x86/platform/pvh/enlighten.c diff --git a/arch/x86/xen/xen-pvh.S b/arch/x86/platform/pvh/head.S similarity index 100% rename from arch/x86/xen/xen-pvh.S rename to arch/x86/platform/pvh/head.S diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index d83cb5478f54..f1b850607212 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -1,6 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 OBJECT_FILES_NON_STANDARD_xen-asm_$(BITS).o := y -OBJECT_FILES_NON_STANDARD_xen-pvh.o := y ifdef CONFIG_FUNCTION_TRACER # Do not profile debug and lowlevel utilities @@ -21,7 +20,6 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o -obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o @@ -33,4 +31,3 @@ obj-$(CONFIG_XEN_DEBUG_FS)+= debugfs.o obj-$(CONFIG_XEN_DOM0) += vga.o obj-$(CONFIG_SWIOTLB_XEN) += pci-swiotlb-xen.o obj-$(CONFIG_XEN_EFI) += efi.o -obj-$(CONFIG_XEN_PVH) += xen-pvh.o -- 2.16.1
[PATCH v7 4/7] xen/pvh: Move Xen specific PVH VM initialization out of common file
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. This patch moves the small block of code used for initializing Xen PVH virtual machines into the Xen specific file. This initialization is not going to be needed for Qemu/KVM guests. Moving it out of the common file is going to allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Juergen Gross --- arch/x86/platform/pvh/enlighten.c | 28 arch/x86/xen/enlighten_pvh.c | 20 +++- 2 files changed, 39 insertions(+), 9 deletions(-) diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index 74ff1c3d2789..edcff7de0529 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -80,26 +80,38 @@ static void __init init_pvh_bootparams(void) x86_init.acpi.get_root_pointer = pvh_get_root_pointer; } +/* + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide the required override for this routine. + */ +void __init __weak xen_pvh_init(void) +{ + xen_raw_printk("Error: Missing xen PVH initialization\n"); + BUG(); +} + +/* + * When we add support for other hypervisors like Qemu/KVM, this routine can + * selectively invoke the appropriate initialization based on guest type. + */ +static void hypervisor_specific_init(void) +{ + xen_pvh_init(); +} + /* * This routine (and those that it might call) should not use * anything that lives in .bss since that segment will be cleared later. */ void __init xen_prepare_pvh(void) { - u32 msr; - u64 pfn; - if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", pvh_start_info.magic); BUG(); } - xen_pvh = 1; - - msr = cpuid_ebx(xen_cpuid_base() + 2); - pfn = __pa(hypercall_page); - wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); + hypervisor_specific_init(); init_pvh_bootparams(); } diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 313fe499065e..bb5784f354b8 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,4 +1,10 @@ -#include +#include + +#include +#include + +#include +#include /* * PVH variables. @@ -7,3 +13,15 @@ * after startup_{32|64} is invoked, which will clear the .bss segment. */ bool xen_pvh __attribute__((section(".data"))) = 0; + +void __init xen_pvh_init(void) +{ + u32 msr; + u64 pfn; + + xen_pvh = 1; + + msr = cpuid_ebx(xen_cpuid_base() + 2); + pfn = __pa(hypercall_page); + wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); +} -- 2.16.1
[PATCH v7 3/7] xen/pvh: Create a new file for Xen specific PVH code
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The first step in that direction is to create a new file that will eventually hold the Xen specific routines. Signed-off-by: Maran Wilson --- arch/x86/platform/pvh/enlighten.c | 5 ++--- arch/x86/xen/Makefile | 1 + arch/x86/xen/enlighten_pvh.c | 9 + 3 files changed, 12 insertions(+), 3 deletions(-) create mode 100644 arch/x86/xen/enlighten_pvh.c diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index aa1c6a6831a9..74ff1c3d2789 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -17,10 +17,9 @@ /* * PVH variables. * - * xen_pvh pvh_bootparams and pvh_start_info need to live in data segment - * since they are used after startup_{32|64}, which clear .bss, are invoked. + * pvh_bootparams and pvh_start_info need to live in the data segment since + * they are used after startup_{32|64}, which clear .bss, are invoked. */ -bool xen_pvh __attribute__((section(".data"))) = 0; struct boot_params pvh_bootparams __attribute__((section(".data"))); struct hvm_start_info pvh_start_info __attribute__((section(".data"))); diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index f1b850607212..ae5c6f1f0fe0 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ -20,6 +20,7 @@ obj-y := enlighten.o multicalls.o mmu.o irq.o \ obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o suspend_hvm.o obj-$(CONFIG_XEN_PV) += setup.o apic.o pmu.o suspend_pv.o \ p2m.o enlighten_pv.o mmu_pv.o +obj-$(CONFIG_XEN_PVH) += enlighten_pvh.o obj-$(CONFIG_EVENT_TRACING) += trace.o diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c new file mode 100644 index ..313fe499065e --- /dev/null +++ b/arch/x86/xen/enlighten_pvh.c @@ -0,0 +1,9 @@ +#include + +/* + * PVH variables. + * + * The variable xen_pvh needs to live in the data segment since it is used + * after startup_{32|64} is invoked, which will clear the .bss segment. + */ +bool xen_pvh __attribute__((section(".data"))) = 0; -- 2.16.1
[PATCH v7 6/7] xen/pvh: Add memory map pointer to hvm_start_info struct
The start info structure that is defined as part of the x86/HVM direct boot ABI and used for starting Xen PVH guests would be more versatile if it also included a way to pass information about the memory map to the guest. This would allow KVM guests to share the same entry point. Signed-off-by: Maran Wilson --- include/xen/interface/hvm/start_info.h | 63 +- 1 file changed, 62 insertions(+), 1 deletion(-) diff --git a/include/xen/interface/hvm/start_info.h b/include/xen/interface/hvm/start_info.h index 648415976ead..50af9ea2ff1e 100644 --- a/include/xen/interface/hvm/start_info.h +++ b/include/xen/interface/hvm/start_info.h @@ -33,7 +33,7 @@ *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE *|| ("xEn3" with the 0x80 bit of the "E" set). * 4 ++ - *| version| Version of this structure. Current version is 0. New + *| version| Version of this structure. Current version is 1. New *|| versions are guaranteed to be backwards-compatible. * 8 ++ *| flags | SIF_xxx flags. @@ -48,6 +48,15 @@ * 32 ++ *| rsdp_paddr | Physical address of the RSDP ACPI data structure. * 40 ++ + *| memmap_paddr | Physical address of the (optional) memory map. Only + *|| present in version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Zero + *|| if there is no memory map being provided. Only + *|| present in version 1 and newer of the structure. + * 52 ++ + *| reserved | Version 1 and newer only. + * 56 ++ * * The layout of each entry in the module structure is the following: * @@ -62,13 +71,51 @@ *| reserved | * 32 ++ * + * The layout of each entry in the memory map table is as follows: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping in bytes + * 16 ++ + *| type | Type of mapping as defined between the hypervisor + *|| and guest. See XEN_HVM_MEMMAP_TYPE_* values below. + * 20 +| + *| reserved | + * 24 ++ + * * The address and sizes are always a 64bit little endian unsigned integer. * * NB: Xen on x86 will always try to place all the data below the 4GiB * boundary. + * + * Version numbers of the hvm_start_info structure have evolved like this: + * + * Version 0: Initial implementation. + * + * Version 1: Added the memmap_paddr/memmap_entries fields (plus 4 bytes of + * padding) to the end of the hvm_start_info struct. These new + * fields can be used to pass a memory map to the guest. The + * memory map is optional and so guests that understand version 1 + * of the structure must check that memmap_entries is non-zero + * before trying to read the memory map. */ #define XEN_HVM_START_MAGIC_VALUE 0x336ec578 +/* + * The values used in the type field of the memory map table entries are + * defined below and match the Address Range Types as defined in the "System + * Address Map Interfaces" section of the ACPI Specification. Please refer to + * section 15 in version 6.2 of the ACPI spec: http://uefi.org/specifications + */ +#define XEN_HVM_MEMMAP_TYPE_RAM 1 +#define XEN_HVM_MEMMAP_TYPE_RESERVED 2 +#define XEN_HVM_MEMMAP_TYPE_ACPI 3 +#define XEN_HVM_MEMMAP_TYPE_NVS 4 +#define XEN_HVM_MEMMAP_TYPE_UNUSABLE 5 +#define XEN_HVM_MEMMAP_TYPE_DISABLED 6 +#define XEN_HVM_MEMMAP_TYPE_PMEM 7 + /* * C representation of the x86/HVM start info layout. * @@ -86,6 +133,13 @@ struct hvm_start_info { uint64_t cmdline_paddr; /* Physical address of the command line. */ uint64_t rsdp_paddr;/* Physical address of the RSDP ACPI data*/ /* structure.*/ +/* All following fields only present in version 1 and newer */ +uint64_t memmap_paddr; /* Physical address of an array of */ +/* hvm_memmap_table_entry. */ +uint32_t memmap_entries;/* Number of entries in the memmap table.*/ +/* Value will be zero if there is no memory */ +/* map being provided. */ +uint32_t reserved; /* Must be zero. */ }; struct hvm_modlist_entry { @@ -95,4 +149,11 @@ struct hvm_modlist_entry { uint64_t reserved; }; +struct hvm_memmap_table_entry { +uint64_t addr; /* Base address of the memory region
[PATCH v7 7/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch enables Qemu to use that same entry point for booting KVM guests. Signed-off-by: Maran Wilson Suggested-by: Konrad Rzeszutek Wilk Suggested-by: Boris Ostrovsky Tested-by: Boris Ostrovsky --- arch/x86/Kbuild | 2 +- arch/x86/Kconfig | 8 arch/x86/platform/pvh/Makefile| 4 ++-- arch/x86/platform/pvh/enlighten.c | 42 +-- 4 files changed, 42 insertions(+), 14 deletions(-) diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild index 2089e4414300..c625f57472f7 100644 --- a/arch/x86/Kbuild +++ b/arch/x86/Kbuild @@ -7,7 +7,7 @@ obj-$(CONFIG_KVM) += kvm/ # Xen paravirtualization support obj-$(CONFIG_XEN) += xen/ -obj-$(CONFIG_XEN_PVH) += platform/pvh/ +obj-$(CONFIG_PVH) += platform/pvh/ # Hyper-V paravirtualization support obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/ diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8511d419e39f..26fef538d3ef 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -787,6 +787,14 @@ config PVH This option enables the PVH entry point for guest virtual machines as specified in the x86/HVM direct boot ABI. +config KVM_GUEST_PVH + bool "Support for running as a KVM PVH guest" + depends on KVM_GUEST + select PVH + ---help--- + This option enables starting KVM guests via the PVH entry point as + specified in the x86/HVM direct boot ABI. + config KVM_DEBUG_FS bool "Enable debug information for KVM Guests in debugfs" depends on KVM_GUEST && DEBUG_FS diff --git a/arch/x86/platform/pvh/Makefile b/arch/x86/platform/pvh/Makefile index 9fd25efcd2a3..5dec5067c9fb 100644 --- a/arch/x86/platform/pvh/Makefile +++ b/arch/x86/platform/pvh/Makefile @@ -1,5 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 OBJECT_FILES_NON_STANDARD_head.o := y -obj-$(CONFIG_XEN_PVH) += enlighten.o -obj-$(CONFIG_XEN_PVH) += head.o +obj-$(CONFIG_PVH) += enlighten.o +obj-$(CONFIG_PVH) += head.o diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index c42a9f36ee9c..0c7f570d3c16 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -8,6 +8,8 @@ #include #include +#include + #include /* @@ -39,11 +41,28 @@ void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) BUG(); } -static void __init init_pvh_bootparams(void) +static void __init init_pvh_bootparams(bool xen_guest) { memset(_bootparams, 0, sizeof(pvh_bootparams)); - mem_map_via_hcall(_bootparams); + if ((pvh_start_info.version > 0) && (pvh_start_info.memmap_entries)) { + struct hvm_memmap_table_entry *ep; + int i; + + ep = __va(pvh_start_info.memmap_paddr); + pvh_bootparams.e820_entries = pvh_start_info.memmap_entries; + + for (i = 0; i < pvh_bootparams.e820_entries ; i++, ep++) { + pvh_bootparams.e820_table[i].addr = ep->addr; + pvh_bootparams.e820_table[i].size = ep->size; + pvh_bootparams.e820_table[i].type = ep->type; + } + } else if (xen_guest) { + mem_map_via_hcall(_bootparams); + } else { + /* Non-xen guests are not supported by version 0 */ + BUG(); + } if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = @@ -74,7 +93,7 @@ static void __init init_pvh_bootparams(void) * environment (i.e. hardware_subarch 0). */ pvh_bootparams.hdr.version = 0x212; - pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ + pvh_bootparams.hdr.type_of_loader = ((xen_guest ? 0x9 : 0xb) << 4) | 0; x86_init.acpi.get_root_pointer = pvh_get_root_pointer; } @@ -89,13 +108,10 @@ void __init __weak xen_pvh_init(void) BUG(); } -/* - * When we add support for other hypervisors like Qemu/KVM, this routine can - * selectively invoke the appropriate initialization based on guest type. - */ -static void hypervisor_specific_init(void) +static void hypervisor_specific_init(bool xen_guest) { - xen_pvh_init(); + if (xen_guest) + xen_pvh_init(); } /* @@ -104,13 +120,17 @@ static void hypervisor_specific_init(void) */ void __init xen_prepare_pvh(void) { +
[PATCH v7 5/7] xen/pvh: Move Xen code for getting mem map via hcall out of common file
We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The original design for PVH entry in Xen guests relies on being able to obtain the memory map from the hypervisor using a hypercall. When we extend the PVH entry ABI to support other hypervisors like Qemu/KVM, a new mechanism will be added that allows the guest to get the memory map without needing to use hypercalls. For Xen guests, the hypercall approach will still be supported. In preparation for adding support for other hypervisors, we can move the code that uses hypercalls into the Xen specific file. This will allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson Reviewed-by: Juergen Gross --- arch/x86/platform/pvh/enlighten.c | 29 ++--- arch/x86/xen/enlighten_pvh.c | 20 2 files changed, 34 insertions(+), 15 deletions(-) diff --git a/arch/x86/platform/pvh/enlighten.c b/arch/x86/platform/pvh/enlighten.c index edcff7de0529..c42a9f36ee9c 100644 --- a/arch/x86/platform/pvh/enlighten.c +++ b/arch/x86/platform/pvh/enlighten.c @@ -8,10 +8,6 @@ #include #include -#include -#include - -#include #include /* @@ -30,21 +26,24 @@ static u64 pvh_get_root_pointer(void) return pvh_start_info.rsdp_paddr; } +/* + * Xen guests are able to obtain the memory map from the hypervisor via the + * HYPERVISOR_memory_op hypercall. + * If we are trying to boot a Xen PVH guest, it is expected that the kernel + * will have been configured to provide an override for this routine to do + * just that. + */ +void __init __weak mem_map_via_hcall(struct boot_params *ptr __maybe_unused) +{ + xen_raw_printk("Error: Could not find memory map\n"); + BUG(); +} + static void __init init_pvh_bootparams(void) { - struct xen_memory_map memmap; - int rc; - memset(_bootparams, 0, sizeof(pvh_bootparams)); - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); - rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); - if (rc) { - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); - BUG(); - } - pvh_bootparams.e820_entries = memmap.nr_entries; + mem_map_via_hcall(_bootparams); if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index bb5784f354b8..0141dd1d21e2 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -1,11 +1,16 @@ #include +#include + #include #include +#include #include #include +#include + /* * PVH variables. * @@ -25,3 +30,18 @@ void __init xen_pvh_init(void) pfn = __pa(hypercall_page); wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); } + +void __init mem_map_via_hcall(struct boot_params *boot_params_p) +{ + struct xen_memory_map memmap; + int rc; + + memmap.nr_entries = ARRAY_SIZE(boot_params_p->e820_table); + set_xen_guest_handle(memmap.buffer, boot_params_p->e820_table); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); + if (rc) { + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + BUG(); + } + boot_params_p->e820_entries = memmap.nr_entries; +} -- 2.16.1
[RFC PATCH v2 0/2] KVM: x86: Allow Qemu/KVM to use PVH entry point
Changes from v1: * Adopted Paolo's suggestion for defining a v2 PVH ABI that includes the e820 map instead of using the second module entry to pass the table. * Cleaned things up a bit to reduce the number of xen vs non-xen special cases. Juergen also had a suggestion to split the different hypervisor types early and use a common set of service functions instead of special casing xen_guest everywhere. There are certainly less special cases in this version of the patch, but if we still think it's important to split things up between common, Xen, and KVM components, then I would appreciate a suggestion on how best that can be done. Are we talking about just re-factoring functions in the existing file? Or do we need to go all the way and pull all the PVH entry code out of xen directories and find a home for it somewhere else so that we can use kernels built without CONFIG_XEN to start KVM guests via the PVH entry point. If the latter, any suggestions for which common files or directories I can move this stuff to? Maran Wilson (2): xen/pvh: Add memory map pointer to hvm_start_info struct KVM: x86: Allow Qemu/KVM to use PVH entry point arch/x86/xen/enlighten_pvh.c | 48 +--- include/xen/interface/hvm/start_info.h | 34 +++--- 2 files changed, 64 insertions(+), 18 deletions(-)
[RFC PATCH v2 1/2] xen/pvh: Add memory map pointer to hvm_start_info struct
The start info structure that is defined as part of the x86/HVM direct boot ABI and used for starting Xen PVH guests would be more versatile if it also included a way to efficiently pass information about the memory map to the guest. That way Xen PVH guests would not be forced to use a hypercall to get the information and would make it easier for KVM guests to share the PVH entry point. --- include/xen/interface/hvm/start_info.h | 34 +++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/include/xen/interface/hvm/start_info.h b/include/xen/interface/hvm/start_info.h index 6484159..60206bb 100644 --- a/include/xen/interface/hvm/start_info.h +++ b/include/xen/interface/hvm/start_info.h @@ -33,7 +33,7 @@ *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE *|| ("xEn3" with the 0x80 bit of the "E" set). * 4 ++ - *| version| Version of this structure. Current version is 0. New + *| version| Version of this structure. Current version is 1. New *|| versions are guaranteed to be backwards-compatible. * 8 ++ *| flags | SIF_xxx flags. @@ -48,6 +48,12 @@ * 32 ++ *| rsdp_paddr | Physical address of the RSDP ACPI data structure. * 40 ++ + *| memmap_paddr | Physical address of the memory map. Only present in + *|| version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Only + *|| present in version 1 and newer of the structure. + * 52 ++ * * The layout of each entry in the module structure is the following: * @@ -62,6 +68,17 @@ *| reserved | * 32 ++ * + * The layout of each entry in the memory map table is as follows and no + * padding is used between entries in the array: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping + * 16 ++ + *| type | E820_TYPE_xxx + * 20 +| + * * The address and sizes are always a 64bit little endian unsigned integer. * * NB: Xen on x86 will always try to place all the data below the 4GiB @@ -86,13 +103,24 @@ struct hvm_start_info { uint64_t cmdline_paddr; /* Physical address of the command line. */ uint64_t rsdp_paddr;/* Physical address of the RSDP ACPI data*/ /* structure.*/ -}; +uint64_t memmap_paddr; /* Physical address of an array of */ + /* hvm_memmap_table_entry. Only present in */ + /* Ver 1 or later. For e820 mem map table. */ +uint32_t memmap_entries; /* Only present in Ver 1 or later. Number of */ + /* entries in the memmap table. */ +} __attribute__((packed)); struct hvm_modlist_entry { uint64_t paddr; /* Physical address of the module. */ uint64_t size; /* Size of the module in bytes. */ uint64_t cmdline_paddr; /* Physical address of the command line. */ uint64_t reserved; -}; +} __attribute__((packed)); + +struct hvm_memmap_table_entry { +uint64_t addr; /* Base address of the memory region */ +uint64_t size; /* Size of the memory region */ +uint32_t type; /* E820_TYPE_xxx of the memory region*/ +} __attribute__((packed)); #endif /* __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ */ -- 1.8.3.1
[RFC PATCH v2 2/2] KVM: x86: Allow Qemu/KVM to use PVH entry point
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch enables Qemu to use that same entry point for booting KVM guests. --- arch/x86/xen/enlighten_pvh.c | 48 ++-- 1 file changed, 33 insertions(+), 15 deletions(-) diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 98ab176..f11fbfc 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -31,21 +31,35 @@ static void xen_pvh_arch_setup(void) acpi_irq_model = ACPI_IRQ_MODEL_PLATFORM; } -static void __init init_pvh_bootparams(void) +static void __init init_pvh_bootparams(bool xen_guest) { struct xen_memory_map memmap; int rc; memset(_bootparams, 0, sizeof(pvh_bootparams)); - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); - rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); - if (rc) { - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + if ((pvh_start_info.version > 0) && (pvh_start_info.memmap_entries)) { + struct boot_e820_entry *ep; + int idx; + + ep = __va(pvh_start_info.memmap_paddr); + pvh_bootparams.e820_entries = pvh_start_info.memmap_entries; + + for (idx = 0; idx < pvh_bootparams.e820_entries ; idx++, ep++) + pvh_bootparams.e820_table[idx] = *ep; + } else if (xen_guest) { + memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); + set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); + if (rc) { + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + BUG(); + } + pvh_bootparams.e820_entries = memmap.nr_entries; + } else { + xen_raw_printk("Error: Could not find memory map\n"); BUG(); } - pvh_bootparams.e820_entries = memmap.nr_entries; if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = @@ -76,7 +90,7 @@ static void __init init_pvh_bootparams(void) * environment (i.e. hardware_subarch 0). */ pvh_bootparams.hdr.version = 0x212; - pvh_bootparams.hdr.type_of_loader = (9 << 4) | 0; /* Xen loader */ + pvh_bootparams.hdr.type_of_loader = ((xen_guest ? 0x9 : 0xb) << 4) | 0; } /* @@ -85,8 +99,10 @@ static void __init init_pvh_bootparams(void) */ void __init xen_prepare_pvh(void) { - u32 msr; + + u32 msr = xen_cpuid_base(); u64 pfn; + bool xen_guest = !!msr; if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", @@ -94,13 +110,15 @@ void __init xen_prepare_pvh(void) BUG(); } - xen_pvh = 1; + if (xen_guest) { + xen_pvh = 1; - msr = cpuid_ebx(xen_cpuid_base() + 2); - pfn = __pa(hypercall_page); - wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); + msr = cpuid_ebx(msr + 2); + pfn = __pa(hypercall_page); + wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); - init_pvh_bootparams(); + x86_init.oem.arch_setup = xen_pvh_arch_setup; + } - x86_init.oem.arch_setup = xen_pvh_arch_setup; + init_pvh_bootparams(xen_guest); } -- 1.8.3.1
Re: [RFC PATCH] KVM: x86: Allow Qemu/KVM to use PVH entry point
Just FYI: I sent out a v2 of this patch but in doing so I moved a few people from the "to" line to the "cc" line. For anyone who previously did not comment but still wanted to follow the discussion, here's the link to the v2 email: https://lkml.org/lkml/2017/12/7/1624 Thanks, -Maran On 12/1/2017 12:08 AM, Paolo Bonzini wrote: On 30/11/2017 19:23, Maran Wilson wrote: Are you saying the Linux PVH entry code (such as init_pvh_bootparams()) should use the fw_cfg interface to read the e820 memory map data and put it into the zeropage? Basically, keeping the patch very much like it already is, just extracting the e820 data via the fw_cfg interface instead of from the second module of start_info struct? Yes. If that is the case, I guess I'm a bit hesitant to throw the QEMU specific fw_cfg interface into the mix on the Linux PVH side when the existing PVH ABI already seems to contain an interface for passing modules/blobs to the guest. But if you feel there is a compelling reason to use the fw_cfg interface here, I'm happy to explore that approach further. I think the same holds true for Xen, but it is still using a hypercall to get the memory map. In the end, using fw_cfg seems closest to what the Xen code does. There are other possibilities: 1) defining a v2 PVH ABI that includes the e820 map would also be a possibility. 2) modify enlighten_pvh.c to get the start info in multiboot format, something like: diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 98ab17673454..656e41449db0 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -88,19 +88,22 @@ void __init xen_prepare_pvh(void) u32 msr; u64 pfn; - if (pvh_start_info.magic != XEN_HVM_START_MAGIC_VALUE) { + if (pvh_start_info.magic == XEN_HVM_START_MAGIC_VALUE) { + xen_pvh = 1; + + init_pvh_bootparams_xen(); + + msr = cpuid_ebx(xen_cpuid_base() + 2); + pfn = __pa(hypercall_page); + wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); + + x86_init.oem.arch_setup = xen_pvh_arch_setup; + } else if (pvh_start_info.magic == MULTIBOOT_INFO_MAGIC_VALUE) { + init_pvh_bootparams_multiboot(); + + } else { xen_raw_printk("Error: Unexpected magic value (0x%08x)\n", pvh_start_info.magic); BUG(); } - - xen_pvh = 1; - - msr = cpuid_ebx(xen_cpuid_base() + 2); - pfn = __pa(hypercall_page); - wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32)); - - init_pvh_bootparams(); - - x86_init.oem.arch_setup = xen_pvh_arch_setup; } Note that this would *not* be a multiboot-format kernel, as it would still have the Xen PVH ELF note. It would just reuse the format of the start info struct. However, I think it is simpler to just use the e820 memory map from fw_cfg. Paolo
Re: [RFC PATCH] KVM: x86: Allow Qemu/KVM to use PVH entry point
On 11/29/2017 12:59 AM, Paolo Bonzini wrote: On 28/11/2017 20:34, Maran Wilson wrote: For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/hvmlite.html This PoC patch enables Qemu to use that same entry point for booting KVM guests. Nice! So QEMU would parse the ELF file just like for multiboot, find the ELF note, and then prepare an hvmlite boot info struct instead of the multiboot one? Yes, exactly. There would then be a new option ROM, very similar to multiboot.S. That is one option. I guess this gets into a discussion about the QEMU side of the upcoming patches that would follow ... I'm currently just initializing the CPU state in QEMU for testing since there is such minimal (non Linux specific) setup that is required by the ABI. And (borrowing from the Intel clear container patches) that VM setup is only performed when user selects the "nofw" option with the q35 model. But yeah, if folks think it important to move all such machine state initialization out of QEMU and into an option ROM, I can look into coding it up that way for the QEMU patches. Thanks, -Maran Thanks, Paolo
Re: [RFC PATCH] KVM: x86: Allow Qemu/KVM to use PVH entry point
On 11/29/2017 12:21 AM, Juergen Gross wrote: On 28/11/17 20:34, Maran Wilson wrote: For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/hvmlite.html This PoC patch enables Qemu to use that same entry point for booting KVM guests. Even though the code is still PoC quality, I'm sending this as an RFC now since there are a number of different ways the specific implementation details can be handled. I chose a shared code path for Xen and KVM guests but could just as easily create a separate code path that is advertised by a different ELF note for KVM. There also seems to be some flexibility in how the e820 table data is passed and how (or if) it should be identified as e820 data. As a starting point, I've chosen the options that seem to result in the smallest patch with minimal to no changes required of the x86/HVM direct boot ABI. I like the idea. I'd rather split up the different hypervisor types early and use a common set of service functions instead of special casing xen_guest everywhere. This would make it much easier to support the KVM PVH boot without the need to configure the kernel with CONFIG_XEN. Thanks for the feedback. I'll try doing something like that as this patch moves from proof of concept to a real proposal. Another option would be to use the same boot path as with grub: set the boot params in zeropage and start at startup_32. I think others have already responded about that. The main thing I was trying to avoid, was adding any Linux OS specific initialization (like zeropage) to QEMU. Especially since this PVH entry point already exists in Linux. Thanks, -Maran Juergen --- arch/x86/xen/enlighten_pvh.c | 74 1 file changed, 55 insertions(+), 19 deletions(-) diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c index 98ab176..d93f711 100644 --- a/arch/x86/xen/enlighten_pvh.c +++ b/arch/x86/xen/enlighten_pvh.c @@ -31,21 +31,46 @@ static void xen_pvh_arch_setup(void) acpi_irq_model = ACPI_IRQ_MODEL_PLATFORM; } -static void __init init_pvh_bootparams(void) +static void __init init_pvh_bootparams(bool xen_guest) { struct xen_memory_map memmap; int rc; memset(_bootparams, 0, sizeof(pvh_bootparams)); - memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); - set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); - rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); - if (rc) { - xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); - BUG(); + if (xen_guest) { + memmap.nr_entries = ARRAY_SIZE(pvh_bootparams.e820_table); + set_xen_guest_handle(memmap.buffer, pvh_bootparams.e820_table); + rc = HYPERVISOR_memory_op(XENMEM_memory_map, ); + if (rc) { + xen_raw_printk("XENMEM_memory_map failed (%d)\n", rc); + BUG(); + } + pvh_bootparams.e820_entries = memmap.nr_entries; + } else if (pvh_start_info.nr_modules > 1) { + /* The second module should be the e820 data for KVM guests */ + struct hvm_modlist_entry *modaddr; + char e820_sig[] = "e820 data"; + struct boot_e820_entry *ep; + struct e820_table *tp; + char *cmdline_str; + int idx; + + modaddr = __va(pvh_start_info.modlist_paddr + + sizeof(struct hvm_modlist_entry)); + cmdline_str = __va(modaddr->cmdline_paddr); + + if ((modaddr->cmdline_paddr) && + (!strncmp(e820_sig, cmdline_str, sizeof(e820_sig { + tp = __va(modaddr->paddr); + ep = (struct boot_e820_entry *)tp->entries; + + pvh_bootparams.e820_entries = tp->nr_entries; + + for (idx = 0; idx < tp->nr_entries ; idx++, ep++) + pvh_bootparams.e820_table[idx] = *ep; + } } - pvh_bootparams.e820_entries = memmap.nr_entries; if (pvh_bootparams.e820_entries < E820_MAX_ENTRIES_ZEROPAGE - 1) { pvh_bootparams.e820_table[pvh_bootparams.e820_entries].addr = @@ -55,8 +80,9 @@ static void __init init_pvh_bootparams(void) pvh_bootparams.e820_table[pvh_bootparams.e820_entries].type = E820_TYPE_RESERVED; pvh_bootp
Re: [RFC PATCH] KVM: x86: Allow Qemu/KVM to use PVH entry point
On 11/29/2017 6:44 AM, Paolo Bonzini wrote: I actually like this patch, except that I'd get the e820 memory map from fw_cfg (see the first part of https://github.com/bonzini/qboot/blob/master/fw_cfg.c, and extract_e820 inhttps://github.com/bonzini/qboot/blob/master/main.c) instead of the second module. Hi Paolo, I want to make sure I understand exactly what you are suggesting... Are you saying the Linux PVH entry code (such as init_pvh_bootparams()) should use the fw_cfg interface to read the e820 memory map data and put it into the zeropage? Basically, keeping the patch very much like it already is, just extracting the e820 data via the fw_cfg interface instead of from the second module of start_info struct? If that is the case, I guess I'm a bit hesitant to throw the QEMU specific fw_cfg interface into the mix on the Linux PVH side when the existing PVH ABI already seems to contain an interface for passing modules/blobs to the guest. But if you feel there is a compelling reason to use the fw_cfg interface here, I'm happy to explore that approach further. Thanks, -Maran
Re: [RFC PATCH v2 1/2] xen/pvh: Add memory map pointer to hvm_start_info struct
Thanks for taking a look Jan. More below... On 12/8/2017 12:49 AM, Jan Beulich wrote: On 07.12.17 at 23:45, wrote: The start info structure that is defined as part of the x86/HVM direct boot ABI and used for starting Xen PVH guests would be more versatile if it also included a way to efficiently pass information about the memory map to the guest. That way Xen PVH guests would not be forced to use a hypercall to get the information and would make it easier for KVM guests to share the PVH entry point. --- include/xen/interface/hvm/start_info.h | 34 +++--- 1 file changed, 31 insertions(+), 3 deletions(-) First of all such a change should be submitted against the canonical copy of the header, which lives in the Xen tree. Understood. Will do that when this converts from RFC to actual patch. The argument of avoiding a hypercall doesn't really count imo - this isn't in any way performance critical code. The argument of making re-use easier is fine, though. Okay, I will reword the commit message. --- a/include/xen/interface/hvm/start_info.h +++ b/include/xen/interface/hvm/start_info.h @@ -33,7 +33,7 @@ *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE *|| ("xEn3" with the 0x80 bit of the "E" set). * 4 ++ - *| version| Version of this structure. Current version is 0. New + *| version| Version of this structure. Current version is 1. New *|| versions are guaranteed to be backwards-compatible. * 8 ++ *| flags | SIF_xxx flags. @@ -48,6 +48,12 @@ * 32 ++ *| rsdp_paddr | Physical address of the RSDP ACPI data structure. * 40 ++ + *| memmap_paddr | Physical address of the memory map. Only present in + *|| version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Only + *|| present in version 1 and newer of the structure. + * 52 ++ Please let's make this optional even in v1 (and later), i.e. spell out that it may be zero. That way Xen code could continue to use the hypercall approach even. Yes, my intention was to make this optional. I will spell it out. Also please spell out a 4-byte reserved entry at the end, to make the specified structure a multiple of 8 in size again regardless of bitness of the producer/consumer. Sure, I can add that. @@ -62,6 +68,17 @@ *| reserved | * 32 ++ * + * The layout of each entry in the memory map table is as follows and no + * padding is used between entries in the array: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping + * 16 ++ + *| type | E820_TYPE_xxx + * 20 +| I'm not convinced of re-using E820 types here. I can see that this might ease the consumption in Linux, but I don't think there should be any connection to x86 aspects here - the data being supplied is x86-agnostic, and Linux'es placement of the header is also making no connection to x86 (oddly enough, the current placement in the Xen tree does, for a reason which escapes me). I could also imagine reasons to add new types without them being sanctioned by whoever maintains E820 type assignments. So there are three aspects to discuss here. 1) The addition of the "E820_TYPE_xxx" comment. I am fine with just changing that to "mapping type" and leaving it as something to be coordinated between the hypervisor and the guest OS being started by that hypervisor. 2) x86 vs x86-agnostic. While I'm trying to keep this interface generic in terms of guest OS (like Linux, FreeBSD, possible other guests in the future) and hypervisor type (Xen, QEMU/KVM, etc), I was actually under the impression that we are dealing with an ABI that is very much x86 specific. The canonical document describing the ABI (https://xenbits.xen.org/docs/unstable/misc/pvh.html) is titled "x86/HVM direct boot ABI" and goes on to describe an interface in very x86-specific terms. i.e. The ebx register must contain a pointer, cs, ds, es must be set a certain way, etc. That is probably why Xen's placement of the header file is in a x86 section of the tree. And also why there already exist a number of "x86" references in the existing header file. A quick grep of the existing header file will show lines like: "C representation of the x86/HVM start info layout" "Start of day structure passed to PVH guests and to HVM guests in %ebx" "Xen on x86 will always try to place all the data below the 4GiB" If at some point in the future someone decides to implement a similar ABI for a different CPU architecture while re-using this same hvm_start_info struct, then this header will have to be