Re: Why does memblock only refer to E820 table and not EFI Memory Map?
> > On x86 platforms, there are two sources through which kernel learns about > > physical memory in the system namely E820 table and EFI Memory Map. Each > > table > > describes which regions of system memory is usable by kernel and which > > regions > > should be preserved (i.e. reserved regions that typically have BIOS > > code/data) > > so that no other component in the system could read/write to these > > regions. I > > think they are duplicating the information and hence I have couple of > > questions regarding these > > But isn't it true that in x86 systems the E820 table is populated from the > EFI memory map? I don't know that it happens.. :( > At least in systems with EFI firmware and a Linux which understands > EFI. If booting from the EFI stub, the stub will take the EFI memory map and > assemble the E820 table passed as part of the boot params [4]. It also > considers the case when there are more than 128 entries in the table [5]. > Thus, if booting as an EFI application it will definitely use the EFI memory > map. If Linux' EFI entry point is not used the bootloader should to the > same. For instance, grub also reads the EFI memory map to assemble the E820 > memory map [6], [7], [8]. Thanks a lot! for the pointers Ricardo :) I haven't looked at EFI stub and Grub code and hence didn't knew this was happening. It does make me feel better that EFI Memory Map is indeed being used to generate e820 in EFI stub case, so at-least it's getting consumed indirectly. > > 1. I see that only E820 table is being consumed by kernel [1] (i.e. > > memblock > > subsystem in kernel) to distinguish between "usable" vs "reserved" > > regions. > > Assume someone has called memblock_alloc(), the memblock subsystem would > > service the caller by allocating memory from "usable" regions and it knows > > this *only* from E820 table [2] (it does not check if EFI Memory Map also > > says > > that this region is usable as well). So, why isn't the kernel taking EFI > > Memory Map into consideration? (I see that it does happen only when > > "add_efi_memmap" kernel command line arg is passed i.e. passing this > > argument > > updates E820 table based on EFI Memory Map) [3]. The problem I see with > > memblock not taking EFI Memory Map into consideration is that, we are > > ignoring > > the main purpose for which EFI Memory Map exists. > > > > 2. Why doesn't the kernel have "add_efi_memmap" by default? From the > > commit > > "21eb140e: x86 boot: only pick up additional EFI memmap if > > add_efi_memmap > > flag", I didn't understand why the decision was made so. Shouldn't we give > > more preference to EFI Memory map rather than E820 table as it's the > > latest > > and E820 is legacy? > > I did a a quick experiment with and without add_efi_memmmap. the e820 > table looked exactly the same. I guess this shows that what I wrote > above makes sense ;) . Have you observed difference? When I did a quick test, I didn't notice any difference (with and without add_efi_memap) because both e820 and EFI Memory Map were reporting regions in sync. So, "add_efi_memmap" didn't have to add any new regions into e820. Hence my last question, what if both the tables (EFI Memory Map and e820) are out of sync? Shouldn't happen in Grub and EFI stub because they generate e820 from EFI Memory Map, as pointed by you. Regards, Sai
Why does memblock only refer to E820 table and not EFI Memory Map?
Hi All, Disclaimer: 1. Please note that this discussion is x86 specific 2. Below stated things are my understanding about kernel and I could have missed somethings, so please let me know if I understood something wrong. 3. I have focused only on memblock here because if I understand correctly, memblock is the base that feeds other memory management subsystems in kernel (like the buddy allocator). On x86 platforms, there are two sources through which kernel learns about physical memory in the system namely E820 table and EFI Memory Map. Each table describes which regions of system memory is usable by kernel and which regions should be preserved (i.e. reserved regions that typically have BIOS code/data) so that no other component in the system could read/write to these regions. I think they are duplicating the information and hence I have couple of questions regarding these 1. I see that only E820 table is being consumed by kernel [1] (i.e. memblock subsystem in kernel) to distinguish between "usable" vs "reserved" regions. Assume someone has called memblock_alloc(), the memblock subsystem would service the caller by allocating memory from "usable" regions and it knows this *only* from E820 table [2] (it does not check if EFI Memory Map also says that this region is usable as well). So, why isn't the kernel taking EFI Memory Map into consideration? (I see that it does happen only when "add_efi_memmap" kernel command line arg is passed i.e. passing this argument updates E820 table based on EFI Memory Map) [3]. The problem I see with memblock not taking EFI Memory Map into consideration is that, we are ignoring the main purpose for which EFI Memory Map exists. 2. Why doesn't the kernel have "add_efi_memmap" by default? From the commit "21eb140e: x86 boot: only pick up additional EFI memmap if add_efi_memmap flag", I didn't understand why the decision was made so. Shouldn't we give more preference to EFI Memory map rather than E820 table as it's the latest and E820 is legacy? 3. Why isn't kernel checking that both the tables E820 table and EFI Memory Map are in sync i.e. is there any *possibility* that a buggy BIOS could report a region as usable in E820 table and as reserved in EFI Memory Map? [1] https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/setup.c#L1106 [2] https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/e820.c#L1265 [3] https://elixir.bootlin.com/linux/latest/source/arch/x86/platform/efi/efi.c#L129 Regards, Sai
[PATCH] x86/efi: Mark can_free_region() as an __init function
can_free_region() is called only once during _boot_ by efi_reserve_boot_services(). Hence, mark it as __init function. Signed-off-by: Sai Praneeth Prakhya Cc: Ard Biesheuvel --- arch/x86/platform/efi/quirks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 17456a1d3f04..9ce85e605052 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -304,7 +304,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) * - Not within any part of the kernel * - Not the BIOS reserved area (E820_TYPE_RESERVED, E820_TYPE_NVS, etc) */ -static bool can_free_region(u64 start, u64 size) +static __init bool can_free_region(u64 start, u64 size) { if (start + size > __pa_symbol(_text) && start <= __pa_symbol(_end)) return false; -- 2.19.1
[PATCH] x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP and EFI_MIXED_MODE
Commit d5052a7130a6 ("x86/efi: Unmap EFI boot services code/data regions from efi_pgd") forgets to take two EFI modes into consideration namely EFI_OLD_MEMMAP and EFI_MIXED_MODE. EFI_OLD_MEMMAP is a legacy way of mapping EFI regions into swapper_pg_dir using ioremap() and init_memory_mapping(). This feature can be enabled by passing "efi=old_map" as kernel command line argument. But, efi_unmap_pages() unmaps EFI boot services code/data regions *only* from efi_pgd and hence cannot be used for unmapping EFI boot services code/data regions from swapper_pg_dir. Introduce a temporary fix to not unmap EFI boot services code/data regions when EFI_OLD_MEMMAP is enabled while working on a real fix. EFI_MIXED_MODE is another feature where a 64-bit kernel runs on a 64-bit platform crippled by a 32-bit firmware. To support EFI_MIXED_MODE, all RAM (i.e. namely EFI regions like EFI_CONVENTIONAL_MEMORY, EFI_LOADER_, EFI_BOOT_SERVICES_ and EFI_RUNTIME_CODE/DATA regions) is mapped into efi_pgd all the time to facilitate EFI runtime calls access it's arguments in 1:1 mode. Hence, don't unmap EFI boot services code/data regions when booted in mixed mode. Signed-off-by: Sai Praneeth Prakhya Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ard Biesheuvel --- arch/x86/platform/efi/quirks.c | 16 1 file changed, 16 insertions(+) diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 09e811b9da26..9c34230aaeae 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -380,6 +380,22 @@ static void __init efi_unmap_pages(efi_memory_desc_t *md) u64 pa = md->phys_addr; u64 va = md->virt_addr; + /* +* To Do: Remove this check after adding functionality to unmap EFI boot +* services code/data regions from direct mapping area because +* "efi=old_map" maps EFI regions in swapper_pg_dir. +*/ + if (efi_enabled(EFI_OLD_MEMMAP)) + return; + + /* +* EFI mixed mode has all RAM mapped to access arguments while making +* EFI runtime calls, hence don't unmap EFI boot services code/data +* regions. +*/ + if (!efi_is_native() && IS_ENABLED(CONFIG_EFI_MIXED)) + return; + if (kernel_unmap_pages_in_pgd(pgd, pa, md->num_pages)) pr_err("Failed to unmap 1:1 mapping for 0x%llx\n", pa); -- 2.19.1
[PATCH V2 3/3] x86/efi: Use efi_memmap_() to create runtime EFI memory map
efi_map_regions() uses realloc_pages() to allocate memory for runtime EFI memory map (EFI memory map which contains only memory descriptors of type Runtime Code/Data and Boot Code/Data). Since efi_memmap_alloc() also does the same, use it instead of realloc_pages() and install the new EFI memory map using efi_memmap_install() instead of efi_memmap_init_late(). This also fixes the leaking of existing EFI memory map. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Ard Biesheuvel Cc: Ingo Molnar Cc: Borislav Petkov Cc: Bhupesh Sharma Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 2 +- arch/x86/platform/efi/efi.c| 93 +- arch/x86/platform/efi/efi_32.c | 2 +- arch/x86/platform/efi/efi_64.c | 7 ++- 4 files changed, 33 insertions(+), 71 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 744f945a00e7..524fda68b03f 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -131,7 +131,7 @@ extern void __init efi_map_region(efi_memory_desc_t *md); extern void __init efi_map_region_fixed(efi_memory_desc_t *md); extern void efi_sync_low_kernel_mappings(void); extern int __init efi_alloc_page_tables(void); -extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages); +extern int __init efi_setup_page_tables(void); extern void __init old_map_region(efi_memory_desc_t *md); extern void __init runtime_code_page_mkexec(void); extern void __init efi_runtime_update_mappings(void); diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 63885cc8e34e..1b0a9449096b 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -656,27 +656,6 @@ static void __init get_systab_virt_addr(efi_memory_desc_t *md) } } -static void *realloc_pages(void *old_memmap, int old_shift) -{ - void *ret; - - ret = (void *)__get_free_pages(GFP_KERNEL, old_shift + 1); - if (!ret) - goto out; - - /* -* A first-time allocation doesn't have anything to copy. -*/ - if (!old_memmap) - return ret; - - memcpy(ret, old_memmap, PAGE_SIZE << old_shift); - -out: - free_pages((unsigned long)old_memmap, old_shift); - return ret; -} - /* * Iterate the EFI memory map in reverse order because the regions * will be mapped top-down. The end result is the same as if we had @@ -782,18 +761,15 @@ static bool should_map_region(efi_memory_desc_t *md) } /* - * Map the efi memory ranges of the runtime services and update new_mmap with - * virtual addresses. + * Map the efi memory ranges of the runtime services and update memory map with + * virtual addresses. Returns number of memory map entries mapped. */ -static void * __init efi_map_regions(int *count, int *pg_shift) +static int __init efi_map_regions(void) { - void *p, *new_memmap = NULL; - unsigned long left = 0; - unsigned long desc_size; + void *p; + int count = 0; efi_memory_desc_t *md; - desc_size = efi.memmap.desc_size; - p = NULL; while ((p = efi_map_next_entry(p))) { md = p; @@ -803,30 +779,15 @@ static void * __init efi_map_regions(int *count, int *pg_shift) efi_map_region(md); get_systab_virt_addr(md); - - if (left < desc_size) { - new_memmap = realloc_pages(new_memmap, *pg_shift); - if (!new_memmap) - return NULL; - - left += PAGE_SIZE << *pg_shift; - (*pg_shift)++; - } - - memcpy(new_memmap + (*count * desc_size), md, desc_size); - - left -= desc_size; - (*count)++; + count++; } - - return new_memmap; + return count; } static void __init kexec_enter_virtual_mode(void) { #ifdef CONFIG_KEXEC_CORE efi_memory_desc_t *md; - unsigned int num_pages; efi.systab = NULL; @@ -872,10 +833,7 @@ static void __init kexec_enter_virtual_mode(void) BUG_ON(!efi.systab); - num_pages = ALIGN(efi.memmap.nr_map * efi.memmap.desc_size, PAGE_SIZE); - num_pages >>= PAGE_SHIFT; - - if (efi_setup_page_tables(efi.memmap.phys_map, num_pages)) { + if (efi_setup_page_tables()) { clear_bit(EFI_RUNTIME_SERVICES, ); return; } @@ -926,10 +884,12 @@ static void __init kexec_enter_virtual_mode(void) */ static void __init __efi_enter_virtual_mode(void) { - int count = 0, pg_shift = 0; - void *new_memmap = NULL; + struct efi_memory_map new_memmap; + efi_memory_desc_t *md; + int count = 0; efi_status_t status; unsigned long pa; + void *out; efi.systab = NULL; @@ -940,28 +900,25 @@ static void __init __efi_en
[PATCH V2 1/3] efi: Introduce efi_memmap_free() and efi_memmap_unmap_and_free()
Presently, in EFI subsystem of kernel, every time kernel allocates memory for a new EFI memory map, it forgets to free the memory occupied by old EFI memory map. Hence, introduce efi_memmap_free() that frees up the memory occupied by an EFI memory map. Introduce __efi_memmap_unmap(), so that it could be used to unmap an EFI memory map and have wrappers around it (namely efi_memmap_unmap() and efi_memmap_unmap_and_free()) to specifically deal with efi.memmap. There are two variants of wrappers (unmap and free) because there are use cases where the kernel just needs to unmap the memory map (see efi_init() in arm and kexec_enter_virtual_mode()) but not free it. Apart from introducing the above functions, improve the cases where the kernel decides to turn off EFI runtime services during boot by unmapping and freeing the EFI memory map rather than just unmapping the EFI memory map. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Ard Biesheuvel Cc: Ingo Molnar Cc: Borislav Petkov Cc: Bhupesh Sharma Cc: Ard Biesheuvel --- arch/x86/platform/efi/efi.c | 4 +- arch/x86/platform/efi/quirks.c | 2 +- drivers/firmware/efi/arm-init.c | 2 +- drivers/firmware/efi/memmap.c | 72 + include/linux/efi.h | 1 + 5 files changed, 70 insertions(+), 11 deletions(-) diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index e1cb01a22fa8..715601d1c581 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -532,7 +532,7 @@ void __init efi_init(void) pr_info("No EFI runtime due to 32/64-bit mismatch with kernel\n"); else { if (efi_runtime_disabled() || efi_runtime_init()) { - efi_memmap_unmap(); + efi_memmap_unmap_and_free(); return; } } @@ -833,7 +833,7 @@ static void __init kexec_enter_virtual_mode(void) * have been mapped at these virtual addresses. */ if (!efi_is_native() || efi_enabled(EFI_OLD_MEMMAP)) { - efi_memmap_unmap(); + efi_memmap_unmap_and_free(); clear_bit(EFI_RUNTIME_SERVICES, ); return; } diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 09e811b9da26..ce6dcd40dd6c 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -556,7 +556,7 @@ void __init efi_apply_memmap_quirks(void) */ if (!efi_runtime_supported()) { pr_info("Setup done, disabling due to 32/64-bit mismatch\n"); - efi_memmap_unmap(); + efi_memmap_unmap_and_free(); } /* UV2+ BIOS has a fix for this issue. UV1 still needs the quirk. */ diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c index 1a6a77df8a5e..f32ff5c580f6 100644 --- a/drivers/firmware/efi/arm-init.c +++ b/drivers/firmware/efi/arm-init.c @@ -253,7 +253,7 @@ void __init efi_init(void) efi.memmap.desc_version); if (uefi_init() < 0) { - efi_memmap_unmap(); + efi_memmap_unmap_and_free(); return; } diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c index 38b686c67b17..4318a69bdbbf 100644 --- a/drivers/firmware/efi/memmap.c +++ b/drivers/firmware/efi/memmap.c @@ -49,6 +49,29 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries) return __efi_memmap_alloc_early(size); } +/** + * efi_memmap_free - Free memory pointed by new_memmap.map + * @new_memmap: Structure that describes EFI memory map. + * + * Memory is freed depending on the type of allocation performed. + */ +static void __init efi_memmap_free(struct efi_memory_map new_memmap) +{ + phys_addr_t start, end; + unsigned long size = new_memmap.nr_map * new_memmap.desc_size; + unsigned int order = get_order(size); + + start = new_memmap.phys_map; + end = start + size; + if (new_memmap.late) { + __free_pages(pfn_to_page(PHYS_PFN(start)), order); + return; + } + + if (memblock_free(start, size)) + pr_err("Failed to free mem from %pa to %pa\n", , ); +} + /** * __efi_memmap_init - Common code for mapping the EFI memory map * @data: EFI memory map data @@ -116,21 +139,56 @@ int __init efi_memmap_init_early(struct efi_memory_map_data *data) return __efi_memmap_init(data, false); } +/** + * __efi_memmap_unmap - Unmap the region pointed by new_memmap.map + * @new_memmap: Structure that describes EFI memory map. + * + * Use to unmap *newly* created EFI memmap and should *not* be used directly to + * unmap efi.memmap because "EFI_MEMMAP" flag is not cleared here. Instead, use + * efi_memmap_unmap*() variants accordingly. Also, the check for "EFI_MEMMAP" + * flag is done in efi_memmap_unma
[PATCH V2 2/3] x86/efi: Fix EFI memory map leaks
Presently, in efi subsystem of kernel, every time kernel allocates memory for a new EFI memory map, it forgets to free the memory occupied by the existing EFI memory map. This could be fixed by unmapping and freeing the existing EFI memory map every time before installing a new EFI memory map. Hence, modify efi_memmap_install() accordingly since it's the only place which installs a new EFI memory map. Presently, efi_memmap_alloc() allocates only physical memory and every caller of efi_memmap_alloc() should remap the newly allocated memory in order to use it. This extra step could sometimes lead to buggy error handling conditions where in the allocated memory isn't freed should remap fail. So, push the remap logic into efi_memmap_alloc() so that the error handling could be improved and it also makes the caller look simpler. With the modified efi_memmap_alloc() and efi_memmap_install() API's, a typical flow to install a new EFI memory map would look something like below. 1. Get the number of entries the new EFI memory map should have (typically through efi_memmap_split_count()). 2. Allocate memory for the new EFI memory map (efi_memmap_alloc()). 3. Populate memory descriptor entries in the new EFI memory map. 4. Install the new EFI memory map (efi_memmap_install() which also unmaps and frees existing memory map). Existing functions like efi_clean_memmap(), efi_arch_mem_reserve(), efi_free_boot_services() and efi_fake_memmap() are modified to fix the above mentioned bugs and also to follow the above recommended usage of API's. Note that efi_clean_memmap() could be implemented without allocating any new memory, but since this is not fast path and hence is not a concern for performance, readability and maintainability wins. So, change it to use efi_memmap_alloc() and efi_memmap_install(). Signed-off-by: Sai Praneeth Prakhya Suggested-by: Ard Biesheuvel Cc: Ingo Molnar Cc: Borislav Petkov Cc: Bhupesh Sharma Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 1 + arch/x86/kernel/setup.c | 6 ++ arch/x86/platform/efi/efi.c | 44 ++-- arch/x86/platform/efi/quirks.c | 43 +++- drivers/firmware/efi/fake_mem.c | 21 ++ drivers/firmware/efi/memmap.c | 118 +++- include/linux/efi.h | 7 +- 7 files changed, 132 insertions(+), 108 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index d1e64ac80b9c..744f945a00e7 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -143,6 +143,7 @@ extern void efi_switch_mm(struct mm_struct *mm); extern void efi_recover_from_page_fault(unsigned long phys_addr); extern void efi_free_boot_services(void); extern void efi_reserve_boot_services(void); +extern void __init efi_clean_memmap(void); struct efi_setup_data { u64 fw_vendor; diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index b74e7bfed6ab..bed79b238b0d 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1102,6 +1102,12 @@ void __init setup_arch(char **cmdline_p) reserve_bios_regions(); if (efi_enabled(EFI_MEMMAP)) { + /* +* efi_clean_memmap() uses memblock_phys_alloc() to allocate +* memory for new EFI memmap and hence will work only after +* e820__memblock_setup() +*/ + efi_clean_memmap(); efi_fake_memmap(); efi_find_mirror(); efi_esrt_init(); diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 715601d1c581..63885cc8e34e 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -249,30 +249,36 @@ static bool __init efi_memmap_entry_valid(const efi_memory_desc_t *md, int i) return false; } -static void __init efi_clean_memmap(void) +void __init efi_clean_memmap(void) { - efi_memory_desc_t *out = efi.memmap.map; - const efi_memory_desc_t *in = out; - const efi_memory_desc_t *end = efi.memmap.map_end; - int i, n_removal; - - for (i = n_removal = 0; in < end; i++) { - if (efi_memmap_entry_valid(in, i)) { - if (out != in) - memcpy(out, in, efi.memmap.desc_size); - out = (void *)out + efi.memmap.desc_size; - } else { + void *out; + efi_memory_desc_t *md; + unsigned int i = 0, n_removal = 0; + struct efi_memory_map new_memmap; + + for_each_efi_memory_desc(md) { + if (!efi_memmap_entry_valid(md, i)) n_removal++; - } - in = (void *)in + efi.memmap.desc_size; } - if (n_removal > 0) { - u64 size = efi.memmap.nr_map - n_removal; + if (n_removal == 0) + return; - pr_warn("Removing %d invalid memory map entries.\n&
[PATCH V2 0/3] Fix EFI memory map leaks
Presently, in EFI subsystem of kernel, every time kernel allocates memory for a new EFI memory map, it forgets to free the memory occupied by old EFI memory map. It does clear the mappings though (using efi_memmap_unmap()), but forgets to free up the memory. Also, there is another minor issue, where in the newly allocated memory isn't freed, should remap fail. The first issue is addressed by adding efi_memmap_free() to efi_memmap_install() and the second issue is addressed by pushing the remap code into efi_memmap_alloc() and there by handling the failure condition. Memory allocated to EFI memory map is leaked in below functions and hence they are modified to fix the issue. Functions that modify EFI memmap are: 1. efi_clean_memmap(), 2. efi_fake_memmap(), 3. efi_arch_mem_reserve(), 4. efi_free_boot_services(), 5. and __efi_enter_virtual_mode() More detailed explanation: -- A typical boot flow on EFI supported x86_64 machines might look something like below 1. EFI memory map is passed by firmware to kernel. 2. Kernel does a memblock_reserve() on this memory (see efi_memblock_x86_reserve_range()). 3. This memory map is checked for invalid entries in efi_clean_memmap(). If any invalid entries are found, they are omitted from EFI memory map but the memory occupied by these invalid EFI memory descriptors isn't freed. 3. To further process this memory map (see efi_fake_memmap(), efi_bgrt_init() and efi_esrt_init()), kernel allocates memory using efi_memmap_alloc() and copies the processed memory map to newly allocated memory but it forgets to free memory occupied by old EFI memory map. 4. Further, in efi_map_regions() the EFI memory map is processed again to include only EFI memory descriptors of type Runtime Code/Data and Boot Code/Data. Again, memory is allocated for this new memory map through realloc_pages() and the old EFI memory map is not freed. 5. After SetVirtualAddressMap() is done, the EFI memory map is processed again to have only EFI memory descriptors of type Runtime Code/Data. Again, memory is allocated for this new memory map through efi_memmap_alloc() and the old EFI memory map is not freed. Testing: Tested with LUV on qemu-x86_64 and on my dev machine. Checked for unchanged boot behavior i.e. shouldn't break any existing stuff. Built for arm, arm64 and ia64 and found no new warnings/errors. Would appreciate the effort if someone could test on arm machines. Although majority of the changes are made to drivers/firmware/efi/memmap.c file (which is common across architectures), this bug is only limited to x86_64 machines and hence this patch set shouldn't effect any other architectures. Notes: -- 1. This patch set is based on EFI tree's "next" branch [1]. 2. This patch set is an outcome of the discussion at [2]. Changes from V1: 1. Drop passing around allocation type from efi_memmap_alloc(), instead change efi_memmap_alloc() such that it now returns a populated struct efi_memory_map 2. Drop fixing issues in efi_fake_memmap(), will be addressed in a separate patch. 3. Optimize efi_map_regions(). [1] git git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git [2] https://lkml.org/lkml/2018/7/2/1095 Sai Praneeth Prakhya (3): efi: Introduce efi_memmap_free() and efi_memmap_unmap_and_free() x86/efi: Fix EFI memory map leaks x86/efi: Use efi_memmap_() to create runtime EFI memory map arch/x86/include/asm/efi.h | 3 +- arch/x86/kernel/setup.c | 6 + arch/x86/platform/efi/efi.c | 141 +--- arch/x86/platform/efi/efi_32.c | 2 +- arch/x86/platform/efi/efi_64.c | 7 +- arch/x86/platform/efi/quirks.c | 45 ++-- drivers/firmware/efi/arm-init.c | 2 +- drivers/firmware/efi/fake_mem.c | 21 +--- drivers/firmware/efi/memmap.c | 190 +--- include/linux/efi.h | 8 +- 10 files changed, 235 insertions(+), 190 deletions(-) Signed-off-by: Sai Praneeth Prakhya Suggested-by: Ard Biesheuvel Cc: Ingo Molnar Cc: Borislav Petkov Cc: Bhupesh Sharma Cc: Ard Biesheuvel -- 2.19.1
[PATCH V3 2/3] x86/efi: Unmap EFI boot services code/data regions from efi_pgd
efi_free_boot_services(), as the name suggests, frees EFI boot services code/data regions but forgets to unmap these regions from efi_pgd. This means that any code that's running in efi_pgd address space (e.g: any EFI runtime service) would still be able to access these regions but the contents of these regions would have long been over written by someone else. So, it's important to unmap these regions. Hence, introduce efi_unmap_pages() to unmap these regions from efi_pgd. After unmapping EFI boot services code/data regions, any illegal access by buggy firmware to these regions would result in page fault which will be handled by EFI specific fault handler. Signed-off-by: Sai Praneeth Prakhya Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/platform/efi/quirks.c | 25 + 1 file changed, 25 insertions(+) diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 669babcaf245..fb1c44b11235 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -370,6 +370,24 @@ void __init efi_reserve_boot_services(void) } } +/* + * Apart from having VA mappings for EFI boot services code/data regions, + * (duplicate) 1:1 mappings were also created as a quirk for buggy firmware. So, + * unmap both 1:1 and VA mappings. + */ +static void __init efi_unmap_pages(efi_memory_desc_t *md) +{ + pgd_t *pgd = efi_mm.pgd; + u64 pa = md->phys_addr; + u64 va = md->virt_addr; + + if (kernel_unmap_pages_in_pgd(pgd, pa, md->num_pages)) + pr_err("Failed to unmap 1:1 mapping for 0x%llx\n", pa); + + if (kernel_unmap_pages_in_pgd(pgd, va, md->num_pages)) + pr_err("Failed to unmap VA mapping for 0x%llx\n", va); +} + void __init efi_free_boot_services(void) { phys_addr_t new_phys, new_size; @@ -395,6 +413,13 @@ void __init efi_free_boot_services(void) } /* +* Before calling set_virtual_address_map(), EFI boot services +* code/data regions were mapped as a quirk for buggy firmware. +* Unmap them from efi_pgd before freeing them up. +*/ + efi_unmap_pages(md); + + /* * Nasty quirk: if all sub-1MB memory is used for boot * services, we can get here without having allocated the * real mode trampoline. It's too late to hand boot services -- 2.7.4
[PATCH V3 1/3] x86/mm/pageattr: Introduce helper function to unmap EFI boot services
Ideally, after kernel assumes control of the platform, firmware shouldn't access EFI boot services code/data regions. But, it's noticed that this is not so true in many x86 platforms. Hence, during boot, kernel reserves EFI boot services code/data regions [1] and maps [2] them to efi_pgd so that call to set_virtual_address_map() doesn't fail. After returning from set_virtual_address_map(), kernel frees the reserved regions [3] but they still remain mapped. Hence, introduce kernel_unmap_pages_in_pgd() which will later be used to unmap EFI boot services code/data regions. While at it modify kernel_map_pages_in_pgd() by 1. Adding __init modifier because it's always used *only* during boot. 2. Add a warning if it's used after SMP is initialized because it uses __flush_tlb_all() which flushes mappings only on current CPU. Unmapping EFI boot services code/data regions will result in clearing PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already handled by protnone_mask() at arch/x86/include/asm/pgtable-invert.h. [1] efi_reserve_boot_services() [2] efi_map_region() -> __map_region() -> kernel_map_pages_in_pgd() [3] efi_free_boot_services() Signed-off-by: Sai Praneeth Prakhya Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/pgtable_types.h | 8 ++-- arch/x86/mm/pageattr.c | 40 ++-- 2 files changed, 44 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index b64acb08a62b..79aa79bb2cfa 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -564,8 +564,12 @@ extern pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address, unsigned int *level); extern pmd_t *lookup_pmd_address(unsigned long address); extern phys_addr_t slow_virt_to_phys(void *__address); -extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address, - unsigned numpages, unsigned long page_flags); +extern int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, + unsigned long address, + unsigned numpages, + unsigned long page_flags); +extern int __init kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address, + unsigned long numpages); #endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_PGTABLE_DEFS_H */ diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 51a5a69ecac9..1b1d5a68c4b2 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -2111,8 +2111,8 @@ bool kernel_page_present(struct page *page) #endif /* CONFIG_DEBUG_PAGEALLOC */ -int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address, - unsigned numpages, unsigned long page_flags) +int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address, + unsigned numpages, unsigned long page_flags) { int retval = -EINVAL; @@ -2126,6 +2126,8 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address, .flags = 0, }; + WARN_ONCE(num_online_cpus() > 1, "Don't call after initializing SMP"); + if (!(__supported_pte_mask & _PAGE_NX)) goto out; @@ -2148,6 +2150,40 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address, } /* + * __flush_tlb_all() flushes mappings only on current CPU and hence this + * function shouldn't be used in an SMP environment. Presently, it's used only + * during boot (way before smp_init()) by EFI subsystem and hence is ok. + */ +int __init kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address, +unsigned long numpages) +{ + int retval; + + /* +* The typical sequence for unmapping is to find a pte through +* lookup_address_in_pgd() (ideally, it should never return NULL because +* the address is already mapped) and change it's protections. As pfn is +* the *target* of a mapping, it's not useful while unmapping. +*/ + struct cpa_data cpa = { + .vaddr = , + .pfn= 0, + .pgd= pgd, + .numpages = numpages, + .mask_set = __pgprot(0), + .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW), + .flags = 0, + }; + + WARN_ONCE(num_online_cpus() > 1, "Don't call after initializing SMP"); + + retval = __change_page_attr_set_clr(, 0); + __flush_tlb_all(); + + return retval; +} + +/* *
[PATCH V3 3/3] x86/efi: Move efi__boot_services() to arch/x86
efi__boot_services() are x86 specific quirks and as such should be in asm/efi.h, so move them from linux/efi.h. Also, call efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86 specific call and ideally shouldn't be part of init/main.c Signed-off-by: Sai Praneeth Prakhya Acked-by: Thomas Gleixner Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 2 ++ arch/x86/platform/efi/efi.c | 2 ++ include/linux/efi.h | 3 --- init/main.c | 4 4 files changed, 4 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index eea40d52ca78..d1e64ac80b9c 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -141,6 +141,8 @@ extern int __init efi_reuse_config(u64 tables, int nr_tables); extern void efi_delete_dummy_variable(void); extern void efi_switch_mm(struct mm_struct *mm); extern void efi_recover_from_page_fault(unsigned long phys_addr); +extern void efi_free_boot_services(void); +extern void efi_reserve_boot_services(void); struct efi_setup_data { u64 fw_vendor; diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 9061babfbc83..93924a353e3b 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -994,6 +994,8 @@ static void __init __efi_enter_virtual_mode(void) panic("EFI call to SetVirtualAddressMap() failed!"); } + efi_free_boot_services(); + /* * Now that EFI is in virtual mode, update the function * pointers in the runtime service table to the new virtual addresses. diff --git a/include/linux/efi.h b/include/linux/efi.h index 845174e113ce..ed2058073385 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1000,13 +1000,11 @@ extern void efi_memmap_walk (efi_freemem_callback_t callback, void *arg); extern void efi_gettimeofday (struct timespec64 *ts); extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, if possible */ #ifdef CONFIG_X86 -extern void efi_free_boot_services(void); extern efi_status_t efi_query_variable_store(u32 attributes, unsigned long size, bool nonblocking); extern void efi_find_mirror(void); #else -static inline void efi_free_boot_services(void) {} static inline efi_status_t efi_query_variable_store(u32 attributes, unsigned long size, @@ -1046,7 +1044,6 @@ extern void efi_mem_reserve(phys_addr_t addr, u64 size); extern int efi_mem_reserve_persistent(phys_addr_t addr, u64 size); extern void efi_initialize_iomem_resources(struct resource *code_resource, struct resource *data_resource, struct resource *bss_resource); -extern void efi_reserve_boot_services(void); extern int efi_get_fdt_params(struct efi_fdt_params *params); extern struct kobject *efi_kobj; diff --git a/init/main.c b/init/main.c index 18f8f0140fa0..174fb14196cc 100644 --- a/init/main.c +++ b/init/main.c @@ -731,10 +731,6 @@ asmlinkage __visible void __init start_kernel(void) arch_post_acpi_subsys_init(); sfi_init_late(); - if (efi_enabled(EFI_RUNTIME_SERVICES)) { - efi_free_boot_services(); - } - /* Do the rest non-__init'ed, we're now alive */ rest_init(); } -- 2.7.4
[PATCH V3 0/3] Unmap EFI boot services code/data regions after boot.
CC'ing x86 folks because this patch set touches x86/mm which I am no expert of. Ideally, after kernel assumes control of the platform, firmware shouldn't access EFI boot services code/data regions. But, it's noticed that this is not so true in many x86 platforms. Hence, during boot, kernel reserves EFI boot services code/data regions [1] and maps [2] them to efi_pgd so that call to set_virtual_address_map() doesn't fail. After returning from set_virtual_address_map(), kernel frees the reserved regions [3] but they still remain mapped. This means that any code that's running in efi_pgd address space (e.g: any EFI runtime service) would still be able to access EFI boot services code/data regions but the contents of these regions would have long been over written by someone else as they are freed by efi_free_boot_services(). So, it's important to unmap these regions. After unmapping EFI boot services code/data regions, any illegal access by buggy firmware to these regions would result in page fault which will be handled by efi specific fault handler. Unmapping EFI boot services code/data regions will result in clearing PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already handled by protnone_mask() at arch/x86/include/asm/pgtable-invert.h. [1] Please see efi_reserve_boot_services() [2] Please see efi_map_region() -> __map_region() -> kernel_map_pages_in_pgd() [3] Please see efi_free_boot_services() Testing the patch set: -- 1. Download buggy firmware (which accesses boot regions even after kernel has booted) from here [1]. 2. Without the patch set, you shouldn't see any kernel warning/error messages (i.e. kernel allows accesses to EFI boot services code/data regions even after call to set_virtual_address_map()). 3. With the patch set, you should see a kernel warning about buggy firmware, efi_rts_wq beeing freezed and disabling runtime services forever. Please note that this patch will change kernel's existing behavior for some EFI runtime services but I think it's OK because kernel should have never allowed those accesses in the first place. Also please note that this patch set needs lot of real time trashing as I just tested it out with OVMF. Note: - Patch set based on "next" branch in efi tree. Changes from V2 -> V3: -- 1. Expliclty set pfn to 0 in kernel_unmap_pages_in_pgd(). 2. Add __init modifier to kernel__pages_in_pgd(). 3. Warn if kernel__pages_in_pgd() are called after smp_init(). 4. Split efi_unmap_pages() into a separate patch. Changes from V1 -> V2: -- 1. Rewrite the cpa initializer in a more readable fashion. 2. Don't use cpa->pfn while unmapping, as it's not useful. 3. Unmap regions before freeing them up. 4. Fix spelling nits. Sai Praneeth (3): x86/mm/pageattr: Introduce helper function to unmap EFI boot services x86/efi: Unmap EFI boot services code/data regions from efi_pgd x86/efi: Move efi__boot_services() to arch/x86 arch/x86/include/asm/efi.h | 2 ++ arch/x86/include/asm/pgtable_types.h | 8 ++-- arch/x86/mm/pageattr.c | 40 ++-- arch/x86/platform/efi/efi.c | 2 ++ arch/x86/platform/efi/quirks.c | 25 ++ include/linux/efi.h | 3 --- init/main.c | 4 7 files changed, 73 insertions(+), 11 deletions(-) Signed-off-by: Sai Praneeth Prakhya Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel -- 2.7.4
[PATCH V2 0/2] Unmap EFI boot services code/data regions after boot.
CC'ing x86 folks because this patch touches x86/mm which I am no expert of. [Copied from Patch 1] Ideally, after kernel assumes control of the platform, firmware shouldn't access EFI boot services code/data regions. But, it's noticed that this is not so true in many x86 platforms. Hence, during boot, kernel reserves EFI boot services code/data regions [1] and maps [2] them to efi_pgd so that call to set_virtual_address_map() doesn't fail. After returning from set_virtual_address_map(), kernel frees the reserved regions [3] but they still remain mapped. This means that any code that's running in efi_pgd address space (e.g: any EFI runtime service) would still be able to access EFI boot services code/data regions but the contents of these regions would have long been over written by someone else as they are freed by efi_free_boot_services(). So, it's important to unmap these regions. After unmapping EFI boot services code/data regions, any illegal access by buggy firmware to these regions would result in page fault which will be handled by efi specific fault handler. Unmapping EFI boot services code/data regions will result in clearing PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already handled by protnone_mask() at arch/x86/include/asm/pgtable-invert.h. [1] Please see efi_reserve_boot_services() [2] Please see efi_map_region() -> __map_region() [3] Please see efi_free_boot_services() Testing the patch set: -- 1. Download buggy firmware (which accesses boot regions even after kernel has booted) from here [1]. 2. Without the patch set, you shouldn't see any kernel warning/error messages (i.e. kernel allows accesses to EFI boot services code/data regions even after call to set_virtual_address_map()). 3. With the patch set, you should see a kernel warning about buggy firmware, efi_rts_wq beeing freezed and disabling runtime services forever. Please note that this patch will change kernel's existing behavior for some EFI runtime services but I think it's OK because kernel should have never allowed those accesses in the first place. Also please note that this patch set needs lot of real time trashing as I just tested it out with OVMF. Note: - Patch set based on "next" branch in efi tree. Changes from V1 -> v2: -- 1. Rewrite the cpa initializer in a more readable fashion. 2. Don't use cpa->pfn while unmapping, as it's not useful. 3. Unmap regions before freeing them up. 4. Fix spelling nits. Sai Praneeth (2): x86/efi: Unmap EFI boot services code/data regions from efi_pgd x86/efi: Move efi__boot_services() to arch/x86 arch/x86/include/asm/efi.h | 2 ++ arch/x86/include/asm/pgtable_types.h | 2 ++ arch/x86/mm/pageattr.c | 26 ++ arch/x86/platform/efi/efi.c | 2 ++ arch/x86/platform/efi/quirks.c | 25 + include/linux/efi.h | 3 --- init/main.c | 4 7 files changed, 57 insertions(+), 7 deletions(-) Signed-off-by: Sai Praneeth Prakhya Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel -- 2.19.1
[PATCH V2 2/2] x86/efi: Move efi__boot_services() to arch/x86
efi__boot_services() are x86 specific quirks and as such should be in asm/efi.h, so move them from linux/efi.h. Also, call efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86 specific call and ideally shouldn't be part of init/main.c Signed-off-by: Sai Praneeth Prakhya Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 2 ++ arch/x86/platform/efi/efi.c | 2 ++ include/linux/efi.h | 3 --- init/main.c | 4 4 files changed, 4 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index eea40d52ca78..d1e64ac80b9c 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -141,6 +141,8 @@ extern int __init efi_reuse_config(u64 tables, int nr_tables); extern void efi_delete_dummy_variable(void); extern void efi_switch_mm(struct mm_struct *mm); extern void efi_recover_from_page_fault(unsigned long phys_addr); +extern void efi_free_boot_services(void); +extern void efi_reserve_boot_services(void); struct efi_setup_data { u64 fw_vendor; diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 9061babfbc83..93924a353e3b 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -994,6 +994,8 @@ static void __init __efi_enter_virtual_mode(void) panic("EFI call to SetVirtualAddressMap() failed!"); } + efi_free_boot_services(); + /* * Now that EFI is in virtual mode, update the function * pointers in the runtime service table to the new virtual addresses. diff --git a/include/linux/efi.h b/include/linux/efi.h index 845174e113ce..ed2058073385 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1000,13 +1000,11 @@ extern void efi_memmap_walk (efi_freemem_callback_t callback, void *arg); extern void efi_gettimeofday (struct timespec64 *ts); extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, if possible */ #ifdef CONFIG_X86 -extern void efi_free_boot_services(void); extern efi_status_t efi_query_variable_store(u32 attributes, unsigned long size, bool nonblocking); extern void efi_find_mirror(void); #else -static inline void efi_free_boot_services(void) {} static inline efi_status_t efi_query_variable_store(u32 attributes, unsigned long size, @@ -1046,7 +1044,6 @@ extern void efi_mem_reserve(phys_addr_t addr, u64 size); extern int efi_mem_reserve_persistent(phys_addr_t addr, u64 size); extern void efi_initialize_iomem_resources(struct resource *code_resource, struct resource *data_resource, struct resource *bss_resource); -extern void efi_reserve_boot_services(void); extern int efi_get_fdt_params(struct efi_fdt_params *params); extern struct kobject *efi_kobj; diff --git a/init/main.c b/init/main.c index 18f8f0140fa0..174fb14196cc 100644 --- a/init/main.c +++ b/init/main.c @@ -731,10 +731,6 @@ asmlinkage __visible void __init start_kernel(void) arch_post_acpi_subsys_init(); sfi_init_late(); - if (efi_enabled(EFI_RUNTIME_SERVICES)) { - efi_free_boot_services(); - } - /* Do the rest non-__init'ed, we're now alive */ rest_init(); } -- 2.19.1
[PATCH V2 1/2] x86/efi: Unmap EFI boot services code/data regions from efi_pgd
Ideally, after kernel assumes control of the platform, firmware shouldn't access EFI boot services code/data regions. But, it's noticed that this is not so true in many x86 platforms. Hence, during boot, kernel reserves EFI boot services code/data regions [1] and maps [2] them to efi_pgd so that call to set_virtual_address_map() doesn't fail. After returning from set_virtual_address_map(), kernel frees the reserved regions [3] but they still remain mapped. This means that any code that's running in efi_pgd address space (e.g: any EFI runtime service) would still be able to access EFI boot services code/data regions but the contents of these regions would have long been over written by someone else as they are freed by efi_free_boot_services(). So, it's important to unmap these regions. After unmapping EFI boot services code/data regions, any illegal access by buggy firmware to these regions would result in page fault which will be handled by efi specific fault handler. Unmapping EFI boot services code/data regions will result in clearing PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already handled by protnone_mask() at arch/x86/include/asm/pgtable-invert.h. [1] Please see efi_reserve_boot_services() [2] Please see efi_map_region() -> __map_region() [3] Please see efi_free_boot_services() Signed-off-by: Sai Praneeth Prakhya Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/pgtable_types.h | 2 ++ arch/x86/mm/pageattr.c | 26 ++ arch/x86/platform/efi/quirks.c | 25 + 3 files changed, 53 insertions(+) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index b64acb08a62b..cda04ecf5432 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -566,6 +566,8 @@ extern pmd_t *lookup_pmd_address(unsigned long address); extern phys_addr_t slow_virt_to_phys(void *__address); extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address, unsigned numpages, unsigned long page_flags); +extern int kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address, +unsigned long numpages); #endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_PGTABLE_DEFS_H */ diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 51a5a69ecac9..248f16181bed 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -2147,6 +2147,32 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address, return retval; } +int kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address, + unsigned long numpages) +{ + int retval; + + /* +* The typical sequence for unmapping is to find a pte through +* lookup_address_in_pgd() (ideally, it should never return NULL because +* the address is already mapped) and change it's protections. +* As pfn is the *target* of a mapping, it's not useful while unmapping. +*/ + struct cpa_data cpa = { + .vaddr = , + .pgd= pgd, + .numpages = numpages, + .mask_set = __pgprot(0), + .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW), + .flags = 0, + }; + + retval = __change_page_attr_set_clr(, 0); + __flush_tlb_all(); + + return retval; +} + /* * The testcases use internal knowledge of the implementation that shouldn't * be exposed to the rest of the kernel. Include these directly here. diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 669babcaf245..fb1c44b11235 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -370,6 +370,24 @@ void __init efi_reserve_boot_services(void) } } +/* + * Apart from having VA mappings for EFI boot services code/data regions, + * (duplicate) 1:1 mappings were also created as a quirk for buggy firmware. So, + * unmap both 1:1 and VA mappings. + */ +static void __init efi_unmap_pages(efi_memory_desc_t *md) +{ + pgd_t *pgd = efi_mm.pgd; + u64 pa = md->phys_addr; + u64 va = md->virt_addr; + + if (kernel_unmap_pages_in_pgd(pgd, pa, md->num_pages)) + pr_err("Failed to unmap 1:1 mapping for 0x%llx\n", pa); + + if (kernel_unmap_pages_in_pgd(pgd, va, md->num_pages)) + pr_err("Failed to unmap VA mapping for 0x%llx\n", va); +} + void __init efi_free_boot_services(void) { phys_addr_t new_phys, new_size; @@ -394,6 +412,13 @@ void __init efi_free_boot_services(void) continue; } + /* +* Before calling
[PATCH 2/2] x86/efi: Move efi__boot_services() to arch/x86
efi__boot_services() are x86 specific quirks and as such should be in asm/efi.h, so move them from linux/efi.h. Also, call efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86 specific call and ideally shouldn't be part of init/main.c Signed-off-by: Sai Praneeth Prakhya Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 2 ++ arch/x86/platform/efi/efi.c | 2 ++ include/linux/efi.h | 3 --- init/main.c | 4 4 files changed, 4 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index eea40d52ca78..d1e64ac80b9c 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -141,6 +141,8 @@ extern int __init efi_reuse_config(u64 tables, int nr_tables); extern void efi_delete_dummy_variable(void); extern void efi_switch_mm(struct mm_struct *mm); extern void efi_recover_from_page_fault(unsigned long phys_addr); +extern void efi_free_boot_services(void); +extern void efi_reserve_boot_services(void); struct efi_setup_data { u64 fw_vendor; diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 9061babfbc83..93924a353e3b 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -994,6 +994,8 @@ static void __init __efi_enter_virtual_mode(void) panic("EFI call to SetVirtualAddressMap() failed!"); } + efi_free_boot_services(); + /* * Now that EFI is in virtual mode, update the function * pointers in the runtime service table to the new virtual addresses. diff --git a/include/linux/efi.h b/include/linux/efi.h index 845174e113ce..ed2058073385 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1000,13 +1000,11 @@ extern void efi_memmap_walk (efi_freemem_callback_t callback, void *arg); extern void efi_gettimeofday (struct timespec64 *ts); extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, if possible */ #ifdef CONFIG_X86 -extern void efi_free_boot_services(void); extern efi_status_t efi_query_variable_store(u32 attributes, unsigned long size, bool nonblocking); extern void efi_find_mirror(void); #else -static inline void efi_free_boot_services(void) {} static inline efi_status_t efi_query_variable_store(u32 attributes, unsigned long size, @@ -1046,7 +1044,6 @@ extern void efi_mem_reserve(phys_addr_t addr, u64 size); extern int efi_mem_reserve_persistent(phys_addr_t addr, u64 size); extern void efi_initialize_iomem_resources(struct resource *code_resource, struct resource *data_resource, struct resource *bss_resource); -extern void efi_reserve_boot_services(void); extern int efi_get_fdt_params(struct efi_fdt_params *params); extern struct kobject *efi_kobj; diff --git a/init/main.c b/init/main.c index 18f8f0140fa0..174fb14196cc 100644 --- a/init/main.c +++ b/init/main.c @@ -731,10 +731,6 @@ asmlinkage __visible void __init start_kernel(void) arch_post_acpi_subsys_init(); sfi_init_late(); - if (efi_enabled(EFI_RUNTIME_SERVICES)) { - efi_free_boot_services(); - } - /* Do the rest non-__init'ed, we're now alive */ rest_init(); } -- 2.7.4
[PATCH 1/2] x86/efi: Unmap efi boot services code/data regions from efi_pgd
Ideally, after kernel assumes control of the platform firmware shouldn't access EFI Boot Services Code/Data regions. But, it's noticed that this is not so true in many x86 platforms. Hence, during boot, kernel reserves efi boot services code/data regions [1] and maps [2] them to efi_pgd so that call to set_virtual_address_map() doesn't fail. After returning from set_virtual_address_map(), kernel frees the reserved regions [3] but they still remain mapped. This means that any code that's running in efi_pgd address space (e.g: any efi runtime service) would still be able to access efi boot services code/data regions but the contents of these regions would have long been over written by someone else as they are freed by efi_free_boot_services(). So, it's important to unmap these regions. After unmapping boot services code/data regions, any illegal access by buggy firmware to these regions would result in page fault which will be handled by efi specific fault handler. [1] Please see efi_reserve_boot_services() [2] Please see efi_map_region() -> __map_region() [3] Please see efi_free_boot_services() Signed-off-by: Sai Praneeth Prakhya Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/pgtable_types.h | 2 ++ arch/x86/mm/pageattr.c | 21 + arch/x86/platform/efi/quirks.c | 26 ++ 3 files changed, 49 insertions(+) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index b64acb08a62b..796476f11151 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -566,6 +566,8 @@ extern pmd_t *lookup_pmd_address(unsigned long address); extern phys_addr_t slow_virt_to_phys(void *__address); extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address, unsigned numpages, unsigned long page_flags); +extern int kernel_unmap_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address, +unsigned long numpages); #endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_PGTABLE_DEFS_H */ diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 51a5a69ecac9..b88ed8e91790 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -2147,6 +2147,27 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address, return retval; } +int kernel_unmap_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address, + unsigned long numpages) +{ + int retval; + + struct cpa_data cpa = { + .vaddr = , + .pfn = pfn, + .pgd = pgd, + .numpages = numpages, + .mask_set = __pgprot(0), + .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW), + .flags = 0, + }; + + retval = __change_page_attr_set_clr(, 0); + __flush_tlb_all(); + + return retval; +} + /* * The testcases use internal knowledge of the implementation that shouldn't * be exposed to the rest of the kernel. Include these directly here. diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 669babcaf245..5a1ee9392fcf 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -370,6 +370,25 @@ void __init efi_reserve_boot_services(void) } } +/* + * Apart from having VA mappings for efi boot services code/data regions, + * (duplicate) 1:1 mappings were also created as a catch for buggy firmware. So, + * unmap both 1:1 and VA mappings. + */ +static void __init efi_unmap_pages(efi_memory_desc_t *md) +{ + pgd_t *pgd = efi_mm.pgd; + u64 pfn = md->phys_addr >> PAGE_SHIFT; + + if (kernel_unmap_pages_in_pgd(pgd, pfn, md->phys_addr, md->num_pages)) + pr_err("Failed to unmap 1:1 mapping: PA 0x%llx -> VA 0x%llx!\n", + md->phys_addr, md->virt_addr); + + if (kernel_unmap_pages_in_pgd(pgd, pfn, md->virt_addr, md->num_pages)) + pr_err("Failed to unmap VA mapping: PA 0x%llx -> VA 0x%llx!\n", + md->phys_addr, md->virt_addr); +} + void __init efi_free_boot_services(void) { phys_addr_t new_phys, new_size; @@ -415,6 +434,13 @@ void __init efi_free_boot_services(void) } free_bootmem_late(start, size); + + /* +* Before calling set_virtual_address_map(), boot services +* code/data regions were mapped as a catch for buggy firmware. +* Unmap them from efi_pgd as they have already been freed. +*/ + efi_unmap_pages(md); } if (!num_entries) -- 2.7.4
[PATCH V6 0/2] Add efi page fault handler to recover from page
From: Sai Praneeth There may exist some buggy UEFI firmware implementations that access efi memory regions other than EFI_RUNTIME_SERVICES_ even after the kernel has assumed control of the platform. This violates UEFI specification. Hence, provide a efi specific page fault handler which recovers from page faults caused by buggy firmware. Page faults triggered by firmware happen at ring 0 and if unhandled, hangs the kernel. So, provide an efi specific page fault handler to: 1. Avoid panics/hangs caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence is not a kernel bug. The efi page fault handler will check if the access is by efi_reset_system(). 1. If so, then the efi page fault handler will reboot the machine through BIOS and not through efi_reset_system(). 2. If not, then the efi page fault handler will freeze efi_rts_wq and schedules a new process. This issue was reported by Al Stone when he saw that reboot via EFI hangs the machine. Upon debugging, I found that it's efi_reset_system() that's touching memory regions which it shouldn't. To reproduce the same behavior, I have hacked OVMF and made efi_reset_system() buggy. Along with efi_reset_system(), I have also modified get_next_high_mono_count() and set_virtual_address_map(). They illegally access both boot time and other efi regions. Testing the patch set: -- 1. Download buggy firmware from here [1]. 2. Run a qemu instance with this buggy BIOS and boot mainline kernel. Add reboot=efi to the kernel command line arguments and after the kernel is up and running, type "reboot". The kernel should hang while rebooting. 3. With the same setup, boot kernel after applying patches and the reboot should work fine. Also please notice warning/error messages printed by kernel. Changes from RFC to V1: --- 1. Drop "long jump" technique of dealing with illegal access and instead use scheduling away from efi_rts_wq. Changes from V1 to V2: -- 1. Shortened config name to CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS from CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES. 2. Made the config option available only to expert users. 3. efi_free_boot_services() should be called only when CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is not enabled. Previously, this was part of init/main.c file. As it is an architecture agnostic code, moved the change to arch/x86/platform/efi/quirks.c file. Changes from V2 to V3: -- 1. Drop treating illegal access to EFI_BOOT_SERVICES_ regions separately from illegal accesses to other regions like EFI_CONVENTIONAL_MEMORY or EFI_LOADER_. In previous versions, illegal access to EFI_BOOT_SERVICES_ regions were handled by mapping requested region to efi_pgd but from V3 they are handled similar to illegal access to other regions i.e by freezing efi_rts_wq and scheduling new process. 2. Change __efi_init_fixup attribute to __efi_init. Changes from V3 to V4: -- 1. Drop saving original memory map passed by kernel. It also means less checks in efi page fault handler. 2. Change the config name to EFI_PAGE_FAULT_HANDLER to reflect it's functionality more appropriately. Changes from V4 to V5: -- 1. Drop config option that enables efi page fault handler, instead make it default. 2. Call schedule() in an infinite loop to account for spurious wake ups. 3. Introduce "NONE" as an efi runtime service function identifier so that it could be used in efi_recover_from_page_fault() to check if the page fault was indeed triggered by an efi runtime service. Changes from V5 to V6: -- 1. Thanks to 0-day for reporting build error when CONFIG_EFI is not enabled. Fixed it by calling efi page fault handler only when CONFIG_EFI is enabled. 2. Change return type of efi page fault handler from int to void. void return type should do (and int is not needed) because the efi page fault handler returns only upon a failure to handle page fault. Note: - Patch set based on "next" branch in efi tree. [1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt Sai Praneeth (2): efi: Make efi_rts_work accessible to efi page fault handler x86/efi: Add efi page fault handler to recover from page faults caused by the firmware arch/x86/include/asm/efi.h | 1 + arch/x86/mm/fault.c | 9 arch/x86/platform/efi/quirks.c | 78 + drivers/firmware/efi/runtime-wrappers.c | 61 +++--- include/linux/efi.h | 42 ++ 5 files changed, 147 insertions(+), 44 deletions(-) Tested-by: Bhupesh Sharma Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thom
[PATCH V6 2/2] x86/efi: Add efi page fault handler to recover from page faults caused by the firmware
From: Sai Praneeth As per the UEFI specification, after the call to ExitBootServices(), accesses by the firmware to any memory regions except EFI_RUNTIME_SERVICES_ regions is considered illegal. A buggy firmware could trigger these illegal accesses when an efi runtime service is invoked and if this happens when the kernel is up and running, the kernel hangs. Kernel hangs because the memory region requested by the firmware isn't mapped in efi_pgd, which causes a page fault in ring 0 and the kernel fails to handle it, leading to die(). To save kernel from hanging, add an efi specific page fault handler which recovers from such faults by 1. If the efi runtime service is efi_reset_system(), reboot the machine through BIOS. 2. If the efi runtime service is _not_ efi_reset_system(), then, freeze efi_rts_wq and schedule a new process. The efi page fault handler offers us two advantages: 1. Recovers from potential hangs that could be caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence is not a kernel bug. Tested-by: Bhupesh Sharma Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 1 + arch/x86/mm/fault.c | 9 arch/x86/platform/efi/quirks.c | 78 + drivers/firmware/efi/runtime-wrappers.c | 8 include/linux/efi.h | 8 +++- 5 files changed, 103 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index cec5fae23eb3..eea40d52ca78 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -140,6 +140,7 @@ extern void __init efi_apply_memmap_quirks(void); extern int __init efi_reuse_config(u64 tables, int nr_tables); extern void efi_delete_dummy_variable(void); extern void efi_switch_mm(struct mm_struct *mm); +extern void efi_recover_from_page_fault(unsigned long phys_addr); struct efi_setup_data { u64 fw_vendor; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 2aafa6ab6103..fd636c82d3c1 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -16,6 +16,7 @@ #include /* prefetchw*/ #include /* exception_enter(), ... */ #include /* faulthandler_disabled() */ +#include /* efi_recover_from_page_fault()*/ #include /* boot_cpu_has, ...*/ #include /* dotraplinkage, ... */ @@ -24,6 +25,7 @@ #include /* emulate_vsyscall */ #include /* struct vm86 */ #include/* vma_pkey() */ +#include/* efi_recover_from_page_fault()*/ #define CREATE_TRACE_POINTS #include @@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code, return; /* +* Buggy firmware could access regions which might page fault, try to +* recover from such faults. +*/ + if (IS_ENABLED(CONFIG_EFI)) + efi_recover_from_page_fault(address); + + /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice: */ diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 844d31cb8a0c..669babcaf245 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -16,6 +16,7 @@ #include #include #include +#include #define EFI_MIN_RESERVE 5120 @@ -654,3 +655,80 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff, } #endif + +/* + * If any access by any efi runtime service causes a page fault, then, + * 1. If it's efi_reset_system(), reboot through BIOS. + * 2. If any other efi runtime service, then + *a. Return error status to the efi caller process. + *b. Disable EFI Runtime Services forever and + *c. Freeze efi_rts_wq and schedule new process. + * + * @return: Returns, if the page fault is not handled. This function + * will never return if the page fault is handled successfully. + */ +void efi_recover_from_page_fault(unsigned long phys_addr) +{ + if (!IS_ENABLED(CONFIG_X86_64)) + return; + + /* +* Make sure that an efi runtime service caused the page fault. +* "efi_mm" cannot be used to check if the page fault had occurred +* in the firmware context because efi=old_map doesn't use efi_pgd. +*/ + if (efi_rts_work.efi_rts_id == NONE) + return; + + /* +* Address range 0x - 0x0fff is always mapped in the efi_pgd, so +* page faulting on these addresses isn't expected. +*/ + if
[PATCH V6 1/2] efi: Make efi_rts_work accessible to efi page fault handler
From: Sai Praneeth After the kernel has booted, if any accesses by firmware causes a page fault, the efi page fault handler would freeze efi_rts_wq and schedules a new process. To do this, the efi page fault handler needs efi_rts_work. Hence, make it accessible. There will be no race conditions in accessing this structure, because, all the calls to efi runtime services are already serialized. Tested-by: Bhupesh Sharma Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- drivers/firmware/efi/runtime-wrappers.c | 53 ++--- include/linux/efi.h | 36 ++ 2 files changed, 45 insertions(+), 44 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index aa66cbf23512..b18b2d864c2c 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -45,39 +45,7 @@ #define __efi_call_virt(f, args...) \ __efi_call_virt_pointer(efi.systab->runtime, f, args) -/* efi_runtime_service() function identifiers */ -enum efi_rts_ids { - GET_TIME, - SET_TIME, - GET_WAKEUP_TIME, - SET_WAKEUP_TIME, - GET_VARIABLE, - GET_NEXT_VARIABLE, - SET_VARIABLE, - QUERY_VARIABLE_INFO, - GET_NEXT_HIGH_MONO_COUNT, - UPDATE_CAPSULE, - QUERY_CAPSULE_CAPS, -}; - -/* - * efi_runtime_work: Details of EFI Runtime Service work - * @arg<1-5>: EFI Runtime Service function arguments - * @status:Status of executing EFI Runtime Service - * @efi_rts_id:EFI Runtime Service function identifier - * @efi_rts_comp: Struct used for handling completions - */ -struct efi_runtime_work { - void *arg1; - void *arg2; - void *arg3; - void *arg4; - void *arg5; - efi_status_t status; - struct work_struct work; - enum efi_rts_ids efi_rts_id; - struct completion efi_rts_comp; -}; +struct efi_runtime_work efi_rts_work; /* * efi_queue_work: Queue efi_runtime_service() and wait until it's done @@ -91,7 +59,6 @@ struct efi_runtime_work { */ #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \ ({ \ - struct efi_runtime_work efi_rts_work; \ efi_rts_work.status = EFI_ABORTED; \ \ init_completion(_rts_work.efi_rts_comp);\ @@ -184,18 +151,16 @@ static DEFINE_SEMAPHORE(efi_runtime_lock); */ static void efi_call_rts(struct work_struct *work) { - struct efi_runtime_work *efi_rts_work; void *arg1, *arg2, *arg3, *arg4, *arg5; efi_status_t status = EFI_NOT_FOUND; - efi_rts_work = container_of(work, struct efi_runtime_work, work); - arg1 = efi_rts_work->arg1; - arg2 = efi_rts_work->arg2; - arg3 = efi_rts_work->arg3; - arg4 = efi_rts_work->arg4; - arg5 = efi_rts_work->arg5; + arg1 = efi_rts_work.arg1; + arg2 = efi_rts_work.arg2; + arg3 = efi_rts_work.arg3; + arg4 = efi_rts_work.arg4; + arg5 = efi_rts_work.arg5; - switch (efi_rts_work->efi_rts_id) { + switch (efi_rts_work.efi_rts_id) { case GET_TIME: status = efi_call_virt(get_time, (efi_time_t *)arg1, (efi_time_cap_t *)arg2); @@ -253,8 +218,8 @@ static void efi_call_rts(struct work_struct *work) */ pr_err("Requested executing invalid EFI Runtime Service.\n"); } - efi_rts_work->status = status; - complete(_rts_work->efi_rts_comp); + efi_rts_work.status = status; + complete(_rts_work.efi_rts_comp); } static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc) diff --git a/include/linux/efi.h b/include/linux/efi.h index 401e4b254e30..855992b15269 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1659,7 +1659,43 @@ struct linux_efi_tpm_eventlog { extern int efi_tpm_eventlog_init(void); +/* efi_runtime_service() function identifiers */ +enum efi_rts_ids { + GET_TIME, + SET_TIME, + GET_WAKEUP_TIME, + SET_WAKEUP_TIME, + GET_VARIABLE, + GET_NEXT_VARIABLE, + SET_VARIABLE, + QUERY_VARIABLE_INFO, + GET_NEXT_HIGH_MONO_COUNT, + UPDATE_CAPSULE, + QUERY_CAPSULE_CAPS, +}; + +/* + * efi_runtime_work: Details of EFI Runtime Service work + * @arg<1-5>: EFI Runtime Service function arguments + * @status:Status of executing EFI Runtime Service +
[PATCH V5 2/2] x86/efi: Add efi page fault handler to recover from page faults caused by the firmware
From: Sai Praneeth As per the UEFI specification, after the call to ExitBootServices(), accesses by the firmware to any memory regions except EFI_RUNTIME_SERVICES_ regions is considered illegal. A buggy firmware could trigger these illegal accesses when an efi runtime service is invoked and if this happens when the kernel is up and running, the kernel hangs. Kernel hangs because the memory region requested by the firmware isn't mapped in efi_pgd, which causes a page fault in ring 0 and the kernel fails to handle it, leading to die(). To save kernel from hanging, add an efi specific page fault handler which recovers from such faults by 1. If the efi runtime service is efi_reset_system(), reboot the machine through BIOS. 2. If the efi runtime service is _not_ efi_reset_system(), then, freeze efi_rts_wq and schedule a new process. The efi page fault handler offers us two advantages: 1. Recovers from potential hangs that could be caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence is not a kernel bug. Tested-by: Bhupesh Sharma Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 1 + arch/x86/mm/fault.c | 9 arch/x86/platform/efi/quirks.c | 78 + drivers/firmware/efi/runtime-wrappers.c | 8 include/linux/efi.h | 8 +++- 5 files changed, 103 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index cec5fae23eb3..c1a655f099ef 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -140,6 +140,7 @@ extern void __init efi_apply_memmap_quirks(void); extern int __init efi_reuse_config(u64 tables, int nr_tables); extern void efi_delete_dummy_variable(void); extern void efi_switch_mm(struct mm_struct *mm); +extern int efi_recover_from_page_fault(unsigned long phys_addr); struct efi_setup_data { u64 fw_vendor; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 2aafa6ab6103..cc2a2e3a4095 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -16,6 +16,7 @@ #include /* prefetchw*/ #include /* exception_enter(), ... */ #include /* faulthandler_disabled() */ +#include /* efi_recover_from_page_fault()*/ #include /* boot_cpu_has, ...*/ #include /* dotraplinkage, ... */ @@ -24,6 +25,7 @@ #include /* emulate_vsyscall */ #include /* struct vm86 */ #include/* vma_pkey() */ +#include/* efi_recover_from_page_fault()*/ #define CREATE_TRACE_POINTS #include @@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code, return; /* +* Buggy firmware could access regions which might page fault, try to +* recover from such faults. +*/ + if (efi_recover_from_page_fault(address)) + return; + + /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice: */ diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 844d31cb8a0c..3920ae8cab2a 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -16,6 +16,7 @@ #include #include #include +#include #define EFI_MIN_RESERVE 5120 @@ -654,3 +655,80 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff, } #endif + +/* + * If any access by any efi runtime service causes a page fault, then, + * 1. If it's efi_reset_system(), reboot through BIOS. + * 2. If any other efi runtime service, then + *a. Return error status to the efi caller process. + *b. Disable EFI Runtime Services forever and + *c. Freeze efi_rts_wq and schedule new process. + * + * @return: Returns 0, if the page fault is not handled. This function + * will never return if the page fault is handled successfully. + */ +int efi_recover_from_page_fault(unsigned long phys_addr) +{ + if (!IS_ENABLED(CONFIG_X86_64)) + return 0; + + /* +* Make sure that an efi runtime service caused the page fault. +* "efi_mm" cannot be used to check if the page fault had occurred +* in the firmware context because efi=old_map doesn't use efi_pgd. +*/ + if (efi_rts_work.efi_rts_id == NONE) + return 0; + + /* +* Address range 0x - 0x0fff is always mapped in the efi_pgd, so +* page faulting on these addresses isn't expected. +*/ + if (phys_addr
[PATCH V5 0/2] Add efi page fault handler to recover from page
From: Sai Praneeth There may exist some buggy UEFI firmware implementations that access efi memory regions other than EFI_RUNTIME_SERVICES_ even after the kernel has assumed control of the platform. This violates UEFI specification. Hence, provide a efi specific page fault handler which recovers from page faults caused by buggy firmware. Page faults triggered by firmware happen at ring 0 and if unhandled, hangs the kernel. So, provide an efi specific page fault handler to: 1. Avoid panics/hangs caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence is not a kernel bug. The efi page fault handler will check if the access is by efi_reset_system(). 1. If so, then the efi page fault handler will reboot the machine through BIOS and not through efi_reset_system(). 2. If not, then the efi page fault handler will freeze efi_rts_wq and schedules a new process. This issue was reported by Al Stone when he saw that reboot via EFI hangs the machine. Upon debugging, I found that it's efi_reset_system() that's touching memory regions which it shouldn't. To reproduce the same behavior, I have hacked OVMF and made efi_reset_system() buggy. Along with efi_reset_system(), I have also modified get_next_high_mono_count() and set_virtual_address_map(). They illegally access both boot time and other efi regions. Testing the patch set: -- 1. Download buggy firmware from here [1]. 2. Run a qemu instance with this buggy BIOS and boot mainline kernel. Add reboot=efi to the kernel command line arguments and after the kernel is up and running, type "reboot". The kernel should hang while rebooting. 3. With the same setup, boot kernel after applying patches and the reboot should work fine. Also please notice warning/error messages printed by kernel. Changes from RFC to V1: --- 1. Drop "long jump" technique of dealing with illegal access and instead use scheduling away from efi_rts_wq. Changes from V1 to V2: -- 1. Shortened config name to CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS from CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES. 2. Made the config option available only to expert users. 3. efi_free_boot_services() should be called only when CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is not enabled. Previously, this was part of init/main.c file. As it is an architecture agnostic code, moved the change to arch/x86/platform/efi/quirks.c file. Changes from V2 to V3: -- 1. Drop treating illegal access to EFI_BOOT_SERVICES_ regions separately from illegal accesses to other regions like EFI_CONVENTIONAL_MEMORY or EFI_LOADER_. In previous versions, illegal access to EFI_BOOT_SERVICES_ regions were handled by mapping requested region to efi_pgd but from V3 they are handled similar to illegal access to other regions i.e by freezing efi_rts_wq and scheduling new process. 2. Change __efi_init_fixup attribute to __efi_init. Changes from V3 to V4: -- 1. Drop saving original memory map passed by kernel. It also means less checks in efi page fault handler. 2. Change the config name to EFI_PAGE_FAULT_HANDLER to reflect it's functionality more appropriately. Changes from V4 to V5: -- 1. Drop config option that enables efi page fault handler, instead make it default. 2. Call schedule() in an infinite loop to account for spurious wake ups. 3. Introduce "NONE" as an efi runtime service function identifier so that it could be used in efi_recover_from_page_fault() to check if the page fault was indeed triggered by an efi runtime service. Note: - Patch set based on "next" branch in efi tree. [1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt Sai Praneeth (2): efi: Make efi_rts_work accessible to efi page fault handler x86/efi: Add efi page fault handler to recover from page faults caused by the firmware arch/x86/include/asm/efi.h | 1 + arch/x86/mm/fault.c | 9 arch/x86/platform/efi/quirks.c | 78 + drivers/firmware/efi/runtime-wrappers.c | 61 +++--- include/linux/efi.h | 42 ++ 5 files changed, 147 insertions(+), 44 deletions(-) Tested-by: Bhupesh Sharma Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel -- 2.7.4
[PATCH V5 1/2] efi: Make efi_rts_work accessible to efi page fault handler
From: Sai Praneeth After the kernel has booted, if any accesses by firmware causes a page fault, the efi page fault handler would freeze efi_rts_wq and schedules a new process. To do this, the efi page fault handler needs efi_rts_work. Hence, make it accessible. There will be no race conditions in accessing this structure, because, all the calls to efi runtime services are already serialized. Tested-by: Bhupesh Sharma Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- drivers/firmware/efi/runtime-wrappers.c | 53 ++--- include/linux/efi.h | 36 ++ 2 files changed, 45 insertions(+), 44 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index aa66cbf23512..b18b2d864c2c 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -45,39 +45,7 @@ #define __efi_call_virt(f, args...) \ __efi_call_virt_pointer(efi.systab->runtime, f, args) -/* efi_runtime_service() function identifiers */ -enum efi_rts_ids { - GET_TIME, - SET_TIME, - GET_WAKEUP_TIME, - SET_WAKEUP_TIME, - GET_VARIABLE, - GET_NEXT_VARIABLE, - SET_VARIABLE, - QUERY_VARIABLE_INFO, - GET_NEXT_HIGH_MONO_COUNT, - UPDATE_CAPSULE, - QUERY_CAPSULE_CAPS, -}; - -/* - * efi_runtime_work: Details of EFI Runtime Service work - * @arg<1-5>: EFI Runtime Service function arguments - * @status:Status of executing EFI Runtime Service - * @efi_rts_id:EFI Runtime Service function identifier - * @efi_rts_comp: Struct used for handling completions - */ -struct efi_runtime_work { - void *arg1; - void *arg2; - void *arg3; - void *arg4; - void *arg5; - efi_status_t status; - struct work_struct work; - enum efi_rts_ids efi_rts_id; - struct completion efi_rts_comp; -}; +struct efi_runtime_work efi_rts_work; /* * efi_queue_work: Queue efi_runtime_service() and wait until it's done @@ -91,7 +59,6 @@ struct efi_runtime_work { */ #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \ ({ \ - struct efi_runtime_work efi_rts_work; \ efi_rts_work.status = EFI_ABORTED; \ \ init_completion(_rts_work.efi_rts_comp);\ @@ -184,18 +151,16 @@ static DEFINE_SEMAPHORE(efi_runtime_lock); */ static void efi_call_rts(struct work_struct *work) { - struct efi_runtime_work *efi_rts_work; void *arg1, *arg2, *arg3, *arg4, *arg5; efi_status_t status = EFI_NOT_FOUND; - efi_rts_work = container_of(work, struct efi_runtime_work, work); - arg1 = efi_rts_work->arg1; - arg2 = efi_rts_work->arg2; - arg3 = efi_rts_work->arg3; - arg4 = efi_rts_work->arg4; - arg5 = efi_rts_work->arg5; + arg1 = efi_rts_work.arg1; + arg2 = efi_rts_work.arg2; + arg3 = efi_rts_work.arg3; + arg4 = efi_rts_work.arg4; + arg5 = efi_rts_work.arg5; - switch (efi_rts_work->efi_rts_id) { + switch (efi_rts_work.efi_rts_id) { case GET_TIME: status = efi_call_virt(get_time, (efi_time_t *)arg1, (efi_time_cap_t *)arg2); @@ -253,8 +218,8 @@ static void efi_call_rts(struct work_struct *work) */ pr_err("Requested executing invalid EFI Runtime Service.\n"); } - efi_rts_work->status = status; - complete(_rts_work->efi_rts_comp); + efi_rts_work.status = status; + complete(_rts_work.efi_rts_comp); } static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc) diff --git a/include/linux/efi.h b/include/linux/efi.h index 401e4b254e30..855992b15269 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1659,7 +1659,43 @@ struct linux_efi_tpm_eventlog { extern int efi_tpm_eventlog_init(void); +/* efi_runtime_service() function identifiers */ +enum efi_rts_ids { + GET_TIME, + SET_TIME, + GET_WAKEUP_TIME, + SET_WAKEUP_TIME, + GET_VARIABLE, + GET_NEXT_VARIABLE, + SET_VARIABLE, + QUERY_VARIABLE_INFO, + GET_NEXT_HIGH_MONO_COUNT, + UPDATE_CAPSULE, + QUERY_CAPSULE_CAPS, +}; + +/* + * efi_runtime_work: Details of EFI Runtime Service work + * @arg<1-5>: EFI Runtime Service function arguments + * @status:Status of executing EFI Runtime Service +
[PATCH V4 2/3] x86/efi: Add efi page fault handler to recover from page faults caused by the firmware
From: Sai Praneeth As per the UEFI specification, after the call to ExitBootServices(), accesses by the firmware to any memory regions except EFI_RUNTIME_SERVICES_ regions is considered illegal. A buggy firmware could trigger these illegal accesses when an efi runtime service is invoked and if this happens when the kernel is up and running, the kernel hangs. Kernel hangs because the memory region requested by the firmware isn't mapped in efi_pgd, which causes a page fault in ring 0 and the kernel fails to handle it, leading to die(). To save kernel from hanging, add an efi specific page fault handler which recovers from such faults by 1. If the efi runtime service is efi_reset_system(), reboot the machine through BIOS. 2. If the efi runtime service is _not_ efi_reset_system(), then, freeze efi_rts_wq and schedule a new process. The efi page fault handler offers us two advantages: 1. Recovers from potential hangs that could be caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence is not a kernel bug. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 9 + arch/x86/mm/fault.c | 9 + arch/x86/platform/efi/quirks.c | 70 + drivers/firmware/efi/runtime-wrappers.c | 7 include/linux/efi.h | 1 + 5 files changed, 96 insertions(+) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index cec5fae23eb3..afb1c80182f2 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -141,6 +141,15 @@ extern int __init efi_reuse_config(u64 tables, int nr_tables); extern void efi_delete_dummy_variable(void); extern void efi_switch_mm(struct mm_struct *mm); +#ifdef CONFIG_EFI_PAGE_FAULT_HANDLER +extern int efi_recover_from_page_fault(unsigned long phys_addr); +#else +static inline int efi_recover_from_page_fault(unsigned long phys_addr) +{ + return 0; +} +#endif /* CONFIG_EFI_PAGE_FAULT_HANDLER */ + struct efi_setup_data { u64 fw_vendor; u64 runtime; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 2aafa6ab6103..cc2a2e3a4095 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -16,6 +16,7 @@ #include /* prefetchw*/ #include /* exception_enter(), ... */ #include /* faulthandler_disabled() */ +#include /* efi_recover_from_page_fault()*/ #include /* boot_cpu_has, ...*/ #include /* dotraplinkage, ... */ @@ -24,6 +25,7 @@ #include /* emulate_vsyscall */ #include /* struct vm86 */ #include/* vma_pkey() */ +#include/* efi_recover_from_page_fault()*/ #define CREATE_TRACE_POINTS #include @@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code, return; /* +* Buggy firmware could access regions which might page fault, try to +* recover from such faults. +*/ + if (efi_recover_from_page_fault(address)) + return; + + /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice: */ diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 844d31cb8a0c..853742aba209 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -16,6 +16,7 @@ #include #include #include +#include #define EFI_MIN_RESERVE 5120 @@ -654,3 +655,72 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff, } #endif + +#ifdef CONFIG_EFI_PAGE_FAULT_HANDLER + +/* + * If any access by any efi runtime service causes a page fault, then, + * 1. If it's efi_reset_system(), reboot through BIOS. + * 2. If any other efi runtime service, then + *a. Freeze efi_rts_wq. + *b. Return error status to the efi caller process. + *c. Disable EFI Runtime Services forever and + *d. Schedule another process by explicitly calling scheduler. + * + * @return: Returns 0, if the page fault is not handled. This function + * will never return if the page fault is handled successfully. + */ +int efi_recover_from_page_fault(unsigned long phys_addr) +{ + /* Recover from page faults caused *only* by the firmware */ + if (current->active_mm != _mm) + return 0; + + /* +* Address range 0x - 0x0fff is always mapped in the efi_pgd, so +* page faulting on these addresses isn't expected. +*/ + if (phys_addr >= 0x && phys_addr <= 0x0fff) +
[PATCH V4 0/3] Add efi page fault handler to recover from page
From: Sai Praneeth There may exist some buggy UEFI firmware implementations that access efi memory regions other than EFI_RUNTIME_SERVICES_ even after the kernel has assumed control of the platform. This violates UEFI specification. Hence, provide a debug config option which when enabled recovers from page faults caused by buggy firmware. Page faults triggered by firmware happen at ring 0 and if unhandled, hangs the kernel. So, provide an efi specific page fault handler to: 1. Avoid panics/hangs caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence is not a kernel bug. The efi page fault handler will check if the access is by efi_reset_system(). 1. If so, then the efi page fault handler will reboot the machine through BIOS and not through efi_reset_system(). 2. If not, then the efi page fault handler will freeze efi_rts_wq and schedules a new process. This issue was reported by Al Stone when he saw that reboot via EFI hangs the machine. Upon debugging, I found that it's efi_reset_system() that's touching memory regions which it shouldn't. To reproduce the same behavior, I have hacked OVMF and made efi_reset_system() buggy. Along with efi_reset_system(), I have also modified get_next_high_mono_count() and set_virtual_address_map(). They illegally access both boot time and other efi regions. Testing the patch set: -- 1. Download buggy firmware from here [1]. 2. Run a qemu instance with this buggy BIOS and boot mainline kernel. Add reboot=efi to the kernel command line arguments and after the kernel is up and running, type "reboot". The kernel should hang while rebooting. 3. With the same setup, boot kernel after applying patches and the reboot should work fine. Also please notice warning/error messages printed by kernel. Changes from RFC to V1: --- 1. Drop "long jump" technique of dealing with illegal access and instead use scheduling away from efi_rts_wq. Changes from V1 to V2: -- 1. Shortened config name to CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS from CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES. 2. Made the config option available only to expert users. 3. efi_free_boot_services() should be called only when CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is not enabled. Previously, this was part of init/main.c file. As it is an architecture agnostic code, moved the change to arch/x86/platform/efi/quirks.c file. Changes from V2 to V3: -- 1. Drop treating illegal access to EFI_BOOT_SERVICES_ regions separatley from illegal accesses to other regions like EFI_CONVENTIONAL_MEMORY or EFI_LOADER_. In previous versions, illegal access to EFI_BOOT_SERVICES_ regions were handled by mapping requested region to efi_pgd but from V3 they are handled similar to illegal access to other regions i.e by freezing efi_rts_wq and scheduling new process. 2. Change __efi_init_fixup attribute to __efi_init. Changes from V3 to V4: -- 1. Drop saving original memory map passed by kernel. It also means less checks in efi page fault handler. 2. Change the config name to EFI_PAGE_FAULT_HANDLER to reflect it's functionality more appropriatley. Note: - Patch set based on "next" branch in efi tree. [1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt Sai Praneeth (3): efi: Make efi_rts_work accessible to efi page fault handler x86/efi: Add efi page fault handler to recover from page faults caused by the firmware x86/efi: Introduce EFI_PAGE_FAULT_HANDLER arch/x86/Kconfig| 18 + arch/x86/include/asm/efi.h | 9 + arch/x86/mm/fault.c | 9 + arch/x86/platform/efi/quirks.c | 70 + drivers/firmware/efi/runtime-wrappers.c | 60 include/linux/efi.h | 37 + 6 files changed, 159 insertions(+), 44 deletions(-) Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel -- 2.7.4
[PATCH V4 1/3] efi: Make efi_rts_work accessible to efi page fault handler
From: Sai Praneeth After the kernel has booted, if any accesses by firmware causes a page fault, the efi page fault handler would freeze efi_rts_wq and schedules a new process. To do this, the efi page fault handler needs efi_rts_work. Hence, make it accessible. There will be no race conditions in accessing this structure, because, all the calls to efi runtime services are already serialized. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- drivers/firmware/efi/runtime-wrappers.c | 53 ++--- include/linux/efi.h | 36 ++ 2 files changed, 45 insertions(+), 44 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index aa66cbf23512..b18b2d864c2c 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -45,39 +45,7 @@ #define __efi_call_virt(f, args...) \ __efi_call_virt_pointer(efi.systab->runtime, f, args) -/* efi_runtime_service() function identifiers */ -enum efi_rts_ids { - GET_TIME, - SET_TIME, - GET_WAKEUP_TIME, - SET_WAKEUP_TIME, - GET_VARIABLE, - GET_NEXT_VARIABLE, - SET_VARIABLE, - QUERY_VARIABLE_INFO, - GET_NEXT_HIGH_MONO_COUNT, - UPDATE_CAPSULE, - QUERY_CAPSULE_CAPS, -}; - -/* - * efi_runtime_work: Details of EFI Runtime Service work - * @arg<1-5>: EFI Runtime Service function arguments - * @status:Status of executing EFI Runtime Service - * @efi_rts_id:EFI Runtime Service function identifier - * @efi_rts_comp: Struct used for handling completions - */ -struct efi_runtime_work { - void *arg1; - void *arg2; - void *arg3; - void *arg4; - void *arg5; - efi_status_t status; - struct work_struct work; - enum efi_rts_ids efi_rts_id; - struct completion efi_rts_comp; -}; +struct efi_runtime_work efi_rts_work; /* * efi_queue_work: Queue efi_runtime_service() and wait until it's done @@ -91,7 +59,6 @@ struct efi_runtime_work { */ #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \ ({ \ - struct efi_runtime_work efi_rts_work; \ efi_rts_work.status = EFI_ABORTED; \ \ init_completion(_rts_work.efi_rts_comp);\ @@ -184,18 +151,16 @@ static DEFINE_SEMAPHORE(efi_runtime_lock); */ static void efi_call_rts(struct work_struct *work) { - struct efi_runtime_work *efi_rts_work; void *arg1, *arg2, *arg3, *arg4, *arg5; efi_status_t status = EFI_NOT_FOUND; - efi_rts_work = container_of(work, struct efi_runtime_work, work); - arg1 = efi_rts_work->arg1; - arg2 = efi_rts_work->arg2; - arg3 = efi_rts_work->arg3; - arg4 = efi_rts_work->arg4; - arg5 = efi_rts_work->arg5; + arg1 = efi_rts_work.arg1; + arg2 = efi_rts_work.arg2; + arg3 = efi_rts_work.arg3; + arg4 = efi_rts_work.arg4; + arg5 = efi_rts_work.arg5; - switch (efi_rts_work->efi_rts_id) { + switch (efi_rts_work.efi_rts_id) { case GET_TIME: status = efi_call_virt(get_time, (efi_time_t *)arg1, (efi_time_cap_t *)arg2); @@ -253,8 +218,8 @@ static void efi_call_rts(struct work_struct *work) */ pr_err("Requested executing invalid EFI Runtime Service.\n"); } - efi_rts_work->status = status; - complete(_rts_work->efi_rts_comp); + efi_rts_work.status = status; + complete(_rts_work.efi_rts_comp); } static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc) diff --git a/include/linux/efi.h b/include/linux/efi.h index 401e4b254e30..855992b15269 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1659,7 +1659,43 @@ struct linux_efi_tpm_eventlog { extern int efi_tpm_eventlog_init(void); +/* efi_runtime_service() function identifiers */ +enum efi_rts_ids { + GET_TIME, + SET_TIME, + GET_WAKEUP_TIME, + SET_WAKEUP_TIME, + GET_VARIABLE, + GET_NEXT_VARIABLE, + SET_VARIABLE, + QUERY_VARIABLE_INFO, + GET_NEXT_HIGH_MONO_COUNT, + UPDATE_CAPSULE, + QUERY_CAPSULE_CAPS, +}; + +/* + * efi_runtime_work: Details of EFI Runtime Service work + * @arg<1-5>: EFI Runtime Service function arguments + * @status:Status of executing EFI Runtime Service + * @efi_rts_id:
[PATCH V4 3/3] x86/efi: Introduce EFI_PAGE_FAULT_HANDLER
From: Sai Praneeth There may exist some buggy UEFI firmware implementations that might access efi regions other than EFI_RUNTIME_SERVICES_ even after the kernel has assumed control of the platform. This violates UEFI specification. If selected, this debug option will print a warning message if the UEFI firmware tries to access any memory region which it shouldn't. Along with the warning, the efi page fault handler will also try to recover from the page fault triggered by the firmware so that the machine doesn't hang. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/Kconfig | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f1dbb4ee19d7..cc840710ae3e 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1957,6 +1957,24 @@ config EFI_MIXED If unsure, say N. +config EFI_PAGE_FAULT_HANDLER + bool "EFI page fault handler support" if EXPERT + depends on EFI + help + Enable this debug feature so that the kernel can recover from page + faults caused by buggy firmware. Also, + 1. If the page fault is caused by efi_reset_system(), then the +platform is rebooted through BIOS. + 2. If the page fault is caused by any other efi runtime service, +then the kernel freezes efi_rts_wq (work queue that runs efi +runtime services) and schedules a new process. Also, it disables +EFI Runtime Services, so that it will never again call buggy +firmware. + Please see the UEFI specification for details on the expectations + of memory usage. + + If unsure, say N. + config SECCOMP def_bool y prompt "Enable seccomp to safely compute untrusted bytecode" -- 2.7.4
[PATCH V3 2/5] efi: Introduce __efi_init attribute
From: Sai Praneeth Buggy firmware could illegally access some efi regions even after the kernel has assumed control of the platform. When "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is enabled, the efi page fault handler will detect and recover from these illegal accesses. efi_md_typeattr_format() and memory_type_name are used by the efi page fault handler to print information about memory descriptor that was illegally accessed. As the page fault handler is present during/after kernel boot it doesn't have an __init attribute, but efi_md_typeattr_format() has it and thus during kernel build, "WARNING: modpost: Found * section mismatch(es)" build warning is observed. To fix it, remove __init attribute for efi_md_typeattr_format(). In order to not keep efi_md_typeattr_format() and memory_type_name needlessly when "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is not selected, add a new __efi_init attribute whose value changes based on whether the config option is selected or not. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- drivers/firmware/efi/efi.c | 4 ++-- include/linux/efi.h| 14 +- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index d8a33a781a57..16571429b19c 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -768,7 +768,7 @@ int __init efi_get_fdt_params(struct efi_fdt_params *params) } #endif /* CONFIG_EFI_PARAMS_FROM_FDT */ -static __initdata char memory_type_name[][20] = { +static __efi_initdata char memory_type_name[][20] = { "Reserved", "Loader Code", "Loader Data", @@ -786,7 +786,7 @@ static __initdata char memory_type_name[][20] = { "Persistent Memory", }; -char * __init efi_md_typeattr_format(char *buf, size_t size, +char * __efi_init efi_md_typeattr_format(char *buf, size_t size, const efi_memory_desc_t *md) { char *pos; diff --git a/include/linux/efi.h b/include/linux/efi.h index 855992b15269..6a07e3166fd1 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1107,10 +1107,22 @@ extern int efi_memattr_apply_permissions(struct mm_struct *mm, for_each_efi_memory_desc_in_map(, md) /* + * __efi_init - if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is enabled, remove __init + * modifier. + */ +#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS +#define __efi_init +#define __efi_initdata +#else +#define __efi_init __init +#define __efi_initdata __initdata +#endif + +/* * Format an EFI memory descriptor's type and attributes to a user-provided * character buffer, as per snprintf(), and return the buffer. */ -char * __init efi_md_typeattr_format(char *buf, size_t size, +char * __efi_init efi_md_typeattr_format(char *buf, size_t size, const efi_memory_desc_t *md); /** -- 2.7.4
[PATCH V3 3/5] x86/efi: Permanently save the EFI_MEMORY_MAP passed by the firmware
From: Sai Praneeth The efi page fault handler that recovers from page faults caused by the firmware needs the original memory map passed by the firmware. It looks up this memory map to find the type of the memory region at which the page fault occurred. Presently, EFI subsystem discards the original memory map passed by the firmware and replaces it with a new memory map that has only EFI_RUNTIME_SERVICES_ regions. But illegal accesses by firmware can occur at any region. Hence, _only_ if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is defined, create a backup of the original memory map passed by the firmware, so that efi page fault handler could detect/recover from illegal accesses to *any* efi region. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 6 ++ arch/x86/platform/efi/efi.c| 2 ++ arch/x86/platform/efi/quirks.c | 48 ++ 3 files changed, 56 insertions(+) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index cec5fae23eb3..788ed4cbce22 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -141,6 +141,12 @@ extern int __init efi_reuse_config(u64 tables, int nr_tables); extern void efi_delete_dummy_variable(void); extern void efi_switch_mm(struct mm_struct *mm); +#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS +extern void __init efi_save_original_memmap(void); +#else +static inline void __init efi_save_original_memmap(void) { } +#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */ + struct efi_setup_data { u64 fw_vendor; u64 runtime; diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 9061babfbc83..7a3ea4cd5939 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -946,6 +946,8 @@ static void __init __efi_enter_virtual_mode(void) pa = __pa(new_memmap); + efi_save_original_memmap(); + /* * Unregister the early EFI memmap from efi_init() and install * the new EFI memory map that we are about to pass to the diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 844d31cb8a0c..36b0b042ba56 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -654,3 +654,51 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff, } #endif + +#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS + +static bool original_memory_map_present; +static struct efi_memory_map original_memory_map; + +/* + * The efi page fault handler that recovers from page faults caused by + * buggy firmware needs original memory map passed by firmware. Hence, + * build a new EFI memmap that has all entries and save it for later use. + */ +void __init efi_save_original_memmap(void) +{ + efi_memory_desc_t *md; + void *remapped_phys, *new_md; + phys_addr_t new_phys, new_size; + + new_size = efi.memmap.desc_size * efi.memmap.nr_map; + new_phys = efi_memmap_alloc(efi.memmap.nr_map); + if (!new_phys) { + pr_err("Failed to allocate new EFI memmap\n"); + return; + } + + remapped_phys = memremap(new_phys, new_size, MEMREMAP_WB); + if (!remapped_phys) { + pr_err("Failed to remap new EFI memmap\n"); + __free_pages(pfn_to_page(PHYS_PFN(new_phys)), get_order(new_size)); + return; + } + + new_md = remapped_phys; + for_each_efi_memory_desc(md) { + memcpy(new_md, md, efi.memmap.desc_size); + new_md += efi.memmap.desc_size; + } + + original_memory_map.late = 1; + original_memory_map.phys_map = new_phys; + original_memory_map.map = remapped_phys; + original_memory_map.nr_map = efi.memmap.nr_map; + original_memory_map.desc_size = efi.memmap.desc_size; + original_memory_map.map_end = remapped_phys + new_size; + original_memory_map.desc_version = efi.memmap.desc_version; + + original_memory_map_present = true; +} +#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */ -- 2.7.4
[PATCH V3 1/5] efi: Make efi_rts_work accessible to efi page fault handler
From: Sai Praneeth After the kernel has booted, if the firmware accesses *any* efi regions other than EFI_RUNTIME_SERVICES_, the efi page fault handler would freeze efi_rts_wq and schedules a new process. To do this, the efi page fault handler needs efi_rts_work. Hence, make it accessible. There will be no race conditions in accessing this structure, because, all the calls to efi runtime services are already serialized. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- drivers/firmware/efi/runtime-wrappers.c | 53 ++--- include/linux/efi.h | 36 ++ 2 files changed, 45 insertions(+), 44 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index aa66cbf23512..b18b2d864c2c 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -45,39 +45,7 @@ #define __efi_call_virt(f, args...) \ __efi_call_virt_pointer(efi.systab->runtime, f, args) -/* efi_runtime_service() function identifiers */ -enum efi_rts_ids { - GET_TIME, - SET_TIME, - GET_WAKEUP_TIME, - SET_WAKEUP_TIME, - GET_VARIABLE, - GET_NEXT_VARIABLE, - SET_VARIABLE, - QUERY_VARIABLE_INFO, - GET_NEXT_HIGH_MONO_COUNT, - UPDATE_CAPSULE, - QUERY_CAPSULE_CAPS, -}; - -/* - * efi_runtime_work: Details of EFI Runtime Service work - * @arg<1-5>: EFI Runtime Service function arguments - * @status:Status of executing EFI Runtime Service - * @efi_rts_id:EFI Runtime Service function identifier - * @efi_rts_comp: Struct used for handling completions - */ -struct efi_runtime_work { - void *arg1; - void *arg2; - void *arg3; - void *arg4; - void *arg5; - efi_status_t status; - struct work_struct work; - enum efi_rts_ids efi_rts_id; - struct completion efi_rts_comp; -}; +struct efi_runtime_work efi_rts_work; /* * efi_queue_work: Queue efi_runtime_service() and wait until it's done @@ -91,7 +59,6 @@ struct efi_runtime_work { */ #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \ ({ \ - struct efi_runtime_work efi_rts_work; \ efi_rts_work.status = EFI_ABORTED; \ \ init_completion(_rts_work.efi_rts_comp);\ @@ -184,18 +151,16 @@ static DEFINE_SEMAPHORE(efi_runtime_lock); */ static void efi_call_rts(struct work_struct *work) { - struct efi_runtime_work *efi_rts_work; void *arg1, *arg2, *arg3, *arg4, *arg5; efi_status_t status = EFI_NOT_FOUND; - efi_rts_work = container_of(work, struct efi_runtime_work, work); - arg1 = efi_rts_work->arg1; - arg2 = efi_rts_work->arg2; - arg3 = efi_rts_work->arg3; - arg4 = efi_rts_work->arg4; - arg5 = efi_rts_work->arg5; + arg1 = efi_rts_work.arg1; + arg2 = efi_rts_work.arg2; + arg3 = efi_rts_work.arg3; + arg4 = efi_rts_work.arg4; + arg5 = efi_rts_work.arg5; - switch (efi_rts_work->efi_rts_id) { + switch (efi_rts_work.efi_rts_id) { case GET_TIME: status = efi_call_virt(get_time, (efi_time_t *)arg1, (efi_time_cap_t *)arg2); @@ -253,8 +218,8 @@ static void efi_call_rts(struct work_struct *work) */ pr_err("Requested executing invalid EFI Runtime Service.\n"); } - efi_rts_work->status = status; - complete(_rts_work->efi_rts_comp); + efi_rts_work.status = status; + complete(_rts_work.efi_rts_comp); } static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc) diff --git a/include/linux/efi.h b/include/linux/efi.h index 401e4b254e30..855992b15269 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1659,7 +1659,43 @@ struct linux_efi_tpm_eventlog { extern int efi_tpm_eventlog_init(void); +/* efi_runtime_service() function identifiers */ +enum efi_rts_ids { + GET_TIME, + SET_TIME, + GET_WAKEUP_TIME, + SET_WAKEUP_TIME, + GET_VARIABLE, + GET_NEXT_VARIABLE, + SET_VARIABLE, + QUERY_VARIABLE_INFO, + GET_NEXT_HIGH_MONO_COUNT, + UPDATE_CAPSULE, + QUERY_CAPSULE_CAPS, +}; + +/* + * efi_runtime_work: Details of EFI Runtime Service work + * @arg<1-5>: EFI Runtime Service function arguments + * @status:Status of executing EFI Runtime Service +
[PATCH V3 4/5] x86/efi: Add efi page fault handler to recover from the page faults caused by firmware
From: Sai Praneeth As per the UEFI specification, after the call to ExitBootServices(), accesses by the firmware to any memory regions except EFI_RUNTIME_SERVICES_ regions is considered illegal. A buggy firmware could trigger these illegal accesses when an efi runtime service is invoked and if this happens when the kernel is up and running, the kernel hangs. Kernel hangs because the memory region requested by the firmware isn't mapped in efi_pgd, which causes a page fault in ring 0 and the kernel fails to handle it, leading to die(). To save kernel from hanging, add an efi specific page fault handler which detects illegal accesses by the firmware and if the access is to any region other than EFI_RUNTIME_SERVICES_, then 1. The efi page fault handler freezes efi_rts_wq and schedules a new process. 2. If the efi runtime service is efi_reset_system(), then the efi page fault handler will reboot the machine through BIOS and not through efi_reset_system(). The efi specific page fault handler offers us two advantages: 1. Recovers from potential hangs that could be caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence is not a kernel bug. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 5 ++ arch/x86/mm/fault.c | 9 ++ arch/x86/platform/efi/quirks.c | 140 drivers/firmware/efi/runtime-wrappers.c | 7 ++ include/linux/efi.h | 1 + 5 files changed, 162 insertions(+) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 788ed4cbce22..f3d9c3c2359e 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -143,8 +143,13 @@ extern void efi_switch_mm(struct mm_struct *mm); #ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS extern void __init efi_save_original_memmap(void); +extern int efi_illegal_accesses_fixup(unsigned long phys_addr); #else static inline void __init efi_save_original_memmap(void) { } +static inline int efi_illegal_accesses_fixup(unsigned long phys_addr) +{ + return 0; +} #endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */ struct efi_setup_data { diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 2aafa6ab6103..4f6939d8e13f 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -16,6 +16,7 @@ #include /* prefetchw*/ #include /* exception_enter(), ... */ #include /* faulthandler_disabled() */ +#include /* fixup for buggy UEFI firmware*/ #include /* boot_cpu_has, ...*/ #include /* dotraplinkage, ... */ @@ -24,6 +25,7 @@ #include /* emulate_vsyscall */ #include /* struct vm86 */ #include/* vma_pkey() */ +#include/* fixup for buggy UEFI firmware*/ #define CREATE_TRACE_POINTS #include @@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code, return; /* +* Buggy firmware could trigger illegal accesses to some EFI regions +* which might page fault, try to recover from such faults. +*/ + if (efi_illegal_accesses_fixup(address)) + return; + + /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice: */ diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 36b0b042ba56..2aba28a90800 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -16,6 +16,7 @@ #include #include #include +#include #define EFI_MIN_RESERVE 5120 @@ -701,4 +702,143 @@ void __init efi_save_original_memmap(void) original_memory_map_present = true; } + +/* + * From the original EFI memory map passed by the firmware, return a + * pointer to the memory descriptor that describes the given physical + * address. If not found, return NULL. + */ +static efi_memory_desc_t *efi_get_md(unsigned long phys_addr) +{ + efi_memory_desc_t *md; + + for_each_efi_memory_desc_in_map(_memory_map, md) { + if (md->phys_addr <= phys_addr && + (phys_addr < (md->phys_addr + + (md->num_pages << EFI_PAGE_SHIFT { + return md; + } + } + return NULL; +} + +/* + * Detect illegal access by the firmware and if the illegally accessed + * region is any region described by efi memory map and other than + * EFI_RUNTIME_SERVICES_, then + * 1. If the efi runtime service is efi_reset_system(), then reboot + *th
[PATCH V3 0/5] Add efi page fault handler to detect and recover
From: Sai Praneeth There may exist some buggy UEFI firmware implementations that access efi memory regions other than EFI_RUNTIME_SERVICES_ even after the kernel has assumed control of the platform. This violates UEFI specification. Hence, provide a debug config option which when enabled detects and recovers from page faults caused by buggy firmware. The above said illegal accesses trigger page fault in ring 0 because firmware executes at ring 0 and if unhandled it hangs the kernel. Provide an efi specific page fault handler to: 1. Avoid panics/hangs caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence is not a kernel bug. Upon detetcing that the illegally accessed region is any region other than EFI_RUNTIME_SERVICES_, the efi page fault handler will check if the access is by efi_reset_system(). 1. If so, then the efi page fault handler will reboot the machine through BIOS and not through efi_reset_system(). 2. If not, then the efi page fault handler will freeze efi_rts_wq and schedules a new process. This issue was reported by Al Stone when he saw that reboot via EFI hangs the machine. Upon debugging, I found that it's efi_reset_system() that's touching memory regions which it shouldn't. To reproduce the same behavior, I have hacked OVMF and made efi_reset_system() buggy. Along with efi_reset_system(), I have also modified get_next_high_mono_count() and set_virtual_address_map(). They illegally access both boot time and other efi regions. Testing the patch set: -- 1. Download buggy firmware from here [1]. 2. Run a qemu instance with this buggy BIOS and boot mainline kernel. Add reboot=efi to the kernel command line arguments and after the kernel is up and running, type "reboot". The kernel should hang while rebooting. 3. With the same setup, boot kernel after applying patches and the reboot should work fine. Also please notice warning/error messages printed by kernel. Changes from RFC to V1: --- 1. Drop "long jump" technique of dealing with illegal access and instead use scheduling away from efi_rts_wq. Changes from V1 to V2: -- 1. Shortened config name to CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS from CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES. 2. Made the config option available only to expert users. 3. efi_free_boot_services() should be called only when CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is not enabled. Previously, this was part of init/main.c file. As it is an architecture agnostic code, moved the change to arch/x86/platform/efi/quirks.c file. Changes from V2 to V3: -- 1. Drop treating illegal access to EFI_BOOT_SERVICES_ regions separatley from illegal accesses to other regions like EFI_CONVENTIONAL_MEMORY or EFI_LOADER_. In previous versions, illegal access to EFI_BOOT_SERVICES_ regions were handled by mapping requested region to efi_pgd but from V3 they are handled similar to illegal access to other regions i.e by freezing efi_rts_wq and scheduling new process. 2. Change __efi_init_fixup attribute to __efi_init. Note: - Patch set based on "next" branch in efi tree. [1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt Sai Praneeth (5): efi: Make efi_rts_work accessible to efi page fault handler efi: Introduce __efi_init attribute x86/efi: Permanently save the EFI_MEMORY_MAP passed by the firmware x86/efi: Add efi page fault handler to recover from the page faults caused by firmware x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESS arch/x86/Kconfig| 17 +++ arch/x86/include/asm/efi.h | 11 ++ arch/x86/mm/fault.c | 9 ++ arch/x86/platform/efi/efi.c | 2 + arch/x86/platform/efi/quirks.c | 188 drivers/firmware/efi/efi.c | 4 +- drivers/firmware/efi/runtime-wrappers.c | 60 +++--- include/linux/efi.h | 51 - 8 files changed, 295 insertions(+), 47 deletions(-) Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel -- 2.7.4
[PATCH V3 5/5] x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESS
From: Sai Praneeth There may exist some buggy UEFI firmware implementations that might access efi regions other than EFI_RUNTIME_SERVICES_ even after the kernel has assumed control of the platform. This violates UEFI specification. If selected, this debug option will print a warning message if the UEFI firmware tries to access any memory region which it shouldn't. Along with the warning, the efi page fault handler will also try to recover from the page fault triggered by the firmware so that the machine doesn't hang. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/Kconfig | 17 + 1 file changed, 17 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f1dbb4ee19d7..7dc270c17d0b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1957,6 +1957,23 @@ config EFI_MIXED If unsure, say N. +config EFI_WARN_ON_ILLEGAL_ACCESS + bool "Warn about illegal memory accesses by firmware" if EXPERT + depends on EFI + help + Enable this debug feature so that the kernel can detect illegal + memory accesses by firmware and issue a warning. Also, + 1. If the illegally accessed region is any region other than +EFI_RUNTIME_SERVICES_, then the kernel freezes +efi_rts_wq and schedules a new process. Also, it disables EFI +Runtime Services, so that it will never again call buggy firmware. + 2. If the illegal access is by efi_reset_system(), then the +platform is rebooted through BIOS. + Please see the UEFI specification for details on the expectations + of memory usage. + + If unsure, say N. + config SECCOMP def_bool y prompt "Enable seccomp to safely compute untrusted bytecode" -- 2.7.4
[PATCH V2 5/6] x86/mm: If in_atomic(), allocate pages without sleeping
From: Sai Praneeth A page fault occurs when any EFI Runtime Service tries to reference a memory region which it shouldn't. If the illegally accessed region is EFI_BOOT_SERVICES_, the efi specific page fault handler fixes it up by dynamically creating VA->PA mappings using efi_map_region(). Originally, efi_map_region() and hence the functionality of creating mappings for efi regions was intended to be used *only* during boot time (please note __init modifier) and hence when called during runtime (i.e. from efi page fault handler), the page allocators complain. Calling efi_map_region() during runtime complains because "gfp_allowed_mask" value changes from boot time to runtime (GFP_BOOT_MASK to __GFP_BITS_MASK). During boot, even though efi_map_region() calls alloc__page with GFP_KERNEL, the page allocator doesn't complain because "__GFP_RECLAIM" flag is cleared by "gfp_allowed_mask", but during runtime it isn't cleared and hence prints below stack trace. BUG: sleeping function called from invalid context at mm/page_alloc.c:4320 in_atomic(): 1, irqs_disabled(): 1, pid: 2022, name: fwts 1 lock held by fwts/2022: irq event stamp: 45714 hardirqs last enabled at (45713): [] restore_regs_and_return_to_kernel+0x0/0x2c hardirqs last disabled at (45714): [] error_entry+0x7c/0x100 softirqs last enabled at (44732): [] __do_softirq+0x387/0x49a softirqs last disabled at (44707): [] irq_exit+0xbb/0xc0 CPU: 0 PID: 2022 Comm: fwts Not tainted 4.17.0-rc4-efitest+ #405 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0x5e/0x8b ___might_sleep+0x20c/0x240 __alloc_pages_nodemask+0xc2/0x330 get_zeroed_page+0x12/0x40 alloc_pmd_page+0x13/0x50 populate_pmd+0xc0/0x2e0 ? __lock_acquire+0x439/0x740 __cpa_process_fault+0x2e1/0x5d0 __change_page_attr_set_clr+0x7c3/0xcd0 ? console_unlock+0x34d/0x660 ? kernel_map_pages_in_pgd+0x8c/0x160 kernel_map_pages_in_pgd+0x8c/0x160 ? printk+0x43/0x4b ? __map_region+0x3c/0x60 __map_region+0x3c/0x60 efi_map_region+0x83/0xd0 efi_illegal_accesses_fixup+0x1ca/0x1e0 no_context+0x112/0x390 __do_page_fault+0xc7/0x4f0 page_fault+0x1e/0x30 RIP: 0010:0xfffeffc7ccf1 RSP: 0018:c975bbf0 EFLAGS: 00010282 RAX: 0048 RBX: c975be10 RCX: c975bad0 RDX: 03f8 RSI: c975be10 RDI: fffeffc7cccf RBP: c975bdc8 R08: 0048 R09: 0048 R10: 03fd R11: 03f8 R12: 880032a92d80 R13: 0003 R14: 7ffcf1eb9d50 R15: ? efi_call+0xd1/0x160 ? __lock_acquire+0x439/0x740 ? _raw_spin_unlock+0x24/0x30 ? virt_efi_get_next_high_mono_count+0x77/0xf0 ? efi_test_ioctl+0x1ab/0xc20 ? selinux_file_ioctl+0x122/0x1c0 ? do_vfs_ioctl+0x92/0x6b0 ? do_vfs_ioctl+0x92/0x6b0 ? security_file_ioctl+0x3c/0x50 ? selinux_capable+0x20/0x20 ? ksys_ioctl+0x66/0x70 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x50/0x170 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe Fix the above warning by conditionally changing the allocation from GFP_KERNEL to GFP_ATOMIC, so that efi page fault handler could use efi_map_region() during runtime. This change shouldn't effect any other generic page allocations because this allocation is used only by efi functions [1]. [1] Comment in __cpa_process_fault() at arch/x86/mm/pageattr.c if (cpa->pgd) { /* * Right now, we only execute this code path when mapping * the EFI virtual memory map regions, no other users * provide a ->pgd value. This may change in the future. */ return populate_pgd(cpa, vaddr); } Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/mm/pageattr.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 3bded76e8d5c..1b28a333c8ce 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -926,7 +926,13 @@ static void unmap_pud_range(p4d_t *p4d, unsigned long start, unsigned long end) static int alloc_pte_page(pmd_t *pmd) { - pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL); + pte_t *pte; + + if (in_atomic()) + pte = (pte_t *)get_zeroed_page(GFP_ATOMIC); + else + pte = (pte_t *)get_zeroed_page(GFP_KERNEL); + if (!pte) return -1; @@ -936,7 +942,13 @@ static int alloc_pte_page(pmd_t *pmd) static int alloc_pmd_page(pud_t *pud) { - pmd_t *pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL); + pmd_t *pmd; + + if (in_atomic()) + pmd = (pmd_t *)get_zeroed_page(GFP_ATOMIC); + else + pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL); + if (!pmd) return -1; -- 2.7.4
[PATCH V2 1/6] efi: Make efi_rts_work accessible to efi page fault handler
From: Sai Praneeth If the firmware illegally accesses any efi regions other than EFI_BOOT_SERVICES_, the efi page fault handler would freeze efi_rts_wq and schedules a new process. To do this, the efi page fault handler needs efi_rts_work. Hence, make it accessible. There will be no race conditions in accessing this structure, because, all the calls to efi runtime services are already serialized. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Ard Biesheuvel --- drivers/firmware/efi/runtime-wrappers.c | 53 ++--- include/linux/efi.h | 36 ++ 2 files changed, 45 insertions(+), 44 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index aa66cbf23512..b18b2d864c2c 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -45,39 +45,7 @@ #define __efi_call_virt(f, args...) \ __efi_call_virt_pointer(efi.systab->runtime, f, args) -/* efi_runtime_service() function identifiers */ -enum efi_rts_ids { - GET_TIME, - SET_TIME, - GET_WAKEUP_TIME, - SET_WAKEUP_TIME, - GET_VARIABLE, - GET_NEXT_VARIABLE, - SET_VARIABLE, - QUERY_VARIABLE_INFO, - GET_NEXT_HIGH_MONO_COUNT, - UPDATE_CAPSULE, - QUERY_CAPSULE_CAPS, -}; - -/* - * efi_runtime_work: Details of EFI Runtime Service work - * @arg<1-5>: EFI Runtime Service function arguments - * @status:Status of executing EFI Runtime Service - * @efi_rts_id:EFI Runtime Service function identifier - * @efi_rts_comp: Struct used for handling completions - */ -struct efi_runtime_work { - void *arg1; - void *arg2; - void *arg3; - void *arg4; - void *arg5; - efi_status_t status; - struct work_struct work; - enum efi_rts_ids efi_rts_id; - struct completion efi_rts_comp; -}; +struct efi_runtime_work efi_rts_work; /* * efi_queue_work: Queue efi_runtime_service() and wait until it's done @@ -91,7 +59,6 @@ struct efi_runtime_work { */ #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \ ({ \ - struct efi_runtime_work efi_rts_work; \ efi_rts_work.status = EFI_ABORTED; \ \ init_completion(_rts_work.efi_rts_comp);\ @@ -184,18 +151,16 @@ static DEFINE_SEMAPHORE(efi_runtime_lock); */ static void efi_call_rts(struct work_struct *work) { - struct efi_runtime_work *efi_rts_work; void *arg1, *arg2, *arg3, *arg4, *arg5; efi_status_t status = EFI_NOT_FOUND; - efi_rts_work = container_of(work, struct efi_runtime_work, work); - arg1 = efi_rts_work->arg1; - arg2 = efi_rts_work->arg2; - arg3 = efi_rts_work->arg3; - arg4 = efi_rts_work->arg4; - arg5 = efi_rts_work->arg5; + arg1 = efi_rts_work.arg1; + arg2 = efi_rts_work.arg2; + arg3 = efi_rts_work.arg3; + arg4 = efi_rts_work.arg4; + arg5 = efi_rts_work.arg5; - switch (efi_rts_work->efi_rts_id) { + switch (efi_rts_work.efi_rts_id) { case GET_TIME: status = efi_call_virt(get_time, (efi_time_t *)arg1, (efi_time_cap_t *)arg2); @@ -253,8 +218,8 @@ static void efi_call_rts(struct work_struct *work) */ pr_err("Requested executing invalid EFI Runtime Service.\n"); } - efi_rts_work->status = status; - complete(_rts_work->efi_rts_comp); + efi_rts_work.status = status; + complete(_rts_work.efi_rts_comp); } static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc) diff --git a/include/linux/efi.h b/include/linux/efi.h index 401e4b254e30..855992b15269 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1659,7 +1659,43 @@ struct linux_efi_tpm_eventlog { extern int efi_tpm_eventlog_init(void); +/* efi_runtime_service() function identifiers */ +enum efi_rts_ids { + GET_TIME, + SET_TIME, + GET_WAKEUP_TIME, + SET_WAKEUP_TIME, + GET_VARIABLE, + GET_NEXT_VARIABLE, + SET_VARIABLE, + QUERY_VARIABLE_INFO, + GET_NEXT_HIGH_MONO_COUNT, + UPDATE_CAPSULE, + QUERY_CAPSULE_CAPS, +}; + +/* + * efi_runtime_work: Details of EFI Runtime Service work + * @arg<1-5>: EFI Runtime Service function arguments + * @status:Status of executing EFI Runtime Service + * @efi_rts_id:
[PATCH V2 3/6] x86/efi: Permanently save the EFI_MEMORY_MAP passed by the firmware
From: Sai Praneeth The efi page fault handler that fixes up page faults caused by the firmware needs the original memory map passed by the firmware. It looks up this memory map to find the type of the memory region at which the page fault occurred. Presently, EFI subsystem discards the original memory map passed by the firmware and replaces it with a new memory map that has only EFI_RUNTIME_SERVICES_ regions. But illegal accesses by firmware can occur at any region. Hence, _only_ if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is defined, create a backup of the original memory map passed by the firmware, so that efi page fault handler could detect/fix illegal accesses to *any* efi region. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 6 ++ arch/x86/platform/efi/efi.c| 2 ++ arch/x86/platform/efi/quirks.c | 49 ++ 3 files changed, 57 insertions(+) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 9b70743400f3..d9e5d9a6d138 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -142,6 +142,12 @@ extern int __init efi_reuse_config(u64 tables, int nr_tables); extern void efi_delete_dummy_variable(void); extern void efi_switch_mm(struct mm_struct *mm); +#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS +extern void __init efi_save_original_memmap(void); +#else +static inline void __init efi_save_original_memmap(void) { } +#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */ + struct efi_setup_data { u64 fw_vendor; u64 runtime; diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 439c2c40bf03..7d18b7ed5d41 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -946,6 +946,8 @@ static void __init __efi_enter_virtual_mode(void) pa = __pa(new_memmap); + efi_save_original_memmap(); + /* * Unregister the early EFI memmap from efi_init() and install * the new EFI memory map that we are about to pass to the diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 844d31cb8a0c..7fd53fa8c4dd 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -654,3 +654,52 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff, } #endif + +#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS + +static bool original_memory_map_present; +static struct efi_memory_map original_memory_map; + +/* + * The page fault handler that fixes up page faults caused by buggy + * firmware needs original memory map (memory map passed by firmware). + * Hence, build a new EFI memmap that has *all* entries and save it for + * later use. + */ +void __init efi_save_original_memmap(void) +{ + efi_memory_desc_t *md; + void *remapped_phys, *new_md; + phys_addr_t new_phys, new_size; + + new_size = efi.memmap.desc_size * efi.memmap.nr_map; + new_phys = efi_memmap_alloc(efi.memmap.nr_map); + if (!new_phys) { + pr_err("Failed to allocate new EFI memmap\n"); + return; + } + + remapped_phys = memremap(new_phys, new_size, MEMREMAP_WB); + if (!remapped_phys) { + pr_err("Failed to remap new EFI memmap\n"); + __free_pages(pfn_to_page(PHYS_PFN(new_phys)), get_order(new_size)); + return; + } + + new_md = remapped_phys; + for_each_efi_memory_desc(md) { + memcpy(new_md, md, efi.memmap.desc_size); + new_md += efi.memmap.desc_size; + } + + original_memory_map.late = 1; + original_memory_map.phys_map = new_phys; + original_memory_map.map = remapped_phys; + original_memory_map.nr_map = efi.memmap.nr_map; + original_memory_map.desc_size = efi.memmap.desc_size; + original_memory_map.map_end = remapped_phys + new_size; + original_memory_map.desc_version = efi.memmap.desc_version; + + original_memory_map_present = true; +} +#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */ -- 2.7.4
[PATCH V2 4/6] x86/efi: Add efi page fault handler to fixup/recover from page faults caused by firmware
From: Sai Praneeth EFI regions could briefly be divided into 3 types. 1. EFI_BOOT_SERVICES_ regions 2. EFI_RUNTIME_SERVICES_ regions 3. Other EFI regions like EFI_LOADER_ etc. As per the UEFI specification, after the call to ExitBootServices(), accesses by the firmware to any memory region except EFI_RUNTIME_SERVICES_ regions is considered illegal. A buggy firmware could trigger these illegal accesses during boot time or at runtime (i.e. when the kernel is up and running). Presently, the kernel can fix up illegal accesses to EFI_BOOT_SERVICES_ regions *only* during kernel boot phase. If the firmware triggers illegal accesses to *any* other EFI regions during kernel boot, the kernel panics or if this happens during kernel runtime then the kernel hangs. Kernel panics/hangs because the memory region requested by the firmware isn't mapped, which causes a page fault in ring 0 and the kernel fails to handle it, leading to die(). To save kernel from hanging, add an efi specific page fault handler which detects illegal accesses by the firmware and 1. If the illegally accessed region is EFI_BOOT_SERVICES_, the efi page fault handler fixes it up by mapping the requested region. 2. If any other region (Eg: EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), then the efi page fault handler freezes efi_rts_wq and schedules a new process. 3. If the access is to any other efi region like above but if the efi runtime service is efi_reset_system(), then the efi page fault handler will reboot the machine through BIOS. Illegal accesses to EFI_BOOT_SERVICES_ and to other regions are dealt differently in efi page fault handler because, *generally* EFI_BOOT_SERVICES_ regions are smaller in size relative to other efi regions and hence could be reserved and can be dynamically mapped. But other EFI regions like EFI_CONVENTIONAL_MEMORY and EFI_LOADER_ cannot be reserved as they are very huge in size and reserving them will make the kernel un-bootable. The efi specific page fault handler offers us two advantages: 1. Avoid panics/hangs caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence is not a kernel bug. Finally, this new mapping will not impact a reboot from kexec, as kexec is only concerned about runtime memory regions. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 7 ++ arch/x86/mm/fault.c | 9 ++ arch/x86/platform/efi/quirks.c | 152 drivers/firmware/efi/runtime-wrappers.c | 7 ++ include/linux/efi.h | 1 + 5 files changed, 176 insertions(+) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index d9e5d9a6d138..68a28606909c 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -144,8 +144,15 @@ extern void efi_switch_mm(struct mm_struct *mm); #ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS extern void __init efi_save_original_memmap(void); +extern int efi_illegal_accesses_fixup(unsigned long phys_addr, + struct pt_regs *regs); #else static inline void __init efi_save_original_memmap(void) { } +static inline int efi_illegal_accesses_fixup(unsigned long phys_addr, +struct pt_regs *regs) +{ + return 0; +} #endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */ struct efi_setup_data { diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 2aafa6ab6103..afd42e76058e 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -16,6 +16,7 @@ #include /* prefetchw*/ #include /* exception_enter(), ... */ #include /* faulthandler_disabled() */ +#include /* fixup for buggy UEFI firmware*/ #include /* boot_cpu_has, ...*/ #include /* dotraplinkage, ... */ @@ -24,6 +25,7 @@ #include /* emulate_vsyscall */ #include /* struct vm86 */ #include/* vma_pkey() */ +#include/* fixup for buggy UEFI firmware*/ #define CREATE_TRACE_POINTS #include @@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code, return; /* +* Buggy firmware could trigger illegal accesses to some EFI regions +* which might page fault, try to fixup or recover from such faults. +*/ + if (efi_illegal_accesses_fixup(address, regs)) + return; + + /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice: */ diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86
[PATCH V2 2/6] x86/efi: Remove __init attribute from memory mapping functions
From: Sai Praneeth Buggy firmware could illegally access EFI_BOOT_SERVICES_CODE/DATA regions even after the kernel has assumed control of the platform. When "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is enabled, the efi page fault handler will detect/fixup these illegal accesses. The below modified functions are used by the page fault handler to fixup illegal accesses to EFI_BOOT_SERVICES_CODE/DATA regions. As the page fault handler is present during/after kernel boot it doesn't have an __init attribute, but the below functions have it and thus during kernel build, "WARNING: modpost: Found * section mismatch(es)" build warning is observed. To fix it, remove __init attribute for all these functions. In order to not keep these functions needlessly when "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is not selected, add a new __efi_init_fixup attribute whose value changes based on whether the config option is selected or not. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 11 ++- arch/x86/platform/efi/efi.c| 4 ++-- arch/x86/platform/efi/efi_32.c | 2 +- arch/x86/platform/efi/efi_64.c | 9 + drivers/firmware/efi/efi.c | 6 +++--- include/linux/efi.h| 16 ++-- 6 files changed, 31 insertions(+), 17 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index cec5fae23eb3..9b70743400f3 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -103,8 +103,9 @@ struct efi_scratch { preempt_enable(); \ }) -extern void __iomem *__init efi_ioremap(unsigned long addr, unsigned long size, - u32 type, u64 attribute); +extern void __iomem *__efi_init_fixup efi_ioremap(unsigned long addr, + unsigned long size, u32 type, + u64 attribute); #ifdef CONFIG_KASAN /* @@ -126,13 +127,13 @@ extern int __init efi_memblock_x86_reserve_range(void); extern pgd_t * __init efi_call_phys_prolog(void); extern void __init efi_call_phys_epilog(pgd_t *save_pgd); extern void __init efi_print_memmap(void); -extern void __init efi_memory_uc(u64 addr, unsigned long size); -extern void __init efi_map_region(efi_memory_desc_t *md); +extern void __efi_init_fixup efi_memory_uc(u64 addr, unsigned long size); +extern void __efi_init_fixup efi_map_region(efi_memory_desc_t *md); extern void __init efi_map_region_fixed(efi_memory_desc_t *md); extern void efi_sync_low_kernel_mappings(void); extern int __init efi_alloc_page_tables(void); extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages); -extern void __init old_map_region(efi_memory_desc_t *md); +extern void __efi_init_fixup old_map_region(efi_memory_desc_t *md); extern void __init runtime_code_page_mkexec(void); extern void __init efi_runtime_update_mappings(void); extern void __init efi_dump_pagetable(void); diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 9061babfbc83..439c2c40bf03 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -572,7 +572,7 @@ void __init runtime_code_page_mkexec(void) } } -void __init efi_memory_uc(u64 addr, unsigned long size) +void __efi_init_fixup efi_memory_uc(u64 addr, unsigned long size) { unsigned long page_shift = 1UL << EFI_PAGE_SHIFT; u64 npages; @@ -582,7 +582,7 @@ void __init efi_memory_uc(u64 addr, unsigned long size) set_memory_uc(addr, npages); } -void __init old_map_region(efi_memory_desc_t *md) +void __efi_init_fixup old_map_region(efi_memory_desc_t *md) { u64 start_pfn, end_pfn, end; unsigned long size; diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c index 324b93328b37..8f31452bd204 100644 --- a/arch/x86/platform/efi/efi_32.c +++ b/arch/x86/platform/efi/efi_32.c @@ -58,7 +58,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) return 0; } -void __init efi_map_region(efi_memory_desc_t *md) +void __efi_init_fixup efi_map_region(efi_memory_desc_t *md) { old_map_region(md); } diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 448267f1c073..a04298312fdd 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -408,7 +408,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) return 0; } -static void __init __map_region(efi_memory_desc_t *md, u64 va) +static void __efi_init_fixup __map_region(efi_memory_desc_t *md, u64 va) { unsigned long flags = _PAGE_RW; unsigned long pfn; @@
[PATCH V2 6/6] x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESS
From: Sai Praneeth There may exist some buggy UEFI firmware implementations that might access efi regions other than EFI_RUNTIME_SERVICES_ even after the kernel has assumed control of the platform. This violates UEFI specification. If selected, this debug option will print a warning message if the UEFI firmware tries to access any memory region which it shouldn't. Along with the warning, the efi page fault handler will also try to fixup/recover from the page fault triggered by the firmware so that the machine doesn't hang. To support this feature, two changes should be made to the existing efi subsystem 1. Map EFI_BOOT_SERVICES_ regions only when EFI_WARN_ON_ILLEGAL_ACCESS is disabled Presently, the kernel maps EFI_BOOT_SERVICES_ regions as a workaround for buggy firmware that accesses them even when they shouldn't. With EFI_WARN_ON_ILLEGAL_ACCESS enabled (and hence efi page fault handler) kernel can now detect and handle such accesses dynamically. Hence, rather than safely mapping EFI_BOOT_SERVICES_ regions *all* the time, map them on demand. 2. If EFI_WARN_ON_ILLEGAL_ACCESS is enabled don't call efi_free_boot_services() Presently, during early boot phase EFI_BOOT_SERVICES_ regions are marked as reserved by kernel (see efi_reserve_boot_services()) and are freed before entering runtime (see efi_free_boot_services()). But, while dynamically fixing page faults caused by the firmware, efi page fault handler assumes that EFI_BOOT_SERVICES_ regions are still intact. Hence, to make this assumption true, don't call efi_free_boot_services() if EFI_WARN_ON_ILLEGAL_ACCESS is enabled. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/Kconfig | 21 + arch/x86/platform/efi/efi.c| 4 arch/x86/platform/efi/quirks.c | 3 +++ 3 files changed, 28 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f1dbb4ee19d7..0fb1309d510d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1957,6 +1957,27 @@ config EFI_MIXED If unsure, say N. +config EFI_WARN_ON_ILLEGAL_ACCESS + bool "Warn about illegal memory accesses by firmware" if EXPERT + depends on EFI + help + Enable this debug feature so that the kernel can detect illegal + memory accesses by firmware and issue a warning. Also, + 1. If the illegally accessed region is EFI_BOOT_SERVICES_, +the kernel fixes it up by mapping the requested region. + 2. If the illegally accessed region is any other region (Eg: +EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), then the +kernel freezes efi_rts_wq and schedules a new process. Also, it +disables EFI Runtime Services, so that it will never again call +buggy firmware. + 3. If the access is to any other efi region like above but if the +buggy efi runtime service is efi_reset_system(), then the +platform is rebooted through BIOS. + Please see the UEFI specification for details on the expectations + of memory usage. + + If unsure, say N. + config SECCOMP def_bool y prompt "Enable seccomp to safely compute untrusted bytecode" diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 7d18b7ed5d41..77fbcb798f4e 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -768,9 +768,13 @@ static bool should_map_region(efi_memory_desc_t *md) /* * Map boot services regions as a workaround for buggy * firmware that accesses them even when they shouldn't. +* (only if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is disabled) * * See efi_{reserve,free}_boot_services(). */ + if (IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS)) + return false; + if (md->type == EFI_BOOT_SERVICES_CODE || md->type == EFI_BOOT_SERVICES_DATA) return true; diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index e38e823382ba..60cb7a8d5371 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -377,6 +377,9 @@ void __init efi_free_boot_services(void) int num_entries = 0; void *new, *new_md; + if (IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS)) + return; + for_each_efi_memory_desc(md) { unsigned long long start = md->phys_addr; unsigned long long size = md->num_pages << EFI_PAGE_SHIFT; -- 2.7.4
[PATCH V1 4/6] x86/efi: Add efi page fault handler to fixup/recover from page faults caused by firmware
From: Sai Praneeth EFI regions could briefly be divided into 3 types. 1. EFI_BOOT_SERVICES_ regions 2. EFI_RUNTIME_SERVICES_ regions 3. Other EFI regions like EFI_LOADER_ etc. As per the UEFI specification, after the call to ExitBootServices(), accesses by the firmware to any memory region except EFI_RUNTIME_SERVICES_ regions is considered illegal. A buggy firmware could trigger these illegal accesses during boot time or at runtime (i.e. when the kernel is up and running). Presently, the kernel can fix up illegal accesses to EFI_BOOT_SERVICES_ regions *only* during kernel boot phase. If the firmware triggers illegal accesses to *any* other EFI regions during kernel boot, the kernel panics or if this happens during kernel runtime then the kernel hangs. Kernel panics/hangs because the memory region requested by the firmware isn't mapped, which causes a page fault in ring 0 and the kernel fails to handle it, leading to die(). To save kernel from hanging, add an efi specific page fault handler which detects illegal accesses by the firmware and 1. If the illegally accessed region is EFI_BOOT_SERVICES_, the efi page fault handler fixes it up by mapping the requested region. 2. If any other region (Eg: EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), then the efi page fault handler freezes efi_rts_wq and schedules a new process. 3. If the access is to any other efi region like above but if the efi runtime service is efi_reset_system(), then the efi page fault handler will reboot the machine through BIOS. Illegal accesses to EFI_BOOT_SERVICES_ and to other regions are dealt differently in efi page fault handler because, *generally* EFI_BOOT_SERVICES_ regions are smaller in size relative to other efi regions and hence could be reserved and can be dynamically mapped. But other EFI regions like EFI_CONVENTIONAL_MEMORY and EFI_LOADER_ cannot be reserved as they are very huge in size and reserving them will make the kernel un-bootable. The efi specific page fault handler offers us two advantages: 1. Avoid panics/hangs caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence is not a kernel bug. Finally, this new mapping will not impact a reboot from kexec, as kexec is only concerned about runtime memory regions. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 7 ++ arch/x86/mm/fault.c | 9 ++ arch/x86/platform/efi/quirks.c | 152 drivers/firmware/efi/runtime-wrappers.c | 7 ++ include/linux/efi.h | 1 + 5 files changed, 176 insertions(+) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index c97f2e955cab..4942fa04d74b 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -144,8 +144,15 @@ extern void efi_switch_mm(struct mm_struct *mm); #ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES extern void __init efi_save_original_memmap(void); +extern int efi_illegal_accesses_fixup(unsigned long phys_addr, + struct pt_regs *regs); #else static inline void __init efi_save_original_memmap(void) { } +static inline int efi_illegal_accesses_fixup(unsigned long phys_addr, +struct pt_regs *regs) +{ + return 0; +} #endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */ struct efi_setup_data { diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 2aafa6ab6103..afd42e76058e 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -16,6 +16,7 @@ #include /* prefetchw*/ #include /* exception_enter(), ... */ #include /* faulthandler_disabled() */ +#include /* fixup for buggy UEFI firmware*/ #include /* boot_cpu_has, ...*/ #include /* dotraplinkage, ... */ @@ -24,6 +25,7 @@ #include /* emulate_vsyscall */ #include /* struct vm86 */ #include/* vma_pkey() */ +#include/* fixup for buggy UEFI firmware*/ #define CREATE_TRACE_POINTS #include @@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code, return; /* +* Buggy firmware could trigger illegal accesses to some EFI regions +* which might page fault, try to fixup or recover from such faults. +*/ + if (efi_illegal_accesses_fixup(address, regs)) + return; + + /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice: */ diff --git a/arch/x86/platform/efi/quirks.c b/arch
[PATCH V1 3/6] x86/efi: Permanently save the EFI_MEMORY_MAP passed by the firmware
From: Sai Praneeth The efi page fault handler that fixes up page faults caused by the firmware needs the original memory map passed by the firmware. It looks up this memory map to find the type of the memory region at which the page fault occurred. Presently, EFI subsystem discards the original memory map passed by the firmware and replaces it with a new memory map that has only EFI_RUNTIME_SERVICES_ regions. But illegal accesses by firmware can occur at any region. Hence, _only_ if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES is defined, create a backup of the original memory map passed by the firmware, so that efi page fault handler could detect/fix illegal accesses to *any* efi region. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 6 ++ arch/x86/platform/efi/efi.c| 2 ++ arch/x86/platform/efi/quirks.c | 49 ++ 3 files changed, 57 insertions(+) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 9b70743400f3..c97f2e955cab 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -142,6 +142,12 @@ extern int __init efi_reuse_config(u64 tables, int nr_tables); extern void efi_delete_dummy_variable(void); extern void efi_switch_mm(struct mm_struct *mm); +#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES +extern void __init efi_save_original_memmap(void); +#else +static inline void __init efi_save_original_memmap(void) { } +#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */ + struct efi_setup_data { u64 fw_vendor; u64 runtime; diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 439c2c40bf03..7d18b7ed5d41 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -946,6 +946,8 @@ static void __init __efi_enter_virtual_mode(void) pa = __pa(new_memmap); + efi_save_original_memmap(); + /* * Unregister the early EFI memmap from efi_init() and install * the new EFI memory map that we are about to pass to the diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 844d31cb8a0c..84b213a1460a 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -654,3 +654,52 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff, } #endif + +#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES + +static bool original_memory_map_present; +static struct efi_memory_map original_memory_map; + +/* + * The page fault handler that fixes up page faults caused by buggy + * firmware needs original memory map (memory map passed by firmware). + * Hence, build a new EFI memmap that has *all* entries and save it for + * later use. + */ +void __init efi_save_original_memmap(void) +{ + efi_memory_desc_t *md; + void *remapped_phys, *new_md; + phys_addr_t new_phys, new_size; + + new_size = efi.memmap.desc_size * efi.memmap.nr_map; + new_phys = efi_memmap_alloc(efi.memmap.nr_map); + if (!new_phys) { + pr_err("Failed to allocate new EFI memmap\n"); + return; + } + + remapped_phys = memremap(new_phys, new_size, MEMREMAP_WB); + if (!remapped_phys) { + pr_err("Failed to remap new EFI memmap\n"); + __free_pages(pfn_to_page(PHYS_PFN(new_phys)), get_order(new_size)); + return; + } + + new_md = remapped_phys; + for_each_efi_memory_desc(md) { + memcpy(new_md, md, efi.memmap.desc_size); + new_md += efi.memmap.desc_size; + } + + original_memory_map.late = 1; + original_memory_map.phys_map = new_phys; + original_memory_map.map = remapped_phys; + original_memory_map.nr_map = efi.memmap.nr_map; + original_memory_map.desc_size = efi.memmap.desc_size; + original_memory_map.map_end = remapped_phys + new_size; + original_memory_map.desc_version = efi.memmap.desc_version; + + original_memory_map_present = true; +} +#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */ -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V1 6/6] x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES
From: Sai Praneeth There may exist some buggy UEFI firmware implementations that might access efi regions other than EFI_RUNTIME_SERVICES_ even after the kernel has assumed control of the platform. This violates UEFI specification. If selected, this debug option will print a warning message if the UEFI firmware tries to access any memory region which it shouldn't. Along with the warning, the efi page fault handler will also try to fixup/recover from the page fault triggered by the firmware so that the machine doesn't hang. To support this feature, two changes should be made to the existing efi subsystem 1. Map EFI_BOOT_SERVICES_ regions only when EFI_WARN_ON_ILLEGAL_ACCESSES is disabled Presently, the kernel maps EFI_BOOT_SERVICES_ regions as a workaround for buggy firmware that accesses them even when they shouldn't. With EFI_WARN_ON_ILLEGAL_ACCESSES enabled (and hence efi page fault handler) kernel can now detect and handle such accesses dynamically. Hence, rather than safely mapping EFI_BOOT_SERVICES_ regions *all* the time, map them on demand. 2. If EFI_WARN_ON_ILLEGAL_ACCESSES is enabled don't call efi_free_boot_services() Presently, during early boot phase EFI_BOOT_SERVICES_ regions are marked as reserved by kernel (see efi_reserve_boot_services()) and are freed before entering runtime (see efi_free_boot_services()). But, while dynamically fixing page faults caused by the firmware, efi page fault handler assumes that EFI_BOOT_SERVICES_ regions are still intact. Hence, to make this assumption true, don't call efi_free_boot_services() if EFI_WARN_ON_ILLEGAL_ACCESSES is enabled. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/Kconfig| 21 + arch/x86/platform/efi/efi.c | 4 init/main.c | 3 ++- 3 files changed, 27 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f1dbb4ee19d7..278e5820e8dd 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1957,6 +1957,27 @@ config EFI_MIXED If unsure, say N. +config EFI_WARN_ON_ILLEGAL_ACCESSES + bool "Warn about illegal memory accesses by firmware" + depends on EFI + help + Enable this debug feature so that the kernel can detect illegal + memory accesses by firmware and issue a warning. Also, + 1. If the illegally accessed region is EFI_BOOT_SERVICES_, +the kernel fixes it up by mapping the requested region. + 2. If the illegally accessed region is any other region (Eg: +EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), then the +kernel freezes efi_rts_wq and schedules a new process. Also, it +disables EFI Runtime Services, so that it will never again call +buggy firmware. + 3. If the access is to any other efi region like above but if the +buggy efi runtime service is efi_reset_system(), then the +platform is rebooted through BIOS. + Please see the UEFI specification for details on the expectations + of memory usage. + + If unsure, say N. + config SECCOMP def_bool y prompt "Enable seccomp to safely compute untrusted bytecode" diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 7d18b7ed5d41..0ddb22a03d88 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -768,9 +768,13 @@ static bool should_map_region(efi_memory_desc_t *md) /* * Map boot services regions as a workaround for buggy * firmware that accesses them even when they shouldn't. +* (only if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES is disabled) * * See efi_{reserve,free}_boot_services(). */ + if (IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES)) + return false; + if (md->type == EFI_BOOT_SERVICES_CODE || md->type == EFI_BOOT_SERVICES_DATA) return true; diff --git a/init/main.c b/init/main.c index 3b4ada11ed52..dce0520861a1 100644 --- a/init/main.c +++ b/init/main.c @@ -730,7 +730,8 @@ asmlinkage __visible void __init start_kernel(void) arch_post_acpi_subsys_init(); sfi_init_late(); - if (efi_enabled(EFI_RUNTIME_SERVICES)) { + if (efi_enabled(EFI_RUNTIME_SERVICES) && + !IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES)) { efi_free_boot_services(); } -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V1 5/6] x86/mm: If in_atomic(), allocate pages without sleeping
From: Sai Praneeth A page fault occurs when any EFI Runtime Service tries to reference a memory region which it shouldn't. If the illegally accessed region is EFI_BOOT_SERVICES_, the efi specific page fault handler fixes it up by dynamically creating VA->PA mappings using efi_map_region(). Originally, efi_map_region() and hence the functionality of creating mappings for efi regions was intended to be used *only* during boot time (please note __init modifier) and hence when called during runtime (i.e. from efi page fault handler), the page allocators complain. Calling efi_map_region() during runtime complains because "gfp_allowed_mask" value changes from boot time to runtime (GFP_BOOT_MASK to __GFP_BITS_MASK). During boot, even though efi_map_region() calls alloc__page with GFP_KERNEL, the page allocator doesn't complain because "__GFP_RECLAIM" flag is cleared by "gfp_allowed_mask", but during runtime it isn't cleared and hence prints below stack trace. BUG: sleeping function called from invalid context at mm/page_alloc.c:4320 in_atomic(): 1, irqs_disabled(): 1, pid: 2022, name: fwts 1 lock held by fwts/2022: irq event stamp: 45714 hardirqs last enabled at (45713): [] restore_regs_and_return_to_kernel+0x0/0x2c hardirqs last disabled at (45714): [] error_entry+0x7c/0x100 softirqs last enabled at (44732): [] __do_softirq+0x387/0x49a softirqs last disabled at (44707): [] irq_exit+0xbb/0xc0 CPU: 0 PID: 2022 Comm: fwts Not tainted 4.17.0-rc4-efitest+ #405 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0x5e/0x8b ___might_sleep+0x20c/0x240 __alloc_pages_nodemask+0xc2/0x330 get_zeroed_page+0x12/0x40 alloc_pmd_page+0x13/0x50 populate_pmd+0xc0/0x2e0 ? __lock_acquire+0x439/0x740 __cpa_process_fault+0x2e1/0x5d0 __change_page_attr_set_clr+0x7c3/0xcd0 ? console_unlock+0x34d/0x660 ? kernel_map_pages_in_pgd+0x8c/0x160 kernel_map_pages_in_pgd+0x8c/0x160 ? printk+0x43/0x4b ? __map_region+0x3c/0x60 __map_region+0x3c/0x60 efi_map_region+0x83/0xd0 efi_illegal_accesses_fixup+0x1ca/0x1e0 no_context+0x112/0x390 __do_page_fault+0xc7/0x4f0 page_fault+0x1e/0x30 RIP: 0010:0xfffeffc7ccf1 RSP: 0018:c975bbf0 EFLAGS: 00010282 RAX: 0048 RBX: c975be10 RCX: c975bad0 RDX: 03f8 RSI: c975be10 RDI: fffeffc7cccf RBP: c975bdc8 R08: 0048 R09: 0048 R10: 03fd R11: 03f8 R12: 880032a92d80 R13: 0003 R14: 7ffcf1eb9d50 R15: ? efi_call+0xd1/0x160 ? __lock_acquire+0x439/0x740 ? _raw_spin_unlock+0x24/0x30 ? virt_efi_get_next_high_mono_count+0x77/0xf0 ? efi_test_ioctl+0x1ab/0xc20 ? selinux_file_ioctl+0x122/0x1c0 ? do_vfs_ioctl+0x92/0x6b0 ? do_vfs_ioctl+0x92/0x6b0 ? security_file_ioctl+0x3c/0x50 ? selinux_capable+0x20/0x20 ? ksys_ioctl+0x66/0x70 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x50/0x170 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe Fix the above warning by conditionally changing the allocation from GFP_KERNEL to GFP_ATOMIC, so that efi page fault handler could use efi_map_region() during runtime. This change shouldn't effect any other generic page allocations because this allocation is used only by efi functions [1]. [1] Comment in __cpa_process_fault() at arch/x86/mm/pageattr.c if (cpa->pgd) { /* * Right now, we only execute this code path when mapping * the EFI virtual memory map regions, no other users * provide a ->pgd value. This may change in the future. */ return populate_pgd(cpa, vaddr); } Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/mm/pageattr.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 3bded76e8d5c..1b28a333c8ce 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -926,7 +926,13 @@ static void unmap_pud_range(p4d_t *p4d, unsigned long start, unsigned long end) static int alloc_pte_page(pmd_t *pmd) { - pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL); + pte_t *pte; + + if (in_atomic()) + pte = (pte_t *)get_zeroed_page(GFP_ATOMIC); + else + pte = (pte_t *)get_zeroed_page(GFP_KERNEL); + if (!pte) return -1; @@ -936,7 +942,13 @@ static int alloc_pte_page(pmd_t *pmd) static int alloc_pmd_page(pud_t *pud) { - pmd_t *pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL); + pmd_t *pmd; + + if (in_atomic()) + pmd = (pmd_t *)get_zeroed_page(GFP_ATOMIC); + else + pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL); + if (!pmd) return -1; -- 2.7.4 -- To unsubscribe from this
[PATCH V1 2/6] x86/efi: Remove __init attribute from memory mapping functions
From: Sai Praneeth Buggy firmware could illegally access EFI_BOOT_SERVICES_CODE/DATA regions even after the kernel has assumed control of the platform. When "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES" is enabled, the efi page fault handler will detect/fixup these illegal accesses. The below modified functions are used by the page fault handler to fixup illegal accesses to EFI_BOOT_SERVICES_CODE/DATA regions. As the page fault handler is present during/after kernel boot it doesn't have an __init attribute, but the below functions have it and thus during kernel build, "WARNING: modpost: Found * section mismatch(es)" build warning is observed. To fix it, remove __init attribute for all these functions. In order to not keep these functions needlessly when "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES" is not selected, add a new __efi_init_fixup attribute whose value changes based on whether the config option is selected or not. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 11 ++- arch/x86/platform/efi/efi.c| 4 ++-- arch/x86/platform/efi/efi_32.c | 2 +- arch/x86/platform/efi/efi_64.c | 9 + drivers/firmware/efi/efi.c | 6 +++--- include/linux/efi.h| 16 ++-- 6 files changed, 31 insertions(+), 17 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index cec5fae23eb3..9b70743400f3 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -103,8 +103,9 @@ struct efi_scratch { preempt_enable(); \ }) -extern void __iomem *__init efi_ioremap(unsigned long addr, unsigned long size, - u32 type, u64 attribute); +extern void __iomem *__efi_init_fixup efi_ioremap(unsigned long addr, + unsigned long size, u32 type, + u64 attribute); #ifdef CONFIG_KASAN /* @@ -126,13 +127,13 @@ extern int __init efi_memblock_x86_reserve_range(void); extern pgd_t * __init efi_call_phys_prolog(void); extern void __init efi_call_phys_epilog(pgd_t *save_pgd); extern void __init efi_print_memmap(void); -extern void __init efi_memory_uc(u64 addr, unsigned long size); -extern void __init efi_map_region(efi_memory_desc_t *md); +extern void __efi_init_fixup efi_memory_uc(u64 addr, unsigned long size); +extern void __efi_init_fixup efi_map_region(efi_memory_desc_t *md); extern void __init efi_map_region_fixed(efi_memory_desc_t *md); extern void efi_sync_low_kernel_mappings(void); extern int __init efi_alloc_page_tables(void); extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages); -extern void __init old_map_region(efi_memory_desc_t *md); +extern void __efi_init_fixup old_map_region(efi_memory_desc_t *md); extern void __init runtime_code_page_mkexec(void); extern void __init efi_runtime_update_mappings(void); extern void __init efi_dump_pagetable(void); diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 9061babfbc83..439c2c40bf03 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -572,7 +572,7 @@ void __init runtime_code_page_mkexec(void) } } -void __init efi_memory_uc(u64 addr, unsigned long size) +void __efi_init_fixup efi_memory_uc(u64 addr, unsigned long size) { unsigned long page_shift = 1UL << EFI_PAGE_SHIFT; u64 npages; @@ -582,7 +582,7 @@ void __init efi_memory_uc(u64 addr, unsigned long size) set_memory_uc(addr, npages); } -void __init old_map_region(efi_memory_desc_t *md) +void __efi_init_fixup old_map_region(efi_memory_desc_t *md) { u64 start_pfn, end_pfn, end; unsigned long size; diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c index 324b93328b37..8f31452bd204 100644 --- a/arch/x86/platform/efi/efi_32.c +++ b/arch/x86/platform/efi/efi_32.c @@ -58,7 +58,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) return 0; } -void __init efi_map_region(efi_memory_desc_t *md) +void __efi_init_fixup efi_map_region(efi_memory_desc_t *md) { old_map_region(md); } diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 448267f1c073..a04298312fdd 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -408,7 +408,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) return 0; } -static void __init __map_region(efi_memory_desc_t *md, u64 va) +static void __efi_init_fixup __map_region(efi_memory_desc_t *md, u64 va) { unsigned long flags = _PAGE_RW; unsigned long pfn; @@
[PATCH V1 1/6] efi: Make efi_rts_work accessible to efi page fault handler
From: Sai Praneeth If the firmware illegally accesses any efi regions other than EFI_BOOT_SERVICES_, the efi page fault handler would freeze efi_rts_wq and schedules a new process. To do this, the efi page fault handler needs efi_rts_work. Hence, make it accessible. There will be no race conditions in accessing this structure, because, all the calls to efi runtime services are already serialized. Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Ard Biesheuvel --- drivers/firmware/efi/runtime-wrappers.c | 53 ++--- include/linux/efi.h | 36 ++ 2 files changed, 45 insertions(+), 44 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index aa66cbf23512..b18b2d864c2c 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -45,39 +45,7 @@ #define __efi_call_virt(f, args...) \ __efi_call_virt_pointer(efi.systab->runtime, f, args) -/* efi_runtime_service() function identifiers */ -enum efi_rts_ids { - GET_TIME, - SET_TIME, - GET_WAKEUP_TIME, - SET_WAKEUP_TIME, - GET_VARIABLE, - GET_NEXT_VARIABLE, - SET_VARIABLE, - QUERY_VARIABLE_INFO, - GET_NEXT_HIGH_MONO_COUNT, - UPDATE_CAPSULE, - QUERY_CAPSULE_CAPS, -}; - -/* - * efi_runtime_work: Details of EFI Runtime Service work - * @arg<1-5>: EFI Runtime Service function arguments - * @status:Status of executing EFI Runtime Service - * @efi_rts_id:EFI Runtime Service function identifier - * @efi_rts_comp: Struct used for handling completions - */ -struct efi_runtime_work { - void *arg1; - void *arg2; - void *arg3; - void *arg4; - void *arg5; - efi_status_t status; - struct work_struct work; - enum efi_rts_ids efi_rts_id; - struct completion efi_rts_comp; -}; +struct efi_runtime_work efi_rts_work; /* * efi_queue_work: Queue efi_runtime_service() and wait until it's done @@ -91,7 +59,6 @@ struct efi_runtime_work { */ #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \ ({ \ - struct efi_runtime_work efi_rts_work; \ efi_rts_work.status = EFI_ABORTED; \ \ init_completion(_rts_work.efi_rts_comp);\ @@ -184,18 +151,16 @@ static DEFINE_SEMAPHORE(efi_runtime_lock); */ static void efi_call_rts(struct work_struct *work) { - struct efi_runtime_work *efi_rts_work; void *arg1, *arg2, *arg3, *arg4, *arg5; efi_status_t status = EFI_NOT_FOUND; - efi_rts_work = container_of(work, struct efi_runtime_work, work); - arg1 = efi_rts_work->arg1; - arg2 = efi_rts_work->arg2; - arg3 = efi_rts_work->arg3; - arg4 = efi_rts_work->arg4; - arg5 = efi_rts_work->arg5; + arg1 = efi_rts_work.arg1; + arg2 = efi_rts_work.arg2; + arg3 = efi_rts_work.arg3; + arg4 = efi_rts_work.arg4; + arg5 = efi_rts_work.arg5; - switch (efi_rts_work->efi_rts_id) { + switch (efi_rts_work.efi_rts_id) { case GET_TIME: status = efi_call_virt(get_time, (efi_time_t *)arg1, (efi_time_cap_t *)arg2); @@ -253,8 +218,8 @@ static void efi_call_rts(struct work_struct *work) */ pr_err("Requested executing invalid EFI Runtime Service.\n"); } - efi_rts_work->status = status; - complete(_rts_work->efi_rts_comp); + efi_rts_work.status = status; + complete(_rts_work.efi_rts_comp); } static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc) diff --git a/include/linux/efi.h b/include/linux/efi.h index 401e4b254e30..855992b15269 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1659,7 +1659,43 @@ struct linux_efi_tpm_eventlog { extern int efi_tpm_eventlog_init(void); +/* efi_runtime_service() function identifiers */ +enum efi_rts_ids { + GET_TIME, + SET_TIME, + GET_WAKEUP_TIME, + SET_WAKEUP_TIME, + GET_VARIABLE, + GET_NEXT_VARIABLE, + SET_VARIABLE, + QUERY_VARIABLE_INFO, + GET_NEXT_HIGH_MONO_COUNT, + UPDATE_CAPSULE, + QUERY_CAPSULE_CAPS, +}; + +/* + * efi_runtime_work: Details of EFI Runtime Service work + * @arg<1-5>: EFI Runtime Service function arguments + * @status:Status of executing EFI Runtime Service + * @efi_rts_id:
[PATCH V1 0/6] Add efi page fault handler to fix/recover from
From: Sai Praneeth There may exist some buggy UEFI firmware implementations that access efi memory regions other than EFI_RUNTIME_SERVICES_ even after the kernel has assumed control of the platform. This violates UEFI specification. Hence, provide a debug config option which when enabled detects and fixes/recovers from page faults caused by buggy firmware. The above said illegal accesses trigger page fault in ring 0 because firmware executes at ring 0 and if unhandled it hangs the kernel. We provide an efi specific page fault handler to: 1. Avoid panics/hangs caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence is not a kernel bug. Depending on the illegally accessed efi region, the efi page fault handler handles illegal accesses differently. 1. If the illegally accessed region is EFI_BOOT_SERVICES_, the efi page fault handler fixes it up by mapping the requested region. 2. If any other region (Eg: EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), then the efi page fault handler freezes efi_rts_wq and schedules a new process. 3. If the access is to any other efi region like above but if the efi runtime service is efi_reset_system(), then the efi page fault handler will reboot the machine through BIOS. Illegal accesses to EFI_BOOT_SERVICES_ and to other regions are dealt differently in efi page fault handler because, *generally* EFI_BOOT_SERVICES_ regions are smaller in size relative to other efi regions and hence could be reserved and can be dynamically mapped. But other EFI regions like EFI_CONVENTIONAL_MEMORY and EFI_LOADER_ cannot be reserved as they are very huge in size and reserving them will make the kernel un-bootable. This issue was reported by Al Stone when he saw that reboot via EFI hangs the machine. Upon debugging, I found that it's efi_reset_system() that's touching memory regions which it shouldn't. To reproduce the same behavior, I have hacked OVMF and made efi_reset_system() buggy. Along with efi_reset_system(), I have also modified get_next_high_mono_count() and set_virtual_address_map(). They illegally access both boot time and other efi regions. Testing the patch set: -- 1. Download buggy firmware from here [1]. 2. Run a qemu instance with this buggy BIOS and boot mainline kernel. Add reboot=efi to the kernel command line arguments and after the kernel is up and running, type "reboot". The kernel should hang while rebooting. 3. With the same setup, boot kernel after applying patches and the reboot should work fine. Also please notice warning/error messages printed by kernel. Changes from RFC to V1: --- 1. Drop "long jump" technique of dealing with illegal access and instead use scheduling away from efi_rts_wq. Note: - Patch set based on "next" branch in efi tree. [1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt Sai Praneeth (6): efi: Make efi_rts_work accessible to efi page fault handler x86/efi: Remove __init attribute from memory mapping functions x86/efi: Permanently save the EFI_MEMORY_MAP passed by the firmware x86/efi: Add efi page fault handler to fixup/recover from page faults caused by firmware x86/mm: If in_atomic(), allocate pages without sleeping x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES arch/x86/Kconfig| 21 arch/x86/include/asm/efi.h | 24 +++- arch/x86/mm/fault.c | 9 ++ arch/x86/mm/pageattr.c | 16 ++- arch/x86/platform/efi/efi.c | 10 +- arch/x86/platform/efi/efi_32.c | 2 +- arch/x86/platform/efi/efi_64.c | 9 +- arch/x86/platform/efi/quirks.c | 201 drivers/firmware/efi/efi.c | 6 +- drivers/firmware/efi/runtime-wrappers.c | 60 +++--- include/linux/efi.h | 53 - init/main.c | 3 +- 12 files changed, 350 insertions(+), 64 deletions(-) Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Al Stone Cc: Borislav Petkov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Bhupesh Sharma Cc: Peter Zijlstra Cc: Ard Biesheuvel -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 7/8] x86/mm: If in_atomic(), allocate pages without sleeping
From: Sai Praneeth A page fault occurs when any EFI Runtime Service tries to reference a memory region which it shouldn't. If the illegally accessed region is EFI_BOOT_SERVICES_, the efi specific page fault handler fixes it up by dynamically creating VA->PA mappings using efi_map_region(). Originally, efi_map_region() and hence the functionality of creating mappings for efi regions was intended to be used *only* during boot time (please note __init modifier) and hence when called during runtime, the page allocators complain. Calling efi_map_region() during runtime complains because "gfp_allowed_mask" value changes from boot time to runtime (GFP_BOOT_MASK to __GFP_BITS_MASK). During boot, even though efi_map_region() calls alloc__page with GFP_KERNEL, the page allocator doesn't complain because "__GFP_RECLAIM" flag is cleared by "gfp_allowed_mask", but during runtime it isn't cleared and hence prints below stack trace. BUG: sleeping function called from invalid context at mm/page_alloc.c:4320 in_atomic(): 1, irqs_disabled(): 1, pid: 2022, name: fwts 1 lock held by fwts/2022: irq event stamp: 45714 hardirqs last enabled at (45713): [] restore_regs_and_return_to_kernel+0x0/0x2c hardirqs last disabled at (45714): [] error_entry+0x7c/0x100 softirqs last enabled at (44732): [] __do_softirq+0x387/0x49a softirqs last disabled at (44707): [] irq_exit+0xbb/0xc0 CPU: 0 PID: 2022 Comm: fwts Not tainted 4.17.0-rc4-efitest+ #405 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0x5e/0x8b ___might_sleep+0x20c/0x240 __alloc_pages_nodemask+0xc2/0x330 get_zeroed_page+0x12/0x40 alloc_pmd_page+0x13/0x50 populate_pmd+0xc0/0x2e0 ? __lock_acquire+0x439/0x740 __cpa_process_fault+0x2e1/0x5d0 __change_page_attr_set_clr+0x7c3/0xcd0 ? console_unlock+0x34d/0x660 ? kernel_map_pages_in_pgd+0x8c/0x160 kernel_map_pages_in_pgd+0x8c/0x160 ? printk+0x43/0x4b ? __map_region+0x3c/0x60 __map_region+0x3c/0x60 efi_map_region+0x83/0xd0 efi_illegal_accesses_fixup+0x1ca/0x1e0 no_context+0x112/0x390 __do_page_fault+0xc7/0x4f0 page_fault+0x1e/0x30 RIP: 0010:0xfffeffc7ccf1 RSP: 0018:c975bbf0 EFLAGS: 00010282 RAX: 0048 RBX: c975be10 RCX: c975bad0 RDX: 03f8 RSI: c975be10 RDI: fffeffc7cccf RBP: c975bdc8 R08: 0048 R09: 0048 R10: 03fd R11: 03f8 R12: 880032a92d80 R13: 0003 R14: 7ffcf1eb9d50 R15: ? efi_call+0xd1/0x160 ? __lock_acquire+0x439/0x740 ? _raw_spin_unlock+0x24/0x30 ? virt_efi_get_next_high_mono_count+0x77/0xf0 ? efi_test_ioctl+0x1ab/0xc20 ? selinux_file_ioctl+0x122/0x1c0 ? do_vfs_ioctl+0x92/0x6b0 ? do_vfs_ioctl+0x92/0x6b0 ? security_file_ioctl+0x3c/0x50 ? selinux_capable+0x20/0x20 ? ksys_ioctl+0x66/0x70 ? __x64_sys_ioctl+0x16/0x20 ? do_syscall_64+0x50/0x170 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe I guess, we can't do much to fix the above warning except to change the allocation conditionally from GFP_KERNEL to GFP_ATOMIC, so that we could use efi_map_region() during runtime. This change shouldn't effect any other generic page allocations because this allocation is used only by efi functions [1]. [1] Comment in __cpa_process_fault() at arch/x86/mm/pageattr.c if (cpa->pgd) { /* * Right now, we only execute this code path when mapping * the EFI virtual memory map regions, no other users * provide a ->pgd value. This may change in the future. */ return populate_pgd(cpa, vaddr); } Signed-off-by: Sai Praneeth Prakhya Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Cc: Al Stone Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Bhupesh Sharma Cc: Ard Biesheuvel --- arch/x86/mm/pageattr.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 3bded76e8d5c..1b28a333c8ce 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -926,7 +926,13 @@ static void unmap_pud_range(p4d_t *p4d, unsigned long start, unsigned long end) static int alloc_pte_page(pmd_t *pmd) { - pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL); + pte_t *pte; + + if (in_atomic()) + pte = (pte_t *)get_zeroed_page(GFP_ATOMIC); + else + pte = (pte_t *)get_zeroed_page(GFP_KERNEL); + if (!pte) return -1; @@ -936,7 +942,13 @@ static int alloc_pte_page(pmd_t *pmd) static int alloc_pmd_page(pud_t *pud) { - pmd_t *pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL); + pmd_t *pmd; + + if (in_atomic()) + pmd = (pmd_t *)get_zeroed_page(GFP_ATOMIC); + else + pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL); + if (!pmd) return -1; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message
[PATCH RFC 0/8] Add efi page fault handler to fix/recover from
From: Sai Praneeth There may exist some buggy UEFI firmware implementations that access efi memory regions other than EFI_RUNTIME_SERVICES_ even after kernel has assumed control of the platform. This violates UEFI specification. Here, we provide a debug config option which when enabled detects and fixes up/recovers from page faults caused by buggy firmware. The above said illegal accesses trigger page fault in ring 0 because firmware executes at ring 0 and if unhandled it hangs the kernel. We provide an efi specific page fault handler to: 1. Avoid panics/hangs caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence can save ourselves from being blamed for not a fault of ours. Depending on the illegally accessed efi region, the efi page fault handler handles illegal accesses differently. 1. If the illegally accessed region is EFI_BOOT_SERVICES_, the page fault handler fixes it up by mapping the requested region. 2. If the illegally accessed region is any other efi region (like EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), the page fault handler exits firmware context and disables EFI Runtime Services, so that we will never again call buggy firmware. Page faults to efi regions are handled differently because, presently during kernel boot, EFI_BOOT_SERVICES_ regions are reserved by kernel and hence it's OK to dynamically map these regions in page fault handler. The same approach cannot be followed for other efi regions like EFI_CONVENTIONAL_MEMORY and EFI_LOADER_ as they are very huge in size and reserving them could make the kernel un-bootable. Hence, we take a different approach (exiting firmware context) while dealing with page faults to these regions. This also saves us from executing buggy firmware further. Exiting firmware context means that on every entry to firmware we save the kernel context before calling firmware and if the firmware misbehaves, in the page fault handler, we roll back to the saved kernel context. Saving kernel context means saving the stack pointer and the instruction that gets executed when firmware returns. In the page fault handler we fix up these two things (RIP and RSP) so that when returning from page fault handler it looks as if firmware has called RET. This issue was reported by Al Stone when he saw that reboot via EFI hangs the machine. Upon debugging, I found that it's efi_reset_system() that's touching memory regions which it shouldn't. To reproduce the same behavior, I have hacked OVMF and made efi_reset_system() buggy. Testing the patch set: -- 1. Download buggy firmware from here [1]. 2. Run a qemu instance with this buggy BIOS and boot mainline kernel. Add reboot=efi to the kernel command line arguments and after the kernel is up and running, type "reboot". The kernel should hang while rebooting. 3. With the same setup, boot kernel after applying patches and the reboot should work fine. Also please notice warning/error messages printed by kernel. Note: - Patch set based on "next" branch in efi tree. [1] https://drive.google.com/open?id=1tkvT7GaVX2zSlzy1HK1T4Tv8cT36GP6R Sai Praneeth (8): x86/efi: Remove __init attribute from memory mapping functions x86/efi: Permanently save the EFI_MEMORY_MAP passed by firmware x86/efi: Save kernel context before calling EFI Runtime Services x86/efi: Add page fault handler to fixup/recover from page faults caused by firmware x86/efi: If EFI_WARN_ON_ILLEGAL_ACCESSES is enabled don't call efi_free_boot_services() x86/efi: Map EFI_BOOT_SERVICES_ regions only when EFI_WARN_ON_ILLEGAL_ACCESSES is disabled x86/mm: If in_atomic(), allocate pages without sleeping x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES arch/x86/Kconfig| 17 +++ arch/x86/include/asm/efi.h | 42 ++- arch/x86/mm/fault.c | 9 ++ arch/x86/mm/pageattr.c | 16 ++- arch/x86/platform/efi/efi.c | 10 +- arch/x86/platform/efi/efi_32.c | 2 +- arch/x86/platform/efi/efi_64.c | 16 ++- arch/x86/platform/efi/efi_stub_64.S | 101 - arch/x86/platform/efi/quirks.c | 193 drivers/firmware/efi/efi.c | 6 +- drivers/firmware/efi/runtime-wrappers.c | 6 + include/linux/efi.h | 16 ++- init/main.c | 3 +- 13 files changed, 415 insertions(+), 22 deletions(-) Signed-off-by: Sai Praneeth Prakhya Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Cc: Al Stone Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Bhupesh Sharma Cc: Ard Biesheuvel -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 2/8] x86/efi: Permanently save the EFI_MEMORY_MAP passed by firmware
From: Sai Praneeth The page fault handler that fixes up page faults caused by firmware needs the original memory map passed by firmware. It looks up this memory map to find the type of the memory region at which the page fault occurred. Presently, EFI subsystem discards the original memory map passed by firmware and replaces it with a new memory map that has only EFI_RUNTIME_SERVICES_ regions, but illegal accesses by firmware can occur at any region. Hence, _only_ if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES is defined, create a backup of the original memory map passed by firmware, so that we can detect/fix illegal accesses to *any* efi regions. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Cc: Al Stone Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Bhupesh Sharma Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 6 ++ arch/x86/platform/efi/efi.c| 2 ++ arch/x86/platform/efi/quirks.c | 49 ++ 3 files changed, 57 insertions(+) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 9b70743400f3..c97f2e955cab 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -142,6 +142,12 @@ extern int __init efi_reuse_config(u64 tables, int nr_tables); extern void efi_delete_dummy_variable(void); extern void efi_switch_mm(struct mm_struct *mm); +#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES +extern void __init efi_save_original_memmap(void); +#else +static inline void __init efi_save_original_memmap(void) { } +#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */ + struct efi_setup_data { u64 fw_vendor; u64 runtime; diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 439c2c40bf03..7d18b7ed5d41 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -946,6 +946,8 @@ static void __init __efi_enter_virtual_mode(void) pa = __pa(new_memmap); + efi_save_original_memmap(); + /* * Unregister the early EFI memmap from efi_init() and install * the new EFI memory map that we are about to pass to the diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 844d31cb8a0c..84b213a1460a 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -654,3 +654,52 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff, } #endif + +#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES + +static bool original_memory_map_present; +static struct efi_memory_map original_memory_map; + +/* + * The page fault handler that fixes up page faults caused by buggy + * firmware needs original memory map (memory map passed by firmware). + * Hence, build a new EFI memmap that has *all* entries and save it for + * later use. + */ +void __init efi_save_original_memmap(void) +{ + efi_memory_desc_t *md; + void *remapped_phys, *new_md; + phys_addr_t new_phys, new_size; + + new_size = efi.memmap.desc_size * efi.memmap.nr_map; + new_phys = efi_memmap_alloc(efi.memmap.nr_map); + if (!new_phys) { + pr_err("Failed to allocate new EFI memmap\n"); + return; + } + + remapped_phys = memremap(new_phys, new_size, MEMREMAP_WB); + if (!remapped_phys) { + pr_err("Failed to remap new EFI memmap\n"); + __free_pages(pfn_to_page(PHYS_PFN(new_phys)), get_order(new_size)); + return; + } + + new_md = remapped_phys; + for_each_efi_memory_desc(md) { + memcpy(new_md, md, efi.memmap.desc_size); + new_md += efi.memmap.desc_size; + } + + original_memory_map.late = 1; + original_memory_map.phys_map = new_phys; + original_memory_map.map = remapped_phys; + original_memory_map.nr_map = efi.memmap.nr_map; + original_memory_map.desc_size = efi.memmap.desc_size; + original_memory_map.map_end = remapped_phys + new_size; + original_memory_map.desc_version = efi.memmap.desc_version; + + original_memory_map_present = true; +} +#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */ -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 5/8] x86/efi: If EFI_WARN_ON_ILLEGAL_ACCESSES is enabled don't call efi_free_boot_services()
From: Sai Praneeth During early boot phase EFI_BOOT_SERVICES_ regions are marked as reserved by kernel (see efi_reserve_boot_services()) and hence are not used by kernel for boot purposes. When EFI_WARN_ON_ILLEGAL_ACCESSES is enabled, page faults triggered by firmware due to illegal accesses to EFI_BOOT_SERVICES_ regions are dynamically fixed by kernel by mapping these regions on demand. This resolution assumes that EFI_BOOT_SERVICES_ regions are intact i.e. no one has ever used these regions except firmware. Hence, to make this assumption true, don't call efi_free_boot_services() if EFI_WARN_ON_ILLEGAL_ACCESSES is enabled. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Cc: Al Stone Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Bhupesh Sharma Cc: Ard Biesheuvel --- init/main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/init/main.c b/init/main.c index 3b4ada11ed52..dce0520861a1 100644 --- a/init/main.c +++ b/init/main.c @@ -730,7 +730,8 @@ asmlinkage __visible void __init start_kernel(void) arch_post_acpi_subsys_init(); sfi_init_late(); - if (efi_enabled(EFI_RUNTIME_SERVICES)) { + if (efi_enabled(EFI_RUNTIME_SERVICES) && + !IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES)) { efi_free_boot_services(); } -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 3/8] x86/efi: Save kernel context before calling EFI Runtime Services
From: Sai Praneeth After the kernel is up and running, the only time firmware executes is when an EFI Runtime Service is invoked by kernel. When invoked, some buggy implementations of EFI Runtime Service could access memory regions which it shouldn't. This will cause a page fault in ring 0 and if unhandled it hangs the kernel. The obvious way to avoid such hangs is to handle the page fault. Remember the sequence of things that lead us to page fault. 1. A user has requested kernel to execute some EFI Runtime Service 2. Kernel prepares and calls requested EFI Runtime Service 3. Requested EFI Runtime Service is buggy and hence caused a page fault 4. The kernel gets back control and it's in interrupt mode If the page fault is handled successfully kernel would be returning control to EFI Runtime Service which in turn returns control back to kernel. But the kernel cannot map the requested efi region because it's long gone. We cannot either mark EFI regions as reserved and dynamically allow access because it will make the kernel un-bootable. The proposed solution here is to save the kernel context before giving away control to firmware (i.e. in step 2) and if the firmware misbehaves, in the page fault handler, we roll back to the saved kernel context. This saves us from executing buggy firmware further and saving ourselves from hanging. Saving kernel context means saving the stack pointer and the instruction that gets executed when firmware returns. In the page fault handler we fix up these two things (RIP and RSP) so that when returning from page fault handler it looks as if firmware has called RET. UEFI specification v2.7, section 2.3.4 "Calling Conventions for X64 platforms" says that "The registers RBX, RBP, RDI, RSI, R12, R13, R14, R15, and XMM6-XMM15 are considered nonvolatile and must be saved and restored by a function that uses them". This means that any EFI Runtime Service that uses the above mentioned registers will save/restore its value. Hence, to emulate the same behaviour we save/restore these registers each and every time we call EFI Runtime Service. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Cc: Al Stone Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Bhupesh Sharma Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 3 ++ arch/x86/platform/efi/efi_64.c | 7 +++ arch/x86/platform/efi/efi_stub_64.S | 101 +++- arch/x86/platform/efi/quirks.c | 4 ++ 4 files changed, 114 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index c97f2e955cab..47202b9e1b8e 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -121,6 +121,9 @@ extern void __iomem *__efi_init_fixup efi_ioremap(unsigned long addr, #endif /* CONFIG_X86_32 */ +extern u64 xmm_regs_rsp; +extern u64 core_regs_rsp; +extern u64 exit_fw_ctx_rip; extern struct efi_scratch efi_scratch; extern void __init efi_set_executable(efi_memory_desc_t *md, bool executable); extern int __init efi_memblock_x86_reserve_range(void); diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index a04298312fdd..7787bc2e58fb 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -627,6 +627,13 @@ void __init efi_dump_pagetable(void) */ void efi_switch_mm(struct mm_struct *mm) { + /* +* Used by efi page fault handler (efi_illegal_accesses_fixup()) to +* check if it was indeed invoked in firmware context. +*/ + xmm_regs_rsp = 0; + exit_fw_ctx_rip = 0; + task_lock(current); efi_scratch.prev_mm = current->active_mm; current->active_mm = mm; diff --git a/arch/x86/platform/efi/efi_stub_64.S b/arch/x86/platform/efi/efi_stub_64.S index 74628ec78f29..c86825c01b4c 100644 --- a/arch/x86/platform/efi/efi_stub_64.S +++ b/arch/x86/platform/efi/efi_stub_64.S @@ -39,6 +39,101 @@ mov %rsi, %cr0; \ mov (%rsp), %rsp +#define SAVE_CORE_REGS_CALLEE \ + pushq %rbx; \ + pushq %rdi; \ + pushq %rsi; \ + pushq %r12; \ + pushq %r13; \ + pushq %r14; \ + pushq %r15 + +#define RESTORE_CORE_REGS_CALLEE \ + popq %r15; \ + popq %r14; \ + popq %r13; \ + popq %r12; \ + popq %rsi; \ + popq %rdi; \ + popq %rbx + +#define SAVE_XMM_REGS_CALLEE \ + subq $0xb0, %rsp; \ + and $~0xf, %rsp ; \ + movaps %xmm6, 0xa0(%rsp); \ + movaps %xmm7, 0x90(%rsp); \ + movaps %xmm8, 0x80(%rsp); \ + movaps %xmm9, 0x70(%rs
[PATCH RFC 4/8] x86/efi: Add page fault handler to fixup/recover from page faults caused by firmware
From: Sai Praneeth EFI regions could briefly be divided into 3 types. 1. EFI_BOOT_SERVICES_ regions 2. EFI_RUNTIME_SERVICES_ regions 3. Other EFI regions like EFI_LOADER_ etc. As per the UEFI specification, after the call to ExitBootServices(), accesses by firmware to any memory region except EFI_RUNTIME_SERVICES_ regions is considered illegal. A buggy firmware could trigger these illegal accesses during boot time or at runtime (i.e. when the kernel is up and running). Presently, the kernel can fix up illegal accesses to EFI_BOOT_SERVICES_ regions *only* during kernel boot phase. If firmware triggers illegal accesses to *any* other EFI regions during kernel boot, the kernel panics or if this happens during kernel runtime then the kernel hangs. Kernel panics/hangs because the memory region requested by firmware isn't mapped which causes a page fault in ring 0 and the kernel fails to handle it leading to die(). To save kernel from hanging we add a page fault handler which detects illegal accesses by firmware and 1. If the illegally accessed region is EFI_BOOT_SERVICES_, the kernel fixes it up by mapping the requested region. 2. If any other region (Eg: EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), then the kernel exits firmware context and disables EFI Runtime Services, so that we will never again call buggy firmware. Illegal accesses to EFI_BOOT_SERVICES_ and to other regions are dealt differently in efi page fault handler because presently during kernel boot EFI_BOOT_SERVICES_ regions are reserved by kernel and hence it's OK to dynamically map these regions in page fault handler. We cannot reserve other EFI regions like EFI_CONVENTIONAL_MEMORY and EFI_LOADER_ as they are very huge in size and reserving them will make the kernel un-bootable. Hence, we take a different approach (exiting firmware context) in dealing with page faults to these regions. The efi specific page fault handler offers us two advantages: 1. Avoid panics/hangs caused by buggy firmware. 2. Shout loud that the firmware is buggy and hence can save ourselves from being blamed for not a fault of ours. Finally, this new mapping will not impact a reboot from kexec, as kexec is only concerned about runtime memory regions. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Cc: Al Stone Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Bhupesh Sharma Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 22 - arch/x86/mm/fault.c | 9 ++ arch/x86/platform/efi/quirks.c | 140 drivers/firmware/efi/runtime-wrappers.c | 6 ++ 4 files changed, 176 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 47202b9e1b8e..1285caccdff4 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -90,8 +90,20 @@ struct efi_scratch { efi_switch_mm(_mm); \ }) +/* + * Returns "EFI_ABORTED" if illegal access by firmware caused to exit + * firmware context, otherwise returns status returned by firmware. + */ #define arch_efi_call_virt(p, f, args...) \ - efi_call((void *)p->f, args)\ +({ \ + efi_status_t __s; \ + \ + __s = efi_call((void *)p->f, args); \ + if (exited_fw_ctx) \ + __s = EFI_ABORTED; \ + \ + __s;\ +}) #define arch_efi_call_virt_teardown() \ ({ \ @@ -124,6 +136,7 @@ extern void __iomem *__efi_init_fixup efi_ioremap(unsigned long addr, extern u64 xmm_regs_rsp; extern u64 core_regs_rsp; extern u64 exit_fw_ctx_rip; +extern bool exited_fw_ctx; extern struct efi_scratch efi_scratch; extern void __init efi_set_executable(efi_memory_desc_t *md, bool executable); extern int __init efi_memblock_x86_reserve_range(void); @@ -147,8 +160,15 @@ extern void efi_switch_mm(struct mm_struct *mm); #ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES extern void __init efi_save_original_memmap(void); +extern int efi_illegal_accesses_fixup(unsigned long phys_addr, + struct pt_regs *regs); #else static inline void __init efi_save_original_memmap(void) { } +static inline int efi_illegal_accesses_fixup(unsigned long phys_addr, +struct pt_regs *regs) +{ + return 0; +} #endif /* CONFIG_EFI_WARN_ON_
[PATCH RFC 8/8] x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES
From: Sai Praneeth There may exist some buggy UEFI firmware implementations that access efi memory regions other than EFI_RUNTIME_SERVICES_ even after kernel has assumed control of the platform. This violates UEFI specification. If selected, this debug option will print a warning message if the UEFI firmware tries to access any memory regions which it shouldn't. Along with the warning, the kernel will also try to fixup/recover from the page fault triggered by firmware so that the machine doesn't hang. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Cc: Al Stone Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Bhupesh Sharma Cc: Ard Biesheuvel --- arch/x86/Kconfig | 17 + 1 file changed, 17 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index f1dbb4ee19d7..9ff11ec65232 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1957,6 +1957,23 @@ config EFI_MIXED If unsure, say N. +config EFI_WARN_ON_ILLEGAL_ACCESSES + bool "Warn about illegal memory accesses by firmware" + depends on EFI + help + Enable this debug feature so that the kernel can detect illegal + memory accesses by firmware and issue a warning. Also, + 1. If the illegally accessed region is EFI_BOOT_SERVICES_, + the kernel fixes it up by mapping the requested region. + 2. If the illegally accessed region is any other region (Eg: + EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), then kernel + exits firmware context and disables EFI Runtime Services, so that + it will never again call buggy firmware. + Please see the UEFI specification for details on the expectations + of memory usage. + + If unsure, say N. + config SECCOMP def_bool y prompt "Enable seccomp to safely compute untrusted bytecode" -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 6/8] x86/efi: Map EFI_BOOT_SERVICES_ regions only when EFI_WARN_ON_ILLEGAL_ACCESSES is disabled
From: Sai Praneeth Presently, the kernel maps EFI_BOOT_SERVICES_ regions as a workaround for buggy firmware that accesses them even when they shouldn't. With EFI_WARN_ON_ILLEGAL_ACCESSES enabled kernel can now detect and handle such accesses dynamically. Hence, rather than safely mapping all the EFI_BOOT_SERVICES_ regions, map only EFI_RUNTIME_SERVICES_ regions and trap all other illegal accesses. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Cc: Al Stone Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Bhupesh Sharma Cc: Ard Biesheuvel --- arch/x86/platform/efi/efi.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 7d18b7ed5d41..0ddb22a03d88 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -768,9 +768,13 @@ static bool should_map_region(efi_memory_desc_t *md) /* * Map boot services regions as a workaround for buggy * firmware that accesses them even when they shouldn't. +* (only if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES is disabled) * * See efi_{reserve,free}_boot_services(). */ + if (IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES)) + return false; + if (md->type == EFI_BOOT_SERVICES_CODE || md->type == EFI_BOOT_SERVICES_DATA) return true; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 1/8] x86/efi: Remove __init attribute from memory mapping functions
From: Sai Praneeth Buggy firmware could illegally access EFI_BOOT_SERVICES_CODE/DATA regions even after kernel has assumed control of the platform. When "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES" is enabled we provide a page fault handler that could detect/fixup these illegal accesses. The below modified functions are used by the page fault handler to fixup illegal accesses to EFI_BOOT_SERVICES_CODE/DATA regions. As the page fault handler is present during/after kernel boot it doesn't have an __init attribute but the below functions have it and thus during kernel build, we observe "WARNING: modpost: Found * section mismatch(es)". To fix this build warning we remove __init attribute for all these functions. In order to not keep these functions needlessly when "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES" is not selected, we add a new __efi_init_fixup attribute whose value changes based on whether the config option is selected or not. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Matt Fleming Based-on-code-from: Ricardo Neri Cc: Al Stone Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Bhupesh Sharma Cc: Ard Biesheuvel --- arch/x86/include/asm/efi.h | 11 ++- arch/x86/platform/efi/efi.c| 4 ++-- arch/x86/platform/efi/efi_32.c | 2 +- arch/x86/platform/efi/efi_64.c | 9 + drivers/firmware/efi/efi.c | 6 +++--- include/linux/efi.h| 16 ++-- 6 files changed, 31 insertions(+), 17 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index cec5fae23eb3..9b70743400f3 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -103,8 +103,9 @@ struct efi_scratch { preempt_enable(); \ }) -extern void __iomem *__init efi_ioremap(unsigned long addr, unsigned long size, - u32 type, u64 attribute); +extern void __iomem *__efi_init_fixup efi_ioremap(unsigned long addr, + unsigned long size, u32 type, + u64 attribute); #ifdef CONFIG_KASAN /* @@ -126,13 +127,13 @@ extern int __init efi_memblock_x86_reserve_range(void); extern pgd_t * __init efi_call_phys_prolog(void); extern void __init efi_call_phys_epilog(pgd_t *save_pgd); extern void __init efi_print_memmap(void); -extern void __init efi_memory_uc(u64 addr, unsigned long size); -extern void __init efi_map_region(efi_memory_desc_t *md); +extern void __efi_init_fixup efi_memory_uc(u64 addr, unsigned long size); +extern void __efi_init_fixup efi_map_region(efi_memory_desc_t *md); extern void __init efi_map_region_fixed(efi_memory_desc_t *md); extern void efi_sync_low_kernel_mappings(void); extern int __init efi_alloc_page_tables(void); extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages); -extern void __init old_map_region(efi_memory_desc_t *md); +extern void __efi_init_fixup old_map_region(efi_memory_desc_t *md); extern void __init runtime_code_page_mkexec(void); extern void __init efi_runtime_update_mappings(void); extern void __init efi_dump_pagetable(void); diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 9061babfbc83..439c2c40bf03 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -572,7 +572,7 @@ void __init runtime_code_page_mkexec(void) } } -void __init efi_memory_uc(u64 addr, unsigned long size) +void __efi_init_fixup efi_memory_uc(u64 addr, unsigned long size) { unsigned long page_shift = 1UL << EFI_PAGE_SHIFT; u64 npages; @@ -582,7 +582,7 @@ void __init efi_memory_uc(u64 addr, unsigned long size) set_memory_uc(addr, npages); } -void __init old_map_region(efi_memory_desc_t *md) +void __efi_init_fixup old_map_region(efi_memory_desc_t *md) { u64 start_pfn, end_pfn, end; unsigned long size; diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c index 324b93328b37..8f31452bd204 100644 --- a/arch/x86/platform/efi/efi_32.c +++ b/arch/x86/platform/efi/efi_32.c @@ -58,7 +58,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) return 0; } -void __init efi_map_region(efi_memory_desc_t *md) +void __efi_init_fixup efi_map_region(efi_memory_desc_t *md) { old_map_region(md); } diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 448267f1c073..a04298312fdd 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -408,7 +408,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) return 0; } -static void __init __map_region(efi_memory_desc_t *md, u64 va) +static void __efi_init_fixup __map_region(efi_memory_desc_t *md, u64 va) { unsigned long flags = _PAGE_RW; unsigned long pfn; @@ -426,7 +426,7 @@ static void __init __map_r
[PATCH 1/6] efi: Introduce efi_memmap_free() to free memory allocated by efi_memmap_alloc()
From: Sai Praneeth efi_memmap_alloc() allocates memory depending on whether mm_init() has already been invoked or not. Apart from memblock_alloc() memory and alloc_pages() memory, efi memory map could also have a third variant of memory allocation and that is memblock_reserved. This happens only for the memory map passed to kernel by firmware and thus can happen only once during boot process. In order to identify these three different types of allocations and thus to call the appropriate free() variant, introduce an enum named efi_memmap_type and also introduce a efi memmap API named efi_memmap_free() to free memory allocated by efi_memmap_alloc(). Signed-off-by: Sai Praneeth Prakhya Suggested-by: Ard Biesheuvel Cc: Lee Chun-Yi Cc: Dave Young Cc: Borislav Petkov Cc: Laszlo Ersek Cc: Jan Kiszka Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Nicolai Stange Cc: Naresh Bhat Cc: Ricardo Neri Cc: Peter Zijlstra Cc: Taku Izumi Cc: Ravi Shankar Cc: Matt Fleming Cc: Dan Williams Cc: Ard Biesheuvel --- drivers/firmware/efi/memmap.c | 28 include/linux/efi.h | 8 2 files changed, 36 insertions(+) diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c index 5fc70520e04c..0686e063c644 100644 --- a/drivers/firmware/efi/memmap.c +++ b/drivers/firmware/efi/memmap.c @@ -12,6 +12,7 @@ #include #include #include +#include static phys_addr_t __init __efi_memmap_alloc_early(unsigned long size) { @@ -50,6 +51,33 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries) } /** + * efi_memmap_free - Free memory allocated by efi_memmap_alloc() + * @mem: Physical address allocated by efi_memmap_alloc(). + * @num_entries: Number of entries in the allocated map. + * @alloc_type: What type of allocation did efi_memmap_alloc() perform? + * + * Use this function to free memory allocated by efi_memmap_alloc(). + * efi_memmap_alloc() allocates memory depending on whether mm_init() + * has already been invoked or not. It uses either memblock or "normal" + * page allocation, similarly, we free it in two different ways. Also + * note that there is a third type of memory used by memmap which is + * memblock_reserved() and is passed by EFI stub to kernel. + */ +void __init efi_memmap_free(phys_addr_t mem, unsigned int num_entries, + enum efi_memmap_type alloc_type) +{ + unsigned long size = num_entries * efi.memmap.desc_size; + unsigned int order = get_order(size); + + if (alloc_type == BUDDY_ALLOCATOR) + __free_pages(pfn_to_page(PHYS_PFN(mem)), order); + else if (alloc_type == MEMBLOCK) + memblock_free(mem, size); + else + free_bootmem(mem, size); +} + +/** * __efi_memmap_init - Common code for mapping the EFI memory map * @data: EFI memory map data * @late: Use early or late mapping function? diff --git a/include/linux/efi.h b/include/linux/efi.h index 56add823f190..455875c01ed1 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -765,6 +765,12 @@ struct efi_memory_map_data { unsigned long desc_size; }; +enum efi_memmap_type { + EFI_STUB, + MEMBLOCK, + BUDDY_ALLOCATOR, +}; + struct efi_memory_map { phys_addr_t phys_map; void *map; @@ -1016,6 +1022,8 @@ extern int __init efi_memmap_split_count(efi_memory_desc_t *md, struct range *range); extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf, struct efi_mem_range *mem); +extern void __init efi_memmap_free(phys_addr_t mem, unsigned int num_entries, + enum efi_memmap_type alloc_type); extern int efi_config_init(efi_config_table_type_t *arch_tables); #ifdef CONFIG_EFI_ESRT -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] efi: Use efi.memmap.alloc_type instead of efi.memmap.late
From: Sai Praneeth Memory used by efi memory map could be one among the three different types, namely a) memblock_reserved b) memblock_alloc'ed c) normal paged memory. Presently, we use efi.memmap.late which is of type "bool" to record the type of memory in use by efi memory map. As "bool" doesn't suffice our needs, replace it with enum to represent one among the three different available types of memory and hence also change all the corresponding memmap API's to reflect the same. Also, presently, we never freed memblock_reserved memory and hence never recorded it's usage. Change efi_memmap_init_early() so that it could now record the usage of memblock_reserved memory and can be freed when appropriate. Also, change efi_memmap_install() and __efi_memmap_init() so that at every point of time we could record the type of memory in use by efi memory map and hence use "efi.memmap.alloc_type" to free the existing memory before installing a new memory map. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Ard Biesheuvel Cc: Lee Chun-Yi Cc: Dave Young Cc: Borislav Petkov Cc: Laszlo Ersek Cc: Jan Kiszka Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Nicolai Stange Cc: Naresh Bhat Cc: Ricardo Neri Cc: Peter Zijlstra Cc: Taku Izumi Cc: Ravi Shankar Cc: Matt Fleming Cc: Dan Williams Cc: Ard Biesheuvel --- arch/x86/platform/efi/efi.c | 4 ++-- arch/x86/platform/efi/quirks.c | 4 ++-- drivers/firmware/efi/arm-init.c | 2 +- drivers/firmware/efi/fake_mem.c | 2 +- drivers/firmware/efi/memmap.c | 34 +++--- include/linux/efi.h | 8 +--- 6 files changed, 30 insertions(+), 24 deletions(-) diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 9061babfbc83..cda54abf25a6 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -196,7 +196,7 @@ int __init efi_memblock_x86_reserve_range(void) data.desc_size = e->efi_memdesc_size; data.desc_version = e->efi_memdesc_version; - rv = efi_memmap_init_early(); + rv = efi_memmap_init_early(, EFI_STUB); if (rv) return rv; @@ -272,7 +272,7 @@ static void __init efi_clean_memmap(void) u64 size = efi.memmap.nr_map - n_removal; pr_warn("Removing %d invalid memory map entries.\n", n_removal); - efi_memmap_install(efi.memmap.phys_map, size); + efi_memmap_install(efi.memmap.phys_map, size, EFI_STUB); } } diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 84e8d077adf6..11fa6ac9f0c2 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -292,7 +292,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) efi_memmap_insert(, new, ); early_memunmap(new, new_size); - efi_memmap_install(new_phys, num_entries); + efi_memmap_install(new_phys, num_entries, alloc_type); } /* @@ -452,7 +452,7 @@ void __init efi_free_boot_services(void) memunmap(new); - if (efi_memmap_install(new_phys, num_entries)) { + if (efi_memmap_install(new_phys, num_entries, alloc_type)) { pr_err("Could not install new EFI memmap\n"); return; } diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c index b5214c143fee..f0de8df6f396 100644 --- a/drivers/firmware/efi/arm-init.c +++ b/drivers/firmware/efi/arm-init.c @@ -239,7 +239,7 @@ void __init efi_init(void) data.size = params.mmap_size; data.phys_map = params.mmap; - if (efi_memmap_init_early() < 0) { + if (efi_memmap_init_early(, EFI_STUB) < 0) { /* * If we are booting via UEFI, the UEFI memory map is the only * description of memory we have, so there is little point in diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c index 955e690b8325..82dcfa1c340b 100644 --- a/drivers/firmware/efi/fake_mem.c +++ b/drivers/firmware/efi/fake_mem.c @@ -90,7 +90,7 @@ void __init efi_fake_memmap(void) /* swap into new EFI memmap */ early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map); - efi_memmap_install(new_memmap_phy, new_nr_map); + efi_memmap_install(new_memmap_phy, new_nr_map, alloc_type); /* print new EFI memmap */ efi_print_memmap(); diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c index 69b81d355619..d4e3e114cf86 100644 --- a/drivers/firmware/efi/memmap.c +++ b/drivers/firmware/efi/memmap.c @@ -88,7 +88,7 @@ void __init efi_memmap_free(phys_addr_t mem, unsigned int num_entries, /** * __efi_memmap_init - Common code for mapping the EFI memory map * @data: EFI memory map data - * @late: Use early or late mapping function? + * @alloc_type: Use early or late mapping function? * * This functi
[PATCH 4/6] x86/efi: Free existing memory map before installing new memory map
From: Sai Praneeth efi_memmap_install(), unmaps the existing memory map and installs a new memory map but doesn't free the memory allocated to the existing memory map. Fortunately, the details about the existing memory map (like the physical address, number of entries and type of memory) are stored in efi.memmap. Hence, use them to free the memory. In __efi_enter_virtual_mode(), we don't use efi_memmap_install() to install a new memory map, instead we use efi_memmap_init_late(). Hence, free existing memory map there too before installing a new memory map. Generally, memory for new memory map is allocated using efi_memmap_alloc() but in __efi_enter_virtual_mode() it's done using realloc_pages() [please see efi_map_regions()]. So, it's OK to free this memory using efi_memmap_free() in efi_free_boot_services(). Also, note that the first time efi_free_memmap() is called either from efi_fake_memmap() or efi_arch_mem_reserve() [depending on the boot sequence], we are actually freeing memblock_reserved memory which isn't allocated by efi_memmap_alloc(). So, there are two outliers where we use efi_free_memmap() to free memory allocated through other sources rather than efi_memmap_alloc(). Signed-off-by: Sai Praneeth Prakhya Suggested-by: Ard Biesheuvel Cc: Lee Chun-Yi Cc: Dave Young Cc: Borislav Petkov Cc: Laszlo Ersek Cc: Jan Kiszka Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Nicolai Stange Cc: Naresh Bhat Cc: Ricardo Neri Cc: Peter Zijlstra Cc: Taku Izumi Cc: Ravi Shankar Cc: Matt Fleming Cc: Dan Williams Cc: Ard Biesheuvel --- arch/x86/platform/efi/efi.c | 3 +++ arch/x86/platform/efi/quirks.c | 6 ++ drivers/firmware/efi/fake_mem.c | 3 +++ 3 files changed, 12 insertions(+) diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index cda54abf25a6..7756426e93b5 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -952,6 +952,9 @@ static void __init __efi_enter_virtual_mode(void) * firmware via SetVirtualAddressMap(). */ efi_memmap_unmap(); + /* Free existing memory map before installing new memory map */ + efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map, + efi.memmap.alloc_type); if (efi_memmap_init_late(pa, efi.memmap.desc_size * count)) { pr_err("Failed to remap late EFI memory map\n"); diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 11fa6ac9f0c2..11800f3cbb93 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -292,6 +292,9 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) efi_memmap_insert(, new, ); early_memunmap(new, new_size); + /* Free existing memory map before installing new memory map */ + efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map, + efi.memmap.alloc_type); efi_memmap_install(new_phys, num_entries, alloc_type); } @@ -452,6 +455,9 @@ void __init efi_free_boot_services(void) memunmap(new); + /* Free existing memory map before installing new memory map */ + efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map, + efi.memmap.alloc_type); if (efi_memmap_install(new_phys, num_entries, alloc_type)) { pr_err("Could not install new EFI memmap\n"); return; diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c index 82dcfa1c340b..a47754efb796 100644 --- a/drivers/firmware/efi/fake_mem.c +++ b/drivers/firmware/efi/fake_mem.c @@ -90,6 +90,9 @@ void __init efi_fake_memmap(void) /* swap into new EFI memmap */ early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map); + /* Free existing memory map before installing new memory map */ + efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map, + efi.memmap.alloc_type); efi_memmap_install(new_memmap_phy, new_nr_map, alloc_type); /* print new EFI memmap */ -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] efi: Let the user of efi_memmap_alloc() know the type of allocation performed
From: Sai Praneeth efi_memmap_alloc(), as the name suggests, allocates memory for a new efi memory map and it does so depending on whether mm_init() has already been invoked or not. As we have introduced efi_memmap_free() to free the memory allocated by efi_memmap_alloc(), modify efi_memmap_alloc() to include "efi_memmap_type", so that the caller of efi_memmap_alloc() will know the type of allocation performed and later use the same to free the memory should remap fail. Without "efi_memmap_type" there would be no way for efi_memmap_free() to know the type of allocation performed by efi_memmap_alloc(). Also, "efi_memmap_type" will make sure that efi_memmap_alloc() and efi_memmap_free() are always binded properly i.e. a user could use efi_memmap_alloc() before slab_is_available() and use efi_memmap_free() on the same memory but after slab_is_available(). Without "efi_memmap_type", efi_memmap_free() would be using wrong free variant. With "efi_memmap_type", we make this relationship between efi_memmap_alloc() and efi_memmap_free() explicit to the user. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Ard Biesheuvel Cc: Lee Chun-Yi Cc: Dave Young Cc: Borislav Petkov Cc: Laszlo Ersek Cc: Jan Kiszka Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Nicolai Stange Cc: Naresh Bhat Cc: Ricardo Neri Cc: Peter Zijlstra Cc: Taku Izumi Cc: Ravi Shankar Cc: Matt Fleming Cc: Dan Williams Cc: Ard Biesheuvel --- arch/x86/platform/efi/quirks.c | 6 -- drivers/firmware/efi/fake_mem.c | 3 ++- drivers/firmware/efi/memmap.c | 12 ++-- include/linux/efi.h | 3 ++- 4 files changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 36c1f8b9f7e0..84e8d077adf6 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -248,6 +248,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) efi_memory_desc_t md; int num_entries; void *new; + enum efi_memmap_type alloc_type; if (efi_mem_desc_lookup(addr, )) { pr_err("Failed to lookup EFI memory descriptor for %pa\n", ); @@ -276,7 +277,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) new_size = efi.memmap.desc_size * num_entries; - new_phys = efi_memmap_alloc(num_entries); + new_phys = efi_memmap_alloc(num_entries, _type); if (!new_phys) { pr_err("Could not allocate boot services memmap\n"); return; @@ -375,6 +376,7 @@ void __init efi_free_boot_services(void) efi_memory_desc_t *md; int num_entries = 0; void *new, *new_md; + enum efi_memmap_type alloc_type; for_each_efi_memory_desc(md) { unsigned long long start = md->phys_addr; @@ -420,7 +422,7 @@ void __init efi_free_boot_services(void) return; new_size = efi.memmap.desc_size * num_entries; - new_phys = efi_memmap_alloc(num_entries); + new_phys = efi_memmap_alloc(num_entries, _type); if (!new_phys) { pr_err("Failed to allocate new EFI memmap\n"); return; diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c index 6c7d60c239b5..955e690b8325 100644 --- a/drivers/firmware/efi/fake_mem.c +++ b/drivers/firmware/efi/fake_mem.c @@ -57,6 +57,7 @@ void __init efi_fake_memmap(void) phys_addr_t new_memmap_phy; void *new_memmap; int i; + enum efi_memmap_type alloc_type; if (!nr_fake_mem) return; @@ -71,7 +72,7 @@ void __init efi_fake_memmap(void) } /* allocate memory for new EFI memmap */ - new_memmap_phy = efi_memmap_alloc(new_nr_map); + new_memmap_phy = efi_memmap_alloc(new_nr_map, _type); if (!new_memmap_phy) return; diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c index 0686e063c644..69b81d355619 100644 --- a/drivers/firmware/efi/memmap.c +++ b/drivers/firmware/efi/memmap.c @@ -33,6 +33,7 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size) /** * efi_memmap_alloc - Allocate memory for the EFI memory map * @num_entries: Number of entries in the allocated map. + * @alloc_type: Type of allocation performed (memblock or normal)? * * Depending on whether mm_init() has already been invoked or not, * either memblock or "normal" page allocation is used. @@ -40,13 +41,20 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned long size) * Returns the physical address of the allocated memory map on * success, zero on failure. */ -phys_addr_t __init efi_memmap_alloc(unsigned int num_entries) +phys_addr_t __init efi_memmap_alloc(unsigned int num_entries, + enum efi_memmap_type *alloc_type) {
[PATCH 5/6] x86/efi: Free allocated memory if remap fails
From: Sai Praneeth efi_memmap_alloc(), as the name suggests, allocates memory for a new efi memory map. It's referenced from couple of places, namely, efi_arch_mem_reserve() and efi_free_boot_services(). These callers, after allocating memory, remap it for further use. As usual, a routine check is performed to confirm successful remap. If the remap fails, ideally, the allocated memory should be freed but presently we just return without freeing it up. Hence, fix this bug by freeing the memory with efi_memmap_free(). Also, efi_fake_memmap() references efi_memmap_alloc() but it frees memory correctly using memblock_free(), but replace it with efi_memmap_free() to maintain consistency, as in, allocate memory with efi_memmap_alloc() and free memory with efi_memmap_free(). It's a fact that memremap() and early_memremap() might never fail and this code might never get a chance to run but to maintain good kernel programming semantics, we might need this patch. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Ard Biesheuvel Cc: Lee Chun-Yi Cc: Dave Young Cc: Borislav Petkov Cc: Laszlo Ersek Cc: Jan Kiszka Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Nicolai Stange Cc: Naresh Bhat Cc: Ricardo Neri Cc: Peter Zijlstra Cc: Taku Izumi Cc: Ravi Shankar Cc: Matt Fleming Cc: Dan Williams Cc: Ard Biesheuvel --- arch/x86/platform/efi/quirks.c | 10 -- drivers/firmware/efi/fake_mem.c | 2 +- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 11800f3cbb93..8fce327387e5 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -286,6 +286,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) new = early_memremap(new_phys, new_size); if (!new) { pr_err("Failed to map new boot services memmap\n"); + efi_memmap_free(new_phys, num_entries, alloc_type); return; } @@ -434,7 +435,7 @@ void __init efi_free_boot_services(void) new = memremap(new_phys, new_size, MEMREMAP_WB); if (!new) { pr_err("Failed to map new EFI memmap\n"); - return; + goto free_mem; } /* @@ -460,8 +461,13 @@ void __init efi_free_boot_services(void) efi.memmap.alloc_type); if (efi_memmap_install(new_phys, num_entries, alloc_type)) { pr_err("Could not install new EFI memmap\n"); - return; + goto free_mem; } + + return; + +free_mem: + efi_memmap_free(new_phys, num_entries, alloc_type); } /* diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c index a47754efb796..09b0fabf07fd 100644 --- a/drivers/firmware/efi/fake_mem.c +++ b/drivers/firmware/efi/fake_mem.c @@ -80,7 +80,7 @@ void __init efi_fake_memmap(void) new_memmap = early_memremap(new_memmap_phy, efi.memmap.desc_size * new_nr_map); if (!new_memmap) { - memblock_free(new_memmap_phy, efi.memmap.desc_size * new_nr_map); + efi_memmap_free(new_memmap_phy, new_nr_map, alloc_type); return; } -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] efi: Fix unaligned fake memmap entries corrupting efi memory map
From: Sai Praneeth efi_fake_memmap() inserts user given fake memory map entries into the original efi memory map using efi_memmmap_insert(). efi_memmmap_insert() checks for EFI_PAGE_SIZE alignment and could fail if an unaligned efi memory region is passed (Eg: efi_fake_memmap=1K@0x73ae: 0x8000). Since EFI_PAGE_SIZE is 4K the above request fails, but efi_fake_memmap() doesn't check for failures in efi_memmap_insert() and installs an empty efi memory map from efi_memmap_alloc(). Since efi memory map is corrupted all the later efi calls fail too. Hence, fix this bug by changing the return type of efi_memmap_insert() from void to int. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Ard Biesheuvel Cc: Lee Chun-Yi Cc: Dave Young Cc: Borislav Petkov Cc: Laszlo Ersek Cc: Jan Kiszka Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Nicolai Stange Cc: Naresh Bhat Cc: Ricardo Neri Cc: Peter Zijlstra Cc: Taku Izumi Cc: Ravi Shankar Cc: Matt Fleming Cc: Dan Williams Cc: Ard Biesheuvel --- arch/x86/platform/efi/quirks.c | 8 +++- drivers/firmware/efi/fake_mem.c | 11 +-- drivers/firmware/efi/memmap.c | 12 include/linux/efi.h | 4 ++-- 4 files changed, 26 insertions(+), 9 deletions(-) diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 8fce327387e5..0e607ac24a3b 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -290,7 +290,13 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) return; } - efi_memmap_insert(, new, ); + if (efi_memmap_insert(, new, )) { + pr_err("Failed to reserve EFI memory region\n"); + early_memunmap(new, new_size); + efi_memmap_free(new_phys, num_entries, alloc_type); + return; + } + early_memunmap(new, new_size); /* Free existing memory map before installing new memory map */ diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c index 09b0fabf07fd..ae373af6931b 100644 --- a/drivers/firmware/efi/fake_mem.c +++ b/drivers/firmware/efi/fake_mem.c @@ -84,8 +84,15 @@ void __init efi_fake_memmap(void) return; } - for (i = 0; i < nr_fake_mem; i++) - efi_memmap_insert(, new_memmap, _mems[i]); + for (i = 0; i < nr_fake_mem; i++) { + if (efi_memmap_insert(, new_memmap, _mems[i])) { + pr_err("efi_fake_mem: Failed to create fake memmap\n"); + early_memunmap(new_memmap, + efi.memmap.desc_size * new_nr_map); + efi_memmap_free(new_memmap_phy, new_nr_map, alloc_type); + return; + } + } /* swap into new EFI memmap */ early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map); diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c index d4e3e114cf86..05a556e63ec2 100644 --- a/drivers/firmware/efi/memmap.c +++ b/drivers/firmware/efi/memmap.c @@ -290,9 +290,11 @@ int __init efi_memmap_split_count(efi_memory_desc_t *md, struct range *range) * * It is suggested that you call efi_memmap_split_count() first * to see how large @buf needs to be. + * + * Returns zero on success, a negative error code on failure. */ -void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf, - struct efi_mem_range *mem) +int __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf, +struct efi_mem_range *mem) { u64 m_start, m_end, m_attr; efi_memory_desc_t *md; @@ -311,8 +313,9 @@ void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf, */ if (!IS_ALIGNED(m_start, EFI_PAGE_SIZE) || !IS_ALIGNED(m_end + 1, EFI_PAGE_SIZE)) { - WARN_ON(1); - return; + WARN(1, "Address 0x%llx - 0x%llx is not EFI_PAGE_SIZE aligned", +m_start, m_end); + return -EINVAL; } for (old = old_memmap->map, new = buf; @@ -379,4 +382,5 @@ void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf, md->attribute |= m_attr; } } + return 0; } diff --git a/include/linux/efi.h b/include/linux/efi.h index c9752c67d184..bca955205a3f 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1023,8 +1023,8 @@ extern int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map, enum efi_memmap_type alloc_type); extern int __init efi_memmap_split_count(efi_memory_desc_t *md, struct range *range); -extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap, -
[PATCH 0/6] Fix memory leaks in efi subsystem
x/efi.h | 23 --- 6 files changed, 131 insertions(+), 42 deletions(-) Signed-off-by: Sai Praneeth Prakhya Suggested-by: Ard Biesheuvel Cc: Lee Chun-Yi Cc: Dave Young Cc: Borislav Petkov Cc: Laszlo Ersek Cc: Jan Kiszka Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Nicolai Stange Cc: Naresh Bhat Cc: Ricardo Neri Cc: Peter Zijlstra Cc: Taku Izumi Cc: Ravi Shankar Cc: Matt Fleming Cc: Dan Williams Cc: Ard Biesheuvel -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] efi: Free existing memory map before installing new memory map
From: Sai Praneeth efi_memmap_install(), unmaps the existing memory map and installs the new memory map but doesn't free the memory allocated to the existing memory map. Fortunately, the details about the existing memory map are stored in efi.memmap. Hence, use them to free the memory. Signed-off-by: Sai Praneeth Prakhya Reported-by: Ard Biesheuvel Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Dave Young Cc: Laszlo Ersek Cc: Bhupesh Sharma Cc: Ricardo Neri Cc: Ravi Shankar Cc: Matt Fleming Cc: Ard Biesheuvel --- Note: Patch based on efi tree @https://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git drivers/firmware/efi/memmap.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c index 678e85704054..68b27b14fe94 100644 --- a/drivers/firmware/efi/memmap.c +++ b/drivers/firmware/efi/memmap.c @@ -229,6 +229,9 @@ int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map) efi_memmap_unmap(); + /* Free the memory allocated to the existing memory map */ + efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map, efi.memmap.late); + data.phys_map = addr; data.size = efi.memmap.desc_size * nr_map; data.desc_version = efi.memmap.desc_version; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] efi: Remove the declaration of efi_late_init() as the function is unused
From: Sai Praneeth Commit 7b0a911478c74 (efi/x86: Move the EFI BGRT init code to early init code), removed the implementation and all the references to efi_late_init() but the function is still declared at include/linux/efi.h. Hence, remove the unnecessary declaration. Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Dave Young Cc: Bhupesh Sharma Cc: Ricardo Neri Cc: Ravi Shankar Cc: Matt Fleming Cc: Ard Biesheuvel --- include/linux/efi.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/linux/efi.h b/include/linux/efi.h index 56add823f190..ae47be636b98 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -988,14 +988,12 @@ extern void efi_memmap_walk (efi_freemem_callback_t callback, void *arg); extern void efi_gettimeofday (struct timespec64 *ts); extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, if possible */ #ifdef CONFIG_X86 -extern void efi_late_init(void); extern void efi_free_boot_services(void); extern efi_status_t efi_query_variable_store(u32 attributes, unsigned long size, bool nonblocking); extern void efi_find_mirror(void); #else -static inline void efi_late_init(void) {} static inline void efi_free_boot_services(void) {} static inline efi_status_t efi_query_variable_store(u32 attributes, -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3] x86/efi: Free allocated memory if remap fails
From: Sai Praneeth efi_memmap_alloc(), as the name suggests, allocates memory for a new efi memory map. It's referenced from couple of places, namely, efi_arch_mem_reserve() and efi_free_boot_services(). These callers, after allocating memory, remap it for further use. As usual, a routine check is performed to confirm successful remap. If the remap fails, ideally, the allocated memory should be freed but presently we just return without freeing it up. Hence, fix this bug by introducing efi_memmap_free() which frees memory allocated by efi_memmap_alloc(). As efi_memmap_alloc() allocates memory depending on whether mm_init() has already been invoked or not, introduce a new argument called "late" that lets us know which type of allocation was performed by efi_memmap_alloc(). Later, this is used by efi_memmap_free() to invoke the appropriate method to free the allocated memory. The other main purpose "late" argument serves is to make sure that efi_memmap_alloc() and efi_memmap_free() are always binded properly i.e. there could be a scenario in which efi_memmap_alloc() is used before slab_is_available() and efi_memmap_free() could be used after slab_is_available(). Without "late", this could break because allocation would have been done using memblock_alloc() while freeing will be done using __free_pages(). Since these API's could easily be misused make it explicit, so that the caller has to pass "late" argument to efi_memmap_alloc() and later use the same for efi_memmap_free(). Also, efi_fake_memmap() references efi_memmap_alloc() but it frees memory correctly using memblock_free(), but replace it with efi_memmap_free() to maintain consistency, as in, allocate memory with efi_memmap_alloc() and free memory with efi_memmap_free(). It's a fact that memremap() and early_memremap() might never fail and this code might never get a chance to run but to maintain good kernel programming semantics, we might need this patch. Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Tony Luck Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Ricardo Neri Cc: Peter Zijlstra Cc: Ravi Shankar Cc: Matt Fleming Cc: Ard Biesheuvel --- Changes from V2 to V3: -- 1. Add a new argument "late" to efi_memmap_alloc(), so that efi_memmap_alloc() could communicate the type of allocation performed. 2. Re-introduce efi_memmap_free() (from V1) but with an extra argument "late", to know the type of allocation performed by efi_memmap_alloc(). Changes from V1 to V2: -- 1. Fix the bug of freeing memory map that was just installed by correctly calling free_pages(). 2. Call memblock_free() and __free_pages() directly from the appropriate places instead of efi_memmap_free(). Note: Patch based on Linus's mainline tree V4.18-rc1 arch/x86/platform/efi/quirks.c | 16 drivers/firmware/efi/fake_mem.c | 5 +++-- drivers/firmware/efi/memmap.c | 38 -- include/linux/efi.h | 3 ++- 4 files changed, 53 insertions(+), 9 deletions(-) diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 36c1f8b9f7e0..ef5698a3af7a 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -248,6 +248,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) efi_memory_desc_t md; int num_entries; void *new; + bool late; if (efi_mem_desc_lookup(addr, )) { pr_err("Failed to lookup EFI memory descriptor for %pa\n", ); @@ -276,7 +277,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) new_size = efi.memmap.desc_size * num_entries; - new_phys = efi_memmap_alloc(num_entries); + new_phys = efi_memmap_alloc(num_entries, ); if (!new_phys) { pr_err("Could not allocate boot services memmap\n"); return; @@ -285,6 +286,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) new = early_memremap(new_phys, new_size); if (!new) { pr_err("Failed to map new boot services memmap\n"); + efi_memmap_free(new_phys, num_entries, late); return; } @@ -375,6 +377,7 @@ void __init efi_free_boot_services(void) efi_memory_desc_t *md; int num_entries = 0; void *new, *new_md; + bool late; for_each_efi_memory_desc(md) { unsigned long long start = md->phys_addr; @@ -420,7 +423,7 @@ void __init efi_free_boot_services(void) return; new_size = efi.memmap.desc_size * num_entries; - new_phys = efi_memmap_alloc(num_entries); + new_phys = efi_memmap_alloc(num_entries, ); if (!new_phys) { pr_err("Failed to allocate new EFI memmap\n"); return; @@ -429,7 +4
Re: [PATCH V2] x86/efi: Free allocated memory if remap fails
> > > Thank you Sai. > > But this is not really what I meant. > Ya.. sorry! about that. I had a hunch that you might be suggesting something like below but I went ahead with this implementation as it looked very simple (just 3 insertions and no deletions) > How about we modify efi_memmap_alloc() like this > It sounds like a good idea to me. Leaving aside the pros (which are obvious), the only con I could see is few extra checks and some code but I don't think it's an issue at all because this code is not in fast path and it dosen't impact performance. So, I will post a V3 with suggested changes. > @@ -39,10 +39,12 @@ static phys_addr_t __init > __efi_memmap_alloc_late(unsigned long size) > Â * Returns the physical address of the allocated memory map on > Â * success, zero on failure. > Â */ > -phys_addr_t __init efi_memmap_alloc(unsigned int num_entries) > +phys_addr_t __init efi_memmap_alloc(unsigned int num_entries, bool *late) > Â { > unsigned long size = num_entries * efi.memmap.desc_size; > > +Â Â Â if (late) > +Â Â Â *late = slab_is_available(); > if (slab_is_available()) > return __efi_memmap_alloc_late(size); > > and introduce efi_memmap_free() as before, but pass it the 'late' > parameter you received from efi_memmap_alloc(). That way, it is the > caller's job to take care of this. > Sure! makes sense. > Also, it seems to me that efi_arch_mem_reserve() leaks the old memory > map every time you create a new one, no? I think you are right. The issue I see is (please let me know if you think otherwise): 1. efi_arch_mem_reserve() comes up with a new memory map and then tries to install it via efi_memmap_install(). 2. efi_memmap_install(), unmaps the existing memory map and installs the new memory map but doesn't free the memory used by the existing memory map. Hence, as you said, leaks the old memory map. If this you what you meant, I think, the issue is not just limited to efi_arch_mem_reserve() but to all the places that call efi_memmap_install(). I think, we could solve it as below diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c index 678e85704054..50ae4ffbf058 100644 --- a/drivers/firmware/efi/memmap.c +++ b/drivers/firmware/efi/memmap.c @@ -228,6 +228,7 @@ int __init efi_memmap_install(phys_addr_t addr, unsigned int nr_map) struct efi_memory_map_data data; Â efi_memmap_unmap(); +Â Â Â efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map, efi.memmap.late); Â data.phys_map = addr; data.size = efi.memmap.desc_size * nr_map; Please let me know your thoughts on it. > That is a separate issue that > you may want to look into, but it affects the design of this API as > well. Probably, I could have misunderstood you here.. but I think the efi_memmap_free() API in V3 should work (without changes). Don't you think so? Regards, Sai -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2] x86/efi: Free allocated memory if remap fails
From: Sai Praneeth efi_memmap_alloc(), as the name suggests, allocates memory for a new efi memory map. It's referenced from couple of places, namely, efi_arch_mem_reserve() and efi_free_boot_services(). These callers, after allocating memory, remap it for further use. As usual, a routine check is performed to confirm successful remap. If the remap fails, ideally, the allocated memory should be freed but presently we just return without freeing it up. Hence, fix this bug by freeing up the memory appropriately. As efi_memmap_alloc() allocates memory depending on whether mm_init() has already been invoked or not, similarly, while freeing use memblock_free() to free memory allocated before invoking mm_init() and __free_pages() to free memory allocated after invoking mm_init(). It's a fact that memremap() and early_memremap() might never fail and this code might never get a chance to run but to maintain good kernel programming semantics, we might need this patch. Signed-off-by: Sai Praneeth Prakhya Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Tony Luck Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Ricardo Neri Cc: Peter Zijlstra Cc: Ravi Shankar Cc: Matt Fleming Cc: Ard Biesheuvel --- I found this bug when working on a different patch set which uses efi_memmap_alloc() and then noticed that I never freed the allocated memory. I found it weird, in the sense that, memory is allocated but is not freed (upon returning from an error). So, wasn't sure if that should be treated as a bug or should I just leave it as is because everything works fine even without this patch. Since the effort for the patch is very minimal, I just went ahead and posted one, so that I could know your thoughts on it. Changes from V1 to V2: -- 1. Fix the bug of freeing memory map that was just installed by correctly calling free_pages(). 2. Call memblock_free() and __free_pages() directly from the appropriate places instead of efi_memmap_free(). Note: Patch based on Linus's mainline tree V4.18-rc1 arch/x86/platform/efi/quirks.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 36c1f8b9f7e0..cfa93af97def 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -285,6 +285,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) new = early_memremap(new_phys, new_size); if (!new) { pr_err("Failed to map new boot services memmap\n"); + memblock_free(new_phys, new_size); return; } @@ -429,6 +430,7 @@ void __init efi_free_boot_services(void) new = memremap(new_phys, new_size, MEMREMAP_WB); if (!new) { pr_err("Failed to map new EFI memmap\n"); + __free_pages(pfn_to_page(PHYS_PFN(new_phys)), get_order(new_size)); return; } @@ -452,6 +454,7 @@ void __init efi_free_boot_services(void) if (efi_memmap_install(new_phys, num_entries)) { pr_err("Could not install new EFI memmap\n"); + __free_pages(pfn_to_page(PHYS_PFN(new_phys)), get_order(new_size)); return; } } -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86/efi: Free allocated memory if remap fails
> > > > +void __init efi_memmap_free(phys_addr_t mem, unsigned int > > > > num_entries) > > > > +{ > > > > +Â Â Â unsigned long size = num_entries * efi.memmap.desc_size; > > > > +Â Â Â unsigned int order = get_order(size); > > > > +Â Â Â phys_addr_t end = mem + size - 1; > > > > + > > > > +Â Â Â if (slab_is_available()) { > > > > +Â Â Â __free_pages(pfn_to_page(PHYS_PFN(mem)), order); > > > How do you know that the memory you are freeing was allocated when > > > slab_is_available() was already true? > > > > > efi_memmap_free() should be used *only* in conjunction > > with efi_memmap_alloc()(As I explicitly didn't mention this, maybe it > > might > > have confused you). > > > > When allocating memory efi_memmap_alloc() does similar check > > for slab_is_available() and if so, it allocates memory using > > alloc_pages(). > > So, to free pages allocated using alloc_pages(), efi_memmap_free() > > uses __free_pages(). > > > I understand that. But by abstracting away the free() routine as well > as the alloc() routine, you are hiding this fact. > > What is preventing me from using efi_memmap_alloc() to allocate space > for the memmap, and using efi_memmap_free() in another place? How are > you preventing that this does not happen in a way where mm_init() may > be called in the mean time? > > Whether __free_pages() should be used or memblock_free() is a property > of the *allocation* itself, not of whether mm_init() has already been > called. So if (!slab_is_available()), you can use memblock_free(). > However, if (slab_is_available()), you cannot use __free_pages() > because the allocation could have been made before mm_init() was > called. > Aahh.. Thanks a lot! for making it clear. I see the bug now (efi_memmap_alloc() could be called before mm_init() in which case it uses memblock_alloc() where as efi_memmap_free() could be called after mm_init() in which case it uses __free_pages()). I will fix this. Regards, Sai -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] x86/efi: Free allocated memory if remap fails
> > It's a fact that memremap() and early_memremap() might never fail and > > this code might never get a chance to run but to maintain good kernel > > programming semantics, we might need this patch. > > > > Signed-off-by: Sai Praneeth Prakhya > > Reviewed-by: Ricardo Neri > Please don't include tags for reviews that did not happen on-list. > Sure! Thanks for letting me know. > > @@ -450,10 +451,11 @@ void __init efi_free_boot_services(void) > > > > memunmap(new); > > > > -   if (efi_memmap_install(new_phys, num_entries)) { > > +   if (efi_memmap_install(new_phys, num_entries)) > > pr_err("Could not install new EFI memmap\n"); > > -   return; > > -   } > > + > > +free_mem: > > +   efi_memmap_free(new_phys, num_entries); > Doesn't this free the memory map that you just installed? > That's true! It's a bug. I will fix it. > > > >  } > > > >  /** > > + * efi_memmap_free - Free memory allocated by efi_memmap_alloc() > > + * @mem: Physical address allocated by efi_memmap_alloc() > > + * @num_entries: Number of entries in the allocated map. > > + * > > + * efi_memmap_alloc() allocates memory depending on whether mm_init() > > + * has already been invoked or not. It uses either memblock or "normal" > > + * page allocation. Use this function to free the memory allocated by > > + * efi_memmap_alloc(). Since the allocation is done in two different > > + * ways, similarly, we free it in two different ways. > > + * > > + */ > > +void __init efi_memmap_free(phys_addr_t mem, unsigned int num_entries) > > +{ > > +   unsigned long size = num_entries * efi.memmap.desc_size; > > +   unsigned int order = get_order(size); > > +   phys_addr_t end = mem + size - 1; > > + > > +   if (slab_is_available()) { > > +   __free_pages(pfn_to_page(PHYS_PFN(mem)), order); > How do you know that the memory you are freeing was allocated when > slab_is_available() was already true? > efi_memmap_free() should be used *only* in conjunction with efi_memmap_alloc()(As I explicitly didn't mention this, maybe it might have confused you). When allocating memory efi_memmap_alloc() does similar check for slab_is_available() and if so, it allocates memory using alloc_pages(). So, to free pages allocated using alloc_pages(), efi_memmap_free() uses __free_pages(). > > > > +   return; > > +   } > > + > > +   if (memblock_free(mem, size)) > > +   pr_err("Failed to free mem from %pa to %pa\n", , > > ); > > +} > > + Regards, Sai -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] x86/efi: Free allocated memory if remap fails
From: Sai Praneeth efi_memmap_alloc(), as the name suggests, allocates memory for a new efi memory map. It's referenced from couple of places, namely, efi_arch_mem_reserve() and efi_free_boot_services(). These callers, after allocating memory, remap it for further use. As usual, a routine check is performed to confirm successful remap. If the remap fails, ideally, the allocated memory should be freed but presently we just return without freeing it up. Hence, fix this bug by introducing efi_memmap_free() which frees memory allocated by efi_memmap_alloc(). As efi_memmap_alloc() allocates memory depending on whether mm_init() has already been invoked or not, similarly efi_memmap_free() frees memory accordingly. efi_fake_memmap() also references efi_memmap_alloc() but it frees memory correctly using memblock_free(), but replace it with efi_memmap_free() to maintain consistency, as in, allocate memory with efi_memmap_alloc() and free memory with efi_memmap_free(). It's a fact that memremap() and early_memremap() might never fail and this code might never get a chance to run but to maintain good kernel programming semantics, we might need this patch. Signed-off-by: Sai Praneeth Prakhya Reviewed-by: Ricardo Neri Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Tony Luck Cc: Dave Hansen Cc: Bhupesh Sharma Cc: Ricardo Neri Cc: Peter Zijlstra Cc: Ravi Shankar Cc: Matt Fleming Cc: Ard Biesheuvel --- I found this bug when working on a different patch set which uses efi_memmap_alloc() and then noticed that I never freed the allocated memory. I found it weird, in the sense that, memory is allocated but is not freed (upon returning from an error). So, wasn't sure if that should be treated as a bug or should I just leave it as is because everything works fine even without this patch. Since the effort for the patch is very minimal, I just went ahead and posted one, so that I could know your thoughts on it. Note: Patch based on Linus's mainline tree V4.17 arch/x86/platform/efi/quirks.c | 10 ++ drivers/firmware/efi/fake_mem.c | 2 +- drivers/firmware/efi/memmap.c | 27 +++ include/linux/efi.h | 1 + 4 files changed, 35 insertions(+), 5 deletions(-) diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 36c1f8b9f7e0..f223093f2df7 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -285,6 +285,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) new = early_memremap(new_phys, new_size); if (!new) { pr_err("Failed to map new boot services memmap\n"); + efi_memmap_free(new_phys, num_entries); return; } @@ -429,7 +430,7 @@ void __init efi_free_boot_services(void) new = memremap(new_phys, new_size, MEMREMAP_WB); if (!new) { pr_err("Failed to map new EFI memmap\n"); - return; + goto free_mem; } /* @@ -450,10 +451,11 @@ void __init efi_free_boot_services(void) memunmap(new); - if (efi_memmap_install(new_phys, num_entries)) { + if (efi_memmap_install(new_phys, num_entries)) pr_err("Could not install new EFI memmap\n"); - return; - } + +free_mem: + efi_memmap_free(new_phys, num_entries); } /* diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c index 6c7d60c239b5..63edcedee25b 100644 --- a/drivers/firmware/efi/fake_mem.c +++ b/drivers/firmware/efi/fake_mem.c @@ -79,7 +79,7 @@ void __init efi_fake_memmap(void) new_memmap = early_memremap(new_memmap_phy, efi.memmap.desc_size * new_nr_map); if (!new_memmap) { - memblock_free(new_memmap_phy, efi.memmap.desc_size * new_nr_map); + efi_memmap_free(new_memmap_phy, new_nr_map); return; } diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c index 5fc70520e04c..27d28cb4652d 100644 --- a/drivers/firmware/efi/memmap.c +++ b/drivers/firmware/efi/memmap.c @@ -50,6 +50,33 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries) } /** + * efi_memmap_free - Free memory allocated by efi_memmap_alloc() + * @mem: Physical address allocated by efi_memmap_alloc() + * @num_entries: Number of entries in the allocated map. + * + * efi_memmap_alloc() allocates memory depending on whether mm_init() + * has already been invoked or not. It uses either memblock or "normal" + * page allocation. Use this function to free the memory allocated by + * efi_memmap_alloc(). Since the allocation is done in two different + * ways, similarly, we free it in two different ways. + * + */ +void __init efi_memmap_free(phys_addr_t mem, unsigned int num_entries) +{ + unsigned long size = num_entries * efi.memmap.desc_size; + unsigned int order = get_order(size);
[PATCH V5 1/3] x86/efi: Make efi_delete_dummy_variable() use set_variable_nonblocking() instead of set_variable()
From: Sai Praneeth Presently, efi_delete_dummy_variable() uses set_variable() which might block and hence kernel prints stack trace with a warning "bad: scheduling from the idle thread!". So, make efi_delete_dummy_variable() use set_variable_nonblocking(), which, as the name suggests doesn't block. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Andy Lutomirski Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Tony Luck Cc: Will Deacon Cc: Dave Hansen Cc: Mark Rutland Cc: Bhupesh Sharma Cc: Naresh Bhat Cc: Ricardo Neri Cc: Peter Zijlstra Cc: Ravi Shankar Cc: Matt Fleming Cc: Dan Williams Cc: Ard Biesheuvel Cc: Miguel Ojeda --- arch/x86/platform/efi/quirks.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 36c1f8b9f7e0..6af39dc40325 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -105,12 +105,11 @@ early_param("efi_no_storage_paranoia", setup_storage_paranoia); */ void efi_delete_dummy_variable(void) { - efi.set_variable((efi_char16_t *)efi_dummy_name, -_DUMMY_GUID, -EFI_VARIABLE_NON_VOLATILE | -EFI_VARIABLE_BOOTSERVICE_ACCESS | -EFI_VARIABLE_RUNTIME_ACCESS, -0, NULL); + efi.set_variable_nonblocking((efi_char16_t *)efi_dummy_name, +_DUMMY_GUID, +EFI_VARIABLE_NON_VOLATILE | +EFI_VARIABLE_BOOTSERVICE_ACCESS | +EFI_VARIABLE_RUNTIME_ACCESS, 0, NULL); } /* -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V5 2/3] efi: Create efi_rts_wq and efi_queue_work() to invoke all efi_runtime_services()
From: Sai Praneeth When a process requests the kernel to execute any efi_runtime_service(), the requested efi_runtime_service (represented as an identifier) and its arguments are packed into a struct named efi_runtime_work and queued onto work queue named efi_rts_wq. The caller then waits until the work is completed. Introduce some infrastructure: 1. Creating workqueue named efi_rts_wq 2. A macro (efi_queue_work()) that a. Populates efi_runtime_work b. Queues work onto efi_rts_wq and c. Waits until worker thread completes The caller thread has to wait until the worker thread completes, because it depends on the return status of efi_runtime_service() and, in specific cases, the arguments populated by efi_runtime_service(). Some efi_runtime_services() takes a pointer to buffer as an argument and fills up the buffer with requested data. For instance, efi_get_variable() and efi_get_next_variable(). Hence, caller process cannot just post the work and get going. Some facts about efi_runtime_services(): 1. A quick look at all the efi_runtime_services() shows that any efi_runtime_service() has five or less arguments. 2. An argument of efi_runtime_service() can be a value (of any type) or a pointer (of any type). Hence, efi_runtime_work has five void pointers to store these arguments. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Andy Lutomirski Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Tony Luck Cc: Will Deacon Cc: Dave Hansen Cc: Mark Rutland Cc: Bhupesh Sharma Cc: Naresh Bhat Cc: Ricardo Neri Cc: Peter Zijlstra Cc: Ravi Shankar Cc: Matt Fleming Cc: Dan Williams Cc: Ard Biesheuvel Cc: Miguel Ojeda --- drivers/firmware/efi/efi.c | 14 ++ drivers/firmware/efi/runtime-wrappers.c | 83 + include/linux/efi.h | 3 ++ 3 files changed, 100 insertions(+) diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index 232f4915223b..1379a375dfa8 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -84,6 +84,8 @@ struct mm_struct efi_mm = { .mmlist = LIST_HEAD_INIT(efi_mm.mmlist), }; +struct workqueue_struct *efi_rts_wq; + static bool disable_runtime; static int __init setup_noefi(char *arg) { @@ -337,6 +339,18 @@ static int __init efisubsys_init(void) if (!efi_enabled(EFI_BOOT)) return 0; + /* +* Since we process only one efi_runtime_service() at a time, an +* ordered workqueue (which creates only one execution context) +* should suffice all our needs. +*/ + efi_rts_wq = alloc_ordered_workqueue("efi_rts_wq", 0); + if (!efi_rts_wq) { + pr_err("Creating efi_rts_wq failed, EFI runtime services disabled.\n"); + clear_bit(EFI_RUNTIME_SERVICES, ); + return 0; + } + /* We register the efi directory at /sys/firmware/efi */ efi_kobj = kobject_create_and_add("efi", firmware_kobj); if (!efi_kobj) { diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index ae54870b2788..cf3bae42a752 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -1,6 +1,15 @@ /* * runtime-wrappers.c - Runtime Services function call wrappers * + * Implementation summary: + * --- + * 1. When user/kernel thread requests to execute efi_runtime_service(), + * enqueue work to efi_rts_wq. + * 2. Caller thread waits for completion until the work is finished + * because it's dependent on the return status and execution of + * efi_runtime_service(). + * For instance, get_variable() and get_next_variable(). + * * Copyright (C) 2014 Linaro Ltd. * * Split off from arch/x86/platform/efi/efi.c @@ -22,6 +31,9 @@ #include #include #include +#include +#include + #include /* @@ -33,6 +45,77 @@ #define __efi_call_virt(f, args...) \ __efi_call_virt_pointer(efi.systab->runtime, f, args) +/* efi_runtime_service() function identifiers */ +enum efi_rts_ids { + GET_TIME, + SET_TIME, + GET_WAKEUP_TIME, + SET_WAKEUP_TIME, + GET_VARIABLE, + GET_NEXT_VARIABLE, + SET_VARIABLE, + QUERY_VARIABLE_INFO, + GET_NEXT_HIGH_MONO_COUNT, + RESET_SYSTEM, + UPDATE_CAPSULE, + QUERY_CAPSULE_CAPS, +}; + +/* + * efi_runtime_work: Details of EFI Runtime Service work + * @arg<1-5>: EFI Runtime Service function arguments + * @status:Status of executing EFI Runtime Service + * @efi_rts_id:EFI Runtime Service function identifier + * @efi_rts_comp: Struct used for handling completions + */ +struct efi_runtime_work { + void *arg1; + void *arg2; + void *arg3; + void *arg4; + void *arg5; + efi_status_t status; + struct work_struct work; + enum
[PATCH V5 3/3] efi: Use efi_rts_wq to invoke EFI Runtime Services
From: Sai Praneeth Presently, when a user process requests the kernel to execute any efi_runtime_service(), kernel switches the page directory (%cr3) from swapper_pgd to efi_pgd. Other subsystems in the kernel aren't aware of this switch and they might think, user space is still valid (i.e. the user space mappings are still pointing to the process that requested to run efi_runtime_service()) but in reality it is not so. A solution for this issue is to use kthread to run efi_runtime_service(). When a user process requests the kernel to execute any efi_runtime_service(), kernel queues the work to efi_rts_wq, a kthread comes along, switches to efi_pgd and executes efi_runtime_service() in kthread context. Anything that tries to touch user space addresses while in kthread is terminally broken. Implementation summary: --- 1. When user/kernel thread requests to execute efi_runtime_service(), enqueue work to efi_rts_wq. 2. Caller thread waits for completion until the work is finished because it's dependent on the return status of efi_runtime_service(). Semantics to pack arguments in efi_runtime_work (has void pointers): 1. If argument is a pointer (of any type), pass it as is. 2. If argument is a value (of any type), address of the value is passed. Introduce a handler function (called efi_call_rts()) that 1. Understands efi_runtime_work and 2. Invokes the appropriate efi_runtime_service() with the appropriate arguments Semantics followed by efi_call_rts() to understand efi_runtime_work: 1. If argument was a pointer, recast it from void pointer to original pointer type. 2. If argument was a value, recast it from void pointer to original pointer type and dereference it. The non-blocking variants of set_variable() and query_variable_info() should be used while in atomic context. Use of blocking variants like set_variable() and query_variable_info() while in atomic will issue a warning ("scheduling wile in atomic") and prints stack trace. Presently, pstore uses non-blocking variants and hence works fine. Signed-off-by: Sai Praneeth Prakhya Suggested-by: Andy Lutomirski Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Tony Luck Cc: Will Deacon Cc: Dave Hansen Cc: Mark Rutland Cc: Bhupesh Sharma Cc: Naresh Bhat Cc: Ricardo Neri Cc: Peter Zijlstra Cc: Ravi Shankar Cc: Matt Fleming Cc: Dan Williams Cc: Ard Biesheuvel Cc: Miguel Ojeda --- drivers/firmware/efi/runtime-wrappers.c | 135 1 file changed, 119 insertions(+), 16 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index cf3bae42a752..127d4de00403 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -173,13 +173,104 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call) */ static DEFINE_SEMAPHORE(efi_runtime_lock); +/* + * Calls the appropriate efi_runtime_service() with the appropriate + * arguments. + * + * Semantics followed by efi_call_rts() to understand efi_runtime_work: + * 1. If argument was a pointer, recast it from void pointer to original + * pointer type. + * 2. If argument was a value, recast it from void pointer to original + * pointer type and dereference it. + */ +static void efi_call_rts(struct work_struct *work) +{ + struct efi_runtime_work *efi_rts_work; + void *arg1, *arg2, *arg3, *arg4, *arg5; + efi_status_t status = EFI_NOT_FOUND; + + efi_rts_work = container_of(work, struct efi_runtime_work, work); + arg1 = efi_rts_work->arg1; + arg2 = efi_rts_work->arg2; + arg3 = efi_rts_work->arg3; + arg4 = efi_rts_work->arg4; + arg5 = efi_rts_work->arg5; + + switch (efi_rts_work->efi_rts_id) { + case GET_TIME: + status = efi_call_virt(get_time, (efi_time_t *)arg1, + (efi_time_cap_t *)arg2); + break; + case SET_TIME: + status = efi_call_virt(set_time, (efi_time_t *)arg1); + break; + case GET_WAKEUP_TIME: + status = efi_call_virt(get_wakeup_time, (efi_bool_t *)arg1, + (efi_bool_t *)arg2, (efi_time_t *)arg3); + break; + case SET_WAKEUP_TIME: + status = efi_call_virt(set_wakeup_time, *(efi_bool_t *)arg1, + (efi_time_t *)arg2); + break; + case GET_VARIABLE: + status = efi_call_virt(get_variable, (efi_char16_t *)arg1, + (efi_guid_t *)arg2, (u32 *)arg3, + (unsigned long *)arg4, (void *)arg5); + break; + case GET_NEXT_VARIABLE: + status = efi_call_virt(get_next_variable, (unsigned long *)arg1, + (efi_char16_t *)arg2, + (efi_guid_t *)
[PATCH V5 0/3] Use efi_rts_wq to invoke EFI Runtime Services
Patches are based on Linus's kernel v4.17-rc7 [1] Backup: Detailing efi_pgd: -- efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time Code/Data) regions. Due to the nature of these mappings, they fall in user space address ranges and they are not the same as swapper. [On arm64, the EFI mappings are in the VA range usually used for user space. The two halves of the address space are managed by separate tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map user space or EFI runtime mappings in TTBR0.] - Mark Rutland Changes from V4 to V5: -- 1. As suggested by Ard, don't use efi_rts_wq for non-blocking variants. Non-blocking variants are supposed to not block and using workqueue exactly does the opposite, hence refrain from using it. 2. Use non-blocking variants in efi_delete_dummy_variable(). Use of blocking variants means that we have to call efi_delete_dummy_variable() after efi_rts_wq has been created. 3. Remove in_atomic() check in set_variable<>() and query_variable_info<>(). Any caller wishing to use set_variable() and query_variable_info() in atomic context should use their non-blocking variants. Changes from V3 to V4: -- 1. As suggested by Peter, use completions instead of flush_work() as the former is cheaper 2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard, wasn't able to find a better alternative to keep this change local to arch/x86. Changes from V2 to V3: -- 1. Rewrite the cover letter to clearly state the problem. What we are fixing and what we are not fixing. 2. Make efi_delete_dummy_variable() change local to x86. 3. Avoid using BUG(), instead, print error message and exit gracefully. 4. Move struct efi_runtime_work to runtime-wrappers.c file. 5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work. 6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list. Changes from V1 to V2: -- 1. Remove unnecessary include of asm/efi.h file - Fixes build error on ia64, reported by 0-day 2. Use enum to identify efi_runtime_services() 3. Use alloc_ordered_workqueue() to create efi_rts_wq as create_workqueue() is scheduled for depreciation. 4. Make efi_call_rts() static, as it has no callers outside runtime-wrappers.c 5. Use BUG(), when we are unable to queue work or unable to identify requested efi_runtime_service() - Because these two situations should *never* happen. Sai Praneeth (3): x86/efi: Make efi_delete_dummy_variable() use set_variable_nonblocking() instead of set_variable() efi: Create efi_rts_wq and efi_queue_work() to invoke all efi_runtime_services() efi: Use efi_rts_wq to invoke EFI Runtime Services arch/x86/platform/efi/quirks.c | 11 +- drivers/firmware/efi/efi.c | 14 ++ drivers/firmware/efi/runtime-wrappers.c | 218 +--- include/linux/efi.h | 3 + 4 files changed, 224 insertions(+), 22 deletions(-) Signed-off-by: Sai Praneeth Prakhya Suggested-by: Andy Lutomirski Cc: Lee Chun-Yi Cc: Borislav Petkov Cc: Tony Luck Cc: Will Deacon Cc: Dave Hansen Cc: Mark Rutland Cc: Bhupesh Sharma Cc: Naresh Bhat Cc: Ricardo Neri Cc: Peter Zijlstra Cc: Ravi Shankar Cc: Matt Fleming Cc: Dan Williams Cc: Ard Biesheuvel Cc: Miguel Ojeda -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services
comments and concerns. Note: - Patches are based on Linus's kernel v4.17-rc6 [1] Backup: Detailing efi_pgd: -- efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time Code/Data) regions. Due to the nature of these mappings, they fall in user space address ranges and they are not the same as swapper. [On arm64, the EFI mappings are in the VA range usually used for user space. The two halves of the address space are managed by separate tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map user space or EFI runtime mappings in TTBR0.] - Mark Rutland Changes from V3 to V4: -- 1. As suggested by Peter, use completions instead of flush_work() as the former is cheaper 2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard, wasn't able to find a better alternative to keep this change local to arch/x86. Changes from V2 to V3: -- 1. Rewrite the cover letter to clearly state the problem. What we are fixing and what we are not fixing. 2. Make efi_delete_dummy_variable() change local to x86. 3. Avoid using BUG(), instead, print error message and exit gracefully. 4. Move struct efi_runtime_work to runtime-wrappers.c file. 5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work. 6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list. Changes from V1 to V2: -- 1. Remove unnecessary include of asm/efi.h file - Fixes build error on ia64, reported by 0-day 2. Use enum to identify efi_runtime_services() 3. Use alloc_ordered_workqueue() to create efi_rts_wq as create_workqueue() is scheduled for depreciation. 4. Make efi_call_rts() static, as it has no callers outside runtime-wrappers.c 5. Use BUG(), when we are unable to queue work or unable to identify requested efi_runtime_service() - Because these two situations should *never* happen. Sai Praneeth (3): x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization efi: Create efi_rts_wq and efi_queue_work() to invoke all efi_runtime_services() efi: Use efi_rts_wq to invoke EFI Runtime Services arch/x86/include/asm/efi.h | 1 - arch/x86/platform/efi/efi.c | 6 - drivers/firmware/efi/efi.c | 20 +++ drivers/firmware/efi/runtime-wrappers.c | 256 +--- include/linux/efi.h | 6 + 5 files changed, 262 insertions(+), 27 deletions(-) Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Suggested-by: Andy Lutomirski <l...@kernel.org> Cc: Lee Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Mark Rutland <mark.rutl...@arm.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Naresh Bhat <naresh.b...@linaro.org> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Dan Williams <dan.j.willi...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com> -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 3/3] efi: Use efi_rts_wq to invoke EFI Runtime Services
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Presently, when a user process requests the kernel to execute any efi_runtime_service(), kernel switches the page directory (%cr3) from swapper_pgd to efi_pgd. Other subsystems in the kernel aren't aware of this switch and they might think, user space is still valid (i.e. the user space mappings are still pointing to the process that requested to run efi_runtime_service()) but in reality it is not so. A solution for this issue is to use kthread to run efi_runtime_service(). When a user process requests the kernel to execute any efi_runtime_service(), kernel queues the work to efi_rts_wq, a kthread comes along, switches to efi_pgd and executes efi_runtime_service() in kthread context. Anything that tries to touch user space addresses while in kthread is terminally broken. Implementation summary: --- 1. When user/kernel thread requests to execute efi_runtime_service(), enqueue work to efi_rts_wq. 2. Caller thread waits for completion until the work is finished because it's dependent on the return status of efi_runtime_service(). Semantics to pack arguments in efi_runtime_work (has void pointers): 1. If argument is a pointer (of any type), pass it as is. 2. If argument is a value (of any type), address of the value is passed. Introduce a handler function (called efi_call_rts()) that 1. Understands efi_runtime_work and 2. Invokes the appropriate efi_runtime_service() with the appropriate arguments Semantics followed by efi_call_rts() to understand efi_runtime_work: 1. If argument was a pointer, recast it from void pointer to original pointer type. 2. If argument was a value, recast it from void pointer to original pointer type and dereference it. pstore writes could potentially be invoked in atomic context and it uses set_variable<>() and query_variable_info<>() to store logs. If we invoke efi_runtime_services() through efi_rts_wq while in atomic(), kernel issues a warning ("scheduling wile in atomic") and prints stack trace. One way to overcome this is to not make the caller process wait for the worker thread to finish. This approach breaks pstore i.e. the log messages aren't written to efi variables. Hence, pstore calls efi_runtime_services() without using efi_rts_wq or in other words efi_rts_wq will be used unconditionally for all the efi_runtime_services() except set_variable<>() and query_variable_info<>(). Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Suggested-by: Andy Lutomirski <l...@kernel.org> Cc: Lee Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Mark Rutland <mark.rutl...@arm.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Naresh Bhat <naresh.b...@linaro.org> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Dan Williams <dan.j.willi...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com> --- drivers/firmware/efi/runtime-wrappers.c | 171 1 file changed, 151 insertions(+), 20 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index 534bd348feca..26bb6645ff59 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -175,13 +175,108 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call) */ static DEFINE_SEMAPHORE(efi_runtime_lock); +/* + * Calls the appropriate efi_runtime_service() with the appropriate + * arguments. + * + * Semantics followed by efi_call_rts() to understand efi_runtime_work: + * 1. If argument was a pointer, recast it from void pointer to original + * pointer type. + * 2. If argument was a value, recast it from void pointer to original + * pointer type and dereference it. + */ +static void efi_call_rts(struct work_struct *work) +{ + struct efi_runtime_work *efi_rts_work; + void *arg1, *arg2, *arg3, *arg4, *arg5; + efi_status_t status = EFI_NOT_FOUND; + + efi_rts_work = container_of(work, struct efi_runtime_work, work); + arg1 = efi_rts_work->arg1; + arg2 = efi_rts_work->arg2; + arg3 = efi_rts_work->arg3; + arg4 = efi_rts_work->arg4; + arg5 = efi_rts_work->arg5; + + switch (efi_rts_work->efi_rts_id) { + case GET_TIME: + status = efi_call_virt(get_time, (efi_time_t *)arg1, + (efi_time_cap_t *)arg2); + break; + case SET_TIME: + status = efi_call_virt(set_time, (efi_time_t *)arg1); + break; +
[PATCH V3 3/3] efi: Use efi_rts_wq to invoke EFI Runtime Services
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Presently, when a user process requests the kernel to execute any efi_runtime_service(), kernel switches the page directory (%cr3) from swapper_pgd to efi_pgd. Other subsystems in the kernel aren't aware of this switch and they might think, user space is still valid (i.e. the user space mappings are still pointing to the process that requested to run efi_runtime_service()) but in reality it is not so. A solution for this issue is to use kthread to run efi_runtime_service() When a user process requests the kernel to execute any efi_runtime_service(), kernel queues the work to efi_rts_wq, a kthread comes along, switches to efi_pgd and executes efi_runtime_service() in kthread context. Anything that tries to touch user space addresses while in kthread is terminally broken. Implementation summary: --- 1. When user/kernel thread requests to execute efi_runtime_service(), enqueue work to efi_rts_wq. 2. Caller thread waits until the work is finished because it's dependent on the return status of efi_runtime_service(). Semantics to pack arguments in efi_runtime_work (has void pointers): 1. If argument is a pointer (of any type), pass it as is. 2. If argument is a value (of any type), address of the value is passed. Introduce a handler function (called efi_call_rts()) that 1. Understands efi_runtime_work and 2. Invokes the appropriate efi_runtime_service() with the appropriate arguments Semantics followed by efi_call_rts() to understand efi_runtime_work: 1. If argument was a pointer, recast it from void pointer to original pointer type. 2. If argument was a value, recast it from void pointer to original pointer type and dereference it. pstore writes could potentially be invoked in atomic context and it uses set_variable<>() and query_variable_info<>() to store logs. If we invoke efi_runtime_services() through efi_rts_wq while in atomic(), kernel issues a warning ("scheduling wile in atomic") and prints stack trace. One way to overcome this is to not make the caller process wait for the worker thread to finish. This approach breaks pstore i.e. the log messages aren't written to efi variables. Hence, pstore calls efi_runtime_services() without using efi_rts_wq or in other words efi_rts_wq will be used unconditionally for all the efi_runtime_services() except set_variable<>() and query_variable_info<>(). Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Suggested-by: Andy Lutomirski <l...@kernel.org> Cc: Lee Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Mark Rutland <mark.rutl...@arm.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Naresh Bhat <naresh.b...@linaro.org> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Dan Williams <dan.j.willi...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com> --- drivers/firmware/efi/runtime-wrappers.c | 170 1 file changed, 150 insertions(+), 20 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index a9866045ed52..23ff128fcb2f 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -170,13 +170,107 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call) */ static DEFINE_SEMAPHORE(efi_runtime_lock); +/* + * Calls the appropriate efi_runtime_service() with the appropriate + * arguments. + * + * Semantics followed by efi_call_rts() to understand efi_runtime_work: + * 1. If argument was a pointer, recast it from void pointer to original + * pointer type. + * 2. If argument was a value, recast it from void pointer to original + * pointer type and dereference it. + */ +static void efi_call_rts(struct work_struct *work) +{ + struct efi_runtime_work *efi_rts_work; + void *arg1, *arg2, *arg3, *arg4, *arg5; + efi_status_t status = EFI_NOT_FOUND; + + efi_rts_work = container_of(work, struct efi_runtime_work, work); + arg1 = efi_rts_work->arg1; + arg2 = efi_rts_work->arg2; + arg3 = efi_rts_work->arg3; + arg4 = efi_rts_work->arg4; + arg5 = efi_rts_work->arg5; + + switch (efi_rts_work->efi_rts_id) { + case GET_TIME: + status = efi_call_virt(get_time, (efi_time_t *)arg1, + (efi_time_cap_t *)arg2); + break; + case SET_TIME: + status = efi_call_virt(set_time, (efi_time_t *)arg1); + break; + case GET
[PATCH V3 1/3] x86/efi: Call efi_delete_dummy_variable() after creating efi_rts_wq
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Create a workqueue named efi_rts_wq (efi runtime services workqueue), so that all efi_runtime_services() are executed in kthread context. Invoking efi_runtime_services() through efi_rts_wq means all accesses to efi_runtime_services() should be done after efi_rts_wq has been created. efi_delete_dummy_variable() calls set_variable(), hence efi_delete_dummy_variable() should be called after efi_rts_wq has been created. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Suggested-by: Andy Lutomirski <l...@kernel.org> Cc: Lee Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Mark Rutland <mark.rutl...@arm.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Naresh Bhat <naresh.b...@linaro.org> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Dan Williams <dan.j.willi...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com> --- arch/x86/platform/efi/efi.c| 15 +-- drivers/firmware/efi/arm-runtime.c | 3 +++ drivers/firmware/efi/efi.c | 25 + include/linux/efi.h| 4 4 files changed, 41 insertions(+), 6 deletions(-) diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 9061babfbc83..adcc55cd25ce 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void) if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX)) runtime_code_page_mkexec(); - - /* clean DUMMY object */ - efi_delete_dummy_variable(); #endif } @@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void) * necessary relocation fixups for the new virtual addresses. */ efi_runtime_update_mappings(); - - /* clean DUMMY object */ - efi_delete_dummy_variable(); } void __init efi_enter_virtual_mode(void) @@ -1031,6 +1025,15 @@ void __init efi_enter_virtual_mode(void) __efi_enter_virtual_mode(); efi_dump_pagetable(); + + if (!efi_create_rts_wq()) + return; + + /* +* Clean DUMMY object calls EFI Runtime Service, set_variable(), so +* it should be invoked only after efi_rts_wq is ready. +*/ + efi_delete_dummy_variable(); } static int __init arch_parse_efi_cmdline(char *str) diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c index 5889cbea60b8..6fb06130b53f 100644 --- a/drivers/firmware/efi/arm-runtime.c +++ b/drivers/firmware/efi/arm-runtime.c @@ -139,6 +139,9 @@ static int __init arm_enable_runtime_services(void) return -ENOMEM; } + if (!efi_create_rts_wq()) + return 0; + /* Set up runtime services function pointers */ efi_native_runtime_setup(); set_bit(EFI_RUNTIME_SERVICES, ); diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index 232f4915223b..b9103caa03b4 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -84,6 +84,8 @@ struct mm_struct efi_mm = { .mmlist = LIST_HEAD_INIT(efi_mm.mmlist), }; +struct workqueue_struct *efi_rts_wq; + static bool disable_runtime; static int __init setup_noefi(char *arg) { @@ -337,6 +339,13 @@ static int __init efisubsys_init(void) if (!efi_enabled(EFI_BOOT)) return 0; + /* +* If we failed to create efi_rts_wq, EFI_RUNTIME_SERVICES would +* have been be cleared, check for that condition. +*/ + if (!efi_enabled(EFI_RUNTIME_SERVICES)) + return 0; + /* We register the efi directory at /sys/firmware/efi */ efi_kobj = kobject_create_and_add("efi", firmware_kobj); if (!efi_kobj) { @@ -971,3 +980,19 @@ static int register_update_efi_random_seed(void) } late_initcall(register_update_efi_random_seed); #endif + +bool __init efi_create_rts_wq(void) +{ + /* +* Since we process only one efi_runtime_service() at a time, an +* ordered workqueue (which creates only one execution context) +* should suffice all our needs. +*/ + efi_rts_wq = alloc_ordered_workqueue("efi_rts_wq", 0); + if (!efi_rts_wq) { + pr_err("Creating efi_rts_wq failed, EFI runtime services disabled.\n"); + clear_bit(EFI_RUNTIME_SERVICES, ); + return false; + } + return true; +} diff --git a/include/li
[PATCH V3 2/3] efi: Introduce efi_queue_work() to queue any efi_runtime_service() on efi_rts_wq
From: Sai Praneeth <sai.praneeth.prak...@intel.com> When a process requests the kernel to execute any efi_runtime_service(), the requested efi_runtime_service (represented as an identifier) and its arguments are packed into a struct named efi_runtime_work and queued onto work queue named efi_rts_wq. The caller then waits until the work is completed. Introduce efi_queue_work() that 1. Populates efi_runtime_work 2. Queues work onto efi_rts_wq and 3. Waits until worker thread returns. The caller thread has to wait until the worker thread returns, because it depends on the return status of efi_runtime_service() and, in specific cases, the arguments populated by efi_runtime_service(). Some efi_runtime_services() takes a pointer to buffer as an argument and fills up the buffer with requested data. For instance, efi_get_variable() and efi_get_next_variable(). Hence, caller process cannot just post the work and get going. Some facts about efi_runtime_services(): 1. A quick look at all the efi_runtime_services() shows that any efi_runtime_service() has five or less arguments. 2. An argument of efi_runtime_service() can be a value (of any type) or a pointer (of any type). Hence, efi_runtime_work has five void pointers to store these arguments. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Suggested-by: Andy Lutomirski <l...@kernel.org> Cc: Lee Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Mark Rutland <mark.rutl...@arm.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Naresh Bhat <naresh.b...@linaro.org> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Dan Williams <dan.j.willi...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com> --- drivers/firmware/efi/runtime-wrappers.c | 80 + 1 file changed, 80 insertions(+) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index ae54870b2788..a9866045ed52 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -1,6 +1,14 @@ /* * runtime-wrappers.c - Runtime Services function call wrappers * + * Implementation summary: + * --- + * 1. When user/kernel thread requests to execute efi_runtime_service(), + * enqueue work to efi_rts_wq. + * 2. Caller thread waits until the work is finished because it's + * dependent on the return status and execution of efi_runtime_service(). + * For instance, get_variable() and get_next_variable(). + * * Copyright (C) 2014 Linaro Ltd. <ard.biesheu...@linaro.org> * * Split off from arch/x86/platform/efi/efi.c @@ -22,6 +30,8 @@ #include #include #include +#include + #include /* @@ -33,6 +43,76 @@ #define __efi_call_virt(f, args...) \ __efi_call_virt_pointer(efi.systab->runtime, f, args) +/* efi_runtime_service() function identifiers */ +enum efi_rts_ids { + GET_TIME, + SET_TIME, + GET_WAKEUP_TIME, + SET_WAKEUP_TIME, + GET_VARIABLE, + GET_NEXT_VARIABLE, + SET_VARIABLE, + SET_VARIABLE_NONBLOCKING, + QUERY_VARIABLE_INFO, + QUERY_VARIABLE_INFO_NONBLOCKING, + GET_NEXT_HIGH_MONO_COUNT, + RESET_SYSTEM, + UPDATE_CAPSULE, + QUERY_CAPSULE_CAPS, +}; + +/* + * efi_runtime_work: Details of EFI Runtime Service work + * @func: EFI Runtime Service function identifier + * @arg<1-5>: EFI Runtime Service function arguments + * @status:Status of executing EFI Runtime Service + */ +struct efi_runtime_work { + void *arg1; + void *arg2; + void *arg3; + void *arg4; + void *arg5; + efi_status_t status; + struct work_struct work; + enum efi_rts_ids efi_rts_id; +}; + +/* + * efi_queue_work: Queue efi_runtime_service() and wait until it's done + * @rts: efi_runtime_service() function identifier + * @rts_arg<1-5>: efi_runtime_service() function arguments + * + * Accesses to efi_runtime_services() are serialized by a binary + * semaphore (efi_runtime_lock) and caller waits until the work is + * finished, hence _only_ one work is queued at a time and the queued + * work gets flushed. + */ +#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)\ +({ \ + struct efi_runtime_work efi_rts_work; \ + efi_rts_work.status = EFI_ABORTED; \ +
[PATCH V3 0/3] Use efi_rts_wq to invoke EFI Runtime Services
comments and concerns. Note: - Patches are based on Linus's kernel v4.17-rc6 [1] Backup: Detailing efi_pgd: -- efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time Code/Data) regions. Due to the nature of these mappings, they fall in user space address ranges and they are not the same as swapper. [On arm64, the EFI mappings are in the VA range usually used for user space. The two halves of the address space are managed by separate tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map user space or EFI runtime mappings in TTBR0.] - Mark Rutland Changes from V2 to V3: -- 1. Rewrite the cover letter to clearly state the problem. What we are fixing and what we are not fixing. 2. Make efi_delete_dummy_variable() change local to x86. 3. Avoid using BUG(), instead, print error message and exit gracefully. 4. Move struct efi_runtime_work to runtime-wrappers.c file. 5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work. 6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list. Changes from V1 to V2: -- 1. Remove unnecessary include of asm/efi.h file - Fixes build error on ia64, reported by 0-day 2. Use enum to identify efi_runtime_services() 3. Use alloc_ordered_workqueue() to create efi_rts_wq as create_workqueue() is scheduled for depreciation. 4. Make efi_call_rts() static, as it has no callers outside runtime-wrappers.c 5. Use BUG(), when we are unable to queue work or unable to identify requested efi_runtime_service() - Because these two situations should *never* happen. Sai Praneeth (3): x86/efi: Call efi_delete_dummy_variable() after creating efi_rts_wq efi: Introduce efi_queue_work() to queue any efi_runtime_service() on efi_rts_wq efi: Use efi_rts_wq to invoke EFI Runtime Services arch/x86/platform/efi/efi.c | 15 +- drivers/firmware/efi/arm-runtime.c | 3 + drivers/firmware/efi/efi.c | 25 drivers/firmware/efi/runtime-wrappers.c | 250 +--- include/linux/efi.h | 4 + 5 files changed, 271 insertions(+), 26 deletions(-) Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Suggested-by: Andy Lutomirski <l...@kernel.org> Cc: Lee Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Mark Rutland <mark.rutl...@arm.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Naresh Bhat <naresh.b...@linaro.org> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Dan Williams <dan.j.willi...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com> -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] x86: Use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()
From: Sai Praneeth <sai.praneeth.prak...@intel.com> When the platform supports PCID and if CONFIG_DEBUG_VM is enabled, build_cr3_noflush() (called via switch_mm()) does a sanity check to see if X86_FEATURE_PCID is set. Presently, build_cr3_noflush() uses "this_cpu_has(X86_FEATURE_PCID)" to perform the check but this_cpu_has() works only after SMP is initialized (i.e. per cpu cpu_info's should be populated) and this happens to be very late in the boot process (during rest_init). As efi_runtime_services() are called during (early) kernel boot time and run time, modify build_cr3_noflush() to use boot_cpu_has() all the time. As suggested by Dave, this should be OK because all cpu's have same capabilities anyways (for x86). Without this change we see below warning during kernel boot. WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/tlbflush.h:134 load_new_mm_cr3+0x114/0x170 Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.16.0-02277-gbc16d4052f1a #1 Hardware name: System manufacturer System Product Name/Z170-K, BIOS 3301 02/08/2017 RIP: 0010:load_new_mm_cr3+0x114/0x170 RSP: :9b203e38 EFLAGS: 00010046 RAX: RBX: 9b26f5a0 RCX: RDX: RSI: RDI: 9b20a000 RBP: 9b203e90 R08: R09: 0f63eb29 R10: 9b203ea8 R11: c3292018 R12: R13: 9b2e1180 R14: 0001ee80 R15: FS: () GS:968df6c0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 968df6fff000 CR3: 0004261e6002 CR4: 000606b0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: switch_mm_irqs_off+0x267/0x590 switch_mm+0xe/0x20 efi_switch_mm+0x3e/0x50 efi_enter_virtual_mode+0x43f/0x4da start_kernel+0x3bf/0x458 secondary_startup_64+0xa5/0xb0 Dave also suggested that we put a warning in this_cpu_has() if it's used early in the boot process. This is still work in progress as it effects MCE. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Reported-by: Linus Torvalds <torva...@linux-foundation.org> Cc: Lee Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Cc: Ingo Molnar <mi...@kernel.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: Peter Zijlstra <a.p.zijls...@chello.nl> Cc: Andrew Morton <a...@linux-foundation.org> Cc: Dave Hansen <dave.han...@intel.com> --- arch/x86/include/asm/tlbflush.h | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 84137c22fdfa..42e040859067 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -131,7 +131,12 @@ static inline unsigned long build_cr3(pgd_t *pgd, u16 asid) static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid) { VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE); - VM_WARN_ON_ONCE(!this_cpu_has(X86_FEATURE_PCID)); + /* +* Use boot_cpu_has() instead of this_cpu_has() as this function +* might be called during early boot. This should work even after +* boot because all cpu's have same capabilities anyways. +*/ + VM_WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_PCID)); return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH; } -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()
From: Sai Praneeth <sai.praneeth.prak...@intel.com> When a process requests the kernel to execute any efi_runtime_service(), the requested efi_runtime_service (represented as an identifier) and its arguments are packed into a struct named efi_runtime_work and queued onto work queue named efi_rts_wq. The caller then waits until the work is completed. Introduce some infrastructure: 1. Creating workqueue named efi_rts_wq 2. A macro (efi_queue_work()) that a. populates efi_runtime_work b. queues work onto efi_rts_wq and c. waits until worker thread returns The caller thread has to wait until the worker thread returns, because it's dependent on the return status of efi_runtime_service() and, in specific cases, the arguments populated by efi_runtime_service(). Some efi_runtime_services() takes a pointer to buffer as an argument and fills up the buffer with requested data. For instance, efi_get_variable() and efi_get_next_variable(). Hence, caller process cannot just post the work and get going. Some facts about efi_runtime_services(): 1. A quick look at all the efi_runtime_services() shows that any efi_runtime_service() has five or less arguments. 2. An argument of efi_runtime_service() can be a value (of any type) or a pointer (of any type). Hence, efi_runtime_work has five void pointers to store these arguments. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Suggested-by: Andy Lutomirski <l...@kernel.org> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Mark Rutland <mark.rutl...@arm.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Peter Zijlstra <peter.zijls...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Dan Williams <dan.j.willi...@intel.com> --- drivers/firmware/efi/efi.c | 15 drivers/firmware/efi/runtime-wrappers.c | 61 + include/linux/efi.h | 20 +++ 3 files changed, 96 insertions(+) diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index 838b8efe639c..04b46c62f3ce 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -75,6 +75,8 @@ static unsigned long *efi_tables[] = { _attr_table, }; +struct workqueue_struct *efi_rts_wq; + static bool disable_runtime; static int __init setup_noefi(char *arg) { @@ -329,6 +331,19 @@ static int __init efisubsys_init(void) return 0; /* +* Since we process only one efi_runtime_service() at a time, an +* ordered workqueue (which creates only one execution context) +* should suffice all our needs. +*/ + efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue", 0); + if (!efi_rts_wq) { + pr_err("Failed to create efi_rts_workqueue, EFI runtime services " + "disabled.\n"); + clear_bit(EFI_RUNTIME_SERVICES, ); + return 0; + } + + /* * Clean DUMMY object calls EFI Runtime Service, set_variable(), so * it should be invoked only after efi_rts_workqueue is ready. */ diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index ae54870b2788..649763171439 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -1,6 +1,14 @@ /* * runtime-wrappers.c - Runtime Services function call wrappers * + * Implementation summary: + * --- + * 1. When user/kernel thread requests to execute efi_runtime_service(), + * enqueue work to efi_rts_workqueue. + * 2. Caller thread waits until the work is finished because it's + * dependent on the return status and execution of efi_runtime_service(). + * For instance, get_variable() and get_next_variable(). + * * Copyright (C) 2014 Linaro Ltd. <ard.biesheu...@linaro.org> * * Split off from arch/x86/platform/efi/efi.c @@ -22,6 +30,8 @@ #include #include #include +#include + #include /* @@ -33,6 +43,57 @@ #define __efi_call_virt(f, args...) \ __efi_call_virt_pointer(efi.systab->runtime, f, args) +/* efi_runtime_service() function identifiers */ +enum { + GET_TIME, + SET_TIME, + GET_WAKEUP_TIME, + SET_WAKEUP_TIME, + GET_VARIABLE, + GET_NEXT_VARIABLE, + SET_VARIABLE, + SET_VARIABLE_NONBLOCKING, + QUERY_VARIABLE_INFO, + QUERY_VARIABLE_INFO_NONBLOCKING, + GET_NEXT_HIGH_MONO_COUNT, + RESET_SYSTEM, + UPDATE_CAPSULE, + QUERY_CAPSULE_CAPS, +}; + +/* + * efi_queue_work:
[PATCH V2 1/3] x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Invoking efi_runtime_services() through efi_workqueue means all accesses to efi_runtime_services() should be done after efi_rts_wq has been created. efi_delete_dummy_variable() calls set_variable(), hence efi_delete_dummy_variable() should be called after efi_rts_wq has been created. efi_delete_dummy_variable() is called from efi_enter_virtual_mode() which is early in the boot phase (efi_rts_wq isn't created yet), so call efi_delete_dummy_variable() later in the boot phase i.e. while initializing efi subsystem. In the next patch, this is the place where we create efi_rts_wq and all the efi_runtime_services() will be called using efi_rts_wq. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Suggested-by: Andy Lutomirski <l...@kernel.org> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Mark Rutland <mark.rutl...@arm.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Peter Zijlstra <peter.zijls...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Dan Williams <dan.j.willi...@intel.com> --- arch/x86/include/asm/efi.h | 1 - arch/x86/platform/efi/efi.c | 6 -- drivers/firmware/efi/efi.c | 6 ++ include/linux/efi.h | 3 +++ 4 files changed, 9 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index a399c1ebf6f0..43009e3f821b 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -143,7 +143,6 @@ extern void __init efi_runtime_update_mappings(void); extern void __init efi_dump_pagetable(void); extern void __init efi_apply_memmap_quirks(void); extern int __init efi_reuse_config(u64 tables, int nr_tables); -extern void efi_delete_dummy_variable(void); struct efi_setup_data { u64 fw_vendor; diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 9061babfbc83..a3169d14583f 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void) if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX)) runtime_code_page_mkexec(); - - /* clean DUMMY object */ - efi_delete_dummy_variable(); #endif } @@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void) * necessary relocation fixups for the new virtual addresses. */ efi_runtime_update_mappings(); - - /* clean DUMMY object */ - efi_delete_dummy_variable(); } void __init efi_enter_virtual_mode(void) diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index cd42f66a7c85..838b8efe639c 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -328,6 +328,12 @@ static int __init efisubsys_init(void) if (!efi_enabled(EFI_BOOT)) return 0; + /* +* Clean DUMMY object calls EFI Runtime Service, set_variable(), so +* it should be invoked only after efi_rts_workqueue is ready. +*/ + efi_delete_dummy_variable(); + /* We register the efi directory at /sys/firmware/efi */ efi_kobj = kobject_create_and_add("efi", firmware_kobj); if (!efi_kobj) { diff --git a/include/linux/efi.h b/include/linux/efi.h index f5083aa72eae..c4efb3ef0dfa 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -992,6 +992,7 @@ extern efi_status_t efi_query_variable_store(u32 attributes, unsigned long size, bool nonblocking); extern void efi_find_mirror(void); +extern void efi_delete_dummy_variable(void); #else static inline void efi_late_init(void) {} static inline void efi_free_boot_services(void) {} @@ -1002,6 +1003,8 @@ static inline efi_status_t efi_query_variable_store(u32 attributes, { return EFI_SUCCESS; } + +static inline void efi_delete_dummy_variable(void) {} #endif extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Presently, efi_runtime_services() are executed by firmware in process context. To execute efi_runtime_service(), kernel switches the page directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any user space mappings. A potential issue could be, for instance, an NMI interrupt (like perf) trying to profile some user data while in efi_pgd. A solution for this issue could be to use kthread to run efi_runtime_service(). When a user/kernel thread requests to execute efi_runtime_service(), kernel off-loads this work to kthread which in turn uses efi_pgd. Anything that tries to touch user space addresses while in kthread is terminally broken. This patch adds support to efi subsystem to handle all calls to efi_runtime_services() using a work queue (which in turn uses kthread). Implementation summary: --- 1. When user/kernel thread requests to execute efi_runtime_service(), enqueue work to efi_rts_workqueue. 2. Caller thread waits until the work is finished because it's dependent on the return status of efi_runtime_service(). Semantics to pack arguments in efi_runtime_work (has void pointers): 1. If argument is a pointer (of any type), pass it as is. 2. If argument is a value (of any type), address of the value is passed. Introduce a handler function (called efi_call_rts()) that a. understands efi_runtime_work and b. invokes the appropriate efi_runtime_service() with the appropriate arguments Semantics followed by efi_call_rts() to understand efi_runtime_work: 1. If argument was a pointer, recast it from void pointer to original pointer type. 2. If argument was a value, recast it from void pointer to original pointer type and dereference it. pstore writes could potentially be invoked in interrupt context and it uses set_variable<>() and query_variable_info<>() to store logs. If we invoke efi_runtime_services() through efi_rts_wq while in atomic() kernel issues a warning ("scheduling wile in atomic") and prints stack trace. One way to overcome this is to not make the caller process wait for the worker thread to finish. This approach breaks pstore i.e. the log messages aren't written to efi variables. Hence, pstore calls efi_runtime_services() without using efi_rts_wq or in other words efi_rts_wq will be used unconditionally for all the efi_runtime_services() except set_variable<>() and query_variable_info<>() Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Suggested-by: Andy Lutomirski <l...@kernel.org> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Mark Rutland <mark.rutl...@arm.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Peter Zijlstra <peter.zijls...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Dan Williams <dan.j.willi...@intel.com> --- drivers/firmware/efi/runtime-wrappers.c | 168 1 file changed, 148 insertions(+), 20 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index 649763171439..eff443bf942c 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -151,13 +151,105 @@ void efi_call_virt_check_flags(unsigned long flags, const char *call) */ static DEFINE_SEMAPHORE(efi_runtime_lock); +/* + * Calls the appropriate efi_runtime_service() with the appropriate + * arguments. + * + * Semantics followed by efi_call_rts() to understand efi_runtime_work: + * 1. If argument was a pointer, recast it from void pointer to original + * pointer type. + * 2. If argument was a value, recast it from void pointer to original + * pointer type and dereference it. + */ +static void efi_call_rts(struct work_struct *work) +{ + struct efi_runtime_work *efi_rts_work; + void *arg1, *arg2, *arg3, *arg4, *arg5; + efi_status_t status = EFI_NOT_FOUND; + + efi_rts_work = container_of(work, struct efi_runtime_work, work); + arg1 = efi_rts_work->arg1; + arg2 = efi_rts_work->arg2; + arg3 = efi_rts_work->arg3; + arg4 = efi_rts_work->arg4; + arg5 = efi_rts_work->arg5; + + switch (efi_rts_work->func) { + case GET_TIME: + status = efi_call_virt(get_time, (efi_time_t *)arg1, + (efi_time_cap_t *)arg2); + break; + case SET_TIME: + status = efi_call_virt(set_time, (efi_time_t *)arg1); + break; + case GET_WAKEUP_TIME: + status = efi_call_virt(get_
[PATCH V2 0/3] Use efi_rts_workqueue to invoke EFI Runtime Services
From: Sai Praneeth <sai.praneeth.prak...@intel.com> This patch set is an outcome of the discussion at https://lkml.org/lkml/2017/8/21/607 Presently, efi_runtime_services() are executed by firmware in process context. To execute efi_runtime_service(), kernel switches the page directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any user space mappings. A potential issue could be, for instance, an NMI interrupt (like perf) trying to profile some user data while in efi_pgd. A solution for this issue could be to use kthread to run efi_runtime_service(). When a user/kernel thread requests to execute efi_runtime_service(), kernel off-loads this work to kthread which in turn uses efi_pgd. Anything that tries to touch user space addresses while in kthread is terminally broken. This patch set adds support to the efi subsystem to handle all calls to efi_runtime_services() using a work queue (which in turn uses kthread). Implementation summary: --- 1. When a user/kernel thread requests to execute efi_runtime_service(), enqueue work to a work queue, efi_rts_workqueue. 2. The caller thread waits until the work is finished because it's dependent on the return status of efi_runtime_service() and, in specific cases, the arguments populated by efi_runtime_service(). Some efi_runtime_services() takes a pointer to buffer as an argument and fills up the buffer with requested data. For instance, efi_get_variable() and efi_get_next_variable(). Hence, the caller process cannot just post the work and get going, it has to wait for results from firmware. Caveat: efi_rts_workqueue to run efi_runtime_services() shouldn't be used while in atomic, because caller thread might sleep. Presently, pstore code doesn't use efi_rts_workqueue. Tested using LUV (Linux UEFI Validation) for x86_64 and x86_32. Builds fine for arm and arm64. Will appreciate the effort if someone could test the patches on real ARM/ARM64 machines. LUV: https://01.org/linux-uefi-validation Thanks to Ricardo and Dan for initial reviews and suggestions. Please feel free to pour in your comments and concerns. Note: Patches are based on Linus's kernel v4.16-rc4 Changes from V1 to V2: -- 1. Remove unnecessary include of asm/efi.h file - Fixes build error on ia64, reported by 0-day 2. Use enum to identify efi_runtime_services() 3. Use alloc_ordered_workqueue() to create efi_rts_wq as create_workqueue() is scheduled for depreciation. 4. Make efi_call_rts() static, as it has no callers outside runtime-wrappers.c 5. Use BUG(), when we are unable to queue work or unable to identify requested efi_runtime_service() - Because these two situations should *never* happen. Sai Praneeth (3): x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services() efi: Use efi_rts_workqueue to invoke EFI Runtime Services arch/x86/include/asm/efi.h | 1 - arch/x86/platform/efi/efi.c | 6 - drivers/firmware/efi/efi.c | 21 +++ drivers/firmware/efi/runtime-wrappers.c | 229 +--- include/linux/efi.h | 23 5 files changed, 253 insertions(+), 27 deletions(-) Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Suggested-by: Andy Lutomirski <l...@kernel.org> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Mark Rutland <mark.rutl...@arm.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Peter Zijlstra <peter.zijls...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Dan Williams <dan.j.willi...@intel.com> -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V1 1/3] x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Invoking efi_runtime_services() through efi_workqueue means all accesses to efi_runtime_services() should be done after efi_rts_wq has been created. efi_delete_dummy_variable() calls set_variable(), hence efi_delete_dummy_variable() should be called after efi_rts_wq has been created. efi_delete_dummy_variable() is called from efi_enter_virtual_mode() which is early in the boot phase (efi_rts_wq isn't created yet), so call efi_delete_dummy_variable() later in the boot phase i.e. while initializing efi subsystem. In the next patch, this is the place where we create efi_rts_wq and all the efi_runtime_services() will be called using efi_rts_wq. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Suggested-by: Andy Lutomirski <l...@kernel.org> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Mark Rutland <mark.rutl...@arm.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Peter Zijlstra <peter.zijls...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Dan Williams <dan.j.willi...@intel.com> --- arch/x86/include/asm/efi.h | 1 - arch/x86/platform/efi/efi.c | 6 -- drivers/firmware/efi/efi.c | 7 +++ include/linux/efi.h | 3 +++ 4 files changed, 10 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 85f6ccb80b91..34b03440a80f 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -130,7 +130,6 @@ extern void __init efi_runtime_update_mappings(void); extern void __init efi_dump_pagetable(void); extern void __init efi_apply_memmap_quirks(void); extern int __init efi_reuse_config(u64 tables, int nr_tables); -extern void efi_delete_dummy_variable(void); struct efi_setup_data { u64 fw_vendor; diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 9061babfbc83..a3169d14583f 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void) if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX)) runtime_code_page_mkexec(); - - /* clean DUMMY object */ - efi_delete_dummy_variable(); #endif } @@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void) * necessary relocation fixups for the new virtual addresses. */ efi_runtime_update_mappings(); - - /* clean DUMMY object */ - efi_delete_dummy_variable(); } void __init efi_enter_virtual_mode(void) diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index cd42f66a7c85..ac5db5f8dbbf 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -33,6 +33,7 @@ #include #include +#include struct efi __read_mostly efi = { .mps= EFI_INVALID_TABLE_ADDR, @@ -328,6 +329,12 @@ static int __init efisubsys_init(void) if (!efi_enabled(EFI_BOOT)) return 0; + /* +* Clean DUMMY object calls EFI Runtime Service, set_variable(), so +* it should be invoked only after efi_rts_workqueue is ready. +*/ + efi_delete_dummy_variable(); + /* We register the efi directory at /sys/firmware/efi */ efi_kobj = kobject_create_and_add("efi", firmware_kobj); if (!efi_kobj) { diff --git a/include/linux/efi.h b/include/linux/efi.h index f5083aa72eae..c4efb3ef0dfa 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -992,6 +992,7 @@ extern efi_status_t efi_query_variable_store(u32 attributes, unsigned long size, bool nonblocking); extern void efi_find_mirror(void); +extern void efi_delete_dummy_variable(void); #else static inline void efi_late_init(void) {} static inline void efi_free_boot_services(void) {} @@ -1002,6 +1003,8 @@ static inline efi_status_t efi_query_variable_store(u32 attributes, { return EFI_SUCCESS; } + +static inline void efi_delete_dummy_variable(void) {} #endif extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V1 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Presently, efi_runtime_services() are executed by firmware in process context. To execute efi_runtime_service(), kernel switches the page directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any user space mappings. A potential issue could be, for instance, an NMI interrupt (like perf) trying to profile some user data while in efi_pgd. A solution for this issue could be to use kthread to run efi_runtime_service(). When a user/kernel thread requests to execute efi_runtime_service(), kernel off-loads this work to kthread which in turn uses efi_pgd. Anything that tries to touch user space addresses while in kthread is terminally broken. This patch adds support to efi subsystem to handle all calls to efi_runtime_services() using a work queue (which in turn uses kthread). Implementation summary: --- 1. When user/kernel thread requests to execute efi_runtime_service(), enqueue work to efi_rts_workqueue. 2. Caller thread waits until the work is finished because it's dependent on the return status of efi_runtime_service(). pstore writes could potentially be invoked in interrupt context and it uses set_variable<>() and query_variable_info<>() to store logs. If we invoke efi_runtime_services() through efi_rts_wq while in atomic() kernel issues a warning ("scheduling wile in atomic") and prints stack trace. One way to overcome this is to not make the caller process wait for the worker thread to finish. This approach breaks pstore i.e. the log messages aren't written to efi variables. Hence, pstore calls efi_runtime_services() without using efi_rts_wq or in other words efi_rts_wq will be used unconditionally for all the efi_runtime_services() except set_variable<>() and query_variable_info<>() Semantics to pack arguments in efi_runtime_work (has void pointers): 1. If argument is a pointer (of any type), pass it as is. 2. If argument is a value (of any type), address of the value is passed. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Suggested-by: Andy Lutomirski <l...@kernel.org> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Mark Rutland <mark.rutl...@arm.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Peter Zijlstra <peter.zijls...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Dan Williams <dan.j.willi...@intel.com> --- drivers/firmware/efi/runtime-wrappers.c | 86 + 1 file changed, 66 insertions(+), 20 deletions(-) diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index 5cdb787da5d3..531d077aac70 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -68,6 +68,16 @@ * semaphore (efi_runtime_lock) and caller waits until the work is * finished, hence _only_ one work is queued at a time. So, queue_work() * should never fail. + * + * efi_rts_workqueue to run efi_runtime_services() shouldn't be used + * while in atomic, because caller thread might sleep. pstore writes + * could potentially be invoked in interrupt context and it uses + * set_variable<>() and query_variable_info<>(), so pstore code doesn't + * use efi_rts_workqueue. + * + * Semantics that caller function should follow while passing arguments: + * 1. If argument is a pointer (of any type), pass it as is. + * 2. If argument is a value (of any type), address of the value is passed. */ #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5) \ ({ \ @@ -150,7 +160,7 @@ static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc) if (down_interruptible(_runtime_lock)) return EFI_ABORTED; - status = efi_call_virt(get_time, tm, tc); + status = efi_queue_work(GET_TIME, tm, tc, NULL, NULL, NULL); up(_runtime_lock); return status; } @@ -161,7 +171,7 @@ static efi_status_t virt_efi_set_time(efi_time_t *tm) if (down_interruptible(_runtime_lock)) return EFI_ABORTED; - status = efi_call_virt(set_time, tm); + status = efi_queue_work(SET_TIME, tm, NULL, NULL, NULL, NULL); up(_runtime_lock); return status; } @@ -174,7 +184,8 @@ static efi_status_t virt_efi_get_wakeup_time(efi_bool_t *enabled, if (down_interruptible(_runtime_lock)) return EFI_ABORTED; - status = efi_call_virt(get_wakeup_time, enabled, pending, tm); + status = efi_queue_work(GET_
[PATCH V1 2/3] efi: Introduce efi_rts_workqueue and necessary infrastructure to invoke all efi_runtime_services()
From: Sai Praneeth <sai.praneeth.prak...@intel.com> When a process requests the kernel to execute any efi_runtime_service(), the requested efi_runtime_service (represented as an identifier) and its arguments are packed into a struct named efi_runtime_work and queued onto work queue named efi_rts_wq. The caller then waits until the work is completed. Introduce necessary infrastructure: 1. Creating workqueue named efi_rts_wq 2. A macro (efi_queue_work()) that a. populates efi_runtime_work b. queues work onto efi_rts_wq and c. waits until worker thread returns 3. A handler function that a. understands efi_runtime_work and b. invokes the appropriate efi_runtime_service() with the appropriate arguments The caller thread has to wait until the worker thread returns, because it's dependent on the return status of efi_runtime_service() and, in specific cases, the arguments populated by efi_runtime_service(). Some efi_runtime_services() takes a pointer to buffer as an argument and fills up the buffer with requested data. For instance, efi_get_variable() and efi_get_next_variable(). Hence, caller process cannot just post the work and get going. Some facts about efi_runtime_services(): 1. A quick look at all the efi_runtime_services() shows that any efi_runtime_service() has five or less arguments. 2. An argument of efi_runtime_service() can be a value (of any type) or a pointer (of any type). Hence, efi_runtime_work has five void pointers to store these arguments. Semantics followed by efi_call_rts() to understand efi_runtime_work: 1. If argument was a pointer, recast it from void pointer to original pointer type. 2. If argument was a value, recast it from void pointer to original pointer type and dereference it. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Suggested-by: Andy Lutomirski <l...@kernel.org> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Mark Rutland <mark.rutl...@arm.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Peter Zijlstra <peter.zijls...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Dan Williams <dan.j.willi...@intel.com> --- drivers/firmware/efi/efi.c | 11 +++ drivers/firmware/efi/runtime-wrappers.c | 143 include/linux/efi.h | 23 + 3 files changed, 177 insertions(+) diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index ac5db5f8dbbf..4714b305ca90 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -76,6 +76,8 @@ static unsigned long *efi_tables[] = { _attr_table, }; +struct workqueue_struct *efi_rts_wq; + static bool disable_runtime; static int __init setup_noefi(char *arg) { @@ -329,6 +331,15 @@ static int __init efisubsys_init(void) if (!efi_enabled(EFI_BOOT)) return 0; + /* Create a work queue to run EFI Runtime Services */ + efi_rts_wq = create_workqueue("efi_rts_workqueue"); + if (!efi_rts_wq) { + pr_err("Failed to create efi_rts_workqueue, EFI runtime services " + "disabled.\n"); + clear_bit(EFI_RUNTIME_SERVICES, ); + return 0; + } + /* * Clean DUMMY object calls EFI Runtime Service, set_variable(), so * it should be invoked only after efi_rts_workqueue is ready. diff --git a/drivers/firmware/efi/runtime-wrappers.c b/drivers/firmware/efi/runtime-wrappers.c index ae54870b2788..5cdb787da5d3 100644 --- a/drivers/firmware/efi/runtime-wrappers.c +++ b/drivers/firmware/efi/runtime-wrappers.c @@ -1,6 +1,14 @@ /* * runtime-wrappers.c - Runtime Services function call wrappers * + * Implementation summary: + * --- + * 1. When user/kernel thread requests to execute efi_runtime_service(), + * enqueue work to efi_rts_workqueue. + * 2. Caller thread waits until the work is finished because it's + * dependent on the return status and execution of efi_runtime_service(). + * For instance, get_variable() and get_next_variable(). + * * Copyright (C) 2014 Linaro Ltd. <ard.biesheu...@linaro.org> * * Split off from arch/x86/platform/efi/efi.c @@ -22,6 +30,8 @@ #include #include #include +#include + #include /* @@ -33,6 +43,50 @@ #define __efi_call_virt(f, args...) \ __efi_call_virt_pointer(efi.systab->runtime, f, args) +/* Each EFI Runtime Service is represented with a unique number */ +#define GET_TIME 0 +#define SET_
[PATCH V1 0/3] Use efi_rts_workqueue to invoke EFI Runtime Services
From: Sai Praneeth <sai.praneeth.prak...@intel.com> This patch set is an outcome of the discussion at https://lkml.org/lkml/2017/8/21/607 Presently, efi_runtime_services() are executed by firmware in process context. To execute efi_runtime_service(), kernel switches the page directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any user space mappings. A potential issue could be, for instance, an NMI interrupt (like perf) trying to profile some user data while in efi_pgd. A solution for this issue could be to use kthread to run efi_runtime_service(). When a user/kernel thread requests to execute efi_runtime_service(), kernel off-loads this work to kthread which in turn uses efi_pgd. Anything that tries to touch user space addresses while in kthread is terminally broken. This patch set adds support to the efi subsystem to handle all calls to efi_runtime_services() using a work queue (which in turn uses kthread). Implementation summary: --- 1. When a user/kernel thread requests to execute efi_runtime_service(), enqueue work to a work queue, efi_rts_workqueue. 2. The caller thread waits until the work is finished because it's dependent on the return status of efi_runtime_service() and, in specific cases, the arguments populated by efi_runtime_service(). Some efi_runtime_services() takes a pointer to buffer as an argument and fills up the buffer with requested data. For instance, efi_get_variable() and efi_get_next_variable(). Hence, the caller process cannot just post the work and get going, it has to wait for results from firmware. Caveat: efi_rts_workqueue to run efi_runtime_services() shouldn't be used while in atomic, because caller thread might sleep. Presently, pstore code doesn't use efi_rts_workqueue. Tested using LUV (Linux UEFI Validation) for x86_64 and x86_32. Builds fine for arm and arm64. Will appreciate the effort if someone could test the patches on ARM (although I was able to boot with LUV for ARM). LUV: https://01.org/linux-uefi-validation Thanks to Ricardo and Dan for initial reviews and suggestions. Please feel free to pour in your comments and concerns. Note: Patches are based on Linus's kernel v4.16-rc2 Sai Praneeth (3): x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization efi: Introduce efi_rts_workqueue and necessary infrastructure to invoke all efi_runtime_services() efi: Use efi_rts_workqueue to invoke EFI Runtime Services arch/x86/include/asm/efi.h | 1 - arch/x86/platform/efi/efi.c | 6 - drivers/firmware/efi/efi.c | 18 +++ drivers/firmware/efi/runtime-wrappers.c | 229 +--- include/linux/efi.h | 26 5 files changed, 253 insertions(+), 27 deletions(-) Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Suggested-by: Andy Lutomirski <l...@kernel.org> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Will Deacon <will.dea...@arm.com> Cc: Dave Hansen <dave.han...@intel.com> Cc: Mark Rutland <mark.rutl...@arm.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Peter Zijlstra <peter.zijls...@intel.com> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Dan Williams <dan.j.willi...@intel.com> -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 1/3] efi: Use efi_mm in x86 as well as ARM
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Presently, only ARM uses mm_struct to manage efi page tables and efi runtime region mappings. As this is the preferred approach, let's make this data structure common across architectures. Specially, for x86, using this data structure improves code maintainability and readability. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Tested-by: Bhupesh Sharma <bhsha...@redhat.com> --- arch/x86/include/asm/efi.h | 4 arch/x86/platform/efi/efi_64.c | 3 +++ drivers/firmware/efi/arm-runtime.c | 9 - drivers/firmware/efi/efi.c | 9 + include/linux/efi.h| 2 ++ 5 files changed, 18 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 85f6ccb80b91..00f977ddd718 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -2,10 +2,14 @@ #ifndef _ASM_X86_EFI_H #define _ASM_X86_EFI_H +#include +#include + #include #include #include #include +#include /* * We map the EFI regions needed for runtime services non-contiguously, diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 2dd15e967c3f..c9f8e6924df7 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -232,6 +232,9 @@ int __init efi_alloc_page_tables(void) return -ENOMEM; } + mm_init_cpumask(_mm); + init_new_context(NULL, _mm); + return 0; } diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c index 1cc41c3d6315..d6b26534812b 100644 --- a/drivers/firmware/efi/arm-runtime.c +++ b/drivers/firmware/efi/arm-runtime.c @@ -31,15 +31,6 @@ extern u64 efi_system_table; -static struct mm_struct efi_mm = { - .mm_rb = RB_ROOT, - .mm_users = ATOMIC_INIT(2), - .mm_count = ATOMIC_INIT(1), - .mmap_sem = __RWSEM_INITIALIZER(efi_mm.mmap_sem), - .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock), - .mmlist = LIST_HEAD_INIT(efi_mm.mmlist), -}; - #ifdef CONFIG_ARM64_PTDUMP_DEBUGFS #include diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index 557a47829d03..760260b933b6 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -74,6 +74,15 @@ static unsigned long *efi_tables[] = { _attr_table, }; +struct mm_struct efi_mm = { + .mm_rb = RB_ROOT, + .mm_users = ATOMIC_INIT(2), + .mm_count = ATOMIC_INIT(1), + .mmap_sem = __RWSEM_INITIALIZER(efi_mm.mmap_sem), + .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock), + .mmlist = LIST_HEAD_INIT(efi_mm.mmlist), +}; + static bool disable_runtime; static int __init setup_noefi(char *arg) { diff --git a/include/linux/efi.h b/include/linux/efi.h index 29fdf8029cf6..d79f1cc4c8bb 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -930,6 +930,8 @@ extern struct efi { unsigned long flags; } efi; +extern struct mm_struct efi_mm; + static inline int efi_guidcmp (efi_guid_t left, efi_guid_t right) { -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 2/3] x86/efi: Replace efi_pgd with efi_mm.pgd
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Since the previous patch added support for efi_mm, let's handle efi_pgd through efi_mm and remove global variable efi_pgd. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Tested-by: Bhupesh Sharma <bhsha...@redhat.com> --- arch/x86/platform/efi/efi_64.c | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index c9f8e6924df7..c93f59731608 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -191,8 +191,6 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd) early_code_mapping_set_exec(0); } -static pgd_t *efi_pgd; - /* * We need our own copy of the higher levels of the page tables * because we want to avoid inserting EFI region mappings (EFI_VA_END @@ -204,7 +202,7 @@ static pgd_t *efi_pgd; */ int __init efi_alloc_page_tables(void) { - pgd_t *pgd; + pgd_t *pgd, *efi_pgd; p4d_t *p4d; pud_t *pud; gfp_t gfp_mask; @@ -232,6 +230,7 @@ int __init efi_alloc_page_tables(void) return -ENOMEM; } + efi_mm.pgd = efi_pgd; mm_init_cpumask(_mm); init_new_context(NULL, _mm); @@ -247,6 +246,7 @@ void efi_sync_low_kernel_mappings(void) pgd_t *pgd_k, *pgd_efi; p4d_t *p4d_k, *p4d_efi; pud_t *pud_k, *pud_efi; + pgd_t *efi_pgd = efi_mm.pgd; if (efi_enabled(EFI_OLD_MEMMAP)) return; @@ -340,7 +340,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) unsigned long pfn, text, pf; struct page *page; unsigned npages; - pgd_t *pgd; + pgd_t *pgd = efi_mm.pgd; if (efi_enabled(EFI_OLD_MEMMAP)) return 0; @@ -350,8 +350,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) * this value is loaded into cr3 the PGD will be decrypted during * the pagetable walk. */ - efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd); - pgd = efi_pgd; + efi_scratch.efi_pgt = (pgd_t *)__sme_pa(pgd); /* * It can happen that the physical address of new_memmap lands in memory @@ -421,7 +420,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 va) { unsigned long flags = _PAGE_RW; unsigned long pfn; - pgd_t *pgd = efi_pgd; + pgd_t *pgd = efi_mm.pgd; if (!(md->attribute & EFI_MEMORY_WB)) flags |= _PAGE_PCD; @@ -525,7 +524,7 @@ void __init parse_efi_setup(u64 phys_addr, u32 data_len) static int __init efi_update_mappings(efi_memory_desc_t *md, unsigned long pf) { unsigned long pfn; - pgd_t *pgd = efi_pgd; + pgd_t *pgd = efi_mm.pgd; int err1, err2; /* Update the 1:1 mapping */ @@ -622,7 +621,7 @@ void __init efi_dump_pagetable(void) if (efi_enabled(EFI_OLD_MEMMAP)) ptdump_walk_pgd_level(NULL, swapper_pg_dir); else - ptdump_walk_pgd_level(NULL, efi_pgd); + ptdump_walk_pgd_level(NULL, efi_mm.pgd); #endif } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Use helper function (efi_switch_mm()) to switch to/from efi_mm. We switch to efi_mm before calling 1. efi_set_virtual_address_map() and 2. Invoking any efi_runtime_service() Likewise, we need to switch back to previous mm (mm context stolen by efi_mm) after the above calls return successfully. We can use efi_switch_mm() helper function only with x86_64 kernel and "efi=old_map" disabled because, x86_32 and efi=old_map doesn't use efi_pgd, rather they use swapper_pg_dir. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Tested-by: Bhupesh Sharma <bhsha...@redhat.com> --- arch/x86/include/asm/efi.h | 25 +- arch/x86/platform/efi/efi_64.c | 40 +++- arch/x86/platform/efi/efi_thunk_64.S | 2 +- 3 files changed, 32 insertions(+), 35 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 00f977ddd718..cda9940bed7a 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -62,14 +62,13 @@ extern asmlinkage u64 efi_call(void *fp, ...); #define efi_call_phys(f, args...) efi_call((f), args) /* - * Scratch space used for switching the pagetable in the EFI stub + * struct efi_scratch - Scratch space used while switching to/from efi_mm + * @phys_stack: stack used during EFI Mixed Mode + * @prev_mm:store/restore stolen mm_struct while switching to/from efi_mm */ struct efi_scratch { - u64 r15; - u64 prev_cr3; - pgd_t *efi_pgt; - booluse_pgd; - u64 phys_stack; + u64 phys_stack; + struct mm_struct*prev_mm; } __packed; #define arch_efi_call_virt_setup() \ @@ -78,11 +77,8 @@ struct efi_scratch { preempt_disable(); \ __kernel_fpu_begin(); \ \ - if (efi_scratch.use_pgd) { \ - efi_scratch.prev_cr3 = __read_cr3();\ - write_cr3((unsigned long)efi_scratch.efi_pgt); \ - __flush_tlb_all(); \ - } \ + if (!efi_enabled(EFI_OLD_MEMMAP)) \ + efi_switch_mm(_mm); \ }) #define arch_efi_call_virt(p, f, args...) \ @@ -90,10 +86,8 @@ struct efi_scratch { #define arch_efi_call_virt_teardown() \ ({ \ - if (efi_scratch.use_pgd) { \ - write_cr3(efi_scratch.prev_cr3);\ - __flush_tlb_all(); \ - } \ + if (!efi_enabled(EFI_OLD_MEMMAP)) \ + efi_switch_mm(efi_scratch.prev_mm); \ \ __kernel_fpu_end(); \ preempt_enable(); \ @@ -135,6 +129,7 @@ extern void __init efi_dump_pagetable(void); extern void __init efi_apply_memmap_quirks(void); extern int __init efi_reuse_config(u64 tables, int nr_tables); extern void efi_delete_dummy_variable(void); +extern void efi_switch_mm(struct mm_struct *mm); struct efi_setup_data { u64 fw_vendor; diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index c93f59731608..d6892ad2a693 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -82,9 +82,8 @@ pgd_t * __init efi_call_phys_prolog(void) int n_pgds, i, j; if (!efi_enabled(EFI_OLD_MEMMAP)) { - save_pgd = (pgd_t *)__read_cr3(); - write_cr3((unsigned long)efi_scratch.efi_pgt); - goto out; + efi_switch_mm(_mm); + return NULL; } early_code_mapping_set_exec(1); @@ -156,8 +155,7 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
[PATCH V4 0/3] Use mm_struct and switch_mm() instead of manually
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Presently, in x86, to invoke any efi function like efi_set_virtual_address_map() or any efi_runtime_service() the code path typically involves read_cr3() (save previous pgd), write_cr3() (write efi_pgd) and calling efi function. Likewise after returning from efi function the code path typically involves read_cr3() (save efi_pgd), write_cr3() (write previous pgd). We do this couple of times in efi subsystem of Linux kernel, instead we can use helper function efi_switch_mm() to do this. This improves readability and maintainability. Also, instead of maintaining a separate struct "efi_scratch" to store/restore efi_pgd, we can use mm_struct to do this. I have tested this patch set against LUV (Linux UEFI Validation), so I think I didn't break any existing configurations. I have tested this patch set for 1. x86_64, 2. x86_32, 3. Mixed mode with efi=old_map and for kexec kernel. Please let me know if I have missed any other configurations. Changes in V2: 1. Resolve mm_dropping() issue by not mm_dropping()/mm_grabbing() any mm, as we are not losing/creating any references. Changes in V3: 1. When CPUMASK_OFFSTACK is enabled, switch_mm_irqs_off() sets cpumask by calling cpumask_set_cpu(). This panics kernel as efi_mm is not initialized, therefore initialize efi_mm in efi_alloc_page_tables(). Changes in V4: 1. Remove the unintended removal of local_irq_restore(flags) (in 3rd patch). IRQ flags should be restored after switching to orginal mm. Note: This patch set is based on Linus's tree v4.15-rc8 Sai Praneeth (3): efi: Use efi_mm in x86 as well as ARM x86/efi: Replace efi_pgd with efi_mm.pgd x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3 arch/x86/include/asm/efi.h | 29 +- arch/x86/platform/efi/efi_64.c | 58 +++- arch/x86/platform/efi/efi_thunk_64.S | 2 +- drivers/firmware/efi/arm-runtime.c | 9 -- drivers/firmware/efi/efi.c | 9 ++ include/linux/efi.h | 2 ++ 6 files changed, 57 insertions(+), 52 deletions(-) Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Tested-by: Bhupesh Sharma <bhsha...@redhat.com> -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] x86/efi: Replace efi_pgd with efi_mm.pgd
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Since the previous patch added support for efi_mm, let's handle efi_pgd through efi_mm and remove global variable efi_pgd. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Tested-by: Bhupesh Sharma <bhsha...@redhat.com> --- arch/x86/platform/efi/efi_64.c | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index ccf5239923e8..6b541bdbda5f 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -189,8 +189,6 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd) early_code_mapping_set_exec(0); } -static pgd_t *efi_pgd; - /* * We need our own copy of the higher levels of the page tables * because we want to avoid inserting EFI region mappings (EFI_VA_END @@ -199,7 +197,7 @@ static pgd_t *efi_pgd; */ int __init efi_alloc_page_tables(void) { - pgd_t *pgd; + pgd_t *pgd, *efi_pgd; p4d_t *p4d; pud_t *pud; gfp_t gfp_mask; @@ -227,6 +225,7 @@ int __init efi_alloc_page_tables(void) return -ENOMEM; } + efi_mm.pgd = efi_pgd; mm_init_cpumask(_mm); init_new_context(NULL, _mm); @@ -242,6 +241,7 @@ void efi_sync_low_kernel_mappings(void) pgd_t *pgd_k, *pgd_efi; p4d_t *p4d_k, *p4d_efi; pud_t *pud_k, *pud_efi; + pgd_t *efi_pgd = efi_mm.pgd; if (efi_enabled(EFI_OLD_MEMMAP)) return; @@ -335,7 +335,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) unsigned long pfn, text, pf; struct page *page; unsigned npages; - pgd_t *pgd; + pgd_t *pgd = efi_mm.pgd; if (efi_enabled(EFI_OLD_MEMMAP)) return 0; @@ -345,8 +345,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages) * this value is loaded into cr3 the PGD will be decrypted during * the pagetable walk. */ - efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd); - pgd = efi_pgd; + efi_scratch.efi_pgt = (pgd_t *)__sme_pa(pgd); /* * It can happen that the physical address of new_memmap lands in memory @@ -416,7 +415,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 va) { unsigned long flags = _PAGE_RW; unsigned long pfn; - pgd_t *pgd = efi_pgd; + pgd_t *pgd = efi_mm.pgd; if (!(md->attribute & EFI_MEMORY_WB)) flags |= _PAGE_PCD; @@ -520,7 +519,7 @@ void __init parse_efi_setup(u64 phys_addr, u32 data_len) static int __init efi_update_mappings(efi_memory_desc_t *md, unsigned long pf) { unsigned long pfn; - pgd_t *pgd = efi_pgd; + pgd_t *pgd = efi_mm.pgd; int err1, err2; /* Update the 1:1 mapping */ @@ -617,7 +616,7 @@ void __init efi_dump_pagetable(void) if (efi_enabled(EFI_OLD_MEMMAP)) ptdump_walk_pgd_level(NULL, swapper_pg_dir); else - ptdump_walk_pgd_level(NULL, efi_pgd); + ptdump_walk_pgd_level(NULL, efi_mm.pgd); #endif } -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Use helper function (efi_switch_mm()) to switch to/from efi_mm. We switch to efi_mm before calling 1. efi_set_virtual_address_map() and 2. Invoking any efi_runtime_service() Likewise, we need to switch back to previous mm (mm context stolen by efi_mm) after the above calls return successfully. We can use efi_switch_mm() helper function only with x86_64 kernel and "efi=old_map" disabled because, x86_32 and efi=old_map doesn't use efi_pgd, rather they use swapper_pg_dir. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Bhupesh Sharma <bhsha...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Tested-by: Bhupesh Sharma <bhsha...@redhat.com> --- arch/x86/include/asm/efi.h | 25 +- arch/x86/platform/efi/efi_64.c | 41 ++-- arch/x86/platform/efi/efi_thunk_64.S | 2 +- 3 files changed, 32 insertions(+), 36 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 00f977ddd718..cda9940bed7a 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -62,14 +62,13 @@ extern asmlinkage u64 efi_call(void *fp, ...); #define efi_call_phys(f, args...) efi_call((f), args) /* - * Scratch space used for switching the pagetable in the EFI stub + * struct efi_scratch - Scratch space used while switching to/from efi_mm + * @phys_stack: stack used during EFI Mixed Mode + * @prev_mm:store/restore stolen mm_struct while switching to/from efi_mm */ struct efi_scratch { - u64 r15; - u64 prev_cr3; - pgd_t *efi_pgt; - booluse_pgd; - u64 phys_stack; + u64 phys_stack; + struct mm_struct*prev_mm; } __packed; #define arch_efi_call_virt_setup() \ @@ -78,11 +77,8 @@ struct efi_scratch { preempt_disable(); \ __kernel_fpu_begin(); \ \ - if (efi_scratch.use_pgd) { \ - efi_scratch.prev_cr3 = __read_cr3();\ - write_cr3((unsigned long)efi_scratch.efi_pgt); \ - __flush_tlb_all(); \ - } \ + if (!efi_enabled(EFI_OLD_MEMMAP)) \ + efi_switch_mm(_mm); \ }) #define arch_efi_call_virt(p, f, args...) \ @@ -90,10 +86,8 @@ struct efi_scratch { #define arch_efi_call_virt_teardown() \ ({ \ - if (efi_scratch.use_pgd) { \ - write_cr3(efi_scratch.prev_cr3);\ - __flush_tlb_all(); \ - } \ + if (!efi_enabled(EFI_OLD_MEMMAP)) \ + efi_switch_mm(efi_scratch.prev_mm); \ \ __kernel_fpu_end(); \ preempt_enable(); \ @@ -135,6 +129,7 @@ extern void __init efi_dump_pagetable(void); extern void __init efi_apply_memmap_quirks(void); extern int __init efi_reuse_config(u64 tables, int nr_tables); extern void efi_delete_dummy_variable(void); +extern void efi_switch_mm(struct mm_struct *mm); struct efi_setup_data { u64 fw_vendor; diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 6b541bdbda5f..c325b1cc4d1a 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -82,9 +82,8 @@ pgd_t * __init efi_call_phys_prolog(void) int n_pgds, i, j; if (!efi_enabled(EFI_OLD_MEMMAP)) { - save_pgd = (pgd_t *)__read_cr3(); - write_cr3((unsigned long)efi_scratch.efi_pgt); - goto out; + efi_switch_mm(_mm); + return NULL; } early_code_mapping_set_exec(1); @@ -154,8 +153,7 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
[PATCH 1/3] efi: Use efi_mm in x86 as well as ARM
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Presently, only ARM uses mm_struct to manage efi page tables and efi runtime region mappings. As this is the preferred approach, let's make this data structure common across architectures. Specially, for x86, using this data structure improves code maintainability and readability. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Tested-by: Bhupesh Sharma <bhsha...@redhat.com> --- arch/x86/include/asm/efi.h | 4 arch/x86/platform/efi/efi_64.c | 3 +++ drivers/firmware/efi/arm-runtime.c | 9 - drivers/firmware/efi/efi.c | 9 + include/linux/efi.h| 2 ++ 5 files changed, 18 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 85f6ccb80b91..00f977ddd718 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -2,10 +2,14 @@ #ifndef _ASM_X86_EFI_H #define _ASM_X86_EFI_H +#include +#include + #include #include #include #include +#include /* * We map the EFI regions needed for runtime services non-contiguously, diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 6a151ce70e86..ccf5239923e8 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -227,6 +227,9 @@ int __init efi_alloc_page_tables(void) return -ENOMEM; } + mm_init_cpumask(_mm); + init_new_context(NULL, _mm); + return 0; } diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c index 1cc41c3d6315..d6b26534812b 100644 --- a/drivers/firmware/efi/arm-runtime.c +++ b/drivers/firmware/efi/arm-runtime.c @@ -31,15 +31,6 @@ extern u64 efi_system_table; -static struct mm_struct efi_mm = { - .mm_rb = RB_ROOT, - .mm_users = ATOMIC_INIT(2), - .mm_count = ATOMIC_INIT(1), - .mmap_sem = __RWSEM_INITIALIZER(efi_mm.mmap_sem), - .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock), - .mmlist = LIST_HEAD_INIT(efi_mm.mmlist), -}; - #ifdef CONFIG_ARM64_PTDUMP_DEBUGFS #include diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index 557a47829d03..760260b933b6 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -74,6 +74,15 @@ static unsigned long *efi_tables[] = { _attr_table, }; +struct mm_struct efi_mm = { + .mm_rb = RB_ROOT, + .mm_users = ATOMIC_INIT(2), + .mm_count = ATOMIC_INIT(1), + .mmap_sem = __RWSEM_INITIALIZER(efi_mm.mmap_sem), + .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock), + .mmlist = LIST_HEAD_INIT(efi_mm.mmlist), +}; + static bool disable_runtime; static int __init setup_noefi(char *arg) { diff --git a/include/linux/efi.h b/include/linux/efi.h index d813f7b04da7..6745f4dbbcc1 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -928,6 +928,8 @@ extern struct efi { unsigned long flags; } efi; +extern struct mm_struct efi_mm; + static inline int efi_guidcmp (efi_guid_t left, efi_guid_t right) { -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] Use mm_struct and switch_mm() instead of manually
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Presently, in x86, to invoke any efi function like efi_set_virtual_address_map() or any efi_runtime_service() the code path typically involves read_cr3() (save previous pgd), write_cr3() (write efi_pgd) and calling efi function. Likewise after returning from efi function the code path typically involves read_cr3() (save efi_pgd), write_cr3() (write previous pgd). We do this couple of times in efi subsystem of Linux kernel, instead we can use helper function efi_switch_mm() to do this. This improves readability and maintainability. Also, instead of maintaining a separate struct "efi_scratch" to store/restore efi_pgd, we can use mm_struct to do this. I have tested this patch set against LUV (Linux UEFI Validation), so I think I didn't break any existing configurations. I have tested this patch set for 1. x86_64, 2. x86_32, 3. Mixed mode with efi=old_map and for kexec kernel. Please let me know if I have missed any other configurations. Changes in V2: 1. Resolve mm_dropping() issue by not mm_dropping()/mm_grabbing() any mm, as we are not losing/creating any references. Changes in V3: 1. When CPUMASK_OFFSTACK is enabled, switch_mm_irqs_off() sets cpumask by calling cpumask_set_cpu(). This panics kernel as efi_mm is not initialized, therefore initialize efi_mm in efi_alloc_page_tables(). Note: This patch set is based on Linus's tree v4.15-rc3 Sai Praneeth (3): efi: Use efi_mm in x86 as well as ARM x86/efi: Replace efi_pgd with efi_mm.pgd x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3 arch/x86/include/asm/efi.h | 29 +- arch/x86/platform/efi/efi_64.c | 59 +++- arch/x86/platform/efi/efi_thunk_64.S | 2 +- drivers/firmware/efi/arm-runtime.c | 9 -- drivers/firmware/efi/efi.c | 9 ++ include/linux/efi.h | 2 ++ 6 files changed, 57 insertions(+), 53 deletions(-) Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> Tested-by: Bhupesh Sharma <bhsha...@redhat.com> -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of manually
On Tue, 2017-09-05 at 19:21 -0700, Sai Praneeth Prakhya wrote: > > I get a similar crash on Qemu with linus's master branch and the V2 > > applied on top of it. Here are the details of my test environment: > > > > 1. I use the OVMF (EDK2) EFI firmware to launch the kernel: > > edk2.git/ovmf-x64 > > > > 2. I used linus's master branch (HEAD - commit: > > b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top > > of the same. > > > > 3. I use the following qemu command line to launch the test: > > > > # /usr/local/bin/qemu-system-x86_64 --version > > QEMU emulator version 2.9.50 (v2.9.0-526-g76d20ea) > > Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers > > > > # /usr/local/bin/qemu-system-x86_64 -enable-kvm -net nic -net tap -m > > $MEMSIZE -nographic -drive file=$DISK_IMAGE,if=virtio,format=qcow2 > > -vga std -boot c -cpu host -kernel $KERNEL -append > > "crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81" -initrd > > $INITRAMFS -bios $OVMF_FW_PATH > > > > And here is the crash log: > > > > [0.006054] general protection fault: [#1] SMP > > [0.006459] Modules linked in: > > [0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3 > > [0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > > BIOS 0.0.0 02/06/2015 > > [0.007000] task: 81e0f480 task.stack: 81e0 > > [0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440 > > [0.007000] RSP: :81e03d80 EFLAGS: 00010086 > > [0.007000] RAX: 80007d084000 RBX: RCX: > > 77ff8000 > > [0.007000] RDX: 7d084000 RSI: 8000 RDI: > > 00019a00 > > [0.007000] RBP: 81e03dc0 R08: R09: > > 88007d085000 > > [0.007000] R10: 81e03dd8 R11: 7d095063 R12: > > 81e5c6a0 > > [0.007000] R13: 81ed4f40 R14: 0030 R15: > > 0001 > > [0.007000] FS: () GS:88007d40() > > knlGS: > > [0.007000] CS: 0010 DS: ES: CR0: 80050033 > > [0.007000] CR2: 88007d754000 CR3: 0220a000 CR4: > > 000406b0 > > [0.007000] Call Trace: > > [0.007000] switch_mm+0xd/0x20 > > [0.007000] ? switch_mm+0xd/0x20 > > [0.007000] efi_switch_mm+0x3e/0x4a > > [0.007000] efi_call_phys_prolog+0x28/0x1ac > > [0.007000] efi_enter_virtual_mode+0x35a/0x48f > > [0.007000] start_kernel+0x332/0x3b8 > > [0.007000] x86_64_start_reservations+0x2a/0x2c > > [0.007000] x86_64_start_kernel+0x178/0x18b > > [0.007000] secondary_startup_64+0xa5/0xa5 > > [0.007000] ? secondary_startup_64+0xa5/0xa5 > > [0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9 > > 00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09 > > f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87 fb > > 7e 89 > > [0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: 81e03d80 > > [0.007000] ---[ end trace bfa55bf4e4765255 ]--- > > [0.007000] Kernel panic - not syncing: Attempted to kill the idle task! > > [0.007000] ---[ end Kernel panic - not syncing: Attempted to kill > > the idle task! > > > > 4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf > > firmware and 64-bit x86 kernel) with your patches, the primary kernel > > boots fine on Qemu: > > > > ovmf firmware used in this case - edk2.git/ovmf-ia32 > > > > 5. Also, if I append 'efi=old_map' to the bootargs (for the failing > > case in point 3 above), I see the primary kernel boots fine on Qemu as > > well. > > > > Regards, > > Bhupesh > > Hi Bhupesh, > > Thanks a lot for the detailed explanation. They are helpful to reproduce > the issue quickly. From my initial debug, I think that AMD SME + > efi_mm_struct patches + -cpu host (in qemu) are required to reproduce > the issue on qemu. > > I have tried the following combinations (all tests are on qemu): > On Linus's tree: > 1. With SME and efi_mm and -cpu host -> panics > 2. With SME and efi_mm and !-cpu host -> boots > 3. With SME and !efi_mm and -cpu host -> boots > 4. With SME and !efi_mm and !-cpu host -> boots > 5. With !SME and efi_mm and -cpu host -> boots > 6. With !SME and efi_mm and !-cpu host -> boots > 7. With !SME and !efi_mm and -cpu host -> boots > 8. With !SME and
Re: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of manually
> I get a similar crash on Qemu with linus's master branch and the V2 > applied on top of it. Here are the details of my test environment: > > 1. I use the OVMF (EDK2) EFI firmware to launch the kernel: > edk2.git/ovmf-x64 > > 2. I used linus's master branch (HEAD - commit: > b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top > of the same. > > 3. I use the following qemu command line to launch the test: > > # /usr/local/bin/qemu-system-x86_64 --version > QEMU emulator version 2.9.50 (v2.9.0-526-g76d20ea) > Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers > > # /usr/local/bin/qemu-system-x86_64 -enable-kvm -net nic -net tap -m > $MEMSIZE -nographic -drive file=$DISK_IMAGE,if=virtio,format=qcow2 > -vga std -boot c -cpu host -kernel $KERNEL -append > "crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81" -initrd > $INITRAMFS -bios $OVMF_FW_PATH > > And here is the crash log: > > [0.006054] general protection fault: [#1] SMP > [0.006459] Modules linked in: > [0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3 > [0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 0.0.0 02/06/2015 > [0.007000] task: 81e0f480 task.stack: 81e0 > [0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440 > [0.007000] RSP: :81e03d80 EFLAGS: 00010086 > [0.007000] RAX: 80007d084000 RBX: RCX: > 77ff8000 > [0.007000] RDX: 7d084000 RSI: 8000 RDI: > 00019a00 > [0.007000] RBP: 81e03dc0 R08: R09: > 88007d085000 > [0.007000] R10: 81e03dd8 R11: 7d095063 R12: > 81e5c6a0 > [0.007000] R13: 81ed4f40 R14: 0030 R15: > 0001 > [0.007000] FS: () GS:88007d40() > knlGS: > [0.007000] CS: 0010 DS: ES: CR0: 80050033 > [0.007000] CR2: 88007d754000 CR3: 0220a000 CR4: > 000406b0 > [0.007000] Call Trace: > [0.007000] switch_mm+0xd/0x20 > [0.007000] ? switch_mm+0xd/0x20 > [0.007000] efi_switch_mm+0x3e/0x4a > [0.007000] efi_call_phys_prolog+0x28/0x1ac > [0.007000] efi_enter_virtual_mode+0x35a/0x48f > [0.007000] start_kernel+0x332/0x3b8 > [0.007000] x86_64_start_reservations+0x2a/0x2c > [0.007000] x86_64_start_kernel+0x178/0x18b > [0.007000] secondary_startup_64+0xa5/0xa5 > [0.007000] ? secondary_startup_64+0xa5/0xa5 > [0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9 > 00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09 > f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87 fb > 7e 89 > [0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: 81e03d80 > [0.007000] ---[ end trace bfa55bf4e4765255 ]--- > [0.007000] Kernel panic - not syncing: Attempted to kill the idle task! > [0.007000] ---[ end Kernel panic - not syncing: Attempted to kill > the idle task! > > 4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf > firmware and 64-bit x86 kernel) with your patches, the primary kernel > boots fine on Qemu: > > ovmf firmware used in this case - edk2.git/ovmf-ia32 > > 5. Also, if I append 'efi=old_map' to the bootargs (for the failing > case in point 3 above), I see the primary kernel boots fine on Qemu as > well. > > Regards, > Bhupesh Hi Bhupesh, Thanks a lot for the detailed explanation. They are helpful to reproduce the issue quickly. From my initial debug, I think that AMD SME + efi_mm_struct patches + -cpu host (in qemu) are required to reproduce the issue on qemu. I have tried the following combinations (all tests are on qemu): On Linus's tree: 1. With SME and efi_mm and -cpu host -> panics 2. With SME and efi_mm and !-cpu host -> boots 3. With SME and !efi_mm and -cpu host -> boots 4. With SME and !efi_mm and !-cpu host -> boots 5. With !SME and efi_mm and -cpu host -> boots 6. With !SME and efi_mm and !-cpu host -> boots 7. With !SME and !efi_mm and -cpu host -> boots 8. With !SME and !efi_mm and !-cpu host -> boots On Matt's tree (no SME): 1. With efi_mm and -cpu host -> boots 2. With efi_mm and !-cpu host -> boots 3. With !efi_mm and -cpu host -> boots 4. With !efi_mm and !-cpu host -> boots Summary: On Matt's tree (next branch), I am unable to reproduce the issue because they don't have SME patches. On Linus's tree, with SME patches (b1b6f83ac938d176742c85757960dec2cf10e468) and my patches and -cpu host switch enabled in qemu, I was able to reproduce the issue. Could you please confirm if you are seeing the same behavior? Specially on real machines (I think, this is equivalent to -cpu host on qemu) because in earlier mails you have mentioned that you were able to reproduce this on Matt's tree, but according to my theory it shouldn't be the case because Matt's three doesn't have SME patches.
[PATCH V2 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3
From: Sai Praneeth <sai.praneeth.prak...@intel.com> Use helper function (efi_switch_mm()) to switch to/from efi_mm. We switch to efi_mm before calling 1. efi_set_virtual_address_map() and 2. Invoking any efi_runtime_service() Likewise, we need to switch back to previous mm (mm context stolen by efi_mm) after the above calls return successfully. We can use efi_switch_mm() helper function only with x86_64 kernel and "efi=old_map" disabled because, x86_32 and efi=old_map doesn't use efi_pgd, rather they use swapper_pg_dir. Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com> Cc: Lee, Chun-Yi <j...@suse.com> Cc: Borislav Petkov <b...@alien8.de> Cc: Tony Luck <tony.l...@intel.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Michael S. Tsirkin <m...@redhat.com> Cc: Ricardo Neri <ricardo.n...@intel.com> Cc: Matt Fleming <m...@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Ravi Shankar <ravi.v.shan...@intel.com> --- arch/x86/include/asm/efi.h | 29 ++--- arch/x86/platform/efi/efi_64.c | 36 +--- arch/x86/platform/efi/efi_thunk_64.S | 2 +- 3 files changed, 36 insertions(+), 31 deletions(-) diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 2f77bcefe6b4..23b2137a95e5 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -1,10 +1,14 @@ #ifndef _ASM_X86_EFI_H #define _ASM_X86_EFI_H +#include +#include + #include #include #include #include +#include /* * We map the EFI regions needed for runtime services non-contiguously, @@ -57,14 +61,13 @@ extern u64 asmlinkage efi_call(void *fp, ...); #define efi_call_phys(f, args...) efi_call((f), args) /* - * Scratch space used for switching the pagetable in the EFI stub + * struct efi_scratch - Scratch space used while switching to/from efi_mm + * @phys_stack: stack used during EFI Mixed Mode + * @prev_mm:store/restore stolen mm_struct while switching to/from efi_mm */ struct efi_scratch { - u64 r15; - u64 prev_cr3; - pgd_t *efi_pgt; - booluse_pgd; - u64 phys_stack; + u64 phys_stack; + struct mm_struct*prev_mm; } __packed; #define arch_efi_call_virt_setup() \ @@ -73,11 +76,8 @@ struct efi_scratch { preempt_disable(); \ __kernel_fpu_begin(); \ \ - if (efi_scratch.use_pgd) { \ - efi_scratch.prev_cr3 = read_cr3(); \ - write_cr3((unsigned long)efi_scratch.efi_pgt); \ - __flush_tlb_all(); \ - } \ + if (!efi_enabled(EFI_OLD_MEMMAP)) \ + efi_switch_mm(_mm); \ }) #define arch_efi_call_virt(p, f, args...) \ @@ -85,10 +85,8 @@ struct efi_scratch { #define arch_efi_call_virt_teardown() \ ({ \ - if (efi_scratch.use_pgd) { \ - write_cr3(efi_scratch.prev_cr3);\ - __flush_tlb_all(); \ - } \ + if (!efi_enabled(EFI_OLD_MEMMAP)) \ + efi_switch_mm(efi_scratch.prev_mm); \ \ __kernel_fpu_end(); \ preempt_enable(); \ @@ -130,6 +128,7 @@ extern void __init efi_dump_pagetable(void); extern void __init efi_apply_memmap_quirks(void); extern int __init efi_reuse_config(u64 tables, int nr_tables); extern void efi_delete_dummy_variable(void); +extern void efi_switch_mm(struct mm_struct *mm); struct efi_setup_data { u64 fw_vendor; diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 0bb98c35e178..e0545f56d703 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -80,9 +80,8 @@ pgd_t * __init efi_call_phys_prolog(void) int n_pgds, i, j; if (!efi_enabled(EFI_OLD_MEMMAP)) { - save_pgd = (pgd_t *)read_cr3(); - write_cr3((unsigned long)efi_scratch.efi_pgt); - goto out; + efi_switch_mm(_mm); + return NULL;