Re: Why does memblock only refer to E820 table and not EFI Memory Map?

2019-07-23 Thread Sai Praneeth Prakhya


> > On x86 platforms, there are two sources through which kernel learns about
> > physical memory in the system namely E820 table and EFI Memory Map. Each
> > table
> > describes which regions of system memory is usable by kernel and which
> > regions
> > should be preserved (i.e. reserved regions that typically have BIOS
> > code/data)
> > so that no other component in the system could read/write to these
> > regions. I
> > think they are duplicating the information and hence I have couple of
> > questions regarding these
> 
> But isn't it true that in x86 systems the E820 table is populated from the
> EFI memory map?

I don't know that it happens.. :(

> At least in systems with EFI firmware and a Linux which understands
> EFI. If booting from the EFI stub, the stub will take the EFI memory map and
> assemble the E820 table passed as part of the boot params [4]. It also
> considers the case when there are more than 128 entries in the table [5].
> Thus, if booting as an EFI application it will definitely use the EFI memory
> map. If Linux' EFI entry point is not used the bootloader should to the
> same. For instance, grub also reads the EFI memory map to assemble the E820
> memory map [6], [7], [8].

Thanks a lot! for the pointers Ricardo :)
I haven't looked at EFI stub and Grub code and hence didn't knew this was
happening. It does make me feel better that EFI Memory Map is indeed being
used to generate e820 in EFI stub case, so at-least it's getting consumed
indirectly.

> > 1. I see that only E820 table is being consumed by kernel [1] (i.e.
> > memblock
> > subsystem in kernel) to distinguish between "usable" vs "reserved"
> > regions.
> > Assume someone has called memblock_alloc(), the memblock subsystem would
> > service the caller by allocating memory from "usable" regions and it knows
> > this *only* from E820 table [2] (it does not check if EFI Memory Map also
> > says
> > that this region is usable as well). So, why isn't the kernel taking EFI
> > Memory Map into consideration? (I see that it does happen only when
> > "add_efi_memmap" kernel command line arg is passed i.e. passing this
> > argument
> > updates E820 table based on EFI Memory Map) [3]. The problem I see with
> > memblock not taking EFI Memory Map into consideration is that, we are
> > ignoring
> > the main purpose for which EFI Memory Map exists.
> > 
> > 2. Why doesn't the kernel have "add_efi_memmap" by default? From the
> > commit
> > "21eb140e: x86 boot: only pick up additional EFI memmap if
> > add_efi_memmap
> > flag", I didn't understand why the decision was made so. Shouldn't we give
> > more preference to EFI Memory map rather than E820 table as it's the
> > latest
> > and E820 is legacy?
> 
> I did a a quick experiment with and without add_efi_memmmap. the e820
> table looked exactly the same. I guess this shows that what I wrote
> above makes sense ;) . Have you observed difference?

When I did a quick test, I didn't notice any difference (with and without
add_efi_memap) because both e820 and EFI Memory Map were reporting regions in
sync. So, "add_efi_memmap" didn't have to add any new regions into e820. Hence
my last question, what if both the tables (EFI Memory Map and e820) are out of
sync? Shouldn't happen in Grub and EFI stub because they generate e820 from
EFI Memory Map, as pointed by you.

Regards,
Sai



Why does memblock only refer to E820 table and not EFI Memory Map?

2019-07-20 Thread Sai Praneeth Prakhya
Hi All,

Disclaimer:
1. Please note that this discussion is x86 specific
2. Below stated things are my understanding about kernel and I could have
missed somethings, so please let me know if I understood something wrong.
3. I have focused only on memblock here because if I understand correctly,
memblock is the base that feeds other memory management subsystems in kernel
(like the buddy allocator).

On x86 platforms, there are two sources through which kernel learns about
physical memory in the system namely E820 table and EFI Memory Map. Each table
describes which regions of system memory is usable by kernel and which regions
should be preserved (i.e. reserved regions that typically have BIOS code/data)
so that no other component in the system could read/write to these regions. I
think they are duplicating the information and hence I have couple of
questions regarding these

1. I see that only E820 table is being consumed by kernel [1] (i.e. memblock
subsystem in kernel) to distinguish between "usable" vs "reserved" regions.
Assume someone has called memblock_alloc(), the memblock subsystem would
service the caller by allocating memory from "usable" regions and it knows
this *only* from E820 table [2] (it does not check if EFI Memory Map also says
that this region is usable as well). So, why isn't the kernel taking EFI
Memory Map into consideration? (I see that it does happen only when
"add_efi_memmap" kernel command line arg is passed i.e. passing this argument
updates E820 table based on EFI Memory Map) [3]. The problem I see with
memblock not taking EFI Memory Map into consideration is that, we are ignoring
the main purpose for which EFI Memory Map exists.

2. Why doesn't the kernel have "add_efi_memmap" by default? From the commit
"21eb140e: x86 boot: only pick up additional EFI memmap if add_efi_memmap
flag", I didn't understand why the decision was made so. Shouldn't we give
more preference to EFI Memory map rather than E820 table as it's the latest
and E820 is legacy?

3. Why isn't kernel checking that both the tables E820 table and EFI Memory
Map are in sync i.e. is there any *possibility* that a buggy BIOS could report
a region as usable in E820 table and as reserved in EFI Memory Map?

[1] 
https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/setup.c#L1106
[2] 
https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/e820.c#L1265
[3] 
https://elixir.bootlin.com/linux/latest/source/arch/x86/platform/efi/efi.c#L129

Regards,
Sai



[PATCH] x86/efi: Mark can_free_region() as an __init function

2018-12-28 Thread Sai Praneeth Prakhya
can_free_region() is called only once during _boot_ by
efi_reserve_boot_services(). Hence, mark it as __init function.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Ard Biesheuvel 
---
 arch/x86/platform/efi/quirks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 17456a1d3f04..9ce85e605052 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -304,7 +304,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
  * - Not within any part of the kernel
  * - Not the BIOS reserved area (E820_TYPE_RESERVED, E820_TYPE_NVS, etc)
  */
-static bool can_free_region(u64 start, u64 size)
+static __init bool can_free_region(u64 start, u64 size)
 {
if (start + size > __pa_symbol(_text) && start <= __pa_symbol(_end))
return false;
-- 
2.19.1



[PATCH] x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP and EFI_MIXED_MODE

2018-12-21 Thread Sai Praneeth Prakhya
Commit d5052a7130a6 ("x86/efi: Unmap EFI boot services code/data regions
from efi_pgd") forgets to take two EFI modes into consideration namely
EFI_OLD_MEMMAP and EFI_MIXED_MODE.

EFI_OLD_MEMMAP is a legacy way of mapping EFI regions into swapper_pg_dir
using ioremap() and init_memory_mapping(). This feature can be enabled by
passing "efi=old_map" as kernel command line argument. But,
efi_unmap_pages() unmaps EFI boot services code/data regions *only* from
efi_pgd and hence cannot be used for unmapping EFI boot services code/data
regions from swapper_pg_dir.

Introduce a temporary fix to not unmap EFI boot services code/data regions
when EFI_OLD_MEMMAP is enabled while working on a real fix.

EFI_MIXED_MODE is another feature where a 64-bit kernel runs on a
64-bit platform crippled by a 32-bit firmware. To support EFI_MIXED_MODE,
all RAM (i.e. namely EFI regions like EFI_CONVENTIONAL_MEMORY,
EFI_LOADER_, EFI_BOOT_SERVICES_ and
EFI_RUNTIME_CODE/DATA regions) is mapped into efi_pgd all the time to
facilitate EFI runtime calls access it's arguments in 1:1 mode. Hence,
don't unmap EFI boot services code/data regions when booted in mixed mode.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ard Biesheuvel 
---
 arch/x86/platform/efi/quirks.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 09e811b9da26..9c34230aaeae 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -380,6 +380,22 @@ static void __init efi_unmap_pages(efi_memory_desc_t *md)
u64 pa = md->phys_addr;
u64 va = md->virt_addr;
 
+   /*
+* To Do: Remove this check after adding functionality to unmap EFI boot
+* services code/data regions from direct mapping area because
+* "efi=old_map" maps EFI regions in swapper_pg_dir.
+*/
+   if (efi_enabled(EFI_OLD_MEMMAP))
+   return;
+
+   /*
+* EFI mixed mode has all RAM mapped to access arguments while making
+* EFI runtime calls, hence don't unmap EFI boot services code/data
+* regions.
+*/
+   if (!efi_is_native() && IS_ENABLED(CONFIG_EFI_MIXED))
+   return;
+
if (kernel_unmap_pages_in_pgd(pgd, pa, md->num_pages))
pr_err("Failed to unmap 1:1 mapping for 0x%llx\n", pa);
 
-- 
2.19.1



[PATCH V2 3/3] x86/efi: Use efi_memmap_() to create runtime EFI memory map

2018-12-04 Thread Sai Praneeth Prakhya
efi_map_regions() uses realloc_pages() to allocate memory for runtime EFI
memory map (EFI memory map which contains only memory descriptors of type
Runtime Code/Data and Boot Code/Data). Since efi_memmap_alloc() also does
the same, use it instead of realloc_pages() and install the new EFI memory
map using efi_memmap_install() instead of efi_memmap_init_late(). This also
fixes the leaking of existing EFI memory map.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h |  2 +-
 arch/x86/platform/efi/efi.c| 93 +-
 arch/x86/platform/efi/efi_32.c |  2 +-
 arch/x86/platform/efi/efi_64.c |  7 ++-
 4 files changed, 33 insertions(+), 71 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 744f945a00e7..524fda68b03f 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -131,7 +131,7 @@ extern void __init efi_map_region(efi_memory_desc_t *md);
 extern void __init efi_map_region_fixed(efi_memory_desc_t *md);
 extern void efi_sync_low_kernel_mappings(void);
 extern int __init efi_alloc_page_tables(void);
-extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned 
num_pages);
+extern int __init efi_setup_page_tables(void);
 extern void __init old_map_region(efi_memory_desc_t *md);
 extern void __init runtime_code_page_mkexec(void);
 extern void __init efi_runtime_update_mappings(void);
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 63885cc8e34e..1b0a9449096b 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -656,27 +656,6 @@ static void __init get_systab_virt_addr(efi_memory_desc_t 
*md)
}
 }
 
-static void *realloc_pages(void *old_memmap, int old_shift)
-{
-   void *ret;
-
-   ret = (void *)__get_free_pages(GFP_KERNEL, old_shift + 1);
-   if (!ret)
-   goto out;
-
-   /*
-* A first-time allocation doesn't have anything to copy.
-*/
-   if (!old_memmap)
-   return ret;
-
-   memcpy(ret, old_memmap, PAGE_SIZE << old_shift);
-
-out:
-   free_pages((unsigned long)old_memmap, old_shift);
-   return ret;
-}
-
 /*
  * Iterate the EFI memory map in reverse order because the regions
  * will be mapped top-down. The end result is the same as if we had
@@ -782,18 +761,15 @@ static bool should_map_region(efi_memory_desc_t *md)
 }
 
 /*
- * Map the efi memory ranges of the runtime services and update new_mmap with
- * virtual addresses.
+ * Map the efi memory ranges of the runtime services and update memory map with
+ * virtual addresses. Returns number of memory map entries mapped.
  */
-static void * __init efi_map_regions(int *count, int *pg_shift)
+static int __init efi_map_regions(void)
 {
-   void *p, *new_memmap = NULL;
-   unsigned long left = 0;
-   unsigned long desc_size;
+   void *p;
+   int count = 0;
efi_memory_desc_t *md;
 
-   desc_size = efi.memmap.desc_size;
-
p = NULL;
while ((p = efi_map_next_entry(p))) {
md = p;
@@ -803,30 +779,15 @@ static void * __init efi_map_regions(int *count, int 
*pg_shift)
 
efi_map_region(md);
get_systab_virt_addr(md);
-
-   if (left < desc_size) {
-   new_memmap = realloc_pages(new_memmap, *pg_shift);
-   if (!new_memmap)
-   return NULL;
-
-   left += PAGE_SIZE << *pg_shift;
-   (*pg_shift)++;
-   }
-
-   memcpy(new_memmap + (*count * desc_size), md, desc_size);
-
-   left -= desc_size;
-   (*count)++;
+   count++;
}
-
-   return new_memmap;
+   return count;
 }
 
 static void __init kexec_enter_virtual_mode(void)
 {
 #ifdef CONFIG_KEXEC_CORE
efi_memory_desc_t *md;
-   unsigned int num_pages;
 
efi.systab = NULL;
 
@@ -872,10 +833,7 @@ static void __init kexec_enter_virtual_mode(void)
 
BUG_ON(!efi.systab);
 
-   num_pages = ALIGN(efi.memmap.nr_map * efi.memmap.desc_size, PAGE_SIZE);
-   num_pages >>= PAGE_SHIFT;
-
-   if (efi_setup_page_tables(efi.memmap.phys_map, num_pages)) {
+   if (efi_setup_page_tables()) {
clear_bit(EFI_RUNTIME_SERVICES, );
return;
}
@@ -926,10 +884,12 @@ static void __init kexec_enter_virtual_mode(void)
  */
 static void __init __efi_enter_virtual_mode(void)
 {
-   int count = 0, pg_shift = 0;
-   void *new_memmap = NULL;
+   struct efi_memory_map new_memmap;
+   efi_memory_desc_t *md;
+   int count = 0;
efi_status_t status;
unsigned long pa;
+   void *out;
 
efi.systab = NULL;
 
@@ -940,28 +900,25 @@ static void __init __efi_en

[PATCH V2 1/3] efi: Introduce efi_memmap_free() and efi_memmap_unmap_and_free()

2018-12-04 Thread Sai Praneeth Prakhya
Presently, in EFI subsystem of kernel, every time kernel allocates memory
for a new EFI memory map, it forgets to free the memory occupied by old EFI
memory map. Hence, introduce efi_memmap_free() that frees up the memory
occupied by an EFI memory map.

Introduce __efi_memmap_unmap(), so that it could be used to unmap an EFI
memory map and have wrappers around it (namely efi_memmap_unmap() and
efi_memmap_unmap_and_free()) to specifically deal with efi.memmap. There
are two variants of wrappers (unmap and free) because there are use cases
where the kernel just needs to unmap the memory map (see efi_init() in arm
and kexec_enter_virtual_mode()) but not free it.

Apart from introducing the above functions, improve the cases where the
kernel decides to turn off EFI runtime services during boot by unmapping
and freeing the EFI memory map rather than just unmapping the EFI memory
map.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 
---
 arch/x86/platform/efi/efi.c |  4 +-
 arch/x86/platform/efi/quirks.c  |  2 +-
 drivers/firmware/efi/arm-init.c |  2 +-
 drivers/firmware/efi/memmap.c   | 72 +
 include/linux/efi.h |  1 +
 5 files changed, 70 insertions(+), 11 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index e1cb01a22fa8..715601d1c581 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -532,7 +532,7 @@ void __init efi_init(void)
pr_info("No EFI runtime due to 32/64-bit mismatch with 
kernel\n");
else {
if (efi_runtime_disabled() || efi_runtime_init()) {
-   efi_memmap_unmap();
+   efi_memmap_unmap_and_free();
return;
}
}
@@ -833,7 +833,7 @@ static void __init kexec_enter_virtual_mode(void)
 * have been mapped at these virtual addresses.
 */
if (!efi_is_native() || efi_enabled(EFI_OLD_MEMMAP)) {
-   efi_memmap_unmap();
+   efi_memmap_unmap_and_free();
clear_bit(EFI_RUNTIME_SERVICES, );
return;
}
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 09e811b9da26..ce6dcd40dd6c 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -556,7 +556,7 @@ void __init efi_apply_memmap_quirks(void)
 */
if (!efi_runtime_supported()) {
pr_info("Setup done, disabling due to 32/64-bit mismatch\n");
-   efi_memmap_unmap();
+   efi_memmap_unmap_and_free();
}
 
/* UV2+ BIOS has a fix for this issue.  UV1 still needs the quirk. */
diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
index 1a6a77df8a5e..f32ff5c580f6 100644
--- a/drivers/firmware/efi/arm-init.c
+++ b/drivers/firmware/efi/arm-init.c
@@ -253,7 +253,7 @@ void __init efi_init(void)
  efi.memmap.desc_version);
 
if (uefi_init() < 0) {
-   efi_memmap_unmap();
+   efi_memmap_unmap_and_free();
return;
}
 
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 38b686c67b17..4318a69bdbbf 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -49,6 +49,29 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
return __efi_memmap_alloc_early(size);
 }
 
+/**
+ * efi_memmap_free - Free memory pointed by new_memmap.map
+ * @new_memmap: Structure that describes EFI memory map.
+ *
+ * Memory is freed depending on the type of allocation performed.
+ */
+static void __init efi_memmap_free(struct efi_memory_map new_memmap)
+{
+   phys_addr_t start, end;
+   unsigned long size = new_memmap.nr_map * new_memmap.desc_size;
+   unsigned int order = get_order(size);
+
+   start = new_memmap.phys_map;
+   end = start + size;
+   if (new_memmap.late) {
+   __free_pages(pfn_to_page(PHYS_PFN(start)), order);
+   return;
+   }
+
+   if (memblock_free(start, size))
+   pr_err("Failed to free mem from %pa to %pa\n", , );
+}
+
 /**
  * __efi_memmap_init - Common code for mapping the EFI memory map
  * @data: EFI memory map data
@@ -116,21 +139,56 @@ int __init efi_memmap_init_early(struct 
efi_memory_map_data *data)
return __efi_memmap_init(data, false);
 }
 
+/**
+ * __efi_memmap_unmap - Unmap the region pointed by new_memmap.map
+ * @new_memmap: Structure that describes EFI memory map.
+ *
+ * Use to unmap *newly* created EFI memmap and should *not* be used directly to
+ * unmap efi.memmap because "EFI_MEMMAP" flag is not cleared here. Instead, use
+ * efi_memmap_unmap*() variants accordingly. Also, the check for "EFI_MEMMAP"
+ * flag is done in efi_memmap_unma

[PATCH V2 2/3] x86/efi: Fix EFI memory map leaks

2018-12-04 Thread Sai Praneeth Prakhya
Presently, in efi subsystem of kernel, every time kernel allocates memory
for a new EFI memory map, it forgets to free the memory occupied by the
existing EFI memory map. This could be fixed by unmapping and freeing the
existing EFI memory map every time before installing a new EFI memory map.
Hence, modify efi_memmap_install() accordingly since it's the only place
which installs a new EFI memory map.

Presently, efi_memmap_alloc() allocates only physical memory and every
caller of efi_memmap_alloc() should remap the newly allocated memory in
order to use it. This extra step could sometimes lead to buggy error
handling conditions where in the allocated memory isn't freed should remap
fail. So, push the remap logic into efi_memmap_alloc() so that the error
handling could be improved and it also makes the caller look simpler.

With the modified efi_memmap_alloc() and efi_memmap_install() API's, a
typical flow to install a new EFI memory map would look something like
below.

1. Get the number of entries the new EFI memory map should have (typically
   through efi_memmap_split_count()).
2. Allocate memory for the new EFI memory map (efi_memmap_alloc()).
3. Populate memory descriptor entries in the new EFI memory map.
4. Install the new EFI memory map (efi_memmap_install() which also unmaps
   and frees existing memory map).

Existing functions like efi_clean_memmap(), efi_arch_mem_reserve(),
efi_free_boot_services() and efi_fake_memmap() are modified to fix the
above mentioned bugs and also to follow the above recommended usage of
API's.

Note that efi_clean_memmap() could be implemented without allocating any
new memory, but since this is not fast path and hence is not a concern for
performance, readability and maintainability wins. So, change it to use
efi_memmap_alloc() and efi_memmap_install().

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h  |   1 +
 arch/x86/kernel/setup.c |   6 ++
 arch/x86/platform/efi/efi.c |  44 ++--
 arch/x86/platform/efi/quirks.c  |  43 +++-
 drivers/firmware/efi/fake_mem.c |  21 ++
 drivers/firmware/efi/memmap.c   | 118 +++-
 include/linux/efi.h |   7 +-
 7 files changed, 132 insertions(+), 108 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index d1e64ac80b9c..744f945a00e7 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -143,6 +143,7 @@ extern void efi_switch_mm(struct mm_struct *mm);
 extern void efi_recover_from_page_fault(unsigned long phys_addr);
 extern void efi_free_boot_services(void);
 extern void efi_reserve_boot_services(void);
+extern void __init efi_clean_memmap(void);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index b74e7bfed6ab..bed79b238b0d 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1102,6 +1102,12 @@ void __init setup_arch(char **cmdline_p)
reserve_bios_regions();
 
if (efi_enabled(EFI_MEMMAP)) {
+   /*
+* efi_clean_memmap() uses memblock_phys_alloc() to allocate
+* memory for new EFI memmap and hence will work only after
+* e820__memblock_setup()
+*/
+   efi_clean_memmap();
efi_fake_memmap();
efi_find_mirror();
efi_esrt_init();
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 715601d1c581..63885cc8e34e 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -249,30 +249,36 @@ static bool __init efi_memmap_entry_valid(const 
efi_memory_desc_t *md, int i)
return false;
 }
 
-static void __init efi_clean_memmap(void)
+void __init efi_clean_memmap(void)
 {
-   efi_memory_desc_t *out = efi.memmap.map;
-   const efi_memory_desc_t *in = out;
-   const efi_memory_desc_t *end = efi.memmap.map_end;
-   int i, n_removal;
-
-   for (i = n_removal = 0; in < end; i++) {
-   if (efi_memmap_entry_valid(in, i)) {
-   if (out != in)
-   memcpy(out, in, efi.memmap.desc_size);
-   out = (void *)out + efi.memmap.desc_size;
-   } else {
+   void *out;
+   efi_memory_desc_t *md;
+   unsigned int i = 0, n_removal = 0;
+   struct efi_memory_map new_memmap;
+
+   for_each_efi_memory_desc(md) {
+   if (!efi_memmap_entry_valid(md, i))
n_removal++;
-   }
-   in = (void *)in + efi.memmap.desc_size;
}
 
-   if (n_removal > 0) {
-   u64 size = efi.memmap.nr_map - n_removal;
+   if (n_removal == 0)
+   return;
 
-   pr_warn("Removing %d invalid memory map entries.\n&

[PATCH V2 0/3] Fix EFI memory map leaks

2018-12-04 Thread Sai Praneeth Prakhya
Presently, in EFI subsystem of kernel, every time kernel allocates memory for a
new EFI memory map, it forgets to free the memory occupied by old EFI memory 
map.
It does clear the mappings though (using efi_memmap_unmap()), but forgets to
free up the memory. Also, there is another minor issue, where in the newly
allocated memory isn't freed, should remap fail.

The first issue is addressed by adding efi_memmap_free() to efi_memmap_install()
and the second issue is addressed by pushing the remap code into
efi_memmap_alloc() and there by handling the failure condition.

Memory allocated to EFI memory map is leaked in below functions and hence they
are modified to fix the issue. Functions that modify EFI memmap are:
1. efi_clean_memmap(),
2. efi_fake_memmap(),
3. efi_arch_mem_reserve(),
4. efi_free_boot_services(),
5. and __efi_enter_virtual_mode()

More detailed explanation:
--
A typical boot flow on EFI supported x86_64 machines might look something like
below
1. EFI memory map is passed by firmware to kernel.
2. Kernel does a memblock_reserve() on this memory
   (see efi_memblock_x86_reserve_range()).
3. This memory map is checked for invalid entries in efi_clean_memmap(). If any
   invalid entries are found, they are omitted from EFI memory map but the
   memory occupied by these invalid EFI memory descriptors isn't freed.
3. To further process this memory map (see efi_fake_memmap(), efi_bgrt_init()
   and efi_esrt_init()), kernel allocates memory using efi_memmap_alloc() and
   copies the processed memory map to newly allocated memory but it forgets to
   free memory occupied by old EFI memory map.
4. Further, in efi_map_regions() the EFI memory map is processed again to
   include only EFI memory descriptors of type Runtime Code/Data and Boot
   Code/Data. Again, memory is allocated for this new memory map through
   realloc_pages() and the old EFI memory map is not freed.
5. After SetVirtualAddressMap() is done, the EFI memory map is processed again
   to have only EFI memory descriptors of type Runtime Code/Data. Again, memory
   is allocated for this new memory map through efi_memmap_alloc() and the old
   EFI memory map is not freed.

Testing:

Tested with LUV on qemu-x86_64 and on my dev machine. Checked for unchanged boot
behavior i.e. shouldn't break any existing stuff. Built for arm, arm64 and ia64
and found no new warnings/errors. Would appreciate the effort if someone could
test on arm machines.

Although majority of the changes are made to drivers/firmware/efi/memmap.c file
(which is common across architectures), this bug is only limited to x86_64
machines and hence this patch set shouldn't effect any other architectures.

Notes:
--
1. This patch set is based on EFI tree's "next" branch [1].
2. This patch set is an outcome of the discussion at [2].

Changes from V1:

1. Drop passing around allocation type from efi_memmap_alloc(), instead change
   efi_memmap_alloc() such that it now returns a populated struct efi_memory_map
2. Drop fixing issues in efi_fake_memmap(), will be addressed in a separate 
patch.
3. Optimize efi_map_regions().

[1] git git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git
[2] https://lkml.org/lkml/2018/7/2/1095

Sai Praneeth Prakhya (3):
  efi: Introduce efi_memmap_free() and efi_memmap_unmap_and_free()
  x86/efi: Fix EFI memory map leaks
  x86/efi: Use efi_memmap_() to create runtime EFI memory
map

 arch/x86/include/asm/efi.h  |   3 +-
 arch/x86/kernel/setup.c |   6 +
 arch/x86/platform/efi/efi.c | 141 +---
 arch/x86/platform/efi/efi_32.c  |   2 +-
 arch/x86/platform/efi/efi_64.c  |   7 +-
 arch/x86/platform/efi/quirks.c  |  45 ++--
 drivers/firmware/efi/arm-init.c |   2 +-
 drivers/firmware/efi/fake_mem.c |  21 +---
 drivers/firmware/efi/memmap.c   | 190 +---
 include/linux/efi.h |   8 +-
 10 files changed, 235 insertions(+), 190 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 

-- 
2.19.1



[PATCH V3 2/3] x86/efi: Unmap EFI boot services code/data regions from efi_pgd

2018-11-04 Thread Sai Praneeth Prakhya
efi_free_boot_services(), as the name suggests, frees EFI boot services
code/data regions but forgets to unmap these regions from efi_pgd. This
means that any code that's running in efi_pgd address space (e.g:
any EFI runtime service) would still be able to access these regions but
the contents of these regions would have long been over written by
someone else. So, it's important to unmap these regions. Hence,
introduce efi_unmap_pages() to unmap these regions from efi_pgd.

After unmapping EFI boot services code/data regions, any illegal access
by buggy firmware to these regions would result in page fault which will
be handled by EFI specific fault handler.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/platform/efi/quirks.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 669babcaf245..fb1c44b11235 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -370,6 +370,24 @@ void __init efi_reserve_boot_services(void)
}
 }
 
+/*
+ * Apart from having VA mappings for EFI boot services code/data regions,
+ * (duplicate) 1:1 mappings were also created as a quirk for buggy firmware. 
So,
+ * unmap both 1:1 and VA mappings.
+ */
+static void __init efi_unmap_pages(efi_memory_desc_t *md)
+{
+   pgd_t *pgd = efi_mm.pgd;
+   u64 pa = md->phys_addr;
+   u64 va = md->virt_addr;
+
+   if (kernel_unmap_pages_in_pgd(pgd, pa, md->num_pages))
+   pr_err("Failed to unmap 1:1 mapping for 0x%llx\n", pa);
+
+   if (kernel_unmap_pages_in_pgd(pgd, va, md->num_pages))
+   pr_err("Failed to unmap VA mapping for 0x%llx\n", va);
+}
+
 void __init efi_free_boot_services(void)
 {
phys_addr_t new_phys, new_size;
@@ -395,6 +413,13 @@ void __init efi_free_boot_services(void)
}
 
/*
+* Before calling set_virtual_address_map(), EFI boot services
+* code/data regions were mapped as a quirk for buggy firmware.
+* Unmap them from efi_pgd before freeing them up.
+*/
+   efi_unmap_pages(md);
+
+   /*
 * Nasty quirk: if all sub-1MB memory is used for boot
 * services, we can get here without having allocated the
 * real mode trampoline.  It's too late to hand boot services
-- 
2.7.4



[PATCH V3 1/3] x86/mm/pageattr: Introduce helper function to unmap EFI boot services

2018-11-04 Thread Sai Praneeth Prakhya
Ideally, after kernel assumes control of the platform, firmware
shouldn't access EFI boot services code/data regions. But, it's noticed
that this is not so true in many x86 platforms. Hence, during boot,
kernel reserves EFI boot services code/data regions [1] and maps [2]
them to efi_pgd so that call to set_virtual_address_map() doesn't fail.
After returning from set_virtual_address_map(), kernel frees the
reserved regions [3] but they still remain mapped. Hence, introduce
kernel_unmap_pages_in_pgd() which will later be used to unmap EFI boot
services code/data regions.

While at it modify kernel_map_pages_in_pgd() by
1. Adding __init modifier because it's always used *only* during boot.
2. Add a warning if it's used after SMP is initialized because it uses
   __flush_tlb_all() which flushes mappings only on current CPU.

Unmapping EFI boot services code/data regions will result in clearing
PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already
handled by protnone_mask() at arch/x86/include/asm/pgtable-invert.h.

[1] efi_reserve_boot_services()
[2] efi_map_region() -> __map_region() -> kernel_map_pages_in_pgd()
[3] efi_free_boot_services()

Signed-off-by: Sai Praneeth Prakhya 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/pgtable_types.h |  8 ++--
 arch/x86/mm/pageattr.c   | 40 ++--
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index b64acb08a62b..79aa79bb2cfa 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -564,8 +564,12 @@ extern pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned 
long address,
unsigned int *level);
 extern pmd_t *lookup_pmd_address(unsigned long address);
 extern phys_addr_t slow_virt_to_phys(void *__address);
-extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
-  unsigned numpages, unsigned long page_flags);
+extern int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn,
+ unsigned long address,
+ unsigned numpages,
+ unsigned long page_flags);
+extern int __init kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
+   unsigned long numpages);
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_X86_PGTABLE_DEFS_H */
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 51a5a69ecac9..1b1d5a68c4b2 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -2111,8 +2111,8 @@ bool kernel_page_present(struct page *page)
 
 #endif /* CONFIG_DEBUG_PAGEALLOC */
 
-int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
-   unsigned numpages, unsigned long page_flags)
+int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
+  unsigned numpages, unsigned long page_flags)
 {
int retval = -EINVAL;
 
@@ -2126,6 +2126,8 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned 
long address,
.flags = 0,
};
 
+   WARN_ONCE(num_online_cpus() > 1, "Don't call after initializing SMP");
+
if (!(__supported_pte_mask & _PAGE_NX))
goto out;
 
@@ -2148,6 +2150,40 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, 
unsigned long address,
 }
 
 /*
+ * __flush_tlb_all() flushes mappings only on current CPU and hence this
+ * function shouldn't be used in an SMP environment. Presently, it's used only
+ * during boot (way before smp_init()) by EFI subsystem and hence is ok.
+ */
+int __init kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
+unsigned long numpages)
+{
+   int retval;
+
+   /*
+* The typical sequence for unmapping is to find a pte through
+* lookup_address_in_pgd() (ideally, it should never return NULL because
+* the address is already mapped) and change it's protections. As pfn is
+* the *target* of a mapping, it's not useful while unmapping.
+*/
+   struct cpa_data cpa = {
+   .vaddr  = ,
+   .pfn= 0,
+   .pgd= pgd,
+   .numpages   = numpages,
+   .mask_set   = __pgprot(0),
+   .mask_clr   = __pgprot(_PAGE_PRESENT | _PAGE_RW),
+   .flags  = 0,
+   };
+
+   WARN_ONCE(num_online_cpus() > 1, "Don't call after initializing SMP");
+
+   retval = __change_page_attr_set_clr(, 0);
+   __flush_tlb_all();
+
+   return retval;
+}
+
+/*
  * 

[PATCH V3 3/3] x86/efi: Move efi__boot_services() to arch/x86

2018-11-04 Thread Sai Praneeth Prakhya
efi__boot_services() are x86 specific quirks and as such
should be in asm/efi.h, so move them from linux/efi.h. Also, call
efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86
specific call and ideally shouldn't be part of init/main.c

Signed-off-by: Sai Praneeth Prakhya 
Acked-by: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h  | 2 ++
 arch/x86/platform/efi/efi.c | 2 ++
 include/linux/efi.h | 3 ---
 init/main.c | 4 
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index eea40d52ca78..d1e64ac80b9c 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -141,6 +141,8 @@ extern int __init efi_reuse_config(u64 tables, int 
nr_tables);
 extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
 extern void efi_recover_from_page_fault(unsigned long phys_addr);
+extern void efi_free_boot_services(void);
+extern void efi_reserve_boot_services(void);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..93924a353e3b 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -994,6 +994,8 @@ static void __init __efi_enter_virtual_mode(void)
panic("EFI call to SetVirtualAddressMap() failed!");
}
 
+   efi_free_boot_services();
+
/*
 * Now that EFI is in virtual mode, update the function
 * pointers in the runtime service table to the new virtual addresses.
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 845174e113ce..ed2058073385 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1000,13 +1000,11 @@ extern void efi_memmap_walk (efi_freemem_callback_t 
callback, void *arg);
 extern void efi_gettimeofday (struct timespec64 *ts);
 extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, 
if possible */
 #ifdef CONFIG_X86
-extern void efi_free_boot_services(void);
 extern efi_status_t efi_query_variable_store(u32 attributes,
 unsigned long size,
 bool nonblocking);
 extern void efi_find_mirror(void);
 #else
-static inline void efi_free_boot_services(void) {}
 
 static inline efi_status_t efi_query_variable_store(u32 attributes,
unsigned long size,
@@ -1046,7 +1044,6 @@ extern void efi_mem_reserve(phys_addr_t addr, u64 size);
 extern int efi_mem_reserve_persistent(phys_addr_t addr, u64 size);
 extern void efi_initialize_iomem_resources(struct resource *code_resource,
struct resource *data_resource, struct resource *bss_resource);
-extern void efi_reserve_boot_services(void);
 extern int efi_get_fdt_params(struct efi_fdt_params *params);
 extern struct kobject *efi_kobj;
 
diff --git a/init/main.c b/init/main.c
index 18f8f0140fa0..174fb14196cc 100644
--- a/init/main.c
+++ b/init/main.c
@@ -731,10 +731,6 @@ asmlinkage __visible void __init start_kernel(void)
arch_post_acpi_subsys_init();
sfi_init_late();
 
-   if (efi_enabled(EFI_RUNTIME_SERVICES)) {
-   efi_free_boot_services();
-   }
-
/* Do the rest non-__init'ed, we're now alive */
rest_init();
 }
-- 
2.7.4



[PATCH V3 0/3] Unmap EFI boot services code/data regions after boot.

2018-11-04 Thread Sai Praneeth Prakhya
CC'ing x86 folks because this patch set touches x86/mm which I am no expert of.

Ideally, after kernel assumes control of the platform, firmware shouldn't access
EFI boot services code/data regions. But, it's noticed that this is not so true
in many x86 platforms. Hence, during boot, kernel reserves EFI boot services
code/data regions [1] and maps [2] them to efi_pgd so that call to
set_virtual_address_map() doesn't fail. After returning from
set_virtual_address_map(), kernel frees the reserved regions [3] but they still
remain mapped.

This means that any code that's running in efi_pgd address space (e.g: any EFI
runtime service) would still be able to access EFI boot services code/data
regions but the contents of these regions would have long been over written by
someone else as they are freed by efi_free_boot_services(). So, it's important
to unmap these regions. After unmapping EFI boot services code/data regions, any
illegal access by buggy firmware to these regions would result in page fault
which will be handled by efi specific fault handler.

Unmapping EFI boot services code/data regions will result in clearing
PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already handled
by protnone_mask() at arch/x86/include/asm/pgtable-invert.h.

[1] Please see efi_reserve_boot_services()
[2] Please see efi_map_region() -> __map_region() -> kernel_map_pages_in_pgd()
[3] Please see efi_free_boot_services()

Testing the patch set:
--
1. Download buggy firmware (which accesses boot regions even after kernel has
booted) from here [1].
2. Without the patch set, you shouldn't see any kernel warning/error messages
(i.e. kernel allows accesses to EFI boot services code/data regions even after
call to set_virtual_address_map()).
3. With the patch set, you should see a kernel warning about buggy firmware,
efi_rts_wq beeing freezed and disabling runtime services forever.

Please note that this patch will change kernel's existing behavior for some EFI
runtime services but I think it's OK because kernel should have never allowed
those accesses in the first place.

Also please note that this patch set needs lot of real time trashing as I just
tested it out with OVMF.

Note:
-
Patch set based on "next" branch in efi tree.

Changes from V2 -> V3:
--
1. Expliclty set pfn to 0 in kernel_unmap_pages_in_pgd().
2. Add __init modifier to kernel__pages_in_pgd().
3. Warn if kernel__pages_in_pgd() are called after smp_init().
4. Split efi_unmap_pages() into a separate patch.

Changes from V1 -> V2:
--
1. Rewrite the cpa initializer in a more readable fashion.
2. Don't use cpa->pfn while unmapping, as it's not useful.
3. Unmap regions before freeing them up.
4. Fix spelling nits.

Sai Praneeth (3):
  x86/mm/pageattr: Introduce helper function to unmap EFI boot services
  x86/efi: Unmap EFI boot services code/data regions from efi_pgd
  x86/efi: Move efi__boot_services() to arch/x86

 arch/x86/include/asm/efi.h   |  2 ++
 arch/x86/include/asm/pgtable_types.h |  8 ++--
 arch/x86/mm/pageattr.c   | 40 ++--
 arch/x86/platform/efi/efi.c  |  2 ++
 arch/x86/platform/efi/quirks.c   | 25 ++
 include/linux/efi.h  |  3 ---
 init/main.c  |  4 
 7 files changed, 73 insertions(+), 11 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 

-- 
2.7.4



[PATCH V2 0/2] Unmap EFI boot services code/data regions after boot.

2018-10-26 Thread Sai Praneeth Prakhya
CC'ing x86 folks because this patch touches x86/mm which I am no expert of.

[Copied from Patch 1]
Ideally, after kernel assumes control of the platform, firmware shouldn't access
EFI boot services code/data regions. But, it's noticed that this is not so true
in many x86 platforms. Hence, during boot, kernel reserves EFI boot services
code/data regions [1] and maps [2] them to efi_pgd so that call to
set_virtual_address_map() doesn't fail. After returning from
set_virtual_address_map(), kernel frees the reserved regions [3] but they still
remain mapped.

This means that any code that's running in efi_pgd address space (e.g: any EFI
runtime service) would still be able to access EFI boot services code/data
regions but the contents of these regions would have long been over written by
someone else as they are freed by efi_free_boot_services(). So, it's important
to unmap these regions. After unmapping EFI boot services code/data regions, any
illegal access by buggy firmware to these regions would result in page fault
which will be handled by efi specific fault handler.

Unmapping EFI boot services code/data regions will result in clearing
PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already handled
by protnone_mask() at arch/x86/include/asm/pgtable-invert.h.

[1] Please see efi_reserve_boot_services()
[2] Please see efi_map_region() -> __map_region()
[3] Please see efi_free_boot_services()

Testing the patch set:
--
1. Download buggy firmware (which accesses boot regions even after kernel has
booted) from here [1].
2. Without the patch set, you shouldn't see any kernel warning/error messages
(i.e. kernel allows accesses to EFI boot services code/data regions even after
call to set_virtual_address_map()).
3. With the patch set, you should see a kernel warning about buggy firmware,
efi_rts_wq beeing freezed and disabling runtime services forever.

Please note that this patch will change kernel's existing behavior for some EFI
runtime services but I think it's OK because kernel should have never allowed
those accesses in the first place.

Also please note that this patch set needs lot of real time trashing as I just
tested it out with OVMF.

Note:
-
Patch set based on "next" branch in efi tree.

Changes from V1 -> v2:
--
1. Rewrite the cpa initializer in a more readable fashion.
2. Don't use cpa->pfn while unmapping, as it's not useful.
3. Unmap regions before freeing them up.
4. Fix spelling nits.

Sai Praneeth (2):
  x86/efi: Unmap EFI boot services code/data regions from efi_pgd
  x86/efi: Move efi__boot_services() to arch/x86

 arch/x86/include/asm/efi.h   |  2 ++
 arch/x86/include/asm/pgtable_types.h |  2 ++
 arch/x86/mm/pageattr.c   | 26 ++
 arch/x86/platform/efi/efi.c  |  2 ++
 arch/x86/platform/efi/quirks.c   | 25 +
 include/linux/efi.h  |  3 ---
 init/main.c  |  4 
 7 files changed, 57 insertions(+), 7 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 

-- 
2.19.1



[PATCH V2 2/2] x86/efi: Move efi__boot_services() to arch/x86

2018-10-26 Thread Sai Praneeth Prakhya
efi__boot_services() are x86 specific quirks and as such
should be in asm/efi.h, so move them from linux/efi.h. Also, call
efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86
specific call and ideally shouldn't be part of init/main.c

Signed-off-by: Sai Praneeth Prakhya 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h  | 2 ++
 arch/x86/platform/efi/efi.c | 2 ++
 include/linux/efi.h | 3 ---
 init/main.c | 4 
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index eea40d52ca78..d1e64ac80b9c 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -141,6 +141,8 @@ extern int __init efi_reuse_config(u64 tables, int 
nr_tables);
 extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
 extern void efi_recover_from_page_fault(unsigned long phys_addr);
+extern void efi_free_boot_services(void);
+extern void efi_reserve_boot_services(void);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..93924a353e3b 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -994,6 +994,8 @@ static void __init __efi_enter_virtual_mode(void)
panic("EFI call to SetVirtualAddressMap() failed!");
}
 
+   efi_free_boot_services();
+
/*
 * Now that EFI is in virtual mode, update the function
 * pointers in the runtime service table to the new virtual addresses.
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 845174e113ce..ed2058073385 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1000,13 +1000,11 @@ extern void efi_memmap_walk (efi_freemem_callback_t 
callback, void *arg);
 extern void efi_gettimeofday (struct timespec64 *ts);
 extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, 
if possible */
 #ifdef CONFIG_X86
-extern void efi_free_boot_services(void);
 extern efi_status_t efi_query_variable_store(u32 attributes,
 unsigned long size,
 bool nonblocking);
 extern void efi_find_mirror(void);
 #else
-static inline void efi_free_boot_services(void) {}
 
 static inline efi_status_t efi_query_variable_store(u32 attributes,
unsigned long size,
@@ -1046,7 +1044,6 @@ extern void efi_mem_reserve(phys_addr_t addr, u64 size);
 extern int efi_mem_reserve_persistent(phys_addr_t addr, u64 size);
 extern void efi_initialize_iomem_resources(struct resource *code_resource,
struct resource *data_resource, struct resource *bss_resource);
-extern void efi_reserve_boot_services(void);
 extern int efi_get_fdt_params(struct efi_fdt_params *params);
 extern struct kobject *efi_kobj;
 
diff --git a/init/main.c b/init/main.c
index 18f8f0140fa0..174fb14196cc 100644
--- a/init/main.c
+++ b/init/main.c
@@ -731,10 +731,6 @@ asmlinkage __visible void __init start_kernel(void)
arch_post_acpi_subsys_init();
sfi_init_late();
 
-   if (efi_enabled(EFI_RUNTIME_SERVICES)) {
-   efi_free_boot_services();
-   }
-
/* Do the rest non-__init'ed, we're now alive */
rest_init();
 }
-- 
2.19.1



[PATCH V2 1/2] x86/efi: Unmap EFI boot services code/data regions from efi_pgd

2018-10-26 Thread Sai Praneeth Prakhya
Ideally, after kernel assumes control of the platform, firmware shouldn't
access EFI boot services code/data regions. But, it's noticed that this is
not so true in many x86 platforms. Hence, during boot, kernel reserves EFI
boot services code/data regions [1] and maps [2] them to efi_pgd so that
call to set_virtual_address_map() doesn't fail. After returning from
set_virtual_address_map(), kernel frees the reserved regions [3] but they
still remain mapped.

This means that any code that's running in efi_pgd address space (e.g: any
EFI runtime service) would still be able to access EFI boot services
code/data regions but the contents of these regions would have long been
over written by someone else as they are freed by efi_free_boot_services().
So, it's important to unmap these regions. After unmapping EFI boot
services code/data regions, any illegal access by buggy firmware to these
regions would result in page fault which will be handled by efi specific
fault handler.

Unmapping EFI boot services code/data regions will result in clearing
PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already
handled by protnone_mask() at arch/x86/include/asm/pgtable-invert.h.

[1] Please see efi_reserve_boot_services()
[2] Please see efi_map_region() -> __map_region()
[3] Please see efi_free_boot_services()

Signed-off-by: Sai Praneeth Prakhya 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/pgtable_types.h |  2 ++
 arch/x86/mm/pageattr.c   | 26 ++
 arch/x86/platform/efi/quirks.c   | 25 +
 3 files changed, 53 insertions(+)

diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index b64acb08a62b..cda04ecf5432 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -566,6 +566,8 @@ extern pmd_t *lookup_pmd_address(unsigned long address);
 extern phys_addr_t slow_virt_to_phys(void *__address);
 extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
   unsigned numpages, unsigned long page_flags);
+extern int kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
+unsigned long numpages);
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_X86_PGTABLE_DEFS_H */
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 51a5a69ecac9..248f16181bed 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -2147,6 +2147,32 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, 
unsigned long address,
return retval;
 }
 
+int kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
+ unsigned long numpages)
+{
+   int retval;
+
+   /*
+* The typical sequence for unmapping is to find a pte through
+* lookup_address_in_pgd() (ideally, it should never return NULL because
+* the address is already mapped) and change it's protections.
+* As pfn is the *target* of a mapping, it's not useful while unmapping.
+*/
+   struct cpa_data cpa = {
+   .vaddr  = ,
+   .pgd= pgd,
+   .numpages   = numpages,
+   .mask_set   = __pgprot(0),
+   .mask_clr   = __pgprot(_PAGE_PRESENT | _PAGE_RW),
+   .flags  = 0,
+   };
+
+   retval = __change_page_attr_set_clr(, 0);
+   __flush_tlb_all();
+
+   return retval;
+}
+
 /*
  * The testcases use internal knowledge of the implementation that shouldn't
  * be exposed to the rest of the kernel. Include these directly here.
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 669babcaf245..fb1c44b11235 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -370,6 +370,24 @@ void __init efi_reserve_boot_services(void)
}
 }
 
+/*
+ * Apart from having VA mappings for EFI boot services code/data regions,
+ * (duplicate) 1:1 mappings were also created as a quirk for buggy firmware. 
So,
+ * unmap both 1:1 and VA mappings.
+ */
+static void __init efi_unmap_pages(efi_memory_desc_t *md)
+{
+   pgd_t *pgd = efi_mm.pgd;
+   u64 pa = md->phys_addr;
+   u64 va = md->virt_addr;
+
+   if (kernel_unmap_pages_in_pgd(pgd, pa, md->num_pages))
+   pr_err("Failed to unmap 1:1 mapping for 0x%llx\n", pa);
+
+   if (kernel_unmap_pages_in_pgd(pgd, va, md->num_pages))
+   pr_err("Failed to unmap VA mapping for 0x%llx\n", va);
+}
+
 void __init efi_free_boot_services(void)
 {
phys_addr_t new_phys, new_size;
@@ -394,6 +412,13 @@ void __init efi_free_boot_services(void)
continue;
}
 
+   /*
+* Before calling 

[PATCH 2/2] x86/efi: Move efi__boot_services() to arch/x86

2018-10-21 Thread Sai Praneeth Prakhya
efi__boot_services() are x86 specific quirks and as such
should be in asm/efi.h, so move them from linux/efi.h. Also, call
efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86
specific call and ideally shouldn't be part of init/main.c

Signed-off-by: Sai Praneeth Prakhya 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h  | 2 ++
 arch/x86/platform/efi/efi.c | 2 ++
 include/linux/efi.h | 3 ---
 init/main.c | 4 
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index eea40d52ca78..d1e64ac80b9c 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -141,6 +141,8 @@ extern int __init efi_reuse_config(u64 tables, int 
nr_tables);
 extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
 extern void efi_recover_from_page_fault(unsigned long phys_addr);
+extern void efi_free_boot_services(void);
+extern void efi_reserve_boot_services(void);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..93924a353e3b 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -994,6 +994,8 @@ static void __init __efi_enter_virtual_mode(void)
panic("EFI call to SetVirtualAddressMap() failed!");
}
 
+   efi_free_boot_services();
+
/*
 * Now that EFI is in virtual mode, update the function
 * pointers in the runtime service table to the new virtual addresses.
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 845174e113ce..ed2058073385 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1000,13 +1000,11 @@ extern void efi_memmap_walk (efi_freemem_callback_t 
callback, void *arg);
 extern void efi_gettimeofday (struct timespec64 *ts);
 extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, 
if possible */
 #ifdef CONFIG_X86
-extern void efi_free_boot_services(void);
 extern efi_status_t efi_query_variable_store(u32 attributes,
 unsigned long size,
 bool nonblocking);
 extern void efi_find_mirror(void);
 #else
-static inline void efi_free_boot_services(void) {}
 
 static inline efi_status_t efi_query_variable_store(u32 attributes,
unsigned long size,
@@ -1046,7 +1044,6 @@ extern void efi_mem_reserve(phys_addr_t addr, u64 size);
 extern int efi_mem_reserve_persistent(phys_addr_t addr, u64 size);
 extern void efi_initialize_iomem_resources(struct resource *code_resource,
struct resource *data_resource, struct resource *bss_resource);
-extern void efi_reserve_boot_services(void);
 extern int efi_get_fdt_params(struct efi_fdt_params *params);
 extern struct kobject *efi_kobj;
 
diff --git a/init/main.c b/init/main.c
index 18f8f0140fa0..174fb14196cc 100644
--- a/init/main.c
+++ b/init/main.c
@@ -731,10 +731,6 @@ asmlinkage __visible void __init start_kernel(void)
arch_post_acpi_subsys_init();
sfi_init_late();
 
-   if (efi_enabled(EFI_RUNTIME_SERVICES)) {
-   efi_free_boot_services();
-   }
-
/* Do the rest non-__init'ed, we're now alive */
rest_init();
 }
-- 
2.7.4



[PATCH 1/2] x86/efi: Unmap efi boot services code/data regions from efi_pgd

2018-10-21 Thread Sai Praneeth Prakhya
Ideally, after kernel assumes control of the platform firmware shouldn't
access EFI Boot Services Code/Data regions. But, it's noticed that this
is not so true in many x86 platforms. Hence, during boot, kernel
reserves efi boot services code/data regions [1] and maps [2] them to
efi_pgd so that call to set_virtual_address_map() doesn't fail. After
returning from set_virtual_address_map(), kernel frees the reserved
regions [3] but they still remain mapped.

This means that any code that's running in efi_pgd address space (e.g:
any efi runtime service) would still be able to access efi boot services
code/data regions but the contents of these regions would have long been
over written by someone else as they are freed by efi_free_boot_services().
So, it's important to unmap these regions. After unmapping boot services
code/data regions, any illegal access by buggy firmware to these regions
would result in page fault which will be handled by efi specific fault
handler.

[1] Please see efi_reserve_boot_services()
[2] Please see efi_map_region() -> __map_region()
[3] Please see efi_free_boot_services()

Signed-off-by: Sai Praneeth Prakhya 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/pgtable_types.h |  2 ++
 arch/x86/mm/pageattr.c   | 21 +
 arch/x86/platform/efi/quirks.c   | 26 ++
 3 files changed, 49 insertions(+)

diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index b64acb08a62b..796476f11151 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -566,6 +566,8 @@ extern pmd_t *lookup_pmd_address(unsigned long address);
 extern phys_addr_t slow_virt_to_phys(void *__address);
 extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
   unsigned numpages, unsigned long page_flags);
+extern int kernel_unmap_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long 
address,
+unsigned long numpages);
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_X86_PGTABLE_DEFS_H */
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 51a5a69ecac9..b88ed8e91790 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -2147,6 +2147,27 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, 
unsigned long address,
return retval;
 }
 
+int kernel_unmap_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
+ unsigned long numpages)
+{
+   int retval;
+
+   struct cpa_data cpa = {
+   .vaddr = ,
+   .pfn = pfn,
+   .pgd = pgd,
+   .numpages = numpages,
+   .mask_set = __pgprot(0),
+   .mask_clr = __pgprot(_PAGE_PRESENT | _PAGE_RW),
+   .flags = 0,
+   };
+
+   retval = __change_page_attr_set_clr(, 0);
+   __flush_tlb_all();
+
+   return retval;
+}
+
 /*
  * The testcases use internal knowledge of the implementation that shouldn't
  * be exposed to the rest of the kernel. Include these directly here.
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 669babcaf245..5a1ee9392fcf 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -370,6 +370,25 @@ void __init efi_reserve_boot_services(void)
}
 }
 
+/*
+ * Apart from having VA mappings for efi boot services code/data regions,
+ * (duplicate) 1:1 mappings were also created as a catch for buggy firmware. 
So,
+ * unmap both 1:1 and VA mappings.
+ */
+static void __init efi_unmap_pages(efi_memory_desc_t *md)
+{
+   pgd_t *pgd = efi_mm.pgd;
+   u64 pfn = md->phys_addr >> PAGE_SHIFT;
+
+   if (kernel_unmap_pages_in_pgd(pgd, pfn, md->phys_addr, md->num_pages))
+   pr_err("Failed to unmap 1:1 mapping: PA 0x%llx -> VA 0x%llx!\n",
+  md->phys_addr, md->virt_addr);
+
+   if (kernel_unmap_pages_in_pgd(pgd, pfn, md->virt_addr, md->num_pages))
+   pr_err("Failed to unmap VA mapping: PA 0x%llx -> VA 0x%llx!\n",
+  md->phys_addr, md->virt_addr);
+}
+
 void __init efi_free_boot_services(void)
 {
phys_addr_t new_phys, new_size;
@@ -415,6 +434,13 @@ void __init efi_free_boot_services(void)
}
 
free_bootmem_late(start, size);
+
+   /*
+* Before calling set_virtual_address_map(), boot services
+* code/data regions were mapped as a catch for buggy firmware.
+* Unmap them from efi_pgd as they have already been freed.
+*/
+   efi_unmap_pages(md);
}
 
if (!num_entries)
-- 
2.7.4



[PATCH V6 0/2] Add efi page fault handler to recover from page

2018-09-11 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

There may exist some buggy UEFI firmware implementations that access efi
memory regions other than EFI_RUNTIME_SERVICES_ even after
the kernel has assumed control of the platform. This violates UEFI
specification. Hence, provide a efi specific page fault handler which
recovers from page faults caused by buggy firmware.

Page faults triggered by firmware happen at ring 0 and if unhandled,
hangs the kernel. So, provide an efi specific page fault handler to:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

The efi page fault handler will check if the access is by
efi_reset_system().
1. If so, then the efi page fault handler will reboot the machine
   through BIOS and not through efi_reset_system().
2. If not, then the efi page fault handler will freeze efi_rts_wq and
   schedules a new process.

This issue was reported by Al Stone when he saw that reboot via EFI hangs
the machine. Upon debugging, I found that it's efi_reset_system() that's
touching memory regions which it shouldn't. To reproduce the same
behavior, I have hacked OVMF and made efi_reset_system() buggy. Along
with efi_reset_system(), I have also modified get_next_high_mono_count()
and set_virtual_address_map(). They illegally access both boot time and
other efi regions.

Testing the patch set:
--
1. Download buggy firmware from here [1].
2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
Add reboot=efi to the kernel command line arguments and after the kernel
is up and running, type "reboot". The kernel should hang while rebooting.
3. With the same setup, boot kernel after applying patches and the
reboot should work fine. Also please notice warning/error messages
printed by kernel.

Changes from RFC to V1:
---
1. Drop "long jump" technique of dealing with illegal access and instead
   use scheduling away from efi_rts_wq.

Changes from V1 to V2:
--
1. Shortened config name to CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS from
   CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES.
2. Made the config option available only to expert users.
3. efi_free_boot_services() should be called only when
   CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is not enabled. Previously, this
   was part of init/main.c file. As it is an architecture agnostic code,
   moved the change to arch/x86/platform/efi/quirks.c file.

Changes from V2 to V3:
--
1. Drop treating illegal access to EFI_BOOT_SERVICES_ regions
   separately from illegal accesses to other regions like
   EFI_CONVENTIONAL_MEMORY or EFI_LOADER_.
   In previous versions, illegal access to EFI_BOOT_SERVICES_
   regions were handled by mapping requested region to efi_pgd but from
   V3 they are handled similar to illegal access to other regions i.e by
   freezing efi_rts_wq and scheduling new process.
2. Change __efi_init_fixup attribute to __efi_init.

Changes from V3 to V4:
--
1. Drop saving original memory map passed by kernel. It also means less
   checks in efi page fault handler.
2. Change the config name to EFI_PAGE_FAULT_HANDLER to reflect it's
   functionality more appropriately.

Changes from V4 to V5:
--
1. Drop config option that enables efi page fault handler, instead make
   it default.
2. Call schedule() in an infinite loop to account for spurious wake ups.
3. Introduce "NONE" as an efi runtime service function identifier so that
   it could be used in efi_recover_from_page_fault() to check if the page
   fault was indeed triggered by an efi runtime service.

Changes from V5 to V6:
--
1. Thanks to 0-day for reporting build error when CONFIG_EFI is not
   enabled. Fixed it by calling efi page fault handler only when
   CONFIG_EFI is enabled.
2. Change return type of efi page fault handler from int to void. void
   return type should do (and int is not needed) because the efi page
   fault handler returns only upon a failure to handle page fault.

Note:
-
Patch set based on "next" branch in efi tree.

[1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt

Sai Praneeth (2):
  efi: Make efi_rts_work accessible to efi page fault handler
  x86/efi: Add efi page fault handler to recover from page faults caused
by the firmware

 arch/x86/include/asm/efi.h  |  1 +
 arch/x86/mm/fault.c |  9 
 arch/x86/platform/efi/quirks.c  | 78 +
 drivers/firmware/efi/runtime-wrappers.c | 61 +++---
 include/linux/efi.h | 42 ++
 5 files changed, 147 insertions(+), 44 deletions(-)

Tested-by: Bhupesh Sharma 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thom

[PATCH V6 2/2] x86/efi: Add efi page fault handler to recover from page faults caused by the firmware

2018-09-11 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

As per the UEFI specification, after the call to ExitBootServices(),
accesses by the firmware to any memory regions except
EFI_RUNTIME_SERVICES_ regions is considered illegal. A buggy
firmware could trigger these illegal accesses when an efi runtime
service is invoked and if this happens when the kernel is up and
running, the kernel hangs.

Kernel hangs because the memory region requested by the firmware isn't
mapped in efi_pgd, which causes a page fault in ring 0 and the kernel
fails to handle it, leading to die(). To save kernel from hanging, add
an efi specific page fault handler which recovers from such faults by
1. If the efi runtime service is efi_reset_system(), reboot the machine
   through BIOS.
2. If the efi runtime service is _not_ efi_reset_system(), then, freeze
   efi_rts_wq and schedule a new process.

The efi page fault handler offers us two advantages:
1. Recovers from potential hangs that could be caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

Tested-by: Bhupesh Sharma 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h  |  1 +
 arch/x86/mm/fault.c |  9 
 arch/x86/platform/efi/quirks.c  | 78 +
 drivers/firmware/efi/runtime-wrappers.c |  8 
 include/linux/efi.h |  8 +++-
 5 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index cec5fae23eb3..eea40d52ca78 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -140,6 +140,7 @@ extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
 extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
+extern void efi_recover_from_page_fault(unsigned long phys_addr);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2aafa6ab6103..fd636c82d3c1 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -16,6 +16,7 @@
 #include /* prefetchw*/
 #include /* exception_enter(), ...   */
 #include  /* faulthandler_disabled()  */
+#include  /* efi_recover_from_page_fault()*/
 
 #include /* boot_cpu_has, ...*/
 #include  /* dotraplinkage, ...   */
@@ -24,6 +25,7 @@
 #include   /* emulate_vsyscall */
 #include   /* struct vm86  */
 #include/* vma_pkey()   */
+#include/* efi_recover_from_page_fault()*/
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code,
return;
 
/*
+* Buggy firmware could access regions which might page fault, try to
+* recover from such faults.
+*/
+   if (IS_ENABLED(CONFIG_EFI))
+   efi_recover_from_page_fault(address);
+
+   /*
 * Oops. The kernel tried to access some bad page. We'll have to
 * terminate things with extreme prejudice:
 */
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 844d31cb8a0c..669babcaf245 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define EFI_MIN_RESERVE 5120
 
@@ -654,3 +655,80 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, 
void *kbuff,
 }
 
 #endif
+
+/*
+ * If any access by any efi runtime service causes a page fault, then,
+ * 1. If it's efi_reset_system(), reboot through BIOS.
+ * 2. If any other efi runtime service, then
+ *a. Return error status to the efi caller process.
+ *b. Disable EFI Runtime Services forever and
+ *c. Freeze efi_rts_wq and schedule new process.
+ *
+ * @return: Returns, if the page fault is not handled. This function
+ * will never return if the page fault is handled successfully.
+ */
+void efi_recover_from_page_fault(unsigned long phys_addr)
+{
+   if (!IS_ENABLED(CONFIG_X86_64))
+   return;
+
+   /*
+* Make sure that an efi runtime service caused the page fault.
+* "efi_mm" cannot be used to check if the page fault had occurred
+* in the firmware context because efi=old_map doesn't use efi_pgd.
+*/
+   if (efi_rts_work.efi_rts_id == NONE)
+   return;
+
+   /*
+* Address range 0x - 0x0fff is always mapped in the efi_pgd, so
+* page faulting on these addresses isn't expected.
+*/
+   if 

[PATCH V6 1/2] efi: Make efi_rts_work accessible to efi page fault handler

2018-09-11 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

After the kernel has booted, if any accesses by firmware causes a page
fault, the efi page fault handler would freeze efi_rts_wq and schedules
a new process. To do this, the efi page fault handler needs
efi_rts_work. Hence, make it accessible.

There will be no race conditions in accessing this structure, because,
all the calls to efi runtime services are already serialized.

Tested-by: Bhupesh Sharma 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 drivers/firmware/efi/runtime-wrappers.c | 53 ++---
 include/linux/efi.h | 36 ++
 2 files changed, 45 insertions(+), 44 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index aa66cbf23512..b18b2d864c2c 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -45,39 +45,7 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
-/* efi_runtime_service() function identifiers */
-enum efi_rts_ids {
-   GET_TIME,
-   SET_TIME,
-   GET_WAKEUP_TIME,
-   SET_WAKEUP_TIME,
-   GET_VARIABLE,
-   GET_NEXT_VARIABLE,
-   SET_VARIABLE,
-   QUERY_VARIABLE_INFO,
-   GET_NEXT_HIGH_MONO_COUNT,
-   UPDATE_CAPSULE,
-   QUERY_CAPSULE_CAPS,
-};
-
-/*
- * efi_runtime_work:   Details of EFI Runtime Service work
- * @arg<1-5>:  EFI Runtime Service function arguments
- * @status:Status of executing EFI Runtime Service
- * @efi_rts_id:EFI Runtime Service function identifier
- * @efi_rts_comp:  Struct used for handling completions
- */
-struct efi_runtime_work {
-   void *arg1;
-   void *arg2;
-   void *arg3;
-   void *arg4;
-   void *arg5;
-   efi_status_t status;
-   struct work_struct work;
-   enum efi_rts_ids efi_rts_id;
-   struct completion efi_rts_comp;
-};
+struct efi_runtime_work efi_rts_work;
 
 /*
  * efi_queue_work: Queue efi_runtime_service() and wait until it's done
@@ -91,7 +59,6 @@ struct efi_runtime_work {
  */
 #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)
\
 ({ \
-   struct efi_runtime_work efi_rts_work;   \
efi_rts_work.status = EFI_ABORTED;  \
\
init_completion(_rts_work.efi_rts_comp);\
@@ -184,18 +151,16 @@ static DEFINE_SEMAPHORE(efi_runtime_lock);
  */
 static void efi_call_rts(struct work_struct *work)
 {
-   struct efi_runtime_work *efi_rts_work;
void *arg1, *arg2, *arg3, *arg4, *arg5;
efi_status_t status = EFI_NOT_FOUND;
 
-   efi_rts_work = container_of(work, struct efi_runtime_work, work);
-   arg1 = efi_rts_work->arg1;
-   arg2 = efi_rts_work->arg2;
-   arg3 = efi_rts_work->arg3;
-   arg4 = efi_rts_work->arg4;
-   arg5 = efi_rts_work->arg5;
+   arg1 = efi_rts_work.arg1;
+   arg2 = efi_rts_work.arg2;
+   arg3 = efi_rts_work.arg3;
+   arg4 = efi_rts_work.arg4;
+   arg5 = efi_rts_work.arg5;
 
-   switch (efi_rts_work->efi_rts_id) {
+   switch (efi_rts_work.efi_rts_id) {
case GET_TIME:
status = efi_call_virt(get_time, (efi_time_t *)arg1,
   (efi_time_cap_t *)arg2);
@@ -253,8 +218,8 @@ static void efi_call_rts(struct work_struct *work)
 */
pr_err("Requested executing invalid EFI Runtime Service.\n");
}
-   efi_rts_work->status = status;
-   complete(_rts_work->efi_rts_comp);
+   efi_rts_work.status = status;
+   complete(_rts_work.efi_rts_comp);
 }
 
 static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 401e4b254e30..855992b15269 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1659,7 +1659,43 @@ struct linux_efi_tpm_eventlog {
 
 extern int efi_tpm_eventlog_init(void);
 
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   QUERY_VARIABLE_INFO,
+   GET_NEXT_HIGH_MONO_COUNT,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work:   Details of EFI Runtime Service work
+ * @arg<1-5>:  EFI Runtime Service function arguments
+ * @status:Status of executing EFI Runtime Service
+ 

[PATCH V5 2/2] x86/efi: Add efi page fault handler to recover from page faults caused by the firmware

2018-09-10 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

As per the UEFI specification, after the call to ExitBootServices(),
accesses by the firmware to any memory regions except
EFI_RUNTIME_SERVICES_ regions is considered illegal. A buggy
firmware could trigger these illegal accesses when an efi runtime
service is invoked and if this happens when the kernel is up and
running, the kernel hangs.

Kernel hangs because the memory region requested by the firmware isn't
mapped in efi_pgd, which causes a page fault in ring 0 and the kernel
fails to handle it, leading to die(). To save kernel from hanging, add
an efi specific page fault handler which recovers from such faults by
1. If the efi runtime service is efi_reset_system(), reboot the machine
   through BIOS.
2. If the efi runtime service is _not_ efi_reset_system(), then, freeze
   efi_rts_wq and schedule a new process.

The efi page fault handler offers us two advantages:
1. Recovers from potential hangs that could be caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

Tested-by: Bhupesh Sharma 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h  |  1 +
 arch/x86/mm/fault.c |  9 
 arch/x86/platform/efi/quirks.c  | 78 +
 drivers/firmware/efi/runtime-wrappers.c |  8 
 include/linux/efi.h |  8 +++-
 5 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index cec5fae23eb3..c1a655f099ef 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -140,6 +140,7 @@ extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
 extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
+extern int efi_recover_from_page_fault(unsigned long phys_addr);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2aafa6ab6103..cc2a2e3a4095 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -16,6 +16,7 @@
 #include /* prefetchw*/
 #include /* exception_enter(), ...   */
 #include  /* faulthandler_disabled()  */
+#include  /* efi_recover_from_page_fault()*/
 
 #include /* boot_cpu_has, ...*/
 #include  /* dotraplinkage, ...   */
@@ -24,6 +25,7 @@
 #include   /* emulate_vsyscall */
 #include   /* struct vm86  */
 #include/* vma_pkey()   */
+#include/* efi_recover_from_page_fault()*/
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code,
return;
 
/*
+* Buggy firmware could access regions which might page fault, try to
+* recover from such faults.
+*/
+   if (efi_recover_from_page_fault(address))
+   return;
+
+   /*
 * Oops. The kernel tried to access some bad page. We'll have to
 * terminate things with extreme prejudice:
 */
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 844d31cb8a0c..3920ae8cab2a 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define EFI_MIN_RESERVE 5120
 
@@ -654,3 +655,80 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, 
void *kbuff,
 }
 
 #endif
+
+/*
+ * If any access by any efi runtime service causes a page fault, then,
+ * 1. If it's efi_reset_system(), reboot through BIOS.
+ * 2. If any other efi runtime service, then
+ *a. Return error status to the efi caller process.
+ *b. Disable EFI Runtime Services forever and
+ *c. Freeze efi_rts_wq and schedule new process.
+ *
+ * @return: Returns 0, if the page fault is not handled. This function
+ * will never return if the page fault is handled successfully.
+ */
+int efi_recover_from_page_fault(unsigned long phys_addr)
+{
+   if (!IS_ENABLED(CONFIG_X86_64))
+   return 0;
+
+   /*
+* Make sure that an efi runtime service caused the page fault.
+* "efi_mm" cannot be used to check if the page fault had occurred
+* in the firmware context because efi=old_map doesn't use efi_pgd.
+*/
+   if (efi_rts_work.efi_rts_id == NONE)
+   return 0;
+
+   /*
+* Address range 0x - 0x0fff is always mapped in the efi_pgd, so
+* page faulting on these addresses isn't expected.
+*/
+   if (phys_addr

[PATCH V5 0/2] Add efi page fault handler to recover from page

2018-09-10 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

There may exist some buggy UEFI firmware implementations that access efi
memory regions other than EFI_RUNTIME_SERVICES_ even after
the kernel has assumed control of the platform. This violates UEFI
specification. Hence, provide a efi specific page fault handler which
recovers from page faults caused by buggy firmware.

Page faults triggered by firmware happen at ring 0 and if unhandled,
hangs the kernel. So, provide an efi specific page fault handler to:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

The efi page fault handler will check if the access is by
efi_reset_system().
1. If so, then the efi page fault handler will reboot the machine
   through BIOS and not through efi_reset_system().
2. If not, then the efi page fault handler will freeze efi_rts_wq and
   schedules a new process.

This issue was reported by Al Stone when he saw that reboot via EFI hangs
the machine. Upon debugging, I found that it's efi_reset_system() that's
touching memory regions which it shouldn't. To reproduce the same
behavior, I have hacked OVMF and made efi_reset_system() buggy. Along
with efi_reset_system(), I have also modified get_next_high_mono_count()
and set_virtual_address_map(). They illegally access both boot time and
other efi regions.

Testing the patch set:
--
1. Download buggy firmware from here [1].
2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
Add reboot=efi to the kernel command line arguments and after the kernel
is up and running, type "reboot". The kernel should hang while rebooting.
3. With the same setup, boot kernel after applying patches and the
reboot should work fine. Also please notice warning/error messages
printed by kernel.

Changes from RFC to V1:
---
1. Drop "long jump" technique of dealing with illegal access and instead
   use scheduling away from efi_rts_wq.

Changes from V1 to V2:
--
1. Shortened config name to CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS from
   CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES.
2. Made the config option available only to expert users.
3. efi_free_boot_services() should be called only when
   CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is not enabled. Previously, this
   was part of init/main.c file. As it is an architecture agnostic code,
   moved the change to arch/x86/platform/efi/quirks.c file.

Changes from V2 to V3:
--
1. Drop treating illegal access to EFI_BOOT_SERVICES_ regions
   separately from illegal accesses to other regions like
   EFI_CONVENTIONAL_MEMORY or EFI_LOADER_.
   In previous versions, illegal access to EFI_BOOT_SERVICES_
   regions were handled by mapping requested region to efi_pgd but from
   V3 they are handled similar to illegal access to other regions i.e by
   freezing efi_rts_wq and scheduling new process.
2. Change __efi_init_fixup attribute to __efi_init.

Changes from V3 to V4:
--
1. Drop saving original memory map passed by kernel. It also means less
   checks in efi page fault handler.
2. Change the config name to EFI_PAGE_FAULT_HANDLER to reflect it's
   functionality more appropriately.

Changes from V4 to V5:
--
1. Drop config option that enables efi page fault handler, instead make
   it default.
2. Call schedule() in an infinite loop to account for spurious wake ups.
3. Introduce "NONE" as an efi runtime service function identifier so that
   it could be used in efi_recover_from_page_fault() to check if the page
   fault was indeed triggered by an efi runtime service.

Note:
-
Patch set based on "next" branch in efi tree.

[1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt

Sai Praneeth (2):
  efi: Make efi_rts_work accessible to efi page fault handler
  x86/efi: Add efi page fault handler to recover from page faults caused
by the firmware

 arch/x86/include/asm/efi.h  |  1 +
 arch/x86/mm/fault.c |  9 
 arch/x86/platform/efi/quirks.c  | 78 +
 drivers/firmware/efi/runtime-wrappers.c | 61 +++---
 include/linux/efi.h | 42 ++
 5 files changed, 147 insertions(+), 44 deletions(-)

Tested-by: Bhupesh Sharma 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 

-- 
2.7.4



[PATCH V5 1/2] efi: Make efi_rts_work accessible to efi page fault handler

2018-09-10 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

After the kernel has booted, if any accesses by firmware causes a page
fault, the efi page fault handler would freeze efi_rts_wq and schedules
a new process. To do this, the efi page fault handler needs
efi_rts_work. Hence, make it accessible.

There will be no race conditions in accessing this structure, because,
all the calls to efi runtime services are already serialized.

Tested-by: Bhupesh Sharma 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 drivers/firmware/efi/runtime-wrappers.c | 53 ++---
 include/linux/efi.h | 36 ++
 2 files changed, 45 insertions(+), 44 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index aa66cbf23512..b18b2d864c2c 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -45,39 +45,7 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
-/* efi_runtime_service() function identifiers */
-enum efi_rts_ids {
-   GET_TIME,
-   SET_TIME,
-   GET_WAKEUP_TIME,
-   SET_WAKEUP_TIME,
-   GET_VARIABLE,
-   GET_NEXT_VARIABLE,
-   SET_VARIABLE,
-   QUERY_VARIABLE_INFO,
-   GET_NEXT_HIGH_MONO_COUNT,
-   UPDATE_CAPSULE,
-   QUERY_CAPSULE_CAPS,
-};
-
-/*
- * efi_runtime_work:   Details of EFI Runtime Service work
- * @arg<1-5>:  EFI Runtime Service function arguments
- * @status:Status of executing EFI Runtime Service
- * @efi_rts_id:EFI Runtime Service function identifier
- * @efi_rts_comp:  Struct used for handling completions
- */
-struct efi_runtime_work {
-   void *arg1;
-   void *arg2;
-   void *arg3;
-   void *arg4;
-   void *arg5;
-   efi_status_t status;
-   struct work_struct work;
-   enum efi_rts_ids efi_rts_id;
-   struct completion efi_rts_comp;
-};
+struct efi_runtime_work efi_rts_work;
 
 /*
  * efi_queue_work: Queue efi_runtime_service() and wait until it's done
@@ -91,7 +59,6 @@ struct efi_runtime_work {
  */
 #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)
\
 ({ \
-   struct efi_runtime_work efi_rts_work;   \
efi_rts_work.status = EFI_ABORTED;  \
\
init_completion(_rts_work.efi_rts_comp);\
@@ -184,18 +151,16 @@ static DEFINE_SEMAPHORE(efi_runtime_lock);
  */
 static void efi_call_rts(struct work_struct *work)
 {
-   struct efi_runtime_work *efi_rts_work;
void *arg1, *arg2, *arg3, *arg4, *arg5;
efi_status_t status = EFI_NOT_FOUND;
 
-   efi_rts_work = container_of(work, struct efi_runtime_work, work);
-   arg1 = efi_rts_work->arg1;
-   arg2 = efi_rts_work->arg2;
-   arg3 = efi_rts_work->arg3;
-   arg4 = efi_rts_work->arg4;
-   arg5 = efi_rts_work->arg5;
+   arg1 = efi_rts_work.arg1;
+   arg2 = efi_rts_work.arg2;
+   arg3 = efi_rts_work.arg3;
+   arg4 = efi_rts_work.arg4;
+   arg5 = efi_rts_work.arg5;
 
-   switch (efi_rts_work->efi_rts_id) {
+   switch (efi_rts_work.efi_rts_id) {
case GET_TIME:
status = efi_call_virt(get_time, (efi_time_t *)arg1,
   (efi_time_cap_t *)arg2);
@@ -253,8 +218,8 @@ static void efi_call_rts(struct work_struct *work)
 */
pr_err("Requested executing invalid EFI Runtime Service.\n");
}
-   efi_rts_work->status = status;
-   complete(_rts_work->efi_rts_comp);
+   efi_rts_work.status = status;
+   complete(_rts_work.efi_rts_comp);
 }
 
 static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 401e4b254e30..855992b15269 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1659,7 +1659,43 @@ struct linux_efi_tpm_eventlog {
 
 extern int efi_tpm_eventlog_init(void);
 
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   QUERY_VARIABLE_INFO,
+   GET_NEXT_HIGH_MONO_COUNT,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work:   Details of EFI Runtime Service work
+ * @arg<1-5>:  EFI Runtime Service function arguments
+ * @status:Status of executing EFI Runtime Service
+ 

[PATCH V4 2/3] x86/efi: Add efi page fault handler to recover from page faults caused by the firmware

2018-09-06 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

As per the UEFI specification, after the call to ExitBootServices(),
accesses by the firmware to any memory regions except
EFI_RUNTIME_SERVICES_ regions is considered illegal. A buggy
firmware could trigger these illegal accesses when an efi runtime
service is invoked and if this happens when the kernel is up and
running, the kernel hangs.

Kernel hangs because the memory region requested by the firmware isn't
mapped in efi_pgd, which causes a page fault in ring 0 and the kernel
fails to handle it, leading to die(). To save kernel from hanging, add
an efi specific page fault handler which recovers from such faults by
1. If the efi runtime service is efi_reset_system(), reboot the machine
   through BIOS.
2. If the efi runtime service is _not_ efi_reset_system(), then, freeze
   efi_rts_wq and schedule a new process.

The efi page fault handler offers us two advantages:
1. Recovers from potential hangs that could be caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h  |  9 +
 arch/x86/mm/fault.c |  9 +
 arch/x86/platform/efi/quirks.c  | 70 +
 drivers/firmware/efi/runtime-wrappers.c |  7 
 include/linux/efi.h |  1 +
 5 files changed, 96 insertions(+)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index cec5fae23eb3..afb1c80182f2 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -141,6 +141,15 @@ extern int __init efi_reuse_config(u64 tables, int 
nr_tables);
 extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
 
+#ifdef CONFIG_EFI_PAGE_FAULT_HANDLER
+extern int efi_recover_from_page_fault(unsigned long phys_addr);
+#else
+static inline int efi_recover_from_page_fault(unsigned long phys_addr)
+{
+   return 0;
+}
+#endif /* CONFIG_EFI_PAGE_FAULT_HANDLER */
+
 struct efi_setup_data {
u64 fw_vendor;
u64 runtime;
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2aafa6ab6103..cc2a2e3a4095 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -16,6 +16,7 @@
 #include /* prefetchw*/
 #include /* exception_enter(), ...   */
 #include  /* faulthandler_disabled()  */
+#include  /* efi_recover_from_page_fault()*/
 
 #include /* boot_cpu_has, ...*/
 #include  /* dotraplinkage, ...   */
@@ -24,6 +25,7 @@
 #include   /* emulate_vsyscall */
 #include   /* struct vm86  */
 #include/* vma_pkey()   */
+#include/* efi_recover_from_page_fault()*/
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code,
return;
 
/*
+* Buggy firmware could access regions which might page fault, try to
+* recover from such faults.
+*/
+   if (efi_recover_from_page_fault(address))
+   return;
+
+   /*
 * Oops. The kernel tried to access some bad page. We'll have to
 * terminate things with extreme prejudice:
 */
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 844d31cb8a0c..853742aba209 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define EFI_MIN_RESERVE 5120
 
@@ -654,3 +655,72 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, 
void *kbuff,
 }
 
 #endif
+
+#ifdef CONFIG_EFI_PAGE_FAULT_HANDLER
+
+/*
+ * If any access by any efi runtime service causes a page fault, then,
+ * 1. If it's efi_reset_system(), reboot through BIOS.
+ * 2. If any other efi runtime service, then
+ *a. Freeze efi_rts_wq.
+ *b. Return error status to the efi caller process.
+ *c. Disable EFI Runtime Services forever and
+ *d. Schedule another process by explicitly calling scheduler.
+ *
+ * @return: Returns 0, if the page fault is not handled. This function
+ * will never return if the page fault is handled successfully.
+ */
+int efi_recover_from_page_fault(unsigned long phys_addr)
+{
+   /* Recover from page faults caused *only* by the firmware */
+   if (current->active_mm != _mm)
+   return 0;
+
+   /*
+* Address range 0x - 0x0fff is always mapped in the efi_pgd, so
+* page faulting on these addresses isn't expected.
+*/
+   if (phys_addr >= 0x && phys_addr <= 0x0fff)
+  

[PATCH V4 0/3] Add efi page fault handler to recover from page

2018-09-06 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

There may exist some buggy UEFI firmware implementations that access efi
memory regions other than EFI_RUNTIME_SERVICES_ even after
the kernel has assumed control of the platform. This violates UEFI
specification. Hence, provide a debug config option which when enabled
recovers from page faults caused by buggy firmware.

Page faults triggered by firmware happen at ring 0 and if unhandled,
hangs the kernel. So, provide an efi specific page fault handler to:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

The efi page fault handler will check if the access is by
efi_reset_system().
1. If so, then the efi page fault handler will reboot the machine
   through BIOS and not through efi_reset_system().
2. If not, then the efi page fault handler will freeze efi_rts_wq and
   schedules a new process.

This issue was reported by Al Stone when he saw that reboot via EFI hangs
the machine. Upon debugging, I found that it's efi_reset_system() that's
touching memory regions which it shouldn't. To reproduce the same
behavior, I have hacked OVMF and made efi_reset_system() buggy. Along
with efi_reset_system(), I have also modified get_next_high_mono_count()
and set_virtual_address_map(). They illegally access both boot time and
other efi regions.

Testing the patch set:
--
1. Download buggy firmware from here [1].
2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
Add reboot=efi to the kernel command line arguments and after the kernel
is up and running, type "reboot". The kernel should hang while rebooting.
3. With the same setup, boot kernel after applying patches and the
reboot should work fine. Also please notice warning/error messages
printed by kernel.

Changes from RFC to V1:
---
1. Drop "long jump" technique of dealing with illegal access and instead
   use scheduling away from efi_rts_wq.

Changes from V1 to V2:
--
1. Shortened config name to CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS from
   CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES.
2. Made the config option available only to expert users.
3. efi_free_boot_services() should be called only when
   CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is not enabled. Previously, this
   was part of init/main.c file. As it is an architecture agnostic code,
   moved the change to arch/x86/platform/efi/quirks.c file.

Changes from V2 to V3:
--
1. Drop treating illegal access to EFI_BOOT_SERVICES_ regions
   separatley from illegal accesses to other regions like
   EFI_CONVENTIONAL_MEMORY or EFI_LOADER_.
   In previous versions, illegal access to EFI_BOOT_SERVICES_
   regions were handled by mapping requested region to efi_pgd but from
   V3 they are handled similar to illegal access to other regions i.e by
   freezing efi_rts_wq and scheduling new process.
2. Change __efi_init_fixup attribute to __efi_init.

Changes from V3 to V4:
--
1. Drop saving original memory map passed by kernel. It also means less
   checks in efi page fault handler.
2. Change the config name to EFI_PAGE_FAULT_HANDLER to reflect it's
   functionality more appropriatley.

Note:
-
Patch set based on "next" branch in efi tree.

[1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt

Sai Praneeth (3):
  efi: Make efi_rts_work accessible to efi page fault handler
  x86/efi: Add efi page fault handler to recover from page faults caused
by the firmware
  x86/efi: Introduce EFI_PAGE_FAULT_HANDLER

 arch/x86/Kconfig| 18 +
 arch/x86/include/asm/efi.h  |  9 +
 arch/x86/mm/fault.c |  9 +
 arch/x86/platform/efi/quirks.c  | 70 +
 drivers/firmware/efi/runtime-wrappers.c | 60 
 include/linux/efi.h | 37 +
 6 files changed, 159 insertions(+), 44 deletions(-)

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 

-- 
2.7.4



[PATCH V4 1/3] efi: Make efi_rts_work accessible to efi page fault handler

2018-09-06 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

After the kernel has booted, if any accesses by firmware causes a page
fault, the efi page fault handler would freeze efi_rts_wq and schedules
a new process. To do this, the efi page fault handler needs
efi_rts_work. Hence, make it accessible.

There will be no race conditions in accessing this structure, because,
all the calls to efi runtime services are already serialized.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 drivers/firmware/efi/runtime-wrappers.c | 53 ++---
 include/linux/efi.h | 36 ++
 2 files changed, 45 insertions(+), 44 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index aa66cbf23512..b18b2d864c2c 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -45,39 +45,7 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
-/* efi_runtime_service() function identifiers */
-enum efi_rts_ids {
-   GET_TIME,
-   SET_TIME,
-   GET_WAKEUP_TIME,
-   SET_WAKEUP_TIME,
-   GET_VARIABLE,
-   GET_NEXT_VARIABLE,
-   SET_VARIABLE,
-   QUERY_VARIABLE_INFO,
-   GET_NEXT_HIGH_MONO_COUNT,
-   UPDATE_CAPSULE,
-   QUERY_CAPSULE_CAPS,
-};
-
-/*
- * efi_runtime_work:   Details of EFI Runtime Service work
- * @arg<1-5>:  EFI Runtime Service function arguments
- * @status:Status of executing EFI Runtime Service
- * @efi_rts_id:EFI Runtime Service function identifier
- * @efi_rts_comp:  Struct used for handling completions
- */
-struct efi_runtime_work {
-   void *arg1;
-   void *arg2;
-   void *arg3;
-   void *arg4;
-   void *arg5;
-   efi_status_t status;
-   struct work_struct work;
-   enum efi_rts_ids efi_rts_id;
-   struct completion efi_rts_comp;
-};
+struct efi_runtime_work efi_rts_work;
 
 /*
  * efi_queue_work: Queue efi_runtime_service() and wait until it's done
@@ -91,7 +59,6 @@ struct efi_runtime_work {
  */
 #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)
\
 ({ \
-   struct efi_runtime_work efi_rts_work;   \
efi_rts_work.status = EFI_ABORTED;  \
\
init_completion(_rts_work.efi_rts_comp);\
@@ -184,18 +151,16 @@ static DEFINE_SEMAPHORE(efi_runtime_lock);
  */
 static void efi_call_rts(struct work_struct *work)
 {
-   struct efi_runtime_work *efi_rts_work;
void *arg1, *arg2, *arg3, *arg4, *arg5;
efi_status_t status = EFI_NOT_FOUND;
 
-   efi_rts_work = container_of(work, struct efi_runtime_work, work);
-   arg1 = efi_rts_work->arg1;
-   arg2 = efi_rts_work->arg2;
-   arg3 = efi_rts_work->arg3;
-   arg4 = efi_rts_work->arg4;
-   arg5 = efi_rts_work->arg5;
+   arg1 = efi_rts_work.arg1;
+   arg2 = efi_rts_work.arg2;
+   arg3 = efi_rts_work.arg3;
+   arg4 = efi_rts_work.arg4;
+   arg5 = efi_rts_work.arg5;
 
-   switch (efi_rts_work->efi_rts_id) {
+   switch (efi_rts_work.efi_rts_id) {
case GET_TIME:
status = efi_call_virt(get_time, (efi_time_t *)arg1,
   (efi_time_cap_t *)arg2);
@@ -253,8 +218,8 @@ static void efi_call_rts(struct work_struct *work)
 */
pr_err("Requested executing invalid EFI Runtime Service.\n");
}
-   efi_rts_work->status = status;
-   complete(_rts_work->efi_rts_comp);
+   efi_rts_work.status = status;
+   complete(_rts_work.efi_rts_comp);
 }
 
 static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 401e4b254e30..855992b15269 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1659,7 +1659,43 @@ struct linux_efi_tpm_eventlog {
 
 extern int efi_tpm_eventlog_init(void);
 
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   QUERY_VARIABLE_INFO,
+   GET_NEXT_HIGH_MONO_COUNT,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work:   Details of EFI Runtime Service work
+ * @arg<1-5>:  EFI Runtime Service function arguments
+ * @status:Status of executing EFI Runtime Service
+ * @efi_rts_id:  

[PATCH V4 3/3] x86/efi: Introduce EFI_PAGE_FAULT_HANDLER

2018-09-06 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

There may exist some buggy UEFI firmware implementations that might
access efi regions other than EFI_RUNTIME_SERVICES_ even
after the kernel has assumed control of the platform. This violates UEFI
specification.

If selected, this debug option will print a warning message if the UEFI
firmware tries to access any memory region which it shouldn't. Along
with the warning, the efi page fault handler will also try to recover
from the page fault triggered by the firmware so that the machine
doesn't hang.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/Kconfig | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f1dbb4ee19d7..cc840710ae3e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1957,6 +1957,24 @@ config EFI_MIXED
 
   If unsure, say N.
 
+config EFI_PAGE_FAULT_HANDLER
+   bool "EFI page fault handler support" if EXPERT
+   depends on EFI
+   help
+ Enable this debug feature so that the kernel can recover from page
+ faults caused by buggy firmware. Also,
+ 1. If the page fault is caused by efi_reset_system(), then the
+platform is rebooted through BIOS.
+ 2. If the page fault is caused by any other efi runtime service,
+then the kernel freezes efi_rts_wq (work queue that runs efi
+runtime services) and schedules a new process. Also, it disables
+EFI Runtime Services, so that it will never again call buggy
+firmware.
+ Please see the UEFI specification for details on the expectations
+ of memory usage.
+
+ If unsure, say N.
+
 config SECCOMP
def_bool y
prompt "Enable seccomp to safely compute untrusted bytecode"
-- 
2.7.4



[PATCH V3 2/5] efi: Introduce __efi_init attribute

2018-09-04 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Buggy firmware could illegally access some efi regions even after the
kernel has assumed control of the platform. When
"CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is enabled, the efi page fault
handler will detect and recover from these illegal accesses.
efi_md_typeattr_format() and memory_type_name are used by the efi page
fault handler to print information about memory descriptor that was
illegally accessed. As the page fault handler is present during/after
kernel boot it doesn't have an __init attribute, but
efi_md_typeattr_format() has it and thus during kernel build, "WARNING:
modpost: Found * section mismatch(es)" build warning is observed. To fix
it, remove __init attribute for efi_md_typeattr_format().

In order to not keep efi_md_typeattr_format() and memory_type_name
needlessly when "CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is not selected, add
a new __efi_init attribute whose value changes based on whether the
config option is selected or not.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 drivers/firmware/efi/efi.c |  4 ++--
 include/linux/efi.h| 14 +-
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index d8a33a781a57..16571429b19c 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -768,7 +768,7 @@ int __init efi_get_fdt_params(struct efi_fdt_params *params)
 }
 #endif /* CONFIG_EFI_PARAMS_FROM_FDT */
 
-static __initdata char memory_type_name[][20] = {
+static __efi_initdata char memory_type_name[][20] = {
"Reserved",
"Loader Code",
"Loader Data",
@@ -786,7 +786,7 @@ static __initdata char memory_type_name[][20] = {
"Persistent Memory",
 };
 
-char * __init efi_md_typeattr_format(char *buf, size_t size,
+char * __efi_init efi_md_typeattr_format(char *buf, size_t size,
 const efi_memory_desc_t *md)
 {
char *pos;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 855992b15269..6a07e3166fd1 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1107,10 +1107,22 @@ extern int efi_memattr_apply_permissions(struct 
mm_struct *mm,
for_each_efi_memory_desc_in_map(, md)
 
 /*
+ * __efi_init - if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is enabled, remove __init
+ * modifier.
+ */
+#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS
+#define __efi_init
+#define __efi_initdata
+#else
+#define __efi_init __init
+#define __efi_initdata __initdata
+#endif
+
+/*
  * Format an EFI memory descriptor's type and attributes to a user-provided
  * character buffer, as per snprintf(), and return the buffer.
  */
-char * __init efi_md_typeattr_format(char *buf, size_t size,
+char * __efi_init efi_md_typeattr_format(char *buf, size_t size,
 const efi_memory_desc_t *md);
 
 /**
-- 
2.7.4



[PATCH V3 3/5] x86/efi: Permanently save the EFI_MEMORY_MAP passed by the firmware

2018-09-04 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

The efi page fault handler that recovers from page faults caused by the
firmware needs the original memory map passed by the firmware. It looks
up this memory map to find the type of the memory region at which the
page fault occurred. Presently, EFI subsystem discards the original
memory map passed by the firmware and replaces it with a new memory map
that has only EFI_RUNTIME_SERVICES_ regions. But illegal
accesses by firmware can occur at any region. Hence, _only_ if
CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is defined, create a backup of the
original memory map passed by the firmware, so that efi page fault
handler could detect/recover from illegal accesses to *any* efi region.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h |  6 ++
 arch/x86/platform/efi/efi.c|  2 ++
 arch/x86/platform/efi/quirks.c | 48 ++
 3 files changed, 56 insertions(+)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index cec5fae23eb3..788ed4cbce22 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -141,6 +141,12 @@ extern int __init efi_reuse_config(u64 tables, int 
nr_tables);
 extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
 
+#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS
+extern void __init efi_save_original_memmap(void);
+#else
+static inline void __init efi_save_original_memmap(void) { }
+#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */
+
 struct efi_setup_data {
u64 fw_vendor;
u64 runtime;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..7a3ea4cd5939 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -946,6 +946,8 @@ static void __init __efi_enter_virtual_mode(void)
 
pa = __pa(new_memmap);
 
+   efi_save_original_memmap();
+
/*
 * Unregister the early EFI memmap from efi_init() and install
 * the new EFI memory map that we are about to pass to the
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 844d31cb8a0c..36b0b042ba56 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -654,3 +654,51 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, 
void *kbuff,
 }
 
 #endif
+
+#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS
+
+static bool original_memory_map_present;
+static struct efi_memory_map original_memory_map;
+
+/*
+ * The efi page fault handler that recovers from page faults caused by
+ * buggy firmware needs original memory map passed by firmware. Hence,
+ * build a new EFI memmap that has all entries and save it for later use.
+ */
+void __init efi_save_original_memmap(void)
+{
+   efi_memory_desc_t *md;
+   void *remapped_phys, *new_md;
+   phys_addr_t new_phys, new_size;
+
+   new_size = efi.memmap.desc_size * efi.memmap.nr_map;
+   new_phys = efi_memmap_alloc(efi.memmap.nr_map);
+   if (!new_phys) {
+   pr_err("Failed to allocate new EFI memmap\n");
+   return;
+   }
+
+   remapped_phys = memremap(new_phys, new_size, MEMREMAP_WB);
+   if (!remapped_phys) {
+   pr_err("Failed to remap new EFI memmap\n");
+   __free_pages(pfn_to_page(PHYS_PFN(new_phys)), 
get_order(new_size));
+   return;
+   }
+
+   new_md = remapped_phys;
+   for_each_efi_memory_desc(md) {
+   memcpy(new_md, md, efi.memmap.desc_size);
+   new_md += efi.memmap.desc_size;
+   }
+
+   original_memory_map.late = 1;
+   original_memory_map.phys_map = new_phys;
+   original_memory_map.map = remapped_phys;
+   original_memory_map.nr_map = efi.memmap.nr_map;
+   original_memory_map.desc_size = efi.memmap.desc_size;
+   original_memory_map.map_end = remapped_phys + new_size;
+   original_memory_map.desc_version = efi.memmap.desc_version;
+
+   original_memory_map_present = true;
+}
+#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */
-- 
2.7.4



[PATCH V3 1/5] efi: Make efi_rts_work accessible to efi page fault handler

2018-09-04 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

After the kernel has booted, if the firmware accesses *any* efi regions
other than EFI_RUNTIME_SERVICES_, the efi page fault handler
would freeze efi_rts_wq and schedules a new process. To do this, the efi
page fault handler needs efi_rts_work. Hence, make it accessible.

There will be no race conditions in accessing this structure, because,
all the calls to efi runtime services are already serialized.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 drivers/firmware/efi/runtime-wrappers.c | 53 ++---
 include/linux/efi.h | 36 ++
 2 files changed, 45 insertions(+), 44 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index aa66cbf23512..b18b2d864c2c 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -45,39 +45,7 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
-/* efi_runtime_service() function identifiers */
-enum efi_rts_ids {
-   GET_TIME,
-   SET_TIME,
-   GET_WAKEUP_TIME,
-   SET_WAKEUP_TIME,
-   GET_VARIABLE,
-   GET_NEXT_VARIABLE,
-   SET_VARIABLE,
-   QUERY_VARIABLE_INFO,
-   GET_NEXT_HIGH_MONO_COUNT,
-   UPDATE_CAPSULE,
-   QUERY_CAPSULE_CAPS,
-};
-
-/*
- * efi_runtime_work:   Details of EFI Runtime Service work
- * @arg<1-5>:  EFI Runtime Service function arguments
- * @status:Status of executing EFI Runtime Service
- * @efi_rts_id:EFI Runtime Service function identifier
- * @efi_rts_comp:  Struct used for handling completions
- */
-struct efi_runtime_work {
-   void *arg1;
-   void *arg2;
-   void *arg3;
-   void *arg4;
-   void *arg5;
-   efi_status_t status;
-   struct work_struct work;
-   enum efi_rts_ids efi_rts_id;
-   struct completion efi_rts_comp;
-};
+struct efi_runtime_work efi_rts_work;
 
 /*
  * efi_queue_work: Queue efi_runtime_service() and wait until it's done
@@ -91,7 +59,6 @@ struct efi_runtime_work {
  */
 #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)
\
 ({ \
-   struct efi_runtime_work efi_rts_work;   \
efi_rts_work.status = EFI_ABORTED;  \
\
init_completion(_rts_work.efi_rts_comp);\
@@ -184,18 +151,16 @@ static DEFINE_SEMAPHORE(efi_runtime_lock);
  */
 static void efi_call_rts(struct work_struct *work)
 {
-   struct efi_runtime_work *efi_rts_work;
void *arg1, *arg2, *arg3, *arg4, *arg5;
efi_status_t status = EFI_NOT_FOUND;
 
-   efi_rts_work = container_of(work, struct efi_runtime_work, work);
-   arg1 = efi_rts_work->arg1;
-   arg2 = efi_rts_work->arg2;
-   arg3 = efi_rts_work->arg3;
-   arg4 = efi_rts_work->arg4;
-   arg5 = efi_rts_work->arg5;
+   arg1 = efi_rts_work.arg1;
+   arg2 = efi_rts_work.arg2;
+   arg3 = efi_rts_work.arg3;
+   arg4 = efi_rts_work.arg4;
+   arg5 = efi_rts_work.arg5;
 
-   switch (efi_rts_work->efi_rts_id) {
+   switch (efi_rts_work.efi_rts_id) {
case GET_TIME:
status = efi_call_virt(get_time, (efi_time_t *)arg1,
   (efi_time_cap_t *)arg2);
@@ -253,8 +218,8 @@ static void efi_call_rts(struct work_struct *work)
 */
pr_err("Requested executing invalid EFI Runtime Service.\n");
}
-   efi_rts_work->status = status;
-   complete(_rts_work->efi_rts_comp);
+   efi_rts_work.status = status;
+   complete(_rts_work.efi_rts_comp);
 }
 
 static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 401e4b254e30..855992b15269 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1659,7 +1659,43 @@ struct linux_efi_tpm_eventlog {
 
 extern int efi_tpm_eventlog_init(void);
 
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   QUERY_VARIABLE_INFO,
+   GET_NEXT_HIGH_MONO_COUNT,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work:   Details of EFI Runtime Service work
+ * @arg<1-5>:  EFI Runtime Service function arguments
+ * @status:Status of executing EFI Runtime Service
+ 

[PATCH V3 4/5] x86/efi: Add efi page fault handler to recover from the page faults caused by firmware

2018-09-04 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

As per the UEFI specification, after the call to ExitBootServices(),
accesses by the firmware to any memory regions except
EFI_RUNTIME_SERVICES_ regions is considered illegal. A buggy
firmware could trigger these illegal accesses when an efi runtime
service is invoked and if this happens when the kernel is up and
running, the kernel hangs.

Kernel hangs because the memory region requested by the firmware isn't
mapped in efi_pgd, which causes a page fault in ring 0 and the kernel
fails to handle it, leading to die(). To save kernel from hanging, add
an efi specific page fault handler which detects illegal accesses by the
firmware and if the access is to any region other than
EFI_RUNTIME_SERVICES_, then
1. The efi page fault handler freezes efi_rts_wq and schedules a new
   process.
2. If the efi runtime service is efi_reset_system(), then the efi page
   fault handler will reboot the machine through BIOS and not through
   efi_reset_system().

The efi specific page fault handler offers us two advantages:
1. Recovers from potential hangs that could be caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h  |   5 ++
 arch/x86/mm/fault.c |   9 ++
 arch/x86/platform/efi/quirks.c  | 140 
 drivers/firmware/efi/runtime-wrappers.c |   7 ++
 include/linux/efi.h |   1 +
 5 files changed, 162 insertions(+)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 788ed4cbce22..f3d9c3c2359e 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -143,8 +143,13 @@ extern void efi_switch_mm(struct mm_struct *mm);
 
 #ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS
 extern void __init efi_save_original_memmap(void);
+extern int efi_illegal_accesses_fixup(unsigned long phys_addr);
 #else
 static inline void __init efi_save_original_memmap(void) { }
+static inline int efi_illegal_accesses_fixup(unsigned long phys_addr)
+{
+   return 0;
+}
 #endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */
 
 struct efi_setup_data {
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2aafa6ab6103..4f6939d8e13f 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -16,6 +16,7 @@
 #include /* prefetchw*/
 #include /* exception_enter(), ...   */
 #include  /* faulthandler_disabled()  */
+#include  /* fixup for buggy UEFI firmware*/
 
 #include /* boot_cpu_has, ...*/
 #include  /* dotraplinkage, ...   */
@@ -24,6 +25,7 @@
 #include   /* emulate_vsyscall */
 #include   /* struct vm86  */
 #include/* vma_pkey()   */
+#include/* fixup for buggy UEFI firmware*/
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code,
return;
 
/*
+* Buggy firmware could trigger illegal accesses to some EFI regions
+* which might page fault, try to recover from such faults.
+*/
+   if (efi_illegal_accesses_fixup(address))
+   return;
+
+   /*
 * Oops. The kernel tried to access some bad page. We'll have to
 * terminate things with extreme prejudice:
 */
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 36b0b042ba56..2aba28a90800 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define EFI_MIN_RESERVE 5120
 
@@ -701,4 +702,143 @@ void __init efi_save_original_memmap(void)
 
original_memory_map_present = true;
 }
+
+/*
+ * From the original EFI memory map passed by the firmware, return a
+ * pointer to the memory descriptor that describes the given physical
+ * address. If not found, return NULL.
+ */
+static efi_memory_desc_t *efi_get_md(unsigned long phys_addr)
+{
+   efi_memory_desc_t *md;
+
+   for_each_efi_memory_desc_in_map(_memory_map, md) {
+   if (md->phys_addr <= phys_addr &&
+   (phys_addr < (md->phys_addr +
+   (md->num_pages << EFI_PAGE_SHIFT {
+   return md;
+   }
+   }
+   return NULL;
+}
+
+/*
+ * Detect illegal access by the firmware and if the illegally accessed
+ * region is any region described by efi memory map and other than
+ * EFI_RUNTIME_SERVICES_, then
+ * 1. If the efi runtime service is efi_reset_system(), then reboot
+ *th

[PATCH V3 0/5] Add efi page fault handler to detect and recover

2018-09-04 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

There may exist some buggy UEFI firmware implementations that access efi
memory regions other than EFI_RUNTIME_SERVICES_ even after
the kernel has assumed control of the platform. This violates UEFI
specification. Hence, provide a debug config option which when enabled
detects and recovers from page faults caused by buggy firmware.

The above said illegal accesses trigger page fault in ring 0 because
firmware executes at ring 0 and if unhandled it hangs the kernel.
Provide an efi specific page fault handler to:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

Upon detetcing that the illegally accessed region is any region other
than EFI_RUNTIME_SERVICES_, the efi page fault handler will
check if the access is by efi_reset_system().
1. If so, then the efi page fault handler will reboot the machine
   through BIOS and not through efi_reset_system().
2. If not, then the efi page fault handler will freeze efi_rts_wq and
   schedules a new process.

This issue was reported by Al Stone when he saw that reboot via EFI hangs
the machine. Upon debugging, I found that it's efi_reset_system() that's
touching memory regions which it shouldn't. To reproduce the same
behavior, I have hacked OVMF and made efi_reset_system() buggy. Along
with efi_reset_system(), I have also modified get_next_high_mono_count()
and set_virtual_address_map(). They illegally access both boot time and
other efi regions.

Testing the patch set:
--
1. Download buggy firmware from here [1].
2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
Add reboot=efi to the kernel command line arguments and after the kernel
is up and running, type "reboot". The kernel should hang while rebooting.
3. With the same setup, boot kernel after applying patches and the
reboot should work fine. Also please notice warning/error messages
printed by kernel.

Changes from RFC to V1:
---
1. Drop "long jump" technique of dealing with illegal access and instead
   use scheduling away from efi_rts_wq.

Changes from V1 to V2:
--
1. Shortened config name to CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS from
   CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES.
2. Made the config option available only to expert users.
3. efi_free_boot_services() should be called only when
   CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is not enabled. Previously, this
   was part of init/main.c file. As it is an architecture agnostic code,
   moved the change to arch/x86/platform/efi/quirks.c file.

Changes from V2 to V3:
--
1. Drop treating illegal access to EFI_BOOT_SERVICES_ regions
   separatley from illegal accesses to other regions like
   EFI_CONVENTIONAL_MEMORY or EFI_LOADER_.
   In previous versions, illegal access to EFI_BOOT_SERVICES_
   regions were handled by mapping requested region to efi_pgd but from
   V3 they are handled similar to illegal access to other regions i.e by
   freezing efi_rts_wq and scheduling new process.
2. Change __efi_init_fixup attribute to __efi_init.

Note:
-
Patch set based on "next" branch in efi tree.

[1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt

Sai Praneeth (5):
  efi: Make efi_rts_work accessible to efi page fault handler
  efi: Introduce __efi_init attribute
  x86/efi: Permanently save the EFI_MEMORY_MAP passed by the firmware
  x86/efi: Add efi page fault handler to recover from the page faults   
 caused by firmware
  x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESS

 arch/x86/Kconfig|  17 +++
 arch/x86/include/asm/efi.h  |  11 ++
 arch/x86/mm/fault.c |   9 ++
 arch/x86/platform/efi/efi.c |   2 +
 arch/x86/platform/efi/quirks.c  | 188 
 drivers/firmware/efi/efi.c  |   4 +-
 drivers/firmware/efi/runtime-wrappers.c |  60 +++---
 include/linux/efi.h |  51 -
 8 files changed, 295 insertions(+), 47 deletions(-)

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 

-- 
2.7.4



[PATCH V3 5/5] x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESS

2018-09-04 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

There may exist some buggy UEFI firmware implementations that might
access efi regions other than EFI_RUNTIME_SERVICES_ even
after the kernel has assumed control of the platform. This violates UEFI
specification.

If selected, this debug option will print a warning message if the UEFI
firmware tries to access any memory region which it shouldn't. Along
with the warning, the efi page fault handler will also try to recover
from the page fault triggered by the firmware so that the machine
doesn't hang.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/Kconfig | 17 +
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f1dbb4ee19d7..7dc270c17d0b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1957,6 +1957,23 @@ config EFI_MIXED
 
   If unsure, say N.
 
+config EFI_WARN_ON_ILLEGAL_ACCESS
+   bool "Warn about illegal memory accesses by firmware" if EXPERT
+   depends on EFI
+   help
+ Enable this debug feature so that the kernel can detect illegal
+ memory accesses by firmware and issue a warning. Also,
+ 1. If the illegally accessed region is any region other than
+EFI_RUNTIME_SERVICES_, then the kernel freezes
+efi_rts_wq and schedules a new process. Also, it disables EFI
+Runtime Services, so that it will never again call buggy firmware.
+ 2. If the illegal access is by efi_reset_system(), then the
+platform is rebooted through BIOS.
+ Please see the UEFI specification for details on the expectations
+ of memory usage.
+
+ If unsure, say N.
+
 config SECCOMP
def_bool y
prompt "Enable seccomp to safely compute untrusted bytecode"
-- 
2.7.4



[PATCH V2 5/6] x86/mm: If in_atomic(), allocate pages without sleeping

2018-09-02 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

A page fault occurs when any EFI Runtime Service tries to reference a
memory region which it shouldn't. If the illegally accessed region
is EFI_BOOT_SERVICES_, the efi specific page fault handler
fixes it up by dynamically creating VA->PA mappings using
efi_map_region().

Originally, efi_map_region() and hence the functionality of creating
mappings for efi regions was intended to be used *only* during boot time
(please note __init modifier) and hence when called during runtime (i.e.
from efi page fault handler), the page allocators complain. Calling
efi_map_region() during runtime complains because "gfp_allowed_mask"
value changes from boot time to runtime (GFP_BOOT_MASK to
__GFP_BITS_MASK). During boot, even though efi_map_region() calls
alloc__page with GFP_KERNEL, the page allocator doesn't
complain because "__GFP_RECLAIM" flag is cleared by "gfp_allowed_mask",
but during runtime it isn't cleared and hence prints below stack trace.

BUG: sleeping function called from invalid context at mm/page_alloc.c:4320
in_atomic(): 1, irqs_disabled(): 1, pid: 2022, name: fwts
1 lock held by fwts/2022:
irq event stamp: 45714
hardirqs last  enabled at (45713): [] 
restore_regs_and_return_to_kernel+0x0/0x2c
hardirqs last disabled at (45714): [] error_entry+0x7c/0x100
softirqs last  enabled at (44732): [] __do_softirq+0x387/0x49a
softirqs last disabled at (44707): [] irq_exit+0xbb/0xc0
CPU: 0 PID: 2022 Comm: fwts Not tainted 4.17.0-rc4-efitest+ #405
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Call Trace:
dump_stack+0x5e/0x8b
___might_sleep+0x20c/0x240
__alloc_pages_nodemask+0xc2/0x330
get_zeroed_page+0x12/0x40
alloc_pmd_page+0x13/0x50
populate_pmd+0xc0/0x2e0
? __lock_acquire+0x439/0x740
__cpa_process_fault+0x2e1/0x5d0
__change_page_attr_set_clr+0x7c3/0xcd0
? console_unlock+0x34d/0x660
? kernel_map_pages_in_pgd+0x8c/0x160
kernel_map_pages_in_pgd+0x8c/0x160
? printk+0x43/0x4b
? __map_region+0x3c/0x60
__map_region+0x3c/0x60
efi_map_region+0x83/0xd0
efi_illegal_accesses_fixup+0x1ca/0x1e0
no_context+0x112/0x390
__do_page_fault+0xc7/0x4f0
page_fault+0x1e/0x30
RIP: 0010:0xfffeffc7ccf1
RSP: 0018:c975bbf0 EFLAGS: 00010282
RAX: 0048 RBX: c975be10 RCX: c975bad0
RDX: 03f8 RSI: c975be10 RDI: fffeffc7cccf
RBP: c975bdc8 R08: 0048 R09: 0048
R10: 03fd R11: 03f8 R12: 880032a92d80
R13: 0003 R14: 7ffcf1eb9d50 R15: 
? efi_call+0xd1/0x160
? __lock_acquire+0x439/0x740
? _raw_spin_unlock+0x24/0x30
? virt_efi_get_next_high_mono_count+0x77/0xf0
? efi_test_ioctl+0x1ab/0xc20
? selinux_file_ioctl+0x122/0x1c0
? do_vfs_ioctl+0x92/0x6b0
? do_vfs_ioctl+0x92/0x6b0
? security_file_ioctl+0x3c/0x50
? selinux_capable+0x20/0x20
? ksys_ioctl+0x66/0x70
? __x64_sys_ioctl+0x16/0x20
? do_syscall_64+0x50/0x170
? entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fix the above warning by conditionally changing the allocation from
GFP_KERNEL to GFP_ATOMIC, so that efi page fault handler could use
efi_map_region() during runtime. This change shouldn't effect any other
generic page allocations because this allocation is used only by efi
functions [1].

[1] Comment in __cpa_process_fault() at arch/x86/mm/pageattr.c

if (cpa->pgd) {
/*
 * Right now, we only execute this code path when mapping
 * the EFI virtual memory map regions, no other users
 * provide a ->pgd value. This may change in the future.
 */
return populate_pgd(cpa, vaddr);
}

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/mm/pageattr.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 3bded76e8d5c..1b28a333c8ce 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -926,7 +926,13 @@ static void unmap_pud_range(p4d_t *p4d, unsigned long 
start, unsigned long end)
 
 static int alloc_pte_page(pmd_t *pmd)
 {
-   pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+   pte_t *pte;
+
+   if (in_atomic())
+   pte = (pte_t *)get_zeroed_page(GFP_ATOMIC);
+   else
+   pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+
if (!pte)
return -1;
 
@@ -936,7 +942,13 @@ static int alloc_pte_page(pmd_t *pmd)
 
 static int alloc_pmd_page(pud_t *pud)
 {
-   pmd_t *pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+   pmd_t *pmd;
+
+   if (in_atomic())
+   pmd = (pmd_t *)get_zeroed_page(GFP_ATOMIC);
+   else
+   pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+
if (!pmd)
return -1;
 
-- 
2.7.4



[PATCH V2 1/6] efi: Make efi_rts_work accessible to efi page fault handler

2018-09-02 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

If the firmware illegally accesses any efi regions other than
EFI_BOOT_SERVICES_, the efi page fault handler would freeze
efi_rts_wq and schedules a new process. To do this, the efi page fault
handler needs efi_rts_work. Hence, make it accessible.

There will be no race conditions in accessing this structure, because,
all the calls to efi runtime services are already serialized.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 drivers/firmware/efi/runtime-wrappers.c | 53 ++---
 include/linux/efi.h | 36 ++
 2 files changed, 45 insertions(+), 44 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index aa66cbf23512..b18b2d864c2c 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -45,39 +45,7 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
-/* efi_runtime_service() function identifiers */
-enum efi_rts_ids {
-   GET_TIME,
-   SET_TIME,
-   GET_WAKEUP_TIME,
-   SET_WAKEUP_TIME,
-   GET_VARIABLE,
-   GET_NEXT_VARIABLE,
-   SET_VARIABLE,
-   QUERY_VARIABLE_INFO,
-   GET_NEXT_HIGH_MONO_COUNT,
-   UPDATE_CAPSULE,
-   QUERY_CAPSULE_CAPS,
-};
-
-/*
- * efi_runtime_work:   Details of EFI Runtime Service work
- * @arg<1-5>:  EFI Runtime Service function arguments
- * @status:Status of executing EFI Runtime Service
- * @efi_rts_id:EFI Runtime Service function identifier
- * @efi_rts_comp:  Struct used for handling completions
- */
-struct efi_runtime_work {
-   void *arg1;
-   void *arg2;
-   void *arg3;
-   void *arg4;
-   void *arg5;
-   efi_status_t status;
-   struct work_struct work;
-   enum efi_rts_ids efi_rts_id;
-   struct completion efi_rts_comp;
-};
+struct efi_runtime_work efi_rts_work;
 
 /*
  * efi_queue_work: Queue efi_runtime_service() and wait until it's done
@@ -91,7 +59,6 @@ struct efi_runtime_work {
  */
 #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)
\
 ({ \
-   struct efi_runtime_work efi_rts_work;   \
efi_rts_work.status = EFI_ABORTED;  \
\
init_completion(_rts_work.efi_rts_comp);\
@@ -184,18 +151,16 @@ static DEFINE_SEMAPHORE(efi_runtime_lock);
  */
 static void efi_call_rts(struct work_struct *work)
 {
-   struct efi_runtime_work *efi_rts_work;
void *arg1, *arg2, *arg3, *arg4, *arg5;
efi_status_t status = EFI_NOT_FOUND;
 
-   efi_rts_work = container_of(work, struct efi_runtime_work, work);
-   arg1 = efi_rts_work->arg1;
-   arg2 = efi_rts_work->arg2;
-   arg3 = efi_rts_work->arg3;
-   arg4 = efi_rts_work->arg4;
-   arg5 = efi_rts_work->arg5;
+   arg1 = efi_rts_work.arg1;
+   arg2 = efi_rts_work.arg2;
+   arg3 = efi_rts_work.arg3;
+   arg4 = efi_rts_work.arg4;
+   arg5 = efi_rts_work.arg5;
 
-   switch (efi_rts_work->efi_rts_id) {
+   switch (efi_rts_work.efi_rts_id) {
case GET_TIME:
status = efi_call_virt(get_time, (efi_time_t *)arg1,
   (efi_time_cap_t *)arg2);
@@ -253,8 +218,8 @@ static void efi_call_rts(struct work_struct *work)
 */
pr_err("Requested executing invalid EFI Runtime Service.\n");
}
-   efi_rts_work->status = status;
-   complete(_rts_work->efi_rts_comp);
+   efi_rts_work.status = status;
+   complete(_rts_work.efi_rts_comp);
 }
 
 static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 401e4b254e30..855992b15269 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1659,7 +1659,43 @@ struct linux_efi_tpm_eventlog {
 
 extern int efi_tpm_eventlog_init(void);
 
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   QUERY_VARIABLE_INFO,
+   GET_NEXT_HIGH_MONO_COUNT,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work:   Details of EFI Runtime Service work
+ * @arg<1-5>:  EFI Runtime Service function arguments
+ * @status:Status of executing EFI Runtime Service
+ * @efi_rts_id:  

[PATCH V2 3/6] x86/efi: Permanently save the EFI_MEMORY_MAP passed by the firmware

2018-09-02 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

The efi page fault handler that fixes up page faults caused by the
firmware needs the original memory map passed by the firmware. It looks
up this memory map to find the type of the memory region at which the
page fault occurred. Presently, EFI subsystem discards the original
memory map passed by the firmware and replaces it with a new memory map
that has only EFI_RUNTIME_SERVICES_ regions. But illegal
accesses by firmware can occur at any region. Hence, _only_ if
CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is defined, create a backup of the
original memory map passed by the firmware, so that efi page fault
handler could detect/fix illegal accesses to *any* efi region.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h |  6 ++
 arch/x86/platform/efi/efi.c|  2 ++
 arch/x86/platform/efi/quirks.c | 49 ++
 3 files changed, 57 insertions(+)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 9b70743400f3..d9e5d9a6d138 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -142,6 +142,12 @@ extern int __init efi_reuse_config(u64 tables, int 
nr_tables);
 extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
 
+#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS
+extern void __init efi_save_original_memmap(void);
+#else
+static inline void __init efi_save_original_memmap(void) { }
+#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */
+
 struct efi_setup_data {
u64 fw_vendor;
u64 runtime;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 439c2c40bf03..7d18b7ed5d41 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -946,6 +946,8 @@ static void __init __efi_enter_virtual_mode(void)
 
pa = __pa(new_memmap);
 
+   efi_save_original_memmap();
+
/*
 * Unregister the early EFI memmap from efi_init() and install
 * the new EFI memory map that we are about to pass to the
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 844d31cb8a0c..7fd53fa8c4dd 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -654,3 +654,52 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, 
void *kbuff,
 }
 
 #endif
+
+#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS
+
+static bool original_memory_map_present;
+static struct efi_memory_map original_memory_map;
+
+/*
+ * The page fault handler that fixes up page faults caused by buggy
+ * firmware needs original memory map (memory map passed by firmware).
+ * Hence, build a new EFI memmap that has *all* entries and save it for
+ * later use.
+ */
+void __init efi_save_original_memmap(void)
+{
+   efi_memory_desc_t *md;
+   void *remapped_phys, *new_md;
+   phys_addr_t new_phys, new_size;
+
+   new_size = efi.memmap.desc_size * efi.memmap.nr_map;
+   new_phys = efi_memmap_alloc(efi.memmap.nr_map);
+   if (!new_phys) {
+   pr_err("Failed to allocate new EFI memmap\n");
+   return;
+   }
+
+   remapped_phys = memremap(new_phys, new_size, MEMREMAP_WB);
+   if (!remapped_phys) {
+   pr_err("Failed to remap new EFI memmap\n");
+   __free_pages(pfn_to_page(PHYS_PFN(new_phys)), 
get_order(new_size));
+   return;
+   }
+
+   new_md = remapped_phys;
+   for_each_efi_memory_desc(md) {
+   memcpy(new_md, md, efi.memmap.desc_size);
+   new_md += efi.memmap.desc_size;
+   }
+
+   original_memory_map.late = 1;
+   original_memory_map.phys_map = new_phys;
+   original_memory_map.map = remapped_phys;
+   original_memory_map.nr_map = efi.memmap.nr_map;
+   original_memory_map.desc_size = efi.memmap.desc_size;
+   original_memory_map.map_end = remapped_phys + new_size;
+   original_memory_map.desc_version = efi.memmap.desc_version;
+
+   original_memory_map_present = true;
+}
+#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */
-- 
2.7.4



[PATCH V2 4/6] x86/efi: Add efi page fault handler to fixup/recover from page faults caused by firmware

2018-09-02 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

EFI regions could briefly be divided into 3 types.
1. EFI_BOOT_SERVICES_ regions
2. EFI_RUNTIME_SERVICES_ regions
3. Other EFI regions like EFI_LOADER_ etc.

As per the UEFI specification, after the call to ExitBootServices(),
accesses by the firmware to any memory region except
EFI_RUNTIME_SERVICES_ regions is considered illegal. A buggy
firmware could trigger these illegal accesses during boot time or at
runtime (i.e. when the kernel is up and running). Presently, the kernel
can fix up illegal accesses to EFI_BOOT_SERVICES_ regions
*only* during kernel boot phase. If the firmware triggers illegal
accesses to *any* other EFI regions during kernel boot, the kernel
panics or if this happens during kernel runtime then the kernel hangs.

Kernel panics/hangs because the memory region requested by the firmware
isn't mapped, which causes a page fault in ring 0 and the kernel fails
to handle it, leading to die(). To save kernel from hanging, add an efi
specific page fault handler which detects illegal accesses by the
firmware and
1. If the illegally accessed region is EFI_BOOT_SERVICES_,
   the efi page fault handler fixes it up by mapping the requested
   region.
2. If any other region (Eg: EFI_CONVENTIONAL_MEMORY or
   EFI_LOADER_), then the efi page fault handler freezes
   efi_rts_wq and schedules a new process.
3. If the access is to any other efi region like above but if the efi
   runtime service is efi_reset_system(), then the efi page fault
   handler will reboot the machine through BIOS.

Illegal accesses to EFI_BOOT_SERVICES_ and to other regions
are dealt differently in efi page fault handler because, *generally*
EFI_BOOT_SERVICES_ regions are smaller in size relative to
other efi regions and hence could be reserved and can be dynamically
mapped. But other EFI regions like EFI_CONVENTIONAL_MEMORY and
EFI_LOADER_ cannot be reserved as they are very huge in size
and reserving them will make the kernel un-bootable.

The efi specific page fault handler offers us two advantages:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

Finally, this new mapping will not impact a reboot from kexec, as kexec
is only concerned about runtime memory regions.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h  |   7 ++
 arch/x86/mm/fault.c |   9 ++
 arch/x86/platform/efi/quirks.c  | 152 
 drivers/firmware/efi/runtime-wrappers.c |   7 ++
 include/linux/efi.h |   1 +
 5 files changed, 176 insertions(+)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index d9e5d9a6d138..68a28606909c 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -144,8 +144,15 @@ extern void efi_switch_mm(struct mm_struct *mm);
 
 #ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS
 extern void __init efi_save_original_memmap(void);
+extern int efi_illegal_accesses_fixup(unsigned long phys_addr,
+ struct pt_regs *regs);
 #else
 static inline void __init efi_save_original_memmap(void) { }
+static inline int efi_illegal_accesses_fixup(unsigned long phys_addr,
+struct pt_regs *regs)
+{
+   return 0;
+}
 #endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS */
 
 struct efi_setup_data {
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2aafa6ab6103..afd42e76058e 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -16,6 +16,7 @@
 #include /* prefetchw*/
 #include /* exception_enter(), ...   */
 #include  /* faulthandler_disabled()  */
+#include  /* fixup for buggy UEFI firmware*/
 
 #include /* boot_cpu_has, ...*/
 #include  /* dotraplinkage, ...   */
@@ -24,6 +25,7 @@
 #include   /* emulate_vsyscall */
 #include   /* struct vm86  */
 #include/* vma_pkey()   */
+#include/* fixup for buggy UEFI firmware*/
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code,
return;
 
/*
+* Buggy firmware could trigger illegal accesses to some EFI regions
+* which might page fault, try to fixup or recover from such faults.
+*/
+   if (efi_illegal_accesses_fixup(address, regs))
+   return;
+
+   /*
 * Oops. The kernel tried to access some bad page. We'll have to
 * terminate things with extreme prejudice:
 */
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86

[PATCH V2 2/6] x86/efi: Remove __init attribute from memory mapping functions

2018-09-02 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Buggy firmware could illegally access EFI_BOOT_SERVICES_CODE/DATA
regions even after the kernel has assumed control of the platform. When
"CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is enabled, the efi page fault
handler will detect/fixup these illegal accesses. The below modified
functions are used by the page fault handler to fixup illegal accesses
to EFI_BOOT_SERVICES_CODE/DATA regions. As the page fault handler is
present during/after kernel boot it doesn't have an __init attribute,
but the below functions have it and thus during kernel build, "WARNING:
modpost: Found * section mismatch(es)" build warning is observed. To fix
it, remove __init attribute for all these functions.

In order to not keep these functions needlessly when
"CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS" is not selected, add a new
__efi_init_fixup attribute whose value changes based on whether the
config option is selected or not.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h | 11 ++-
 arch/x86/platform/efi/efi.c|  4 ++--
 arch/x86/platform/efi/efi_32.c |  2 +-
 arch/x86/platform/efi/efi_64.c |  9 +
 drivers/firmware/efi/efi.c |  6 +++---
 include/linux/efi.h| 16 ++--
 6 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index cec5fae23eb3..9b70743400f3 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -103,8 +103,9 @@ struct efi_scratch {
preempt_enable();   \
 })
 
-extern void __iomem *__init efi_ioremap(unsigned long addr, unsigned long size,
-   u32 type, u64 attribute);
+extern void __iomem *__efi_init_fixup efi_ioremap(unsigned long addr,
+ unsigned long size, u32 type,
+ u64 attribute);
 
 #ifdef CONFIG_KASAN
 /*
@@ -126,13 +127,13 @@ extern int __init efi_memblock_x86_reserve_range(void);
 extern pgd_t * __init efi_call_phys_prolog(void);
 extern void __init efi_call_phys_epilog(pgd_t *save_pgd);
 extern void __init efi_print_memmap(void);
-extern void __init efi_memory_uc(u64 addr, unsigned long size);
-extern void __init efi_map_region(efi_memory_desc_t *md);
+extern void __efi_init_fixup efi_memory_uc(u64 addr, unsigned long size);
+extern void __efi_init_fixup efi_map_region(efi_memory_desc_t *md);
 extern void __init efi_map_region_fixed(efi_memory_desc_t *md);
 extern void efi_sync_low_kernel_mappings(void);
 extern int __init efi_alloc_page_tables(void);
 extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned 
num_pages);
-extern void __init old_map_region(efi_memory_desc_t *md);
+extern void __efi_init_fixup old_map_region(efi_memory_desc_t *md);
 extern void __init runtime_code_page_mkexec(void);
 extern void __init efi_runtime_update_mappings(void);
 extern void __init efi_dump_pagetable(void);
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..439c2c40bf03 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -572,7 +572,7 @@ void __init runtime_code_page_mkexec(void)
}
 }
 
-void __init efi_memory_uc(u64 addr, unsigned long size)
+void __efi_init_fixup efi_memory_uc(u64 addr, unsigned long size)
 {
unsigned long page_shift = 1UL << EFI_PAGE_SHIFT;
u64 npages;
@@ -582,7 +582,7 @@ void __init efi_memory_uc(u64 addr, unsigned long size)
set_memory_uc(addr, npages);
 }
 
-void __init old_map_region(efi_memory_desc_t *md)
+void __efi_init_fixup old_map_region(efi_memory_desc_t *md)
 {
u64 start_pfn, end_pfn, end;
unsigned long size;
diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
index 324b93328b37..8f31452bd204 100644
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -58,7 +58,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
return 0;
 }
 
-void __init efi_map_region(efi_memory_desc_t *md)
+void __efi_init_fixup efi_map_region(efi_memory_desc_t *md)
 {
old_map_region(md);
 }
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 448267f1c073..a04298312fdd 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -408,7 +408,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
return 0;
 }
 
-static void __init __map_region(efi_memory_desc_t *md, u64 va)
+static void __efi_init_fixup __map_region(efi_memory_desc_t *md, u64 va)
 {
unsigned long flags = _PAGE_RW;
unsigned long pfn;
@@

[PATCH V2 6/6] x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESS

2018-09-02 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

There may exist some buggy UEFI firmware implementations that might
access efi regions other than EFI_RUNTIME_SERVICES_ even
after the kernel has assumed control of the platform. This violates UEFI
specification.

If selected, this debug option will print a warning message if the UEFI
firmware tries to access any memory region which it shouldn't. Along
with the warning, the efi page fault handler will also try to
fixup/recover from the page fault triggered by the firmware so that the
machine doesn't hang.

To support this feature, two changes should be made to the existing efi
subsystem
1. Map EFI_BOOT_SERVICES_ regions only when
   EFI_WARN_ON_ILLEGAL_ACCESS is disabled
Presently, the kernel maps EFI_BOOT_SERVICES_ regions as
a workaround for buggy firmware that accesses them even when they
shouldn't. With EFI_WARN_ON_ILLEGAL_ACCESS enabled (and hence efi
page fault handler) kernel can now detect and handle such accesses
dynamically. Hence, rather than safely mapping
EFI_BOOT_SERVICES_ regions *all* the time, map them on
demand.

2. If EFI_WARN_ON_ILLEGAL_ACCESS is enabled don't call
   efi_free_boot_services()
Presently, during early boot phase EFI_BOOT_SERVICES_
regions are marked as reserved by kernel
(see efi_reserve_boot_services()) and are freed before entering
runtime (see efi_free_boot_services()). But, while dynamically
fixing page faults caused by the firmware, efi page fault handler
assumes that EFI_BOOT_SERVICES_ regions are still intact.
Hence, to make this assumption true, don't call
efi_free_boot_services() if EFI_WARN_ON_ILLEGAL_ACCESS is enabled.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/Kconfig   | 21 +
 arch/x86/platform/efi/efi.c|  4 
 arch/x86/platform/efi/quirks.c |  3 +++
 3 files changed, 28 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f1dbb4ee19d7..0fb1309d510d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1957,6 +1957,27 @@ config EFI_MIXED
 
   If unsure, say N.
 
+config EFI_WARN_ON_ILLEGAL_ACCESS
+   bool "Warn about illegal memory accesses by firmware" if EXPERT
+   depends on EFI
+   help
+ Enable this debug feature so that the kernel can detect illegal
+ memory accesses by firmware and issue a warning. Also,
+ 1. If the illegally accessed region is EFI_BOOT_SERVICES_,
+the kernel fixes it up by mapping the requested region.
+ 2. If the illegally accessed region is any other region (Eg:
+EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), then the
+kernel freezes efi_rts_wq and schedules a new process. Also, it
+disables EFI Runtime Services, so that it will never again call
+buggy firmware.
+ 3. If the access is to any other efi region like above but if the
+buggy efi runtime service is efi_reset_system(), then the
+platform is rebooted through BIOS.
+ Please see the UEFI specification for details on the expectations
+ of memory usage.
+
+ If unsure, say N.
+
 config SECCOMP
def_bool y
prompt "Enable seccomp to safely compute untrusted bytecode"
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 7d18b7ed5d41..77fbcb798f4e 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -768,9 +768,13 @@ static bool should_map_region(efi_memory_desc_t *md)
/*
 * Map boot services regions as a workaround for buggy
 * firmware that accesses them even when they shouldn't.
+* (only if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is disabled)
 *
 * See efi_{reserve,free}_boot_services().
 */
+   if (IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS))
+   return false;
+
if (md->type == EFI_BOOT_SERVICES_CODE ||
md->type == EFI_BOOT_SERVICES_DATA)
return true;
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index e38e823382ba..60cb7a8d5371 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -377,6 +377,9 @@ void __init efi_free_boot_services(void)
int num_entries = 0;
void *new, *new_md;
 
+   if (IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS))
+   return;
+
for_each_efi_memory_desc(md) {
unsigned long long start = md->phys_addr;
unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
-- 
2.7.4



[PATCH V1 4/6] x86/efi: Add efi page fault handler to fixup/recover from page faults caused by firmware

2018-08-08 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

EFI regions could briefly be divided into 3 types.
1. EFI_BOOT_SERVICES_ regions
2. EFI_RUNTIME_SERVICES_ regions
3. Other EFI regions like EFI_LOADER_ etc.

As per the UEFI specification, after the call to ExitBootServices(),
accesses by the firmware to any memory region except
EFI_RUNTIME_SERVICES_ regions is considered illegal. A buggy
firmware could trigger these illegal accesses during boot time or at
runtime (i.e. when the kernel is up and running). Presently, the kernel
can fix up illegal accesses to EFI_BOOT_SERVICES_ regions
*only* during kernel boot phase. If the firmware triggers illegal
accesses to *any* other EFI regions during kernel boot, the kernel
panics or if this happens during kernel runtime then the kernel hangs.

Kernel panics/hangs because the memory region requested by the firmware
isn't mapped, which causes a page fault in ring 0 and the kernel fails
to handle it, leading to die(). To save kernel from hanging, add an efi
specific page fault handler which detects illegal accesses by the
firmware and
1. If the illegally accessed region is EFI_BOOT_SERVICES_,
   the efi page fault handler fixes it up by mapping the requested
   region.
2. If any other region (Eg: EFI_CONVENTIONAL_MEMORY or
   EFI_LOADER_), then the efi page fault handler freezes
   efi_rts_wq and schedules a new process.
3. If the access is to any other efi region like above but if the efi
   runtime service is efi_reset_system(), then the efi page fault
   handler will reboot the machine through BIOS.

Illegal accesses to EFI_BOOT_SERVICES_ and to other regions
are dealt differently in efi page fault handler because, *generally*
EFI_BOOT_SERVICES_ regions are smaller in size relative to
other efi regions and hence could be reserved and can be dynamically
mapped. But other EFI regions like EFI_CONVENTIONAL_MEMORY and
EFI_LOADER_ cannot be reserved as they are very huge in size
and reserving them will make the kernel un-bootable.

The efi specific page fault handler offers us two advantages:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

Finally, this new mapping will not impact a reboot from kexec, as kexec
is only concerned about runtime memory regions.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h  |   7 ++
 arch/x86/mm/fault.c |   9 ++
 arch/x86/platform/efi/quirks.c  | 152 
 drivers/firmware/efi/runtime-wrappers.c |   7 ++
 include/linux/efi.h |   1 +
 5 files changed, 176 insertions(+)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index c97f2e955cab..4942fa04d74b 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -144,8 +144,15 @@ extern void efi_switch_mm(struct mm_struct *mm);
 
 #ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES
 extern void __init efi_save_original_memmap(void);
+extern int efi_illegal_accesses_fixup(unsigned long phys_addr,
+ struct pt_regs *regs);
 #else
 static inline void __init efi_save_original_memmap(void) { }
+static inline int efi_illegal_accesses_fixup(unsigned long phys_addr,
+struct pt_regs *regs)
+{
+   return 0;
+}
 #endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */
 
 struct efi_setup_data {
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2aafa6ab6103..afd42e76058e 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -16,6 +16,7 @@
 #include /* prefetchw*/
 #include /* exception_enter(), ...   */
 #include  /* faulthandler_disabled()  */
+#include  /* fixup for buggy UEFI firmware*/
 
 #include /* boot_cpu_has, ...*/
 #include  /* dotraplinkage, ...   */
@@ -24,6 +25,7 @@
 #include   /* emulate_vsyscall */
 #include   /* struct vm86  */
 #include/* vma_pkey()   */
+#include/* fixup for buggy UEFI firmware*/
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -790,6 +792,13 @@ no_context(struct pt_regs *regs, unsigned long error_code,
return;
 
/*
+* Buggy firmware could trigger illegal accesses to some EFI regions
+* which might page fault, try to fixup or recover from such faults.
+*/
+   if (efi_illegal_accesses_fixup(address, regs))
+   return;
+
+   /*
 * Oops. The kernel tried to access some bad page. We'll have to
 * terminate things with extreme prejudice:
 */
diff --git a/arch/x86/platform/efi/quirks.c b/arch

[PATCH V1 3/6] x86/efi: Permanently save the EFI_MEMORY_MAP passed by the firmware

2018-08-08 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

The efi page fault handler that fixes up page faults caused by the
firmware needs the original memory map passed by the firmware. It looks
up this memory map to find the type of the memory region at which the
page fault occurred. Presently, EFI subsystem discards the original
memory map passed by the firmware and replaces it with a new memory map
that has only EFI_RUNTIME_SERVICES_ regions. But illegal
accesses by firmware can occur at any region. Hence, _only_ if
CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES is defined, create a backup of the
original memory map passed by the firmware, so that efi page fault
handler could detect/fix illegal accesses to *any* efi region.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h |  6 ++
 arch/x86/platform/efi/efi.c|  2 ++
 arch/x86/platform/efi/quirks.c | 49 ++
 3 files changed, 57 insertions(+)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 9b70743400f3..c97f2e955cab 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -142,6 +142,12 @@ extern int __init efi_reuse_config(u64 tables, int 
nr_tables);
 extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
 
+#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES
+extern void __init efi_save_original_memmap(void);
+#else
+static inline void __init efi_save_original_memmap(void) { }
+#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */
+
 struct efi_setup_data {
u64 fw_vendor;
u64 runtime;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 439c2c40bf03..7d18b7ed5d41 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -946,6 +946,8 @@ static void __init __efi_enter_virtual_mode(void)
 
pa = __pa(new_memmap);
 
+   efi_save_original_memmap();
+
/*
 * Unregister the early EFI memmap from efi_init() and install
 * the new EFI memory map that we are about to pass to the
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 844d31cb8a0c..84b213a1460a 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -654,3 +654,52 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, 
void *kbuff,
 }
 
 #endif
+
+#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES
+
+static bool original_memory_map_present;
+static struct efi_memory_map original_memory_map;
+
+/*
+ * The page fault handler that fixes up page faults caused by buggy
+ * firmware needs original memory map (memory map passed by firmware).
+ * Hence, build a new EFI memmap that has *all* entries and save it for
+ * later use.
+ */
+void __init efi_save_original_memmap(void)
+{
+   efi_memory_desc_t *md;
+   void *remapped_phys, *new_md;
+   phys_addr_t new_phys, new_size;
+
+   new_size = efi.memmap.desc_size * efi.memmap.nr_map;
+   new_phys = efi_memmap_alloc(efi.memmap.nr_map);
+   if (!new_phys) {
+   pr_err("Failed to allocate new EFI memmap\n");
+   return;
+   }
+
+   remapped_phys = memremap(new_phys, new_size, MEMREMAP_WB);
+   if (!remapped_phys) {
+   pr_err("Failed to remap new EFI memmap\n");
+   __free_pages(pfn_to_page(PHYS_PFN(new_phys)), 
get_order(new_size));
+   return;
+   }
+
+   new_md = remapped_phys;
+   for_each_efi_memory_desc(md) {
+   memcpy(new_md, md, efi.memmap.desc_size);
+   new_md += efi.memmap.desc_size;
+   }
+
+   original_memory_map.late = 1;
+   original_memory_map.phys_map = new_phys;
+   original_memory_map.map = remapped_phys;
+   original_memory_map.nr_map = efi.memmap.nr_map;
+   original_memory_map.desc_size = efi.memmap.desc_size;
+   original_memory_map.map_end = remapped_phys + new_size;
+   original_memory_map.desc_version = efi.memmap.desc_version;
+
+   original_memory_map_present = true;
+}
+#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V1 6/6] x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES

2018-08-08 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

There may exist some buggy UEFI firmware implementations that might
access efi regions other than EFI_RUNTIME_SERVICES_ even
after the kernel has assumed control of the platform. This violates UEFI
specification.

If selected, this debug option will print a warning message if the UEFI
firmware tries to access any memory region which it shouldn't. Along
with the warning, the efi page fault handler will also try to
fixup/recover from the page fault triggered by the firmware so that the
machine doesn't hang.

To support this feature, two changes should be made to the existing efi
subsystem
1. Map EFI_BOOT_SERVICES_ regions only when
   EFI_WARN_ON_ILLEGAL_ACCESSES is disabled
Presently, the kernel maps EFI_BOOT_SERVICES_ regions as
a workaround for buggy firmware that accesses them even when they
shouldn't. With EFI_WARN_ON_ILLEGAL_ACCESSES enabled (and hence efi
page fault handler) kernel can now detect and handle such accesses
dynamically. Hence, rather than safely mapping
EFI_BOOT_SERVICES_ regions *all* the time, map them on
demand.

2. If EFI_WARN_ON_ILLEGAL_ACCESSES is enabled don't call
   efi_free_boot_services()
Presently, during early boot phase EFI_BOOT_SERVICES_
regions are marked as reserved by kernel
(see efi_reserve_boot_services()) and are freed before entering
runtime (see efi_free_boot_services()). But, while dynamically
fixing page faults caused by the firmware, efi page fault handler
assumes that EFI_BOOT_SERVICES_ regions are still intact.
Hence, to make this assumption true, don't call
efi_free_boot_services() if EFI_WARN_ON_ILLEGAL_ACCESSES is enabled.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/Kconfig| 21 +
 arch/x86/platform/efi/efi.c |  4 
 init/main.c |  3 ++-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f1dbb4ee19d7..278e5820e8dd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1957,6 +1957,27 @@ config EFI_MIXED
 
   If unsure, say N.
 
+config EFI_WARN_ON_ILLEGAL_ACCESSES
+   bool "Warn about illegal memory accesses by firmware"
+   depends on EFI
+   help
+ Enable this debug feature so that the kernel can detect illegal
+ memory accesses by firmware and issue a warning. Also,
+ 1. If the illegally accessed region is EFI_BOOT_SERVICES_,
+the kernel fixes it up by mapping the requested region.
+ 2. If the illegally accessed region is any other region (Eg:
+EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), then the
+kernel freezes efi_rts_wq and schedules a new process. Also, it
+disables EFI Runtime Services, so that it will never again call
+buggy firmware.
+ 3. If the access is to any other efi region like above but if the
+buggy efi runtime service is efi_reset_system(), then the
+platform is rebooted through BIOS.
+ Please see the UEFI specification for details on the expectations
+ of memory usage.
+
+ If unsure, say N.
+
 config SECCOMP
def_bool y
prompt "Enable seccomp to safely compute untrusted bytecode"
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 7d18b7ed5d41..0ddb22a03d88 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -768,9 +768,13 @@ static bool should_map_region(efi_memory_desc_t *md)
/*
 * Map boot services regions as a workaround for buggy
 * firmware that accesses them even when they shouldn't.
+* (only if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES is disabled)
 *
 * See efi_{reserve,free}_boot_services().
 */
+   if (IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES))
+   return false;
+
if (md->type == EFI_BOOT_SERVICES_CODE ||
md->type == EFI_BOOT_SERVICES_DATA)
return true;
diff --git a/init/main.c b/init/main.c
index 3b4ada11ed52..dce0520861a1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -730,7 +730,8 @@ asmlinkage __visible void __init start_kernel(void)
arch_post_acpi_subsys_init();
sfi_init_late();
 
-   if (efi_enabled(EFI_RUNTIME_SERVICES)) {
+   if (efi_enabled(EFI_RUNTIME_SERVICES) &&
+   !IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES)) {
efi_free_boot_services();
}
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V1 5/6] x86/mm: If in_atomic(), allocate pages without sleeping

2018-08-08 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

A page fault occurs when any EFI Runtime Service tries to reference a
memory region which it shouldn't. If the illegally accessed region
is EFI_BOOT_SERVICES_, the efi specific page fault handler
fixes it up by dynamically creating VA->PA mappings using
efi_map_region().

Originally, efi_map_region() and hence the functionality of creating
mappings for efi regions was intended to be used *only* during boot time
(please note __init modifier) and hence when called during runtime (i.e.
from efi page fault handler), the page allocators complain. Calling
efi_map_region() during runtime complains because "gfp_allowed_mask"
value changes from boot time to runtime (GFP_BOOT_MASK to
__GFP_BITS_MASK). During boot, even though efi_map_region() calls
alloc__page with GFP_KERNEL, the page allocator doesn't
complain because "__GFP_RECLAIM" flag is cleared by "gfp_allowed_mask",
but during runtime it isn't cleared and hence prints below stack trace.

BUG: sleeping function called from invalid context at mm/page_alloc.c:4320
in_atomic(): 1, irqs_disabled(): 1, pid: 2022, name: fwts
1 lock held by fwts/2022:
irq event stamp: 45714
hardirqs last  enabled at (45713): [] 
restore_regs_and_return_to_kernel+0x0/0x2c
hardirqs last disabled at (45714): [] error_entry+0x7c/0x100
softirqs last  enabled at (44732): [] __do_softirq+0x387/0x49a
softirqs last disabled at (44707): [] irq_exit+0xbb/0xc0
CPU: 0 PID: 2022 Comm: fwts Not tainted 4.17.0-rc4-efitest+ #405
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Call Trace:
dump_stack+0x5e/0x8b
___might_sleep+0x20c/0x240
__alloc_pages_nodemask+0xc2/0x330
get_zeroed_page+0x12/0x40
alloc_pmd_page+0x13/0x50
populate_pmd+0xc0/0x2e0
? __lock_acquire+0x439/0x740
__cpa_process_fault+0x2e1/0x5d0
__change_page_attr_set_clr+0x7c3/0xcd0
? console_unlock+0x34d/0x660
? kernel_map_pages_in_pgd+0x8c/0x160
kernel_map_pages_in_pgd+0x8c/0x160
? printk+0x43/0x4b
? __map_region+0x3c/0x60
__map_region+0x3c/0x60
efi_map_region+0x83/0xd0
efi_illegal_accesses_fixup+0x1ca/0x1e0
no_context+0x112/0x390
__do_page_fault+0xc7/0x4f0
page_fault+0x1e/0x30
RIP: 0010:0xfffeffc7ccf1
RSP: 0018:c975bbf0 EFLAGS: 00010282
RAX: 0048 RBX: c975be10 RCX: c975bad0
RDX: 03f8 RSI: c975be10 RDI: fffeffc7cccf
RBP: c975bdc8 R08: 0048 R09: 0048
R10: 03fd R11: 03f8 R12: 880032a92d80
R13: 0003 R14: 7ffcf1eb9d50 R15: 
? efi_call+0xd1/0x160
? __lock_acquire+0x439/0x740
? _raw_spin_unlock+0x24/0x30
? virt_efi_get_next_high_mono_count+0x77/0xf0
? efi_test_ioctl+0x1ab/0xc20
? selinux_file_ioctl+0x122/0x1c0
? do_vfs_ioctl+0x92/0x6b0
? do_vfs_ioctl+0x92/0x6b0
? security_file_ioctl+0x3c/0x50
? selinux_capable+0x20/0x20
? ksys_ioctl+0x66/0x70
? __x64_sys_ioctl+0x16/0x20
? do_syscall_64+0x50/0x170
? entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fix the above warning by conditionally changing the allocation from
GFP_KERNEL to GFP_ATOMIC, so that efi page fault handler could use
efi_map_region() during runtime. This change shouldn't effect any other
generic page allocations because this allocation is used only by efi
functions [1].

[1] Comment in __cpa_process_fault() at arch/x86/mm/pageattr.c

if (cpa->pgd) {
/*
 * Right now, we only execute this code path when mapping
 * the EFI virtual memory map regions, no other users
 * provide a ->pgd value. This may change in the future.
 */
return populate_pgd(cpa, vaddr);
}

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/mm/pageattr.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 3bded76e8d5c..1b28a333c8ce 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -926,7 +926,13 @@ static void unmap_pud_range(p4d_t *p4d, unsigned long 
start, unsigned long end)
 
 static int alloc_pte_page(pmd_t *pmd)
 {
-   pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+   pte_t *pte;
+
+   if (in_atomic())
+   pte = (pte_t *)get_zeroed_page(GFP_ATOMIC);
+   else
+   pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+
if (!pte)
return -1;
 
@@ -936,7 +942,13 @@ static int alloc_pte_page(pmd_t *pmd)
 
 static int alloc_pmd_page(pud_t *pud)
 {
-   pmd_t *pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+   pmd_t *pmd;
+
+   if (in_atomic())
+   pmd = (pmd_t *)get_zeroed_page(GFP_ATOMIC);
+   else
+   pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+
if (!pmd)
return -1;
 
-- 
2.7.4

--
To unsubscribe from this

[PATCH V1 2/6] x86/efi: Remove __init attribute from memory mapping functions

2018-08-08 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Buggy firmware could illegally access EFI_BOOT_SERVICES_CODE/DATA
regions even after the kernel has assumed control of the platform. When
"CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES" is enabled, the efi page fault
handler will detect/fixup these illegal accesses. The below modified
functions are used by the page fault handler to fixup illegal accesses
to EFI_BOOT_SERVICES_CODE/DATA regions. As the page fault handler is
present during/after kernel boot it doesn't have an __init attribute,
but the below functions have it and thus during kernel build, "WARNING:
modpost: Found * section mismatch(es)" build warning is observed. To fix
it, remove __init attribute for all these functions.

In order to not keep these functions needlessly when
"CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES" is not selected, add a new
__efi_init_fixup attribute whose value changes based on whether the
config option is selected or not.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h | 11 ++-
 arch/x86/platform/efi/efi.c|  4 ++--
 arch/x86/platform/efi/efi_32.c |  2 +-
 arch/x86/platform/efi/efi_64.c |  9 +
 drivers/firmware/efi/efi.c |  6 +++---
 include/linux/efi.h| 16 ++--
 6 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index cec5fae23eb3..9b70743400f3 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -103,8 +103,9 @@ struct efi_scratch {
preempt_enable();   \
 })
 
-extern void __iomem *__init efi_ioremap(unsigned long addr, unsigned long size,
-   u32 type, u64 attribute);
+extern void __iomem *__efi_init_fixup efi_ioremap(unsigned long addr,
+ unsigned long size, u32 type,
+ u64 attribute);
 
 #ifdef CONFIG_KASAN
 /*
@@ -126,13 +127,13 @@ extern int __init efi_memblock_x86_reserve_range(void);
 extern pgd_t * __init efi_call_phys_prolog(void);
 extern void __init efi_call_phys_epilog(pgd_t *save_pgd);
 extern void __init efi_print_memmap(void);
-extern void __init efi_memory_uc(u64 addr, unsigned long size);
-extern void __init efi_map_region(efi_memory_desc_t *md);
+extern void __efi_init_fixup efi_memory_uc(u64 addr, unsigned long size);
+extern void __efi_init_fixup efi_map_region(efi_memory_desc_t *md);
 extern void __init efi_map_region_fixed(efi_memory_desc_t *md);
 extern void efi_sync_low_kernel_mappings(void);
 extern int __init efi_alloc_page_tables(void);
 extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned 
num_pages);
-extern void __init old_map_region(efi_memory_desc_t *md);
+extern void __efi_init_fixup old_map_region(efi_memory_desc_t *md);
 extern void __init runtime_code_page_mkexec(void);
 extern void __init efi_runtime_update_mappings(void);
 extern void __init efi_dump_pagetable(void);
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..439c2c40bf03 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -572,7 +572,7 @@ void __init runtime_code_page_mkexec(void)
}
 }
 
-void __init efi_memory_uc(u64 addr, unsigned long size)
+void __efi_init_fixup efi_memory_uc(u64 addr, unsigned long size)
 {
unsigned long page_shift = 1UL << EFI_PAGE_SHIFT;
u64 npages;
@@ -582,7 +582,7 @@ void __init efi_memory_uc(u64 addr, unsigned long size)
set_memory_uc(addr, npages);
 }
 
-void __init old_map_region(efi_memory_desc_t *md)
+void __efi_init_fixup old_map_region(efi_memory_desc_t *md)
 {
u64 start_pfn, end_pfn, end;
unsigned long size;
diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
index 324b93328b37..8f31452bd204 100644
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -58,7 +58,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
return 0;
 }
 
-void __init efi_map_region(efi_memory_desc_t *md)
+void __efi_init_fixup efi_map_region(efi_memory_desc_t *md)
 {
old_map_region(md);
 }
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 448267f1c073..a04298312fdd 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -408,7 +408,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
return 0;
 }
 
-static void __init __map_region(efi_memory_desc_t *md, u64 va)
+static void __efi_init_fixup __map_region(efi_memory_desc_t *md, u64 va)
 {
unsigned long flags = _PAGE_RW;
unsigned long pfn;
@@

[PATCH V1 1/6] efi: Make efi_rts_work accessible to efi page fault handler

2018-08-08 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

If the firmware illegally accesses any efi regions other than
EFI_BOOT_SERVICES_, the efi page fault handler would freeze
efi_rts_wq and schedules a new process. To do this, the efi page fault
handler needs efi_rts_work. Hence, make it accessible.

There will be no race conditions in accessing this structure, because,
all the calls to efi runtime services are already serialized.

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
---
 drivers/firmware/efi/runtime-wrappers.c | 53 ++---
 include/linux/efi.h | 36 ++
 2 files changed, 45 insertions(+), 44 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index aa66cbf23512..b18b2d864c2c 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -45,39 +45,7 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
-/* efi_runtime_service() function identifiers */
-enum efi_rts_ids {
-   GET_TIME,
-   SET_TIME,
-   GET_WAKEUP_TIME,
-   SET_WAKEUP_TIME,
-   GET_VARIABLE,
-   GET_NEXT_VARIABLE,
-   SET_VARIABLE,
-   QUERY_VARIABLE_INFO,
-   GET_NEXT_HIGH_MONO_COUNT,
-   UPDATE_CAPSULE,
-   QUERY_CAPSULE_CAPS,
-};
-
-/*
- * efi_runtime_work:   Details of EFI Runtime Service work
- * @arg<1-5>:  EFI Runtime Service function arguments
- * @status:Status of executing EFI Runtime Service
- * @efi_rts_id:EFI Runtime Service function identifier
- * @efi_rts_comp:  Struct used for handling completions
- */
-struct efi_runtime_work {
-   void *arg1;
-   void *arg2;
-   void *arg3;
-   void *arg4;
-   void *arg5;
-   efi_status_t status;
-   struct work_struct work;
-   enum efi_rts_ids efi_rts_id;
-   struct completion efi_rts_comp;
-};
+struct efi_runtime_work efi_rts_work;
 
 /*
  * efi_queue_work: Queue efi_runtime_service() and wait until it's done
@@ -91,7 +59,6 @@ struct efi_runtime_work {
  */
 #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)
\
 ({ \
-   struct efi_runtime_work efi_rts_work;   \
efi_rts_work.status = EFI_ABORTED;  \
\
init_completion(_rts_work.efi_rts_comp);\
@@ -184,18 +151,16 @@ static DEFINE_SEMAPHORE(efi_runtime_lock);
  */
 static void efi_call_rts(struct work_struct *work)
 {
-   struct efi_runtime_work *efi_rts_work;
void *arg1, *arg2, *arg3, *arg4, *arg5;
efi_status_t status = EFI_NOT_FOUND;
 
-   efi_rts_work = container_of(work, struct efi_runtime_work, work);
-   arg1 = efi_rts_work->arg1;
-   arg2 = efi_rts_work->arg2;
-   arg3 = efi_rts_work->arg3;
-   arg4 = efi_rts_work->arg4;
-   arg5 = efi_rts_work->arg5;
+   arg1 = efi_rts_work.arg1;
+   arg2 = efi_rts_work.arg2;
+   arg3 = efi_rts_work.arg3;
+   arg4 = efi_rts_work.arg4;
+   arg5 = efi_rts_work.arg5;
 
-   switch (efi_rts_work->efi_rts_id) {
+   switch (efi_rts_work.efi_rts_id) {
case GET_TIME:
status = efi_call_virt(get_time, (efi_time_t *)arg1,
   (efi_time_cap_t *)arg2);
@@ -253,8 +218,8 @@ static void efi_call_rts(struct work_struct *work)
 */
pr_err("Requested executing invalid EFI Runtime Service.\n");
}
-   efi_rts_work->status = status;
-   complete(_rts_work->efi_rts_comp);
+   efi_rts_work.status = status;
+   complete(_rts_work.efi_rts_comp);
 }
 
 static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 401e4b254e30..855992b15269 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1659,7 +1659,43 @@ struct linux_efi_tpm_eventlog {
 
 extern int efi_tpm_eventlog_init(void);
 
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   QUERY_VARIABLE_INFO,
+   GET_NEXT_HIGH_MONO_COUNT,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work:   Details of EFI Runtime Service work
+ * @arg<1-5>:  EFI Runtime Service function arguments
+ * @status:Status of executing EFI Runtime Service
+ * @efi_rts_id:  

[PATCH V1 0/6] Add efi page fault handler to fix/recover from

2018-08-08 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

There may exist some buggy UEFI firmware implementations that access efi
memory regions other than EFI_RUNTIME_SERVICES_ even after
the kernel has assumed control of the platform. This violates UEFI
specification. Hence, provide a debug config option which when enabled
detects and fixes/recovers from page faults caused by buggy firmware.

The above said illegal accesses trigger page fault in ring 0 because
firmware executes at ring 0 and if unhandled it hangs the kernel. We
provide an efi specific page fault handler to:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

Depending on the illegally accessed efi region, the efi page fault
handler handles illegal accesses differently.
1. If the illegally accessed region is EFI_BOOT_SERVICES_,
   the efi page fault handler fixes it up by mapping the requested
   region.
2. If any other region (Eg: EFI_CONVENTIONAL_MEMORY or
   EFI_LOADER_), then the efi page fault handler freezes
   efi_rts_wq and schedules a new process.
3. If the access is to any other efi region like above but if the efi
   runtime service is efi_reset_system(), then the efi page fault
   handler will reboot the machine through BIOS.

Illegal accesses to EFI_BOOT_SERVICES_ and to other regions
are dealt differently in efi page fault handler because, *generally*
EFI_BOOT_SERVICES_ regions are smaller in size relative to
other efi regions and hence could be reserved and can be dynamically
mapped. But other EFI regions like EFI_CONVENTIONAL_MEMORY and
EFI_LOADER_ cannot be reserved as they are very huge in size
and reserving them will make the kernel un-bootable.

This issue was reported by Al Stone when he saw that reboot via EFI hangs
the machine. Upon debugging, I found that it's efi_reset_system() that's
touching memory regions which it shouldn't. To reproduce the same
behavior, I have hacked OVMF and made efi_reset_system() buggy. Along
with efi_reset_system(), I have also modified get_next_high_mono_count()
and set_virtual_address_map(). They illegally access both boot time and
other efi regions.

Testing the patch set:
--
1. Download buggy firmware from here [1].
2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
Add reboot=efi to the kernel command line arguments and after the kernel
is up and running, type "reboot". The kernel should hang while rebooting.
3. With the same setup, boot kernel after applying patches and the
reboot should work fine. Also please notice warning/error messages
printed by kernel.

Changes from RFC to V1:
---
1. Drop "long jump" technique of dealing with illegal access and instead
   use scheduling away from efi_rts_wq.

Note:
-
Patch set based on "next" branch in efi tree.

[1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt

Sai Praneeth (6):
  efi: Make efi_rts_work accessible to efi page fault handler
  x86/efi: Remove __init attribute from memory mapping functions
  x86/efi: Permanently save the EFI_MEMORY_MAP passed by the firmware
  x86/efi: Add efi page fault handler to fixup/recover from page faults 
   caused by firmware
  x86/mm: If in_atomic(), allocate pages without sleeping
  x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES

 arch/x86/Kconfig|  21 
 arch/x86/include/asm/efi.h  |  24 +++-
 arch/x86/mm/fault.c |   9 ++
 arch/x86/mm/pageattr.c  |  16 ++-
 arch/x86/platform/efi/efi.c |  10 +-
 arch/x86/platform/efi/efi_32.c  |   2 +-
 arch/x86/platform/efi/efi_64.c  |   9 +-
 arch/x86/platform/efi/quirks.c  | 201 
 drivers/firmware/efi/efi.c  |   6 +-
 drivers/firmware/efi/runtime-wrappers.c |  60 +++---
 include/linux/efi.h |  53 -
 init/main.c |   3 +-
 12 files changed, 350 insertions(+), 64 deletions(-)

Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 7/8] x86/mm: If in_atomic(), allocate pages without sleeping

2018-07-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

A page fault occurs when any EFI Runtime Service tries to reference a
memory region which it shouldn't. If the illegally accessed region
is EFI_BOOT_SERVICES_, the efi specific page fault handler
fixes it up by dynamically creating VA->PA mappings using
efi_map_region().

Originally, efi_map_region() and hence the functionality of creating
mappings for efi regions was intended to be used *only* during boot time
(please note __init modifier) and hence when called during runtime,
the page allocators complain. Calling efi_map_region() during runtime
complains because "gfp_allowed_mask" value changes from boot time to
runtime (GFP_BOOT_MASK to __GFP_BITS_MASK). During boot, even though
efi_map_region() calls alloc__page with GFP_KERNEL, the page
allocator doesn't complain because "__GFP_RECLAIM" flag is cleared by
"gfp_allowed_mask", but during runtime it isn't cleared and hence prints
below stack trace.

BUG: sleeping function called from invalid context at mm/page_alloc.c:4320
in_atomic(): 1, irqs_disabled(): 1, pid: 2022, name: fwts
1 lock held by fwts/2022:
irq event stamp: 45714
hardirqs last  enabled at (45713): [] 
restore_regs_and_return_to_kernel+0x0/0x2c
hardirqs last disabled at (45714): [] error_entry+0x7c/0x100
softirqs last  enabled at (44732): [] __do_softirq+0x387/0x49a
softirqs last disabled at (44707): [] irq_exit+0xbb/0xc0
CPU: 0 PID: 2022 Comm: fwts Not tainted 4.17.0-rc4-efitest+ #405
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Call Trace:
dump_stack+0x5e/0x8b
___might_sleep+0x20c/0x240
__alloc_pages_nodemask+0xc2/0x330
get_zeroed_page+0x12/0x40
alloc_pmd_page+0x13/0x50
populate_pmd+0xc0/0x2e0
? __lock_acquire+0x439/0x740
__cpa_process_fault+0x2e1/0x5d0
__change_page_attr_set_clr+0x7c3/0xcd0
? console_unlock+0x34d/0x660
? kernel_map_pages_in_pgd+0x8c/0x160
kernel_map_pages_in_pgd+0x8c/0x160
? printk+0x43/0x4b
? __map_region+0x3c/0x60
__map_region+0x3c/0x60
efi_map_region+0x83/0xd0
efi_illegal_accesses_fixup+0x1ca/0x1e0
no_context+0x112/0x390
__do_page_fault+0xc7/0x4f0
page_fault+0x1e/0x30
RIP: 0010:0xfffeffc7ccf1
RSP: 0018:c975bbf0 EFLAGS: 00010282
RAX: 0048 RBX: c975be10 RCX: c975bad0
RDX: 03f8 RSI: c975be10 RDI: fffeffc7cccf
RBP: c975bdc8 R08: 0048 R09: 0048
R10: 03fd R11: 03f8 R12: 880032a92d80
R13: 0003 R14: 7ffcf1eb9d50 R15: 
? efi_call+0xd1/0x160
? __lock_acquire+0x439/0x740
? _raw_spin_unlock+0x24/0x30
? virt_efi_get_next_high_mono_count+0x77/0xf0
? efi_test_ioctl+0x1ab/0xc20
? selinux_file_ioctl+0x122/0x1c0
? do_vfs_ioctl+0x92/0x6b0
? do_vfs_ioctl+0x92/0x6b0
? security_file_ioctl+0x3c/0x50
? selinux_capable+0x20/0x20
? ksys_ioctl+0x66/0x70
? __x64_sys_ioctl+0x16/0x20
? do_syscall_64+0x50/0x170
? entry_SYSCALL_64_after_hwframe+0x49/0xbe

I guess, we can't do much to fix the above warning except to change
the allocation conditionally from GFP_KERNEL to GFP_ATOMIC, so that
we could use efi_map_region() during runtime. This change shouldn't
effect any other generic page allocations because this allocation is
used only by efi functions [1].

[1] Comment in __cpa_process_fault() at arch/x86/mm/pageattr.c

if (cpa->pgd) {
/*
 * Right now, we only execute this code path when mapping
 * the EFI virtual memory map regions, no other users
 * provide a ->pgd value. This may change in the future.
 */
return populate_pgd(cpa, vaddr);
}

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Cc: Al Stone 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 
---
 arch/x86/mm/pageattr.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 3bded76e8d5c..1b28a333c8ce 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -926,7 +926,13 @@ static void unmap_pud_range(p4d_t *p4d, unsigned long 
start, unsigned long end)
 
 static int alloc_pte_page(pmd_t *pmd)
 {
-   pte_t *pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+   pte_t *pte;
+
+   if (in_atomic())
+   pte = (pte_t *)get_zeroed_page(GFP_ATOMIC);
+   else
+   pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+
if (!pte)
return -1;
 
@@ -936,7 +942,13 @@ static int alloc_pte_page(pmd_t *pmd)
 
 static int alloc_pmd_page(pud_t *pud)
 {
-   pmd_t *pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+   pmd_t *pmd;
+
+   if (in_atomic())
+   pmd = (pmd_t *)get_zeroed_page(GFP_ATOMIC);
+   else
+   pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+
if (!pmd)
return -1;
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message

[PATCH RFC 0/8] Add efi page fault handler to fix/recover from

2018-07-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

There may exist some buggy UEFI firmware implementations that access efi
memory regions other than EFI_RUNTIME_SERVICES_ even after
kernel has assumed control of the platform. This violates UEFI
specification. Here, we provide a debug config option which when enabled
detects and fixes up/recovers from page faults caused by buggy firmware.

The above said illegal accesses trigger page fault in ring 0 because
firmware executes at ring 0 and if unhandled it hangs the kernel. We
provide an efi specific page fault handler to:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence can save ourselves
from being blamed for not a fault of ours.

Depending on the illegally accessed efi region, the efi page fault
handler handles illegal accesses differently.
1. If the illegally accessed region is EFI_BOOT_SERVICES_,
the page fault handler fixes it up by mapping the requested region.
2. If the illegally accessed region is any other efi region (like
EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), the page fault
handler exits firmware context and disables EFI Runtime Services, so
that we will never again call buggy firmware.

Page faults to efi regions are handled differently because, presently
during kernel boot, EFI_BOOT_SERVICES_ regions are reserved
by kernel and hence it's OK to dynamically map these regions in page
fault handler. The same approach cannot be followed for other efi
regions like EFI_CONVENTIONAL_MEMORY and EFI_LOADER_ as they
are very huge in size and reserving them could make the kernel
un-bootable. Hence, we take a different approach (exiting firmware
context) while dealing with page faults to these regions. This also
saves us from executing buggy firmware further.

Exiting firmware context means that on every entry to firmware we save
the kernel context before calling firmware and if the firmware
misbehaves, in the page fault handler, we roll back to the saved kernel
context. Saving kernel context means saving the stack pointer and the
instruction that gets executed when firmware returns. In the page fault
handler we fix up these two things (RIP and RSP) so that when returning
from page fault handler it looks as if firmware has called RET.

This issue was reported by Al Stone when he saw that reboot via EFI hangs
the machine. Upon debugging, I found that it's efi_reset_system() that's
touching memory regions which it shouldn't. To reproduce the same
behavior, I have hacked OVMF and made efi_reset_system() buggy.

Testing the patch set:
--
1. Download buggy firmware from here [1].
2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
Add reboot=efi to the kernel command line arguments and after the kernel
is up and running, type "reboot". The kernel should hang while rebooting.
3. With the same setup, boot kernel after applying patches and the
reboot should work fine. Also please notice warning/error messages
printed by kernel.

Note:
-
Patch set based on "next" branch in efi tree.

[1] https://drive.google.com/open?id=1tkvT7GaVX2zSlzy1HK1T4Tv8cT36GP6R

Sai Praneeth (8):
  x86/efi: Remove __init attribute from memory mapping functions
  x86/efi: Permanently save the EFI_MEMORY_MAP passed by firmware
  x86/efi: Save kernel context before calling EFI Runtime Services
  x86/efi: Add page fault handler to fixup/recover from page faults
caused by firmware
  x86/efi: If EFI_WARN_ON_ILLEGAL_ACCESSES is enabled don't call
efi_free_boot_services()
  x86/efi: Map EFI_BOOT_SERVICES_ regions only when
EFI_WARN_ON_ILLEGAL_ACCESSES is disabled
  x86/mm: If in_atomic(), allocate pages without sleeping
  x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES

 arch/x86/Kconfig|  17 +++
 arch/x86/include/asm/efi.h  |  42 ++-
 arch/x86/mm/fault.c |   9 ++
 arch/x86/mm/pageattr.c  |  16 ++-
 arch/x86/platform/efi/efi.c |  10 +-
 arch/x86/platform/efi/efi_32.c  |   2 +-
 arch/x86/platform/efi/efi_64.c  |  16 ++-
 arch/x86/platform/efi/efi_stub_64.S | 101 -
 arch/x86/platform/efi/quirks.c  | 193 
 drivers/firmware/efi/efi.c  |   6 +-
 drivers/firmware/efi/runtime-wrappers.c |   6 +
 include/linux/efi.h |  16 ++-
 init/main.c |   3 +-
 13 files changed, 415 insertions(+), 22 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Cc: Al Stone 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 2/8] x86/efi: Permanently save the EFI_MEMORY_MAP passed by firmware

2018-07-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

The page fault handler that fixes up page faults caused by firmware
needs the original memory map passed by firmware. It looks up this
memory map to find the type of the memory region at which the page fault
occurred. Presently, EFI subsystem discards the original memory map
passed by firmware and replaces it with a new memory map that has only
EFI_RUNTIME_SERVICES_ regions, but illegal accesses by
firmware can occur at any region. Hence, _only_ if
CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES is defined, create a backup of the
original memory map passed by firmware, so that we can detect/fix
illegal accesses to *any* efi regions.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Cc: Al Stone 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h |  6 ++
 arch/x86/platform/efi/efi.c|  2 ++
 arch/x86/platform/efi/quirks.c | 49 ++
 3 files changed, 57 insertions(+)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 9b70743400f3..c97f2e955cab 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -142,6 +142,12 @@ extern int __init efi_reuse_config(u64 tables, int 
nr_tables);
 extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
 
+#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES
+extern void __init efi_save_original_memmap(void);
+#else
+static inline void __init efi_save_original_memmap(void) { }
+#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */
+
 struct efi_setup_data {
u64 fw_vendor;
u64 runtime;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 439c2c40bf03..7d18b7ed5d41 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -946,6 +946,8 @@ static void __init __efi_enter_virtual_mode(void)
 
pa = __pa(new_memmap);
 
+   efi_save_original_memmap();
+
/*
 * Unregister the early EFI memmap from efi_init() and install
 * the new EFI memory map that we are about to pass to the
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 844d31cb8a0c..84b213a1460a 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -654,3 +654,52 @@ int efi_capsule_setup_info(struct capsule_info *cap_info, 
void *kbuff,
 }
 
 #endif
+
+#ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES
+
+static bool original_memory_map_present;
+static struct efi_memory_map original_memory_map;
+
+/*
+ * The page fault handler that fixes up page faults caused by buggy
+ * firmware needs original memory map (memory map passed by firmware).
+ * Hence, build a new EFI memmap that has *all* entries and save it for
+ * later use.
+ */
+void __init efi_save_original_memmap(void)
+{
+   efi_memory_desc_t *md;
+   void *remapped_phys, *new_md;
+   phys_addr_t new_phys, new_size;
+
+   new_size = efi.memmap.desc_size * efi.memmap.nr_map;
+   new_phys = efi_memmap_alloc(efi.memmap.nr_map);
+   if (!new_phys) {
+   pr_err("Failed to allocate new EFI memmap\n");
+   return;
+   }
+
+   remapped_phys = memremap(new_phys, new_size, MEMREMAP_WB);
+   if (!remapped_phys) {
+   pr_err("Failed to remap new EFI memmap\n");
+   __free_pages(pfn_to_page(PHYS_PFN(new_phys)), 
get_order(new_size));
+   return;
+   }
+
+   new_md = remapped_phys;
+   for_each_efi_memory_desc(md) {
+   memcpy(new_md, md, efi.memmap.desc_size);
+   new_md += efi.memmap.desc_size;
+   }
+
+   original_memory_map.late = 1;
+   original_memory_map.phys_map = new_phys;
+   original_memory_map.map = remapped_phys;
+   original_memory_map.nr_map = efi.memmap.nr_map;
+   original_memory_map.desc_size = efi.memmap.desc_size;
+   original_memory_map.map_end = remapped_phys + new_size;
+   original_memory_map.desc_version = efi.memmap.desc_version;
+
+   original_memory_map_present = true;
+}
+#endif /* CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES */
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 5/8] x86/efi: If EFI_WARN_ON_ILLEGAL_ACCESSES is enabled don't call efi_free_boot_services()

2018-07-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

During early boot phase EFI_BOOT_SERVICES_ regions are marked
as reserved by kernel (see efi_reserve_boot_services()) and hence are
not used by kernel for boot purposes. When EFI_WARN_ON_ILLEGAL_ACCESSES
is enabled, page faults triggered by firmware due to illegal accesses to
EFI_BOOT_SERVICES_ regions are dynamically fixed by kernel by
mapping these regions on demand. This resolution assumes that
EFI_BOOT_SERVICES_ regions are intact i.e. no one has ever
used these regions except firmware. Hence, to make this assumption true,
don't call efi_free_boot_services() if EFI_WARN_ON_ILLEGAL_ACCESSES is
enabled.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Cc: Al Stone 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 
---
 init/main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index 3b4ada11ed52..dce0520861a1 100644
--- a/init/main.c
+++ b/init/main.c
@@ -730,7 +730,8 @@ asmlinkage __visible void __init start_kernel(void)
arch_post_acpi_subsys_init();
sfi_init_late();
 
-   if (efi_enabled(EFI_RUNTIME_SERVICES)) {
+   if (efi_enabled(EFI_RUNTIME_SERVICES) &&
+   !IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES)) {
efi_free_boot_services();
}
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 3/8] x86/efi: Save kernel context before calling EFI Runtime Services

2018-07-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

After the kernel is up and running, the only time firmware executes is
when an EFI Runtime Service is invoked by kernel. When invoked, some
buggy implementations of EFI Runtime Service could access memory regions
which it shouldn't. This will cause a page fault in ring 0 and if
unhandled it hangs the kernel.

The obvious way to avoid such hangs is to handle the page fault.
Remember the sequence of things that lead us to page fault.
1. A user has requested kernel to execute some EFI Runtime Service
2. Kernel prepares and calls requested EFI Runtime Service
3. Requested EFI Runtime Service is buggy and hence caused a page fault
4. The kernel gets back control and it's in interrupt mode
If the page fault is handled successfully kernel would be returning
control to EFI Runtime Service which in turn returns control back to
kernel. But the kernel cannot map the requested efi region because it's
long gone. We cannot either mark EFI regions as reserved and dynamically
allow access because it will make the kernel un-bootable.

The proposed solution here is to save the kernel context before giving
away control to firmware (i.e. in step 2) and if the firmware
misbehaves, in the page fault handler, we roll back to the saved kernel
context. This saves us from executing buggy firmware further and saving
ourselves from hanging.

Saving kernel context means saving the stack pointer and the instruction
that gets executed when firmware returns. In the page fault handler we
fix up these two things (RIP and RSP) so that when returning from page
fault handler it looks as if firmware has called RET.

UEFI specification v2.7, section 2.3.4 "Calling Conventions for X64
platforms" says that "The registers RBX, RBP, RDI, RSI, R12, R13, R14,
R15, and XMM6-XMM15 are considered nonvolatile and must be saved and
restored by a function that uses them". This means that any EFI Runtime
Service that uses the above mentioned registers will save/restore its
value. Hence, to emulate the same behaviour we save/restore these
registers each and every time we call EFI Runtime Service.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Cc: Al Stone 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h  |   3 ++
 arch/x86/platform/efi/efi_64.c  |   7 +++
 arch/x86/platform/efi/efi_stub_64.S | 101 +++-
 arch/x86/platform/efi/quirks.c  |   4 ++
 4 files changed, 114 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index c97f2e955cab..47202b9e1b8e 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -121,6 +121,9 @@ extern void __iomem *__efi_init_fixup efi_ioremap(unsigned 
long addr,
 
 #endif /* CONFIG_X86_32 */
 
+extern u64 xmm_regs_rsp;
+extern u64 core_regs_rsp;
+extern u64 exit_fw_ctx_rip;
 extern struct efi_scratch efi_scratch;
 extern void __init efi_set_executable(efi_memory_desc_t *md, bool executable);
 extern int __init efi_memblock_x86_reserve_range(void);
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index a04298312fdd..7787bc2e58fb 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -627,6 +627,13 @@ void __init efi_dump_pagetable(void)
  */
 void efi_switch_mm(struct mm_struct *mm)
 {
+   /*
+* Used by efi page fault handler (efi_illegal_accesses_fixup()) to
+* check if it was indeed invoked in firmware context.
+*/
+   xmm_regs_rsp = 0;
+   exit_fw_ctx_rip = 0;
+
task_lock(current);
efi_scratch.prev_mm = current->active_mm;
current->active_mm = mm;
diff --git a/arch/x86/platform/efi/efi_stub_64.S 
b/arch/x86/platform/efi/efi_stub_64.S
index 74628ec78f29..c86825c01b4c 100644
--- a/arch/x86/platform/efi/efi_stub_64.S
+++ b/arch/x86/platform/efi/efi_stub_64.S
@@ -39,6 +39,101 @@
mov %rsi, %cr0; \
mov (%rsp), %rsp
 
+#define SAVE_CORE_REGS_CALLEE  \
+   pushq %rbx; \
+   pushq %rdi; \
+   pushq %rsi; \
+   pushq %r12; \
+   pushq %r13; \
+   pushq %r14; \
+   pushq %r15
+
+#define RESTORE_CORE_REGS_CALLEE   \
+   popq %r15;  \
+   popq %r14;  \
+   popq %r13;  \
+   popq %r12;  \
+   popq %rsi;  \
+   popq %rdi;  \
+   popq %rbx
+
+#define SAVE_XMM_REGS_CALLEE   \
+   subq $0xb0, %rsp;   \
+   and $~0xf, %rsp ;   \
+   movaps %xmm6, 0xa0(%rsp);   \
+   movaps %xmm7, 0x90(%rsp);   \
+   movaps %xmm8, 0x80(%rsp);   \
+   movaps %xmm9, 0x70(%rs

[PATCH RFC 4/8] x86/efi: Add page fault handler to fixup/recover from page faults caused by firmware

2018-07-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

EFI regions could briefly be divided into 3 types.
1. EFI_BOOT_SERVICES_ regions
2. EFI_RUNTIME_SERVICES_ regions
3. Other EFI regions like EFI_LOADER_ etc.

As per the UEFI specification, after the call to ExitBootServices(),
accesses by firmware to any memory region except
EFI_RUNTIME_SERVICES_ regions is considered illegal. A buggy
firmware could trigger these illegal accesses during boot time or at
runtime (i.e. when the kernel is up and running). Presently, the kernel
can fix up illegal accesses to EFI_BOOT_SERVICES_ regions
*only* during kernel boot phase. If firmware triggers illegal accesses
to *any* other EFI regions during kernel boot, the kernel panics or if
this happens during kernel runtime then the kernel hangs.

Kernel panics/hangs because the memory region requested by firmware
isn't mapped which causes a page fault in ring 0 and the kernel fails to
handle it leading to die(). To save kernel from hanging we add a page
fault handler which detects illegal accesses by firmware and
1. If the illegally accessed region is EFI_BOOT_SERVICES_,
the kernel fixes it up by mapping the requested region.
2. If any other region (Eg: EFI_CONVENTIONAL_MEMORY or
EFI_LOADER_), then the kernel exits firmware context and
disables EFI Runtime Services, so that we will never again call buggy
firmware.

Illegal accesses to EFI_BOOT_SERVICES_ and to other regions
are dealt differently in efi page fault handler because presently during
kernel boot EFI_BOOT_SERVICES_ regions are reserved by kernel
and hence it's OK to dynamically map these regions in page fault
handler. We cannot reserve other EFI regions like
EFI_CONVENTIONAL_MEMORY and EFI_LOADER_ as they are very huge
in size and reserving them will make the kernel un-bootable. Hence, we
take a different approach (exiting firmware context) in dealing with
page faults to these regions.

The efi specific page fault handler offers us two advantages:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence can save ourselves
from being blamed for not a fault of ours.

Finally, this new mapping will not impact a reboot from kexec, as kexec
is only concerned about runtime memory regions.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Cc: Al Stone 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h  |  22 -
 arch/x86/mm/fault.c |   9 ++
 arch/x86/platform/efi/quirks.c  | 140 
 drivers/firmware/efi/runtime-wrappers.c |   6 ++
 4 files changed, 176 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 47202b9e1b8e..1285caccdff4 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -90,8 +90,20 @@ struct efi_scratch {
efi_switch_mm(_mm); \
 })
 
+/*
+ * Returns "EFI_ABORTED" if illegal access by firmware caused to exit
+ * firmware context, otherwise returns status returned by firmware.
+ */
 #define arch_efi_call_virt(p, f, args...)  \
-   efi_call((void *)p->f, args)\
+({ \
+   efi_status_t __s;   \
+   \
+   __s = efi_call((void *)p->f, args); \
+   if (exited_fw_ctx)  \
+   __s = EFI_ABORTED;  \
+   \
+   __s;\
+})
 
 #define arch_efi_call_virt_teardown()  \
 ({ \
@@ -124,6 +136,7 @@ extern void __iomem *__efi_init_fixup efi_ioremap(unsigned 
long addr,
 extern u64 xmm_regs_rsp;
 extern u64 core_regs_rsp;
 extern u64 exit_fw_ctx_rip;
+extern bool exited_fw_ctx;
 extern struct efi_scratch efi_scratch;
 extern void __init efi_set_executable(efi_memory_desc_t *md, bool executable);
 extern int __init efi_memblock_x86_reserve_range(void);
@@ -147,8 +160,15 @@ extern void efi_switch_mm(struct mm_struct *mm);
 
 #ifdef CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES
 extern void __init efi_save_original_memmap(void);
+extern int efi_illegal_accesses_fixup(unsigned long phys_addr,
+ struct pt_regs *regs);
 #else
 static inline void __init efi_save_original_memmap(void) { }
+static inline int efi_illegal_accesses_fixup(unsigned long phys_addr,
+struct pt_regs *regs)
+{
+   return 0;
+}
 #endif /* CONFIG_EFI_WARN_ON_

[PATCH RFC 8/8] x86/efi: Introduce EFI_WARN_ON_ILLEGAL_ACCESSES

2018-07-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

There may exist some buggy UEFI firmware implementations that access efi
memory regions other than EFI_RUNTIME_SERVICES_ even after
kernel has assumed control of the platform. This violates UEFI
specification.

If selected, this debug option will print a warning message if the UEFI
firmware tries to access any memory regions which it shouldn't. Along
with the warning, the kernel will also try to fixup/recover from the
page fault triggered by firmware so that the machine doesn't hang.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Cc: Al Stone 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 
---
 arch/x86/Kconfig | 17 +
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f1dbb4ee19d7..9ff11ec65232 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1957,6 +1957,23 @@ config EFI_MIXED
 
   If unsure, say N.
 
+config EFI_WARN_ON_ILLEGAL_ACCESSES
+   bool "Warn about illegal memory accesses by firmware"
+   depends on EFI
+   help
+ Enable this debug feature so that the kernel can detect illegal
+ memory accesses by firmware and issue a warning. Also,
+ 1. If the illegally accessed region is EFI_BOOT_SERVICES_,
+ the kernel fixes it up by mapping the requested region.
+ 2. If the illegally accessed region is any other region (Eg:
+ EFI_CONVENTIONAL_MEMORY or EFI_LOADER_), then kernel
+ exits firmware context and disables EFI Runtime Services, so that
+ it will never again call buggy firmware.
+ Please see the UEFI specification for details on the expectations
+ of memory usage.
+
+ If unsure, say N.
+
 config SECCOMP
def_bool y
prompt "Enable seccomp to safely compute untrusted bytecode"
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 6/8] x86/efi: Map EFI_BOOT_SERVICES_ regions only when EFI_WARN_ON_ILLEGAL_ACCESSES is disabled

2018-07-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, the kernel maps EFI_BOOT_SERVICES_ regions as a
workaround for buggy firmware that accesses them even when they
shouldn't. With EFI_WARN_ON_ILLEGAL_ACCESSES enabled kernel can now
detect and handle such accesses dynamically. Hence, rather than safely
mapping all the EFI_BOOT_SERVICES_ regions, map only
EFI_RUNTIME_SERVICES_ regions and trap all other illegal
accesses.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Cc: Al Stone 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 
---
 arch/x86/platform/efi/efi.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 7d18b7ed5d41..0ddb22a03d88 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -768,9 +768,13 @@ static bool should_map_region(efi_memory_desc_t *md)
/*
 * Map boot services regions as a workaround for buggy
 * firmware that accesses them even when they shouldn't.
+* (only if CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES is disabled)
 *
 * See efi_{reserve,free}_boot_services().
 */
+   if (IS_ENABLED(CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES))
+   return false;
+
if (md->type == EFI_BOOT_SERVICES_CODE ||
md->type == EFI_BOOT_SERVICES_DATA)
return true;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 1/8] x86/efi: Remove __init attribute from memory mapping functions

2018-07-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Buggy firmware could illegally access EFI_BOOT_SERVICES_CODE/DATA
regions even after kernel has assumed control of the platform. When
"CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES" is enabled we provide a page fault
handler that could detect/fixup these illegal accesses. The below
modified functions are used by the page fault handler to fixup illegal
accesses to EFI_BOOT_SERVICES_CODE/DATA regions. As the page fault
handler is present during/after kernel boot it doesn't have an __init
attribute but the below functions have it and thus during kernel build,
we observe "WARNING: modpost: Found * section mismatch(es)". To fix this
build warning we remove __init attribute for all these functions.

In order to not keep these functions needlessly when
"CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES" is not selected, we add a new
__efi_init_fixup attribute whose value changes based on whether the
config option is selected or not.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Cc: Al Stone 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Bhupesh Sharma 
Cc: Ard Biesheuvel 
---
 arch/x86/include/asm/efi.h | 11 ++-
 arch/x86/platform/efi/efi.c|  4 ++--
 arch/x86/platform/efi/efi_32.c |  2 +-
 arch/x86/platform/efi/efi_64.c |  9 +
 drivers/firmware/efi/efi.c |  6 +++---
 include/linux/efi.h| 16 ++--
 6 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index cec5fae23eb3..9b70743400f3 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -103,8 +103,9 @@ struct efi_scratch {
preempt_enable();   \
 })
 
-extern void __iomem *__init efi_ioremap(unsigned long addr, unsigned long size,
-   u32 type, u64 attribute);
+extern void __iomem *__efi_init_fixup efi_ioremap(unsigned long addr,
+ unsigned long size, u32 type,
+ u64 attribute);
 
 #ifdef CONFIG_KASAN
 /*
@@ -126,13 +127,13 @@ extern int __init efi_memblock_x86_reserve_range(void);
 extern pgd_t * __init efi_call_phys_prolog(void);
 extern void __init efi_call_phys_epilog(pgd_t *save_pgd);
 extern void __init efi_print_memmap(void);
-extern void __init efi_memory_uc(u64 addr, unsigned long size);
-extern void __init efi_map_region(efi_memory_desc_t *md);
+extern void __efi_init_fixup efi_memory_uc(u64 addr, unsigned long size);
+extern void __efi_init_fixup efi_map_region(efi_memory_desc_t *md);
 extern void __init efi_map_region_fixed(efi_memory_desc_t *md);
 extern void efi_sync_low_kernel_mappings(void);
 extern int __init efi_alloc_page_tables(void);
 extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned 
num_pages);
-extern void __init old_map_region(efi_memory_desc_t *md);
+extern void __efi_init_fixup old_map_region(efi_memory_desc_t *md);
 extern void __init runtime_code_page_mkexec(void);
 extern void __init efi_runtime_update_mappings(void);
 extern void __init efi_dump_pagetable(void);
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..439c2c40bf03 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -572,7 +572,7 @@ void __init runtime_code_page_mkexec(void)
}
 }
 
-void __init efi_memory_uc(u64 addr, unsigned long size)
+void __efi_init_fixup efi_memory_uc(u64 addr, unsigned long size)
 {
unsigned long page_shift = 1UL << EFI_PAGE_SHIFT;
u64 npages;
@@ -582,7 +582,7 @@ void __init efi_memory_uc(u64 addr, unsigned long size)
set_memory_uc(addr, npages);
 }
 
-void __init old_map_region(efi_memory_desc_t *md)
+void __efi_init_fixup old_map_region(efi_memory_desc_t *md)
 {
u64 start_pfn, end_pfn, end;
unsigned long size;
diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
index 324b93328b37..8f31452bd204 100644
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -58,7 +58,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
return 0;
 }
 
-void __init efi_map_region(efi_memory_desc_t *md)
+void __efi_init_fixup efi_map_region(efi_memory_desc_t *md)
 {
old_map_region(md);
 }
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 448267f1c073..a04298312fdd 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -408,7 +408,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
return 0;
 }
 
-static void __init __map_region(efi_memory_desc_t *md, u64 va)
+static void __efi_init_fixup __map_region(efi_memory_desc_t *md, u64 va)
 {
unsigned long flags = _PAGE_RW;
unsigned long pfn;
@@ -426,7 +426,7 @@ static void __init __map_r

[PATCH 1/6] efi: Introduce efi_memmap_free() to free memory allocated by efi_memmap_alloc()

2018-07-02 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

efi_memmap_alloc() allocates memory depending on whether mm_init() has
already been invoked or not. Apart from memblock_alloc() memory and
alloc_pages() memory, efi memory map could also have a third variant of
memory allocation and that is memblock_reserved. This happens only for
the memory map passed to kernel by firmware and thus can happen only
once during boot process.

In order to identify these three different types of allocations and thus
to call the appropriate free() variant, introduce an enum named
efi_memmap_type and also introduce a efi memmap API named
efi_memmap_free() to free memory allocated by efi_memmap_alloc().

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Lee Chun-Yi 
Cc: Dave Young 
Cc: Borislav Petkov 
Cc: Laszlo Ersek 
Cc: Jan Kiszka 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Nicolai Stange 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Taku Izumi 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
---
 drivers/firmware/efi/memmap.c | 28 
 include/linux/efi.h   |  8 
 2 files changed, 36 insertions(+)

diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 5fc70520e04c..0686e063c644 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static phys_addr_t __init __efi_memmap_alloc_early(unsigned long size)
 {
@@ -50,6 +51,33 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
 }
 
 /**
+ * efi_memmap_free - Free memory allocated by efi_memmap_alloc()
+ * @mem: Physical address allocated by efi_memmap_alloc().
+ * @num_entries: Number of entries in the allocated map.
+ * @alloc_type: What type of allocation did efi_memmap_alloc() perform?
+ *
+ * Use this function to free memory allocated by efi_memmap_alloc().
+ * efi_memmap_alloc() allocates memory depending on whether mm_init()
+ * has already been invoked or not. It uses either memblock or "normal"
+ * page allocation, similarly, we free it in two different ways. Also
+ * note that there is a third type of memory used by memmap which is
+ * memblock_reserved() and is passed by EFI stub to kernel.
+ */
+void __init efi_memmap_free(phys_addr_t mem, unsigned int num_entries,
+   enum efi_memmap_type alloc_type)
+{
+   unsigned long size = num_entries * efi.memmap.desc_size;
+   unsigned int order = get_order(size);
+
+   if (alloc_type == BUDDY_ALLOCATOR)
+   __free_pages(pfn_to_page(PHYS_PFN(mem)), order);
+   else if (alloc_type == MEMBLOCK)
+   memblock_free(mem, size);
+   else
+   free_bootmem(mem, size);
+}
+
+/**
  * __efi_memmap_init - Common code for mapping the EFI memory map
  * @data: EFI memory map data
  * @late: Use early or late mapping function?
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 56add823f190..455875c01ed1 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -765,6 +765,12 @@ struct efi_memory_map_data {
unsigned long desc_size;
 };
 
+enum efi_memmap_type {
+   EFI_STUB,
+   MEMBLOCK,
+   BUDDY_ALLOCATOR,
+};
+
 struct efi_memory_map {
phys_addr_t phys_map;
void *map;
@@ -1016,6 +1022,8 @@ extern int __init 
efi_memmap_split_count(efi_memory_desc_t *md,
 struct range *range);
 extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap,
 void *buf, struct efi_mem_range *mem);
+extern void __init efi_memmap_free(phys_addr_t mem, unsigned int num_entries,
+  enum efi_memmap_type alloc_type);
 
 extern int efi_config_init(efi_config_table_type_t *arch_tables);
 #ifdef CONFIG_EFI_ESRT
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] efi: Use efi.memmap.alloc_type instead of efi.memmap.late

2018-07-02 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Memory used by efi memory map could be one among the three different
types, namely a) memblock_reserved b) memblock_alloc'ed c) normal paged
memory. Presently, we use efi.memmap.late which is of type "bool" to
record the type of memory in use by efi memory map. As "bool" doesn't
suffice our needs, replace it with enum to represent one among the three
different available types of memory and hence also change all the
corresponding memmap API's to reflect the same.

Also, presently, we never freed memblock_reserved memory and hence never
recorded it's usage. Change efi_memmap_init_early() so that it could now
record the usage of memblock_reserved memory and can be freed when
appropriate. Also, change efi_memmap_install() and __efi_memmap_init()
so that at every point of time we could record the type of memory in use
by efi memory map and hence use "efi.memmap.alloc_type" to free the
existing memory before installing a new memory map.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Lee Chun-Yi 
Cc: Dave Young 
Cc: Borislav Petkov 
Cc: Laszlo Ersek 
Cc: Jan Kiszka 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Nicolai Stange 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Taku Izumi 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
---
 arch/x86/platform/efi/efi.c |  4 ++--
 arch/x86/platform/efi/quirks.c  |  4 ++--
 drivers/firmware/efi/arm-init.c |  2 +-
 drivers/firmware/efi/fake_mem.c |  2 +-
 drivers/firmware/efi/memmap.c   | 34 +++---
 include/linux/efi.h |  8 +---
 6 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..cda54abf25a6 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -196,7 +196,7 @@ int __init efi_memblock_x86_reserve_range(void)
data.desc_size  = e->efi_memdesc_size;
data.desc_version   = e->efi_memdesc_version;
 
-   rv = efi_memmap_init_early();
+   rv = efi_memmap_init_early(, EFI_STUB);
if (rv)
return rv;
 
@@ -272,7 +272,7 @@ static void __init efi_clean_memmap(void)
u64 size = efi.memmap.nr_map - n_removal;
 
pr_warn("Removing %d invalid memory map entries.\n", n_removal);
-   efi_memmap_install(efi.memmap.phys_map, size);
+   efi_memmap_install(efi.memmap.phys_map, size, EFI_STUB);
}
 }
 
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 84e8d077adf6..11fa6ac9f0c2 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -292,7 +292,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
efi_memmap_insert(, new, );
early_memunmap(new, new_size);
 
-   efi_memmap_install(new_phys, num_entries);
+   efi_memmap_install(new_phys, num_entries, alloc_type);
 }
 
 /*
@@ -452,7 +452,7 @@ void __init efi_free_boot_services(void)
 
memunmap(new);
 
-   if (efi_memmap_install(new_phys, num_entries)) {
+   if (efi_memmap_install(new_phys, num_entries, alloc_type)) {
pr_err("Could not install new EFI memmap\n");
return;
}
diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c
index b5214c143fee..f0de8df6f396 100644
--- a/drivers/firmware/efi/arm-init.c
+++ b/drivers/firmware/efi/arm-init.c
@@ -239,7 +239,7 @@ void __init efi_init(void)
data.size = params.mmap_size;
data.phys_map = params.mmap;
 
-   if (efi_memmap_init_early() < 0) {
+   if (efi_memmap_init_early(, EFI_STUB) < 0) {
/*
* If we are booting via UEFI, the UEFI memory map is the only
* description of memory we have, so there is little point in
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 955e690b8325..82dcfa1c340b 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -90,7 +90,7 @@ void __init efi_fake_memmap(void)
/* swap into new EFI memmap */
early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
 
-   efi_memmap_install(new_memmap_phy, new_nr_map);
+   efi_memmap_install(new_memmap_phy, new_nr_map, alloc_type);
 
/* print new EFI memmap */
efi_print_memmap();
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 69b81d355619..d4e3e114cf86 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -88,7 +88,7 @@ void __init efi_memmap_free(phys_addr_t mem, unsigned int 
num_entries,
 /**
  * __efi_memmap_init - Common code for mapping the EFI memory map
  * @data: EFI memory map data
- * @late: Use early or late mapping function?
+ * @alloc_type: Use early or late mapping function?
  *
  * This functi

[PATCH 4/6] x86/efi: Free existing memory map before installing new memory map

2018-07-02 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

efi_memmap_install(), unmaps the existing memory map and installs a new
memory map but doesn't free the memory allocated to the existing
memory map. Fortunately, the details about the existing memory map (like
the physical address, number of entries and type of memory) are
stored in efi.memmap. Hence, use them to free the memory.

In __efi_enter_virtual_mode(), we don't use efi_memmap_install() to
install a new memory map, instead we use efi_memmap_init_late(). Hence,
free existing memory map there too before installing a new memory map.

Generally, memory for new memory map is allocated using
efi_memmap_alloc() but in __efi_enter_virtual_mode() it's done using
realloc_pages() [please see efi_map_regions()]. So, it's OK to free this
memory using efi_memmap_free() in efi_free_boot_services().

Also, note that the first time efi_free_memmap() is called either from
efi_fake_memmap() or efi_arch_mem_reserve() [depending on the boot
sequence], we are actually freeing memblock_reserved memory which isn't
allocated by efi_memmap_alloc(). So, there are two outliers where we use
efi_free_memmap() to free memory allocated through other sources
rather than efi_memmap_alloc().

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Lee Chun-Yi 
Cc: Dave Young 
Cc: Borislav Petkov 
Cc: Laszlo Ersek 
Cc: Jan Kiszka 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Nicolai Stange 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Taku Izumi 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
---
 arch/x86/platform/efi/efi.c | 3 +++
 arch/x86/platform/efi/quirks.c  | 6 ++
 drivers/firmware/efi/fake_mem.c | 3 +++
 3 files changed, 12 insertions(+)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index cda54abf25a6..7756426e93b5 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -952,6 +952,9 @@ static void __init __efi_enter_virtual_mode(void)
 * firmware via SetVirtualAddressMap().
 */
efi_memmap_unmap();
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
 
if (efi_memmap_init_late(pa, efi.memmap.desc_size * count)) {
pr_err("Failed to remap late EFI memory map\n");
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 11fa6ac9f0c2..11800f3cbb93 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -292,6 +292,9 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
efi_memmap_insert(, new, );
early_memunmap(new, new_size);
 
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
efi_memmap_install(new_phys, num_entries, alloc_type);
 }
 
@@ -452,6 +455,9 @@ void __init efi_free_boot_services(void)
 
memunmap(new);
 
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
if (efi_memmap_install(new_phys, num_entries, alloc_type)) {
pr_err("Could not install new EFI memmap\n");
return;
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 82dcfa1c340b..a47754efb796 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -90,6 +90,9 @@ void __init efi_fake_memmap(void)
/* swap into new EFI memmap */
early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
 
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
efi_memmap_install(new_memmap_phy, new_nr_map, alloc_type);
 
/* print new EFI memmap */
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] efi: Let the user of efi_memmap_alloc() know the type of allocation performed

2018-07-02 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

efi_memmap_alloc(), as the name suggests, allocates memory for a new efi
memory map and it does so depending on whether mm_init() has already
been invoked or not. As we have introduced efi_memmap_free() to free the
memory allocated by efi_memmap_alloc(), modify efi_memmap_alloc() to
include "efi_memmap_type", so that the caller of efi_memmap_alloc() will
know the type of allocation performed and later use the same to free the
memory should remap fail. Without "efi_memmap_type" there would be no
way for efi_memmap_free() to know the type of allocation performed by
efi_memmap_alloc().

Also, "efi_memmap_type" will make sure that efi_memmap_alloc() and
efi_memmap_free() are always binded properly i.e. a user could use
efi_memmap_alloc() before slab_is_available() and use efi_memmap_free()
on the same memory but after slab_is_available(). Without
"efi_memmap_type", efi_memmap_free() would be using wrong free variant.
With "efi_memmap_type", we make this relationship between
efi_memmap_alloc() and efi_memmap_free() explicit to the user.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Lee Chun-Yi 
Cc: Dave Young 
Cc: Borislav Petkov 
Cc: Laszlo Ersek 
Cc: Jan Kiszka 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Nicolai Stange 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Taku Izumi 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
---
 arch/x86/platform/efi/quirks.c  |  6 --
 drivers/firmware/efi/fake_mem.c |  3 ++-
 drivers/firmware/efi/memmap.c   | 12 ++--
 include/linux/efi.h |  3 ++-
 4 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 36c1f8b9f7e0..84e8d077adf6 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -248,6 +248,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
efi_memory_desc_t md;
int num_entries;
void *new;
+   enum efi_memmap_type alloc_type;
 
if (efi_mem_desc_lookup(addr, )) {
pr_err("Failed to lookup EFI memory descriptor for %pa\n", 
);
@@ -276,7 +277,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 
new_size = efi.memmap.desc_size * num_entries;
 
-   new_phys = efi_memmap_alloc(num_entries);
+   new_phys = efi_memmap_alloc(num_entries, _type);
if (!new_phys) {
pr_err("Could not allocate boot services memmap\n");
return;
@@ -375,6 +376,7 @@ void __init efi_free_boot_services(void)
efi_memory_desc_t *md;
int num_entries = 0;
void *new, *new_md;
+   enum efi_memmap_type alloc_type;
 
for_each_efi_memory_desc(md) {
unsigned long long start = md->phys_addr;
@@ -420,7 +422,7 @@ void __init efi_free_boot_services(void)
return;
 
new_size = efi.memmap.desc_size * num_entries;
-   new_phys = efi_memmap_alloc(num_entries);
+   new_phys = efi_memmap_alloc(num_entries, _type);
if (!new_phys) {
pr_err("Failed to allocate new EFI memmap\n");
return;
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 6c7d60c239b5..955e690b8325 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -57,6 +57,7 @@ void __init efi_fake_memmap(void)
phys_addr_t new_memmap_phy;
void *new_memmap;
int i;
+   enum efi_memmap_type alloc_type;
 
if (!nr_fake_mem)
return;
@@ -71,7 +72,7 @@ void __init efi_fake_memmap(void)
}
 
/* allocate memory for new EFI memmap */
-   new_memmap_phy = efi_memmap_alloc(new_nr_map);
+   new_memmap_phy = efi_memmap_alloc(new_nr_map, _type);
if (!new_memmap_phy)
return;
 
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 0686e063c644..69b81d355619 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -33,6 +33,7 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned 
long size)
 /**
  * efi_memmap_alloc - Allocate memory for the EFI memory map
  * @num_entries: Number of entries in the allocated map.
+ * @alloc_type: Type of allocation performed (memblock or normal)?
  *
  * Depending on whether mm_init() has already been invoked or not,
  * either memblock or "normal" page allocation is used.
@@ -40,13 +41,20 @@ static phys_addr_t __init __efi_memmap_alloc_late(unsigned 
long size)
  * Returns the physical address of the allocated memory map on
  * success, zero on failure.
  */
-phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
+phys_addr_t __init efi_memmap_alloc(unsigned int num_entries,
+   enum efi_memmap_type *alloc_type)
 {
   

[PATCH 5/6] x86/efi: Free allocated memory if remap fails

2018-07-02 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

efi_memmap_alloc(), as the name suggests, allocates memory for a new efi
memory map. It's referenced from couple of places, namely,
efi_arch_mem_reserve() and efi_free_boot_services(). These callers,
after allocating memory, remap it for further use. As usual, a routine
check is performed to confirm successful remap. If the remap fails,
ideally, the allocated memory should be freed but presently we just
return without freeing it up. Hence, fix this bug by freeing the memory
with efi_memmap_free().

Also, efi_fake_memmap() references efi_memmap_alloc() but it frees
memory correctly using memblock_free(), but replace it with
efi_memmap_free() to maintain consistency, as in, allocate memory with
efi_memmap_alloc() and free memory with efi_memmap_free().

It's a fact that memremap() and early_memremap() might never fail and
this code might never get a chance to run but to maintain good kernel
programming semantics, we might need this patch.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Lee Chun-Yi 
Cc: Dave Young 
Cc: Borislav Petkov 
Cc: Laszlo Ersek 
Cc: Jan Kiszka 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Nicolai Stange 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Taku Izumi 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
---
 arch/x86/platform/efi/quirks.c  | 10 --
 drivers/firmware/efi/fake_mem.c |  2 +-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 11800f3cbb93..8fce327387e5 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -286,6 +286,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
new = early_memremap(new_phys, new_size);
if (!new) {
pr_err("Failed to map new boot services memmap\n");
+   efi_memmap_free(new_phys, num_entries, alloc_type);
return;
}
 
@@ -434,7 +435,7 @@ void __init efi_free_boot_services(void)
new = memremap(new_phys, new_size, MEMREMAP_WB);
if (!new) {
pr_err("Failed to map new EFI memmap\n");
-   return;
+   goto free_mem;
}
 
/*
@@ -460,8 +461,13 @@ void __init efi_free_boot_services(void)
efi.memmap.alloc_type);
if (efi_memmap_install(new_phys, num_entries, alloc_type)) {
pr_err("Could not install new EFI memmap\n");
-   return;
+   goto free_mem;
}
+
+   return;
+
+free_mem:
+   efi_memmap_free(new_phys, num_entries, alloc_type);
 }
 
 /*
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index a47754efb796..09b0fabf07fd 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -80,7 +80,7 @@ void __init efi_fake_memmap(void)
new_memmap = early_memremap(new_memmap_phy,
efi.memmap.desc_size * new_nr_map);
if (!new_memmap) {
-   memblock_free(new_memmap_phy, efi.memmap.desc_size * 
new_nr_map);
+   efi_memmap_free(new_memmap_phy, new_nr_map, alloc_type);
return;
}
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] efi: Fix unaligned fake memmap entries corrupting efi memory map

2018-07-02 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

efi_fake_memmap() inserts user given fake memory map entries into the
original efi memory map using efi_memmmap_insert(). efi_memmmap_insert()
checks for EFI_PAGE_SIZE alignment and could fail if an unaligned efi
memory region is passed (Eg: efi_fake_memmap=1K@0x73ae:
0x8000). Since EFI_PAGE_SIZE is 4K the above request fails,
but efi_fake_memmap() doesn't check for failures in efi_memmap_insert()
and installs an empty efi memory map from efi_memmap_alloc(). Since efi
memory map is corrupted all the later efi calls fail too. Hence, fix
this bug by changing the return type of efi_memmap_insert() from void to
int.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Lee Chun-Yi 
Cc: Dave Young 
Cc: Borislav Petkov 
Cc: Laszlo Ersek 
Cc: Jan Kiszka 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Nicolai Stange 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Taku Izumi 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
---
 arch/x86/platform/efi/quirks.c  |  8 +++-
 drivers/firmware/efi/fake_mem.c | 11 +--
 drivers/firmware/efi/memmap.c   | 12 
 include/linux/efi.h |  4 ++--
 4 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 8fce327387e5..0e607ac24a3b 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -290,7 +290,13 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 
size)
return;
}
 
-   efi_memmap_insert(, new, );
+   if (efi_memmap_insert(, new, )) {
+   pr_err("Failed to reserve EFI memory region\n");
+   early_memunmap(new, new_size);
+   efi_memmap_free(new_phys, num_entries, alloc_type);
+   return;
+   }
+
early_memunmap(new, new_size);
 
/* Free existing memory map before installing new memory map */
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 09b0fabf07fd..ae373af6931b 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -84,8 +84,15 @@ void __init efi_fake_memmap(void)
return;
}
 
-   for (i = 0; i < nr_fake_mem; i++)
-   efi_memmap_insert(, new_memmap, _mems[i]);
+   for (i = 0; i < nr_fake_mem; i++) {
+   if (efi_memmap_insert(, new_memmap, _mems[i])) {
+   pr_err("efi_fake_mem: Failed to create fake memmap\n");
+   early_memunmap(new_memmap,
+  efi.memmap.desc_size * new_nr_map);
+   efi_memmap_free(new_memmap_phy, new_nr_map, alloc_type);
+   return;
+   }
+   }
 
/* swap into new EFI memmap */
early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index d4e3e114cf86..05a556e63ec2 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -290,9 +290,11 @@ int __init efi_memmap_split_count(efi_memory_desc_t *md, 
struct range *range)
  *
  * It is suggested that you call efi_memmap_split_count() first
  * to see how large @buf needs to be.
+ *
+ * Returns zero on success, a negative error code on failure.
  */
-void __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf,
- struct efi_mem_range *mem)
+int __init efi_memmap_insert(struct efi_memory_map *old_memmap, void *buf,
+struct efi_mem_range *mem)
 {
u64 m_start, m_end, m_attr;
efi_memory_desc_t *md;
@@ -311,8 +313,9 @@ void __init efi_memmap_insert(struct efi_memory_map 
*old_memmap, void *buf,
 */
if (!IS_ALIGNED(m_start, EFI_PAGE_SIZE) ||
!IS_ALIGNED(m_end + 1, EFI_PAGE_SIZE)) {
-   WARN_ON(1);
-   return;
+   WARN(1, "Address 0x%llx - 0x%llx is not EFI_PAGE_SIZE aligned",
+m_start, m_end);
+   return -EINVAL;
}
 
for (old = old_memmap->map, new = buf;
@@ -379,4 +382,5 @@ void __init efi_memmap_insert(struct efi_memory_map 
*old_memmap, void *buf,
md->attribute |= m_attr;
}
}
+   return 0;
 }
diff --git a/include/linux/efi.h b/include/linux/efi.h
index c9752c67d184..bca955205a3f 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1023,8 +1023,8 @@ extern int __init efi_memmap_install(phys_addr_t addr, 
unsigned int nr_map,
 enum efi_memmap_type alloc_type);
 extern int __init efi_memmap_split_count(efi_memory_desc_t *md,
 struct range *range);
-extern void __init efi_memmap_insert(struct efi_memory_map *old_memmap,
-  

[PATCH 0/6] Fix memory leaks in efi subsystem

2018-07-02 Thread Sai Praneeth Prakhya
x/efi.h | 23 ---
 6 files changed, 131 insertions(+), 42 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Lee Chun-Yi 
Cc: Dave Young 
Cc: Borislav Petkov 
Cc: Laszlo Ersek 
Cc: Jan Kiszka 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Nicolai Stange 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Taku Izumi 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] efi: Free existing memory map before installing new memory map

2018-06-25 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

efi_memmap_install(), unmaps the existing memory map and installs the
new memory map but doesn't free the memory allocated to the existing
memory map. Fortunately, the details about the existing memory map are
stored in efi.memmap. Hence, use them to free the memory.

Signed-off-by: Sai Praneeth Prakhya 
Reported-by: Ard Biesheuvel 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Dave Young 
Cc: Laszlo Ersek 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
---

Note: Patch based on efi tree 
@https://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git

 drivers/firmware/efi/memmap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 678e85704054..68b27b14fe94 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -229,6 +229,9 @@ int __init efi_memmap_install(phys_addr_t addr, unsigned 
int nr_map)
 
efi_memmap_unmap();
 
+   /* Free the memory allocated to the existing memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map, 
efi.memmap.late);
+
data.phys_map = addr;
data.size = efi.memmap.desc_size * nr_map;
data.desc_version = efi.memmap.desc_version;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] efi: Remove the declaration of efi_late_init() as the function is unused

2018-06-24 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Commit 7b0a911478c74 (efi/x86: Move the EFI BGRT init code to early
init code), removed the implementation and all the references to
efi_late_init() but the function is still declared at
include/linux/efi.h. Hence, remove the unnecessary declaration.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Dave Young 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
---
 include/linux/efi.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/efi.h b/include/linux/efi.h
index 56add823f190..ae47be636b98 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -988,14 +988,12 @@ extern void efi_memmap_walk (efi_freemem_callback_t 
callback, void *arg);
 extern void efi_gettimeofday (struct timespec64 *ts);
 extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, 
if possible */
 #ifdef CONFIG_X86
-extern void efi_late_init(void);
 extern void efi_free_boot_services(void);
 extern efi_status_t efi_query_variable_store(u32 attributes,
 unsigned long size,
 bool nonblocking);
 extern void efi_find_mirror(void);
 #else
-static inline void efi_late_init(void) {}
 static inline void efi_free_boot_services(void) {}
 
 static inline efi_status_t efi_query_variable_store(u32 attributes,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3] x86/efi: Free allocated memory if remap fails

2018-06-19 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

efi_memmap_alloc(), as the name suggests, allocates memory for a new efi
memory map. It's referenced from couple of places, namely,
efi_arch_mem_reserve() and efi_free_boot_services(). These callers,
after allocating memory, remap it for further use. As usual, a routine
check is performed to confirm successful remap. If the remap fails,
ideally, the allocated memory should be freed but presently we just
return without freeing it up. Hence, fix this bug by introducing
efi_memmap_free() which frees memory allocated by efi_memmap_alloc().

As efi_memmap_alloc() allocates memory depending on whether mm_init()
has already been invoked or not, introduce a new argument called "late"
that lets us know which type of allocation was performed by
efi_memmap_alloc(). Later, this is used by efi_memmap_free() to invoke
the appropriate method to free the allocated memory. The other main
purpose "late" argument serves is to make sure that efi_memmap_alloc()
and efi_memmap_free() are always binded properly i.e. there could be a
scenario in which efi_memmap_alloc() is used before slab_is_available()
and efi_memmap_free() could be used after slab_is_available(). Without
"late", this could break because allocation would have been done using
memblock_alloc() while freeing will be done using __free_pages().

Since these API's could easily be misused make it explicit, so that the
caller has to pass "late" argument to efi_memmap_alloc() and later use
the same for efi_memmap_free().

Also, efi_fake_memmap() references efi_memmap_alloc() but it frees
memory correctly using memblock_free(), but replace it with
efi_memmap_free() to maintain consistency, as in, allocate memory with
efi_memmap_alloc() and free memory with efi_memmap_free().

It's a fact that memremap() and early_memremap() might never fail and
this code might never get a chance to run but to maintain good kernel
programming semantics, we might need this patch.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
---

Changes from V2 to V3:
--
1. Add a new argument "late" to efi_memmap_alloc(), so that
efi_memmap_alloc() could communicate the type of allocation performed.
2. Re-introduce efi_memmap_free() (from V1) but with an extra argument
"late", to know the type of allocation performed by efi_memmap_alloc().

Changes from V1 to V2:
--
1. Fix the bug of freeing memory map that was just installed by correctly
calling free_pages().
2. Call memblock_free() and __free_pages() directly from the appropriate
places instead of efi_memmap_free().

Note: Patch based on Linus's mainline tree V4.18-rc1

 arch/x86/platform/efi/quirks.c  | 16 
 drivers/firmware/efi/fake_mem.c |  5 +++--
 drivers/firmware/efi/memmap.c   | 38 --
 include/linux/efi.h |  3 ++-
 4 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 36c1f8b9f7e0..ef5698a3af7a 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -248,6 +248,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
efi_memory_desc_t md;
int num_entries;
void *new;
+   bool late;
 
if (efi_mem_desc_lookup(addr, )) {
pr_err("Failed to lookup EFI memory descriptor for %pa\n", 
);
@@ -276,7 +277,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
 
new_size = efi.memmap.desc_size * num_entries;
 
-   new_phys = efi_memmap_alloc(num_entries);
+   new_phys = efi_memmap_alloc(num_entries, );
if (!new_phys) {
pr_err("Could not allocate boot services memmap\n");
return;
@@ -285,6 +286,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
new = early_memremap(new_phys, new_size);
if (!new) {
pr_err("Failed to map new boot services memmap\n");
+   efi_memmap_free(new_phys, num_entries, late);
return;
}
 
@@ -375,6 +377,7 @@ void __init efi_free_boot_services(void)
efi_memory_desc_t *md;
int num_entries = 0;
void *new, *new_md;
+   bool late;
 
for_each_efi_memory_desc(md) {
unsigned long long start = md->phys_addr;
@@ -420,7 +423,7 @@ void __init efi_free_boot_services(void)
return;
 
new_size = efi.memmap.desc_size * num_entries;
-   new_phys = efi_memmap_alloc(num_entries);
+   new_phys = efi_memmap_alloc(num_entries, );
if (!new_phys) {
pr_err("Failed to allocate new EFI memmap\n");
return;
@@ -429,7 +4

Re: [PATCH V2] x86/efi: Free allocated memory if remap fails

2018-06-19 Thread Sai Praneeth Prakhya


> > 
> Thank you Sai.
> 
> But this is not really what I meant.
> 

Ya.. sorry! about that. I had a hunch that you might be suggesting something
like below but I went ahead with this implementation as it looked very simple
(just 3 insertions and no deletions)

> How about we modify efi_memmap_alloc() like this
> 

It sounds like a good idea to me. Leaving aside the pros (which are obvious),
the only con I could see is few extra checks and some code but I don't think
it's an issue at all because this code is not in fast path and it dosen't
impact performance. So, I will post a V3 with suggested changes.

> @@ -39,10 +39,12 @@ static phys_addr_t __init
> __efi_memmap_alloc_late(unsigned long size)
>   * Returns the physical address of the allocated memory map on
>   * success, zero on failure.
>   */
> -phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
> +phys_addr_t __init efi_memmap_alloc(unsigned int num_entries, bool *late)
>  {
> unsigned long size = num_entries * efi.memmap.desc_size;
> 
> +   if (late)
> +   *late = slab_is_available();
> if (slab_is_available())
> return __efi_memmap_alloc_late(size);
> 
> and introduce efi_memmap_free() as before, but pass it the 'late'
> parameter you received from efi_memmap_alloc(). That way, it is the
> caller's job to take care of this.
> 

Sure! makes sense.

> Also, it seems to me that efi_arch_mem_reserve() leaks the old memory
> map every time you create a new one, no?

I think you are right. The issue I see is (please let me know if you think
otherwise):
1. efi_arch_mem_reserve() comes up with a new memory map and then tries to
install it via efi_memmap_install().
2. efi_memmap_install(), unmaps the existing memory map and installs the new
memory map but doesn't free the memory used by the existing memory map.
Hence, as you said, leaks the old memory map.

If this you what you meant, I think, the issue is not just limited to
efi_arch_mem_reserve() but to all the places that call efi_memmap_install().
I think, we could solve it as below

diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 678e85704054..50ae4ffbf058 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -228,6 +228,7 @@ int __init efi_memmap_install(phys_addr_t addr, unsigned
int nr_map)
struct efi_memory_map_data data;
 
efi_memmap_unmap();
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
efi.memmap.late);
 
data.phys_map = addr;
data.size = efi.memmap.desc_size * nr_map;

Please let me know your thoughts on it.

> That is a separate issue that
> you may want to look into, but it affects the design of this API as
> well.

Probably, I could have misunderstood you here.. but I think the
efi_memmap_free() API in V3 should work (without changes). Don't you think so?

Regards,
Sai
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] x86/efi: Free allocated memory if remap fails

2018-06-18 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

efi_memmap_alloc(), as the name suggests, allocates memory for a new efi
memory map. It's referenced from couple of places, namely,
efi_arch_mem_reserve() and efi_free_boot_services(). These callers,
after allocating memory, remap it for further use. As usual, a routine
check is performed to confirm successful remap. If the remap fails,
ideally, the allocated memory should be freed but presently we just
return without freeing it up. Hence, fix this bug by freeing up the
memory appropriately.

As efi_memmap_alloc() allocates memory depending on whether mm_init()
has already been invoked or not, similarly, while freeing use
memblock_free() to free memory allocated before invoking mm_init() and
__free_pages() to free memory allocated after invoking mm_init().

It's a fact that memremap() and early_memremap() might never fail and
this code might never get a chance to run but to maintain good kernel
programming semantics, we might need this patch.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
---

I found this bug when working on a different patch set which uses
efi_memmap_alloc() and then noticed that I never freed the allocated
memory. I found it weird, in the sense that, memory is allocated but is
not freed (upon returning from an error). So, wasn't sure if that should
be treated as a bug or should I just leave it as is because everything
works fine even without this patch. Since the effort for the patch is
very minimal, I just went ahead and posted one, so that I could know
your thoughts on it.

Changes from V1 to V2:
--
1. Fix the bug of freeing memory map that was just installed by correctly
calling free_pages().
2. Call memblock_free() and __free_pages() directly from the appropriate
places instead of efi_memmap_free().

Note: Patch based on Linus's mainline tree V4.18-rc1

 arch/x86/platform/efi/quirks.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 36c1f8b9f7e0..cfa93af97def 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -285,6 +285,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
new = early_memremap(new_phys, new_size);
if (!new) {
pr_err("Failed to map new boot services memmap\n");
+   memblock_free(new_phys, new_size);
return;
}
 
@@ -429,6 +430,7 @@ void __init efi_free_boot_services(void)
new = memremap(new_phys, new_size, MEMREMAP_WB);
if (!new) {
pr_err("Failed to map new EFI memmap\n");
+   __free_pages(pfn_to_page(PHYS_PFN(new_phys)), 
get_order(new_size));
return;
}
 
@@ -452,6 +454,7 @@ void __init efi_free_boot_services(void)
 
if (efi_memmap_install(new_phys, num_entries)) {
pr_err("Could not install new EFI memmap\n");
+   __free_pages(pfn_to_page(PHYS_PFN(new_phys)), 
get_order(new_size));
return;
}
 }
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86/efi: Free allocated memory if remap fails

2018-06-18 Thread Sai Praneeth Prakhya



> > > > +void __init efi_memmap_free(phys_addr_t mem, unsigned int
> > > > num_entries)
> > > > +{
> > > > +   unsigned long size = num_entries * efi.memmap.desc_size;
> > > > +   unsigned int order = get_order(size);
> > > > +   phys_addr_t end = mem + size - 1;
> > > > +
> > > > +   if (slab_is_available()) {
> > > > +   __free_pages(pfn_to_page(PHYS_PFN(mem)), order);
> > > How do you know that the memory you are freeing was allocated when
> > > slab_is_available() was already true?
> > > 
> > efi_memmap_free() should be used *only* in conjunction
> > with efi_memmap_alloc()(As I explicitly didn't mention this, maybe it
> > might
> > have confused you).
> > 
> > When allocating memory efi_memmap_alloc() does similar check
> > for slab_is_available() and if so, it allocates memory using
> > alloc_pages().
> > So, to free pages allocated using alloc_pages(), efi_memmap_free()
> > uses __free_pages().
> > 
> I understand that. But by abstracting away the free() routine as well
> as the alloc() routine, you are hiding this fact.
> 
> What is preventing me from using efi_memmap_alloc() to allocate space
> for the memmap, and using efi_memmap_free() in another place? How are
> you preventing that this does not happen in a way where mm_init() may
> be called in the mean time?
> 
> Whether __free_pages() should be used or memblock_free() is a property
> of the *allocation* itself, not of whether mm_init() has already been
> called. So if (!slab_is_available()), you can use memblock_free().
> However, if (slab_is_available()), you cannot use __free_pages()
> because the allocation could have been made before mm_init() was
> called.
> 

Aahh.. Thanks a lot! for making it clear. I see the bug now
(efi_memmap_alloc() could be called before mm_init() in which case it uses
memblock_alloc() where as efi_memmap_free() could be called after mm_init() in
which case it uses __free_pages()).

I will fix this.

Regards,
Sai

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86/efi: Free allocated memory if remap fails

2018-06-18 Thread Sai Praneeth Prakhya


> > It's a fact that memremap() and early_memremap() might never fail and
> > this code might never get a chance to run but to maintain good kernel
> > programming semantics, we might need this patch.
> > 
> > Signed-off-by: Sai Praneeth Prakhya 
> > Reviewed-by: Ricardo Neri 
> Please don't include tags for reviews that did not happen on-list.
> 

Sure! Thanks for letting me know.

> > @@ -450,10 +451,11 @@ void __init efi_free_boot_services(void)
> > 
> > memunmap(new);
> > 
> > -   if (efi_memmap_install(new_phys, num_entries)) {
> > +   if (efi_memmap_install(new_phys, num_entries))
> > pr_err("Could not install new EFI memmap\n");
> > -   return;
> > -   }
> > +
> > +free_mem:
> > +   efi_memmap_free(new_phys, num_entries);
> Doesn't this free the memory map that you just installed?
> 

That's true! It's a bug. I will fix it.

> > 
> >  }
> > 
> >  /**
> > + * efi_memmap_free - Free memory allocated by efi_memmap_alloc()
> > + * @mem: Physical address allocated by efi_memmap_alloc()
> > + * @num_entries: Number of entries in the allocated map.
> > + *
> > + * efi_memmap_alloc() allocates memory depending on whether mm_init()
> > + * has already been invoked or not. It uses either memblock or "normal"
> > + * page allocation. Use this function to free the memory allocated by
> > + * efi_memmap_alloc(). Since the allocation is done in two different
> > + * ways, similarly, we free it in two different ways.
> > + *
> > + */
> > +void __init efi_memmap_free(phys_addr_t mem, unsigned int num_entries)
> > +{
> > +   unsigned long size = num_entries * efi.memmap.desc_size;
> > +   unsigned int order = get_order(size);
> > +   phys_addr_t end = mem + size - 1;
> > +
> > +   if (slab_is_available()) {
> > +   __free_pages(pfn_to_page(PHYS_PFN(mem)), order);
> How do you know that the memory you are freeing was allocated when
> slab_is_available() was already true?
> 

efi_memmap_free() should be used *only* in conjunction
with efi_memmap_alloc()(As I explicitly didn't mention this, maybe it might
have confused you).

When allocating memory efi_memmap_alloc() does similar check
for slab_is_available() and if so, it allocates memory using alloc_pages().
So, to free pages allocated using alloc_pages(), efi_memmap_free()
uses __free_pages().

> > 
> > +   return;
> > +   }
> > +
> > +   if (memblock_free(mem, size))
> > +   pr_err("Failed to free mem from %pa to %pa\n", ,
> > );
> > +}
> > +

Regards,
Sai
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] x86/efi: Free allocated memory if remap fails

2018-06-15 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

efi_memmap_alloc(), as the name suggests, allocates memory for a new efi
memory map. It's referenced from couple of places, namely,
efi_arch_mem_reserve() and efi_free_boot_services(). These callers,
after allocating memory, remap it for further use. As usual, a routine
check is performed to confirm successful remap. If the remap fails,
ideally, the allocated memory should be freed but presently we just
return without freeing it up. Hence, fix this bug by introducing
efi_memmap_free() which frees memory allocated by efi_memmap_alloc().

As efi_memmap_alloc() allocates memory depending on whether mm_init()
has already been invoked or not, similarly efi_memmap_free() frees
memory accordingly.

efi_fake_memmap() also references efi_memmap_alloc() but it frees
memory correctly using memblock_free(), but replace it with
efi_memmap_free() to maintain consistency, as in, allocate memory with
efi_memmap_alloc() and free memory with efi_memmap_free().

It's a fact that memremap() and early_memremap() might never fail and
this code might never get a chance to run but to maintain good kernel
programming semantics, we might need this patch.

Signed-off-by: Sai Praneeth Prakhya 
Reviewed-by: Ricardo Neri 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
---

I found this bug when working on a different patch set which uses
efi_memmap_alloc() and then noticed that I never freed the allocated
memory. I found it weird, in the sense that, memory is allocated but is
not freed (upon returning from an error). So, wasn't sure if that should
be treated as a bug or should I just leave it as is because everything
works fine even without this patch. Since the effort for the patch is
very minimal, I just went ahead and posted one, so that I could know
your thoughts on it.

Note: Patch based on Linus's mainline tree V4.17

 arch/x86/platform/efi/quirks.c  | 10 ++
 drivers/firmware/efi/fake_mem.c |  2 +-
 drivers/firmware/efi/memmap.c   | 27 +++
 include/linux/efi.h |  1 +
 4 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 36c1f8b9f7e0..f223093f2df7 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -285,6 +285,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
new = early_memremap(new_phys, new_size);
if (!new) {
pr_err("Failed to map new boot services memmap\n");
+   efi_memmap_free(new_phys, num_entries);
return;
}
 
@@ -429,7 +430,7 @@ void __init efi_free_boot_services(void)
new = memremap(new_phys, new_size, MEMREMAP_WB);
if (!new) {
pr_err("Failed to map new EFI memmap\n");
-   return;
+   goto free_mem;
}
 
/*
@@ -450,10 +451,11 @@ void __init efi_free_boot_services(void)
 
memunmap(new);
 
-   if (efi_memmap_install(new_phys, num_entries)) {
+   if (efi_memmap_install(new_phys, num_entries))
pr_err("Could not install new EFI memmap\n");
-   return;
-   }
+
+free_mem:
+   efi_memmap_free(new_phys, num_entries);
 }
 
 /*
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 6c7d60c239b5..63edcedee25b 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -79,7 +79,7 @@ void __init efi_fake_memmap(void)
new_memmap = early_memremap(new_memmap_phy,
efi.memmap.desc_size * new_nr_map);
if (!new_memmap) {
-   memblock_free(new_memmap_phy, efi.memmap.desc_size * 
new_nr_map);
+   efi_memmap_free(new_memmap_phy, new_nr_map);
return;
}
 
diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
index 5fc70520e04c..27d28cb4652d 100644
--- a/drivers/firmware/efi/memmap.c
+++ b/drivers/firmware/efi/memmap.c
@@ -50,6 +50,33 @@ phys_addr_t __init efi_memmap_alloc(unsigned int num_entries)
 }
 
 /**
+ * efi_memmap_free - Free memory allocated by efi_memmap_alloc()
+ * @mem: Physical address allocated by efi_memmap_alloc()
+ * @num_entries: Number of entries in the allocated map.
+ *
+ * efi_memmap_alloc() allocates memory depending on whether mm_init()
+ * has already been invoked or not. It uses either memblock or "normal"
+ * page allocation. Use this function to free the memory allocated by
+ * efi_memmap_alloc(). Since the allocation is done in two different
+ * ways, similarly, we free it in two different ways.
+ *
+ */
+void __init efi_memmap_free(phys_addr_t mem, unsigned int num_entries)
+{
+   unsigned long size = num_entries * efi.memmap.desc_size;
+   unsigned int order = get_order(size);

[PATCH V5 1/3] x86/efi: Make efi_delete_dummy_variable() use set_variable_nonblocking() instead of set_variable()

2018-05-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, efi_delete_dummy_variable() uses set_variable() which might
block and hence kernel prints stack trace with a warning "bad:
scheduling from the idle thread!". So, make efi_delete_dummy_variable()
use set_variable_nonblocking(), which, as the name suggests doesn't
block.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 
---
 arch/x86/platform/efi/quirks.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 36c1f8b9f7e0..6af39dc40325 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -105,12 +105,11 @@ early_param("efi_no_storage_paranoia", 
setup_storage_paranoia);
 */
 void efi_delete_dummy_variable(void)
 {
-   efi.set_variable((efi_char16_t *)efi_dummy_name,
-_DUMMY_GUID,
-EFI_VARIABLE_NON_VOLATILE |
-EFI_VARIABLE_BOOTSERVICE_ACCESS |
-EFI_VARIABLE_RUNTIME_ACCESS,
-0, NULL);
+   efi.set_variable_nonblocking((efi_char16_t *)efi_dummy_name,
+_DUMMY_GUID,
+EFI_VARIABLE_NON_VOLATILE |
+EFI_VARIABLE_BOOTSERVICE_ACCESS |
+EFI_VARIABLE_RUNTIME_ACCESS, 0, NULL);
 }
 
 /*
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V5 2/3] efi: Create efi_rts_wq and efi_queue_work() to invoke all efi_runtime_services()

2018-05-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce some infrastructure:
1. Creating workqueue named efi_rts_wq
2. A macro (efi_queue_work()) that
a. Populates efi_runtime_work
b. Queues work onto efi_rts_wq and
c. Waits until worker thread completes

The caller thread has to wait until the worker thread completes, because
it depends on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type)
or a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 
---
 drivers/firmware/efi/efi.c  | 14 ++
 drivers/firmware/efi/runtime-wrappers.c | 83 +
 include/linux/efi.h |  3 ++
 3 files changed, 100 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 232f4915223b..1379a375dfa8 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -84,6 +84,8 @@ struct mm_struct efi_mm = {
.mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
 };
 
+struct workqueue_struct *efi_rts_wq;
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -337,6 +339,18 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /*
+* Since we process only one efi_runtime_service() at a time, an
+* ordered workqueue (which creates only one execution context)
+* should suffice all our needs.
+*/
+   efi_rts_wq = alloc_ordered_workqueue("efi_rts_wq", 0);
+   if (!efi_rts_wq) {
+   pr_err("Creating efi_rts_wq failed, EFI runtime services 
disabled.\n");
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   return 0;
+   }
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..cf3bae42a752 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,15 @@
 /*
  * runtime-wrappers.c - Runtime Services function call wrappers
  *
+ * Implementation summary:
+ * ---
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_wq.
+ * 2. Caller thread waits for completion until the work is finished
+ * because it's dependent on the return status and execution of
+ * efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
  * Copyright (C) 2014 Linaro Ltd. 
  *
  * Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +31,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+
 #include 
 
 /*
@@ -33,6 +45,77 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   QUERY_VARIABLE_INFO,
+   GET_NEXT_HIGH_MONO_COUNT,
+   RESET_SYSTEM,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work:   Details of EFI Runtime Service work
+ * @arg<1-5>:  EFI Runtime Service function arguments
+ * @status:Status of executing EFI Runtime Service
+ * @efi_rts_id:EFI Runtime Service function identifier
+ * @efi_rts_comp:  Struct used for handling completions
+ */
+struct efi_runtime_work {
+   void *arg1;
+   void *arg2;
+   void *arg3;
+   void *arg4;
+   void *arg5;
+   efi_status_t status;
+   struct work_struct work;
+   enum

[PATCH V5 3/3] efi: Use efi_rts_wq to invoke EFI Runtime Services

2018-05-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, when a user process requests the kernel to execute any
efi_runtime_service(), kernel switches the page directory (%cr3) from
swapper_pgd to efi_pgd. Other subsystems in the kernel aren't aware of
this switch and they might think, user space is still valid (i.e. the
user space mappings are still pointing to the process that requested to
run efi_runtime_service()) but in reality it is not so.

A solution for this issue is to use kthread to run
efi_runtime_service(). When a user process requests the kernel to
execute any efi_runtime_service(), kernel queues the work to efi_rts_wq,
a kthread comes along, switches to efi_pgd and executes
efi_runtime_service() in kthread context. Anything that tries to touch
user space addresses while in kthread is terminally broken.

Implementation summary:
---
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_wq.
2. Caller thread waits for completion until the work is finished because
it's dependent on the return status of efi_runtime_service().

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is passed.

Introduce a handler function (called efi_call_rts()) that
1. Understands efi_runtime_work and
2. Invokes the appropriate efi_runtime_service() with the appropriate
arguments

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

The non-blocking variants of set_variable() and query_variable_info()
should be used while in atomic context. Use of blocking variants like
set_variable() and query_variable_info() while in atomic will issue a
warning ("scheduling wile in atomic") and prints stack trace. Presently,
pstore uses non-blocking variants and hence works fine.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 
---
 drivers/firmware/efi/runtime-wrappers.c | 135 
 1 file changed, 119 insertions(+), 16 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index cf3bae42a752..127d4de00403 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -173,13 +173,104 @@ void efi_call_virt_check_flags(unsigned long flags, 
const char *call)
  */
 static DEFINE_SEMAPHORE(efi_runtime_lock);
 
+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+   struct efi_runtime_work *efi_rts_work;
+   void *arg1, *arg2, *arg3, *arg4, *arg5;
+   efi_status_t status = EFI_NOT_FOUND;
+
+   efi_rts_work = container_of(work, struct efi_runtime_work, work);
+   arg1 = efi_rts_work->arg1;
+   arg2 = efi_rts_work->arg2;
+   arg3 = efi_rts_work->arg3;
+   arg4 = efi_rts_work->arg4;
+   arg5 = efi_rts_work->arg5;
+
+   switch (efi_rts_work->efi_rts_id) {
+   case GET_TIME:
+   status = efi_call_virt(get_time, (efi_time_t *)arg1,
+  (efi_time_cap_t *)arg2);
+   break;
+   case SET_TIME:
+   status = efi_call_virt(set_time, (efi_time_t *)arg1);
+   break;
+   case GET_WAKEUP_TIME:
+   status = efi_call_virt(get_wakeup_time, (efi_bool_t *)arg1,
+  (efi_bool_t *)arg2, (efi_time_t *)arg3);
+   break;
+   case SET_WAKEUP_TIME:
+   status = efi_call_virt(set_wakeup_time, *(efi_bool_t *)arg1,
+  (efi_time_t *)arg2);
+   break;
+   case GET_VARIABLE:
+   status = efi_call_virt(get_variable, (efi_char16_t *)arg1,
+  (efi_guid_t *)arg2, (u32 *)arg3,
+  (unsigned long *)arg4, (void *)arg5);
+   break;
+   case GET_NEXT_VARIABLE:
+   status = efi_call_virt(get_next_variable, (unsigned long *)arg1,
+  (efi_char16_t *)arg2,
+  (efi_guid_t *)

[PATCH V5 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-05-28 Thread Sai Praneeth Prakhya
Patches are based on Linus's kernel v4.17-rc7

[1] Backup: Detailing efi_pgd:
--
efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
Code/Data) regions. Due to the nature of these mappings, they fall
in user space address ranges and they are not the same as swapper.

[On arm64, the EFI mappings are in the VA range usually used for user
space. The two halves of the address space are managed by separate
tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
user space or EFI runtime mappings in TTBR0.] - Mark Rutland

Changes from V4 to V5:
--
1. As suggested by Ard, don't use efi_rts_wq for non-blocking variants.
  Non-blocking variants are supposed to not block and using workqueue
  exactly does the opposite, hence refrain from using it.
2. Use non-blocking variants in efi_delete_dummy_variable(). Use of
  blocking variants means that we have to call efi_delete_dummy_variable()
  after efi_rts_wq has been created.
3. Remove in_atomic() check in set_variable<>() and query_variable_info<>().
  Any caller wishing to use set_variable() and query_variable_info() in
  atomic context should use their non-blocking variants.

Changes from V3 to V4:
--
1. As suggested by Peter, use completions instead of flush_work() as the
  former is cheaper
2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
  wasn't able to find a better alternative to keep this change local to
  arch/x86.

Changes from V2 to V3:
--
1. Rewrite the cover letter to clearly state the problem. What we are
  fixing and what we are not fixing.
2. Make efi_delete_dummy_variable() change local to x86.
3. Avoid using BUG(), instead, print error message and exit gracefully.
4. Move struct efi_runtime_work to runtime-wrappers.c file.
5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.

Changes from V1 to V2:
--
1. Remove unnecessary include of asm/efi.h file - Fixes build error on
  ia64, reported by 0-day
2. Use enum to identify efi_runtime_services()
3. Use alloc_ordered_workqueue() to create efi_rts_wq as
  create_workqueue() is scheduled for depreciation.
4. Make efi_call_rts() static, as it has no callers outside
  runtime-wrappers.c
5. Use BUG(), when we are unable to queue work or unable to identify
  requested efi_runtime_service() - Because these two situations should
  *never* happen.

Sai Praneeth (3):
  x86/efi: Make efi_delete_dummy_variable() use
set_variable_nonblocking() instead of set_variable()
  efi: Create efi_rts_wq and efi_queue_work() to invoke all
efi_runtime_services()
  efi: Use efi_rts_wq to invoke EFI Runtime Services

 arch/x86/platform/efi/quirks.c  |  11 +-
 drivers/firmware/efi/efi.c  |  14 ++
 drivers/firmware/efi/runtime-wrappers.c | 218 +---
 include/linux/efi.h |   3 +
 4 files changed, 224 insertions(+), 22 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-05-25 Thread Sai Praneeth Prakhya
comments and concerns.

Note:
-
Patches are based on Linus's kernel v4.17-rc6

[1] Backup: Detailing efi_pgd:
--
efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
Code/Data) regions. Due to the nature of these mappings, they fall
in user space address ranges and they are not the same as swapper.

[On arm64, the EFI mappings are in the VA range usually used for user
space. The two halves of the address space are managed by separate
tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
user space or EFI runtime mappings in TTBR0.] - Mark Rutland

Changes from V3 to V4:
--
1. As suggested by Peter, use completions instead of flush_work() as the
  former is cheaper
2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
  wasn't able to find a better alternative to keep this change local to
  arch/x86.

Changes from V2 to V3:
--
1. Rewrite the cover letter to clearly state the problem. What we are
  fixing and what we are not fixing.
2. Make efi_delete_dummy_variable() change local to x86.
3. Avoid using BUG(), instead, print error message and exit gracefully.
4. Move struct efi_runtime_work to runtime-wrappers.c file.
5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.

Changes from V1 to V2:
--
1. Remove unnecessary include of asm/efi.h file - Fixes build error on
  ia64, reported by 0-day
2. Use enum to identify efi_runtime_services()
3. Use alloc_ordered_workqueue() to create efi_rts_wq as
  create_workqueue() is scheduled for depreciation.
4. Make efi_call_rts() static, as it has no callers outside
  runtime-wrappers.c
5. Use BUG(), when we are unable to queue work or unable to identify
  requested efi_runtime_service() - Because these two situations should
  *never* happen.

Sai Praneeth (3):
  x86/efi: Call efi_delete_dummy_variable() during efi subsystem
initialization
  efi: Create efi_rts_wq and efi_queue_work() to invoke all
efi_runtime_services()
  efi: Use efi_rts_wq to invoke EFI Runtime Services

 arch/x86/include/asm/efi.h  |   1 -
 arch/x86/platform/efi/efi.c |   6 -
 drivers/firmware/efi/efi.c  |  20 +++
 drivers/firmware/efi/runtime-wrappers.c | 256 +---
 include/linux/efi.h |   6 +
 5 files changed, 262 insertions(+), 27 deletions(-)

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Naresh Bhat <naresh.b...@linaro.org>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com>

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4 3/3] efi: Use efi_rts_wq to invoke EFI Runtime Services

2018-05-25 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, when a user process requests the kernel to execute any
efi_runtime_service(), kernel switches the page directory (%cr3) from
swapper_pgd to efi_pgd. Other subsystems in the kernel aren't aware of
this switch and they might think, user space is still valid (i.e. the
user space mappings are still pointing to the process that requested to
run efi_runtime_service()) but in reality it is not so.

A solution for this issue is to use kthread to run efi_runtime_service().
When a user process requests the kernel to execute any
efi_runtime_service(), kernel queues the work to efi_rts_wq, a kthread
comes along, switches to efi_pgd and executes efi_runtime_service() in
kthread context. Anything that tries to touch user space addresses while
in kthread is terminally broken.

Implementation summary:
---
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_wq.
2. Caller thread waits for completion until the work is finished because
it's dependent on the return status of efi_runtime_service().

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is passed.

Introduce a handler function (called efi_call_rts()) that
1. Understands efi_runtime_work and
2. Invokes the appropriate efi_runtime_service() with the appropriate
arguments

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

pstore writes could potentially be invoked in atomic context and it uses
set_variable<>() and query_variable_info<>() to store logs. If we invoke
efi_runtime_services() through efi_rts_wq while in atomic(), kernel
issues a warning ("scheduling wile in atomic") and prints stack trace.
One way to overcome this is to not make the caller process wait for the
worker thread to finish. This approach breaks pstore i.e. the log
messages aren't written to efi variables. Hence, pstore calls
efi_runtime_services() without using efi_rts_wq or in other words
efi_rts_wq will be used unconditionally for all the
efi_runtime_services() except set_variable<>() and
query_variable_info<>().

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Naresh Bhat <naresh.b...@linaro.org>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com>
---
 drivers/firmware/efi/runtime-wrappers.c | 171 
 1 file changed, 151 insertions(+), 20 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index 534bd348feca..26bb6645ff59 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -175,13 +175,108 @@ void efi_call_virt_check_flags(unsigned long flags, 
const char *call)
  */
 static DEFINE_SEMAPHORE(efi_runtime_lock);
 
+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+   struct efi_runtime_work *efi_rts_work;
+   void *arg1, *arg2, *arg3, *arg4, *arg5;
+   efi_status_t status = EFI_NOT_FOUND;
+
+   efi_rts_work = container_of(work, struct efi_runtime_work, work);
+   arg1 = efi_rts_work->arg1;
+   arg2 = efi_rts_work->arg2;
+   arg3 = efi_rts_work->arg3;
+   arg4 = efi_rts_work->arg4;
+   arg5 = efi_rts_work->arg5;
+
+   switch (efi_rts_work->efi_rts_id) {
+   case GET_TIME:
+   status = efi_call_virt(get_time, (efi_time_t *)arg1,
+  (efi_time_cap_t *)arg2);
+   break;
+   case SET_TIME:
+   status = efi_call_virt(set_time, (efi_time_t *)arg1);
+   break;
+  

[PATCH V3 3/3] efi: Use efi_rts_wq to invoke EFI Runtime Services

2018-05-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, when a user process requests the kernel to execute any
efi_runtime_service(), kernel switches the page directory (%cr3) from
swapper_pgd to efi_pgd. Other subsystems in the kernel aren't aware of
this switch and they might think, user space is still valid (i.e. the
user space mappings are still pointing to the process that requested to
run efi_runtime_service()) but in reality it is not so.

A solution for this issue is to use kthread to run efi_runtime_service()
When a user process requests the kernel to execute any
efi_runtime_service(), kernel queues the work to efi_rts_wq, a kthread
comes along, switches to efi_pgd and executes efi_runtime_service() in
kthread context. Anything that tries to touch user space addresses while
in kthread is terminally broken.

Implementation summary:
---
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_wq.
2. Caller thread waits until the work is finished because it's dependent
on the return status of efi_runtime_service().

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is passed.

Introduce a handler function (called efi_call_rts()) that
1. Understands efi_runtime_work and
2. Invokes the appropriate efi_runtime_service() with the appropriate
arguments

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

pstore writes could potentially be invoked in atomic context and it uses
set_variable<>() and query_variable_info<>() to store logs. If we invoke
efi_runtime_services() through efi_rts_wq while in atomic(), kernel
issues a warning ("scheduling wile in atomic") and prints stack trace.
One way to overcome this is to not make the caller process wait for the
worker thread to finish. This approach breaks pstore i.e. the log
messages aren't written to efi variables. Hence, pstore calls
efi_runtime_services() without using efi_rts_wq or in other words
efi_rts_wq will be used unconditionally for all the
efi_runtime_services() except set_variable<>() and
query_variable_info<>().

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Naresh Bhat <naresh.b...@linaro.org>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com>
---
 drivers/firmware/efi/runtime-wrappers.c | 170 
 1 file changed, 150 insertions(+), 20 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index a9866045ed52..23ff128fcb2f 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -170,13 +170,107 @@ void efi_call_virt_check_flags(unsigned long flags, 
const char *call)
  */
 static DEFINE_SEMAPHORE(efi_runtime_lock);
 
+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+   struct efi_runtime_work *efi_rts_work;
+   void *arg1, *arg2, *arg3, *arg4, *arg5;
+   efi_status_t status = EFI_NOT_FOUND;
+
+   efi_rts_work = container_of(work, struct efi_runtime_work, work);
+   arg1 = efi_rts_work->arg1;
+   arg2 = efi_rts_work->arg2;
+   arg3 = efi_rts_work->arg3;
+   arg4 = efi_rts_work->arg4;
+   arg5 = efi_rts_work->arg5;
+
+   switch (efi_rts_work->efi_rts_id) {
+   case GET_TIME:
+   status = efi_call_virt(get_time, (efi_time_t *)arg1,
+  (efi_time_cap_t *)arg2);
+   break;
+   case SET_TIME:
+   status = efi_call_virt(set_time, (efi_time_t *)arg1);
+   break;
+   case GET

[PATCH V3 1/3] x86/efi: Call efi_delete_dummy_variable() after creating efi_rts_wq

2018-05-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Create a workqueue named efi_rts_wq (efi runtime services workqueue), so
that all efi_runtime_services() are executed in kthread context.

Invoking efi_runtime_services() through efi_rts_wq means all accesses to
efi_runtime_services() should be done after efi_rts_wq has been created.
efi_delete_dummy_variable() calls set_variable(), hence
efi_delete_dummy_variable() should be called after efi_rts_wq has been
created.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Naresh Bhat <naresh.b...@linaro.org>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com>
---
 arch/x86/platform/efi/efi.c| 15 +--
 drivers/firmware/efi/arm-runtime.c |  3 +++
 drivers/firmware/efi/efi.c | 25 +
 include/linux/efi.h|  4 
 4 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..adcc55cd25ce 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void)
 
if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
runtime_code_page_mkexec();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 #endif
 }
 
@@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void)
 * necessary relocation fixups for the new virtual addresses.
 */
efi_runtime_update_mappings();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 }
 
 void __init efi_enter_virtual_mode(void)
@@ -1031,6 +1025,15 @@ void __init efi_enter_virtual_mode(void)
__efi_enter_virtual_mode();
 
efi_dump_pagetable();
+
+   if (!efi_create_rts_wq())
+   return;
+
+   /*
+* Clean DUMMY object calls EFI Runtime Service, set_variable(), so
+* it should be invoked only after efi_rts_wq is ready.
+*/
+   efi_delete_dummy_variable();
 }
 
 static int __init arch_parse_efi_cmdline(char *str)
diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 5889cbea60b8..6fb06130b53f 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -139,6 +139,9 @@ static int __init arm_enable_runtime_services(void)
return -ENOMEM;
}
 
+   if (!efi_create_rts_wq())
+   return 0;
+
/* Set up runtime services function pointers */
efi_native_runtime_setup();
set_bit(EFI_RUNTIME_SERVICES, );
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 232f4915223b..b9103caa03b4 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -84,6 +84,8 @@ struct mm_struct efi_mm = {
.mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
 };
 
+struct workqueue_struct *efi_rts_wq;
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -337,6 +339,13 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /*
+* If we failed to create efi_rts_wq, EFI_RUNTIME_SERVICES would
+* have been be cleared, check for that condition.
+*/
+   if (!efi_enabled(EFI_RUNTIME_SERVICES))
+   return 0;
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
@@ -971,3 +980,19 @@ static int register_update_efi_random_seed(void)
 }
 late_initcall(register_update_efi_random_seed);
 #endif
+
+bool __init efi_create_rts_wq(void)
+{
+   /*
+* Since we process only one efi_runtime_service() at a time, an
+* ordered workqueue (which creates only one execution context)
+* should suffice all our needs.
+*/
+   efi_rts_wq = alloc_ordered_workqueue("efi_rts_wq", 0);
+   if (!efi_rts_wq) {
+   pr_err("Creating efi_rts_wq failed, EFI runtime services 
disabled.\n");
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   return false;
+   }
+   return true;
+}
diff --git a/include/li

[PATCH V3 2/3] efi: Introduce efi_queue_work() to queue any efi_runtime_service() on efi_rts_wq

2018-05-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce efi_queue_work() that 1. Populates efi_runtime_work 2. Queues
work onto efi_rts_wq and 3. Waits until worker thread returns.

The caller thread has to wait until the worker thread returns, because
it depends on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type) or
a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Naresh Bhat <naresh.b...@linaro.org>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com>
---
 drivers/firmware/efi/runtime-wrappers.c | 80 +
 1 file changed, 80 insertions(+)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..a9866045ed52 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,14 @@
 /*
  * runtime-wrappers.c - Runtime Services function call wrappers
  *
+ * Implementation summary:
+ * ---
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_wq.
+ * 2. Caller thread waits until the work is finished because it's
+ * dependent on the return status and execution of efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
  * Copyright (C) 2014 Linaro Ltd. <ard.biesheu...@linaro.org>
  *
  * Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +30,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 
 /*
@@ -33,6 +43,76 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   SET_VARIABLE_NONBLOCKING,
+   QUERY_VARIABLE_INFO,
+   QUERY_VARIABLE_INFO_NONBLOCKING,
+   GET_NEXT_HIGH_MONO_COUNT,
+   RESET_SYSTEM,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work:   Details of EFI Runtime Service work
+ * @func:  EFI Runtime Service function identifier
+ * @arg<1-5>:  EFI Runtime Service function arguments
+ * @status:Status of executing EFI Runtime Service
+ */
+struct efi_runtime_work {
+   void *arg1;
+   void *arg2;
+   void *arg3;
+   void *arg4;
+   void *arg5;
+   efi_status_t status;
+   struct work_struct work;
+   enum efi_rts_ids efi_rts_id;
+};
+
+/*
+ * efi_queue_work: Queue efi_runtime_service() and wait until it's done
+ * @rts:   efi_runtime_service() function identifier
+ * @rts_arg<1-5>:  efi_runtime_service() function arguments
+ *
+ * Accesses to efi_runtime_services() are serialized by a binary
+ * semaphore (efi_runtime_lock) and caller waits until the work is
+ * finished, hence _only_ one work is queued at a time and the queued
+ * work gets flushed.
+ */
+#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)\
+({ \
+   struct efi_runtime_work efi_rts_work;   \
+   efi_rts_work.status = EFI_ABORTED;  \
+   

[PATCH V3 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-05-21 Thread Sai Praneeth Prakhya
comments and concerns.

Note:
-
Patches are based on Linus's kernel v4.17-rc6

[1] Backup: Detailing efi_pgd:
--
efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
Code/Data) regions. Due to the nature of these mappings, they fall
in user space address ranges and they are not the same as swapper.

[On arm64, the EFI mappings are in the VA range usually used for user
space. The two halves of the address space are managed by separate
tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
user space or EFI runtime mappings in TTBR0.] - Mark Rutland

Changes from V2 to V3:
--
1. Rewrite the cover letter to clearly state the problem. What we are
fixing and what we are not fixing.
2. Make efi_delete_dummy_variable() change local to x86.
3. Avoid using BUG(), instead, print error message and exit gracefully.
4. Move struct efi_runtime_work to runtime-wrappers.c file.
5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.

Changes from V1 to V2:
--
1. Remove unnecessary include of asm/efi.h file - Fixes build error on
ia64, reported by 0-day
2. Use enum to identify efi_runtime_services()
3. Use alloc_ordered_workqueue() to create efi_rts_wq as
create_workqueue() is scheduled for depreciation.
4. Make efi_call_rts() static, as it has no callers outside
runtime-wrappers.c
5. Use BUG(), when we are unable to queue work or unable to identify
requested efi_runtime_service() - Because these two situations should
*never* happen.

Sai Praneeth (3):
  x86/efi: Call efi_delete_dummy_variable() after creating efi_rts_wq
  efi: Introduce efi_queue_work() to queue any efi_runtime_service() on 
   efi_rts_wq
  efi: Use efi_rts_wq to invoke EFI Runtime Services

 arch/x86/platform/efi/efi.c |  15 +-
 drivers/firmware/efi/arm-runtime.c  |   3 +
 drivers/firmware/efi/efi.c  |  25 
 drivers/firmware/efi/runtime-wrappers.c | 250 +---
 include/linux/efi.h |   4 +
 5 files changed, 271 insertions(+), 26 deletions(-)

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Naresh Bhat <naresh.b...@linaro.org>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com>

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] x86: Use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()

2018-04-04 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

When the platform supports PCID and if CONFIG_DEBUG_VM is enabled,
build_cr3_noflush() (called via switch_mm()) does a sanity check to see
if X86_FEATURE_PCID is set. Presently, build_cr3_noflush() uses
"this_cpu_has(X86_FEATURE_PCID)" to perform the check but this_cpu_has()
works only after SMP is initialized (i.e. per cpu cpu_info's should be
populated) and this happens to be very late in the boot process (during
rest_init).

As efi_runtime_services() are called during (early) kernel boot time
and run time, modify build_cr3_noflush() to use boot_cpu_has() all the
time. As suggested by Dave, this should be OK because all cpu's have
same capabilities anyways (for x86).

Without this change we see below warning during kernel boot.

WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/tlbflush.h:134
load_new_mm_cr3+0x114/0x170
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.16.0-02277-gbc16d4052f1a #1
Hardware name: System manufacturer System Product Name/Z170-K, BIOS 3301
02/08/2017
RIP: 0010:load_new_mm_cr3+0x114/0x170
RSP: :9b203e38 EFLAGS: 00010046
RAX:  RBX: 9b26f5a0 RCX: 
RDX:  RSI:  RDI: 9b20a000
RBP: 9b203e90 R08:  R09: 0f63eb29
R10: 9b203ea8 R11: c3292018 R12: 
R13: 9b2e1180 R14: 0001ee80 R15: 
FS:  () GS:968df6c0()
knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 968df6fff000 CR3: 0004261e6002 CR4: 000606b0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
switch_mm_irqs_off+0x267/0x590
switch_mm+0xe/0x20
efi_switch_mm+0x3e/0x50
efi_enter_virtual_mode+0x43f/0x4da
start_kernel+0x3bf/0x458
secondary_startup_64+0xa5/0xb0

Dave also suggested that we put a warning in this_cpu_has() if it's used
early in the boot process. This is still work in progress as it effects
MCE.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Reported-by: Linus Torvalds <torva...@linux-foundation.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Peter Zijlstra <a.p.zijls...@chello.nl>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Dave Hansen <dave.han...@intel.com>
---
 arch/x86/include/asm/tlbflush.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 84137c22fdfa..42e040859067 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -131,7 +131,12 @@ static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
 static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
 {
VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
-   VM_WARN_ON_ONCE(!this_cpu_has(X86_FEATURE_PCID));
+   /*
+* Use boot_cpu_has() instead of this_cpu_has() as this function
+* might be called during early boot. This should work even after
+* boot because all cpu's have same capabilities anyways.
+*/
+   VM_WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_PCID));
return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
 }
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-05 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce some infrastructure:
1. Creating workqueue named efi_rts_wq
2. A macro (efi_queue_work()) that
a. populates efi_runtime_work
b. queues work onto efi_rts_wq and
c. waits until worker thread returns

The caller thread has to wait until the worker thread returns, because
it's dependent on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type)
or a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>
---
 drivers/firmware/efi/efi.c  | 15 
 drivers/firmware/efi/runtime-wrappers.c | 61 +
 include/linux/efi.h | 20 +++
 3 files changed, 96 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 838b8efe639c..04b46c62f3ce 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -75,6 +75,8 @@ static unsigned long *efi_tables[] = {
_attr_table,
 };
 
+struct workqueue_struct *efi_rts_wq;
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -329,6 +331,19 @@ static int __init efisubsys_init(void)
return 0;
 
/*
+* Since we process only one efi_runtime_service() at a time, an
+* ordered workqueue (which creates only one execution context)
+* should suffice all our needs.
+*/
+   efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue", 0);
+   if (!efi_rts_wq) {
+   pr_err("Failed to create efi_rts_workqueue, EFI runtime 
services "
+  "disabled.\n");
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   return 0;
+   }
+
+   /*
 * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
 * it should be invoked only after efi_rts_workqueue is ready.
 */
diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..649763171439 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,14 @@
 /*
  * runtime-wrappers.c - Runtime Services function call wrappers
  *
+ * Implementation summary:
+ * ---
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_workqueue.
+ * 2. Caller thread waits until the work is finished because it's
+ * dependent on the return status and execution of efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
  * Copyright (C) 2014 Linaro Ltd. <ard.biesheu...@linaro.org>
  *
  * Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +30,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 
 /*
@@ -33,6 +43,57 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
+/* efi_runtime_service() function identifiers */
+enum {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   SET_VARIABLE_NONBLOCKING,
+   QUERY_VARIABLE_INFO,
+   QUERY_VARIABLE_INFO_NONBLOCKING,
+   GET_NEXT_HIGH_MONO_COUNT,
+   RESET_SYSTEM,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_queue_work: 

[PATCH V2 1/3] x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization

2018-03-05 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Invoking efi_runtime_services() through efi_workqueue means all accesses
to efi_runtime_services() should be done after efi_rts_wq has been
created. efi_delete_dummy_variable() calls set_variable(), hence
efi_delete_dummy_variable() should be called after efi_rts_wq has been
created.

efi_delete_dummy_variable() is called from efi_enter_virtual_mode()
which is early in the boot phase (efi_rts_wq isn't created yet), so call
efi_delete_dummy_variable() later in the boot phase i.e. while
initializing efi subsystem. In the next patch, this is the place where
we create efi_rts_wq and all the efi_runtime_services() will be called
using efi_rts_wq.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>
---
 arch/x86/include/asm/efi.h  | 1 -
 arch/x86/platform/efi/efi.c | 6 --
 drivers/firmware/efi/efi.c  | 6 ++
 include/linux/efi.h | 3 +++
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index a399c1ebf6f0..43009e3f821b 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -143,7 +143,6 @@ extern void __init efi_runtime_update_mappings(void);
 extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
-extern void efi_delete_dummy_variable(void);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..a3169d14583f 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void)
 
if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
runtime_code_page_mkexec();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 #endif
 }
 
@@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void)
 * necessary relocation fixups for the new virtual addresses.
 */
efi_runtime_update_mappings();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 }
 
 void __init efi_enter_virtual_mode(void)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index cd42f66a7c85..838b8efe639c 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -328,6 +328,12 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /*
+* Clean DUMMY object calls EFI Runtime Service, set_variable(), so
+* it should be invoked only after efi_rts_workqueue is ready.
+*/
+   efi_delete_dummy_variable();
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index f5083aa72eae..c4efb3ef0dfa 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -992,6 +992,7 @@ extern efi_status_t efi_query_variable_store(u32 attributes,
 unsigned long size,
 bool nonblocking);
 extern void efi_find_mirror(void);
+extern void efi_delete_dummy_variable(void);
 #else
 static inline void efi_late_init(void) {}
 static inline void efi_free_boot_services(void) {}
@@ -1002,6 +1003,8 @@ static inline efi_status_t efi_query_variable_store(u32 
attributes,
 {
return EFI_SUCCESS;
 }
+
+static inline void efi_delete_dummy_variable(void) {}
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-05 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, efi_runtime_services() are executed by firmware in process
context. To execute efi_runtime_service(), kernel switches the page
directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
user space mappings. A potential issue could be, for instance, an NMI
interrupt (like perf) trying to profile some user data while in efi_pgd.

A solution for this issue could be to use kthread to run
efi_runtime_service(). When a user/kernel thread requests to execute
efi_runtime_service(), kernel off-loads this work to kthread which in
turn uses efi_pgd. Anything that tries to touch user space addresses
while in kthread is terminally broken. This patch adds support to efi
subsystem to handle all calls to efi_runtime_services() using a work
queue (which in turn uses kthread).

Implementation summary:
---
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_workqueue.
2. Caller thread waits until the work is finished because it's dependent
on the return status of efi_runtime_service().

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is
passed.

Introduce a handler function (called efi_call_rts()) that
a. understands efi_runtime_work and
b. invokes the appropriate efi_runtime_service() with the
appropriate arguments

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

pstore writes could potentially be invoked in interrupt context and it
uses set_variable<>() and query_variable_info<>() to store logs. If we
invoke efi_runtime_services() through efi_rts_wq while in atomic()
kernel issues a warning ("scheduling wile in atomic") and prints stack
trace. One way to overcome this is to not make the caller process wait
for the worker thread to finish. This approach breaks pstore i.e. the
log messages aren't written to efi variables. Hence, pstore calls
efi_runtime_services() without using efi_rts_wq or in other words
efi_rts_wq will be used unconditionally for all the
efi_runtime_services() except set_variable<>() and
query_variable_info<>()

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>
---
 drivers/firmware/efi/runtime-wrappers.c | 168 
 1 file changed, 148 insertions(+), 20 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index 649763171439..eff443bf942c 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -151,13 +151,105 @@ void efi_call_virt_check_flags(unsigned long flags, 
const char *call)
  */
 static DEFINE_SEMAPHORE(efi_runtime_lock);
 
+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+   struct efi_runtime_work *efi_rts_work;
+   void *arg1, *arg2, *arg3, *arg4, *arg5;
+   efi_status_t status = EFI_NOT_FOUND;
+
+   efi_rts_work = container_of(work, struct efi_runtime_work, work);
+   arg1 = efi_rts_work->arg1;
+   arg2 = efi_rts_work->arg2;
+   arg3 = efi_rts_work->arg3;
+   arg4 = efi_rts_work->arg4;
+   arg5 = efi_rts_work->arg5;
+
+   switch (efi_rts_work->func) {
+   case GET_TIME:
+   status = efi_call_virt(get_time, (efi_time_t *)arg1,
+  (efi_time_cap_t *)arg2);
+   break;
+   case SET_TIME:
+   status = efi_call_virt(set_time, (efi_time_t *)arg1);
+   break;
+   case GET_WAKEUP_TIME:
+   status = efi_call_virt(get_

[PATCH V2 0/3] Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-05 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

This patch set is an outcome of the discussion at
https://lkml.org/lkml/2017/8/21/607

Presently, efi_runtime_services() are executed by firmware in process
context. To execute efi_runtime_service(), kernel switches the page
directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
user space mappings. A potential issue could be, for instance, an NMI
interrupt (like perf) trying to profile some user data while in efi_pgd.

A solution for this issue could be to use kthread to run
efi_runtime_service(). When a user/kernel thread requests to execute
efi_runtime_service(), kernel off-loads this work to kthread which in
turn uses efi_pgd. Anything that tries to touch user space addresses
while in kthread is terminally broken. This patch set adds support to
the efi subsystem to handle all calls to efi_runtime_services() using a
work queue (which in turn uses kthread).

Implementation summary:
---
1. When a user/kernel thread requests to execute efi_runtime_service(),
enqueue work to a work queue, efi_rts_workqueue.
2. The caller thread waits until the work is finished because it's
dependent on the return status of efi_runtime_service() and, in specific
cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance, efi_get_variable()
and efi_get_next_variable(). Hence, the caller process cannot just post
the work and get going, it has to wait for results from firmware.

Caveat: efi_rts_workqueue to run efi_runtime_services() shouldn't be used
while in atomic, because caller thread might sleep. Presently, pstore
code doesn't use efi_rts_workqueue.

Tested using LUV (Linux UEFI Validation) for x86_64 and x86_32. Builds
fine for arm and arm64. Will appreciate the effort if someone could test
the patches on real ARM/ARM64 machines.
LUV: https://01.org/linux-uefi-validation

Thanks to Ricardo and Dan for initial reviews and suggestions. Please
feel free to pour in your comments and concerns.
Note: Patches are based on Linus's kernel v4.16-rc4

Changes from V1 to V2:
--
1. Remove unnecessary include of asm/efi.h file - Fixes build error on
ia64, reported by 0-day
2. Use enum to identify efi_runtime_services()
3. Use alloc_ordered_workqueue() to create efi_rts_wq as
create_workqueue() is scheduled for depreciation.
4. Make efi_call_rts() static, as it has no callers outside
runtime-wrappers.c
5. Use BUG(), when we are unable to queue work or unable to identify
requested efi_runtime_service() - Because these two situations should
*never* happen.

Sai Praneeth (3):
  x86/efi: Call efi_delete_dummy_variable() during efi subsystem
initialization
  efi: Introduce efi_rts_workqueue and some infrastructure to invoke all
efi_runtime_services()
  efi: Use efi_rts_workqueue to invoke EFI Runtime Services

 arch/x86/include/asm/efi.h  |   1 -
 arch/x86/platform/efi/efi.c |   6 -
 drivers/firmware/efi/efi.c  |  21 +++
 drivers/firmware/efi/runtime-wrappers.c | 229 +---
 include/linux/efi.h |  23 
 5 files changed, 253 insertions(+), 27 deletions(-)

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V1 1/3] x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization

2018-02-24 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Invoking efi_runtime_services() through efi_workqueue means all accesses
to efi_runtime_services() should be done after efi_rts_wq has been
created. efi_delete_dummy_variable() calls set_variable(), hence
efi_delete_dummy_variable() should be called after efi_rts_wq has been
created.

efi_delete_dummy_variable() is called from efi_enter_virtual_mode()
which is early in the boot phase (efi_rts_wq isn't created yet), so call
efi_delete_dummy_variable() later in the boot phase i.e. while
initializing efi subsystem. In the next patch, this is the place where
we create efi_rts_wq and all the efi_runtime_services() will be called
using efi_rts_wq.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>
---
 arch/x86/include/asm/efi.h  | 1 -
 arch/x86/platform/efi/efi.c | 6 --
 drivers/firmware/efi/efi.c  | 7 +++
 include/linux/efi.h | 3 +++
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb80b91..34b03440a80f 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -130,7 +130,6 @@ extern void __init efi_runtime_update_mappings(void);
 extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
-extern void efi_delete_dummy_variable(void);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..a3169d14583f 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void)
 
if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
runtime_code_page_mkexec();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 #endif
 }
 
@@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void)
 * necessary relocation fixups for the new virtual addresses.
 */
efi_runtime_update_mappings();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 }
 
 void __init efi_enter_virtual_mode(void)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index cd42f66a7c85..ac5db5f8dbbf 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -33,6 +33,7 @@
 #include 
 
 #include 
+#include 
 
 struct efi __read_mostly efi = {
.mps= EFI_INVALID_TABLE_ADDR,
@@ -328,6 +329,12 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /*
+* Clean DUMMY object calls EFI Runtime Service, set_variable(), so
+* it should be invoked only after efi_rts_workqueue is ready.
+*/
+   efi_delete_dummy_variable();
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index f5083aa72eae..c4efb3ef0dfa 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -992,6 +992,7 @@ extern efi_status_t efi_query_variable_store(u32 attributes,
 unsigned long size,
 bool nonblocking);
 extern void efi_find_mirror(void);
+extern void efi_delete_dummy_variable(void);
 #else
 static inline void efi_late_init(void) {}
 static inline void efi_free_boot_services(void) {}
@@ -1002,6 +1003,8 @@ static inline efi_status_t efi_query_variable_store(u32 
attributes,
 {
return EFI_SUCCESS;
 }
+
+static inline void efi_delete_dummy_variable(void) {}
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V1 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-02-24 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, efi_runtime_services() are executed by firmware in process
context. To execute efi_runtime_service(), kernel switches the page
directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
user space mappings. A potential issue could be, for instance, an NMI
interrupt (like perf) trying to profile some user data while in efi_pgd.

A solution for this issue could be to use kthread to run
efi_runtime_service(). When a user/kernel thread requests to execute
efi_runtime_service(), kernel off-loads this work to kthread which in
turn uses efi_pgd. Anything that tries to touch user space addresses
while in kthread is terminally broken. This patch adds support to efi
subsystem to handle all calls to efi_runtime_services() using a work
queue (which in turn uses kthread).

Implementation summary:
---
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_workqueue.
2. Caller thread waits until the work is finished because it's dependent
on the return status of efi_runtime_service().

pstore writes could potentially be invoked in interrupt context and it
uses set_variable<>() and query_variable_info<>() to store logs. If we
invoke efi_runtime_services() through efi_rts_wq while in atomic()
kernel issues a warning ("scheduling wile in atomic") and prints stack
trace. One way to overcome this is to not make the caller process wait
for the worker thread to finish. This approach breaks pstore i.e. the
log messages aren't written to efi variables. Hence, pstore calls
efi_runtime_services() without using efi_rts_wq or in other words
efi_rts_wq will be used unconditionally for all the
efi_runtime_services() except set_variable<>() and
query_variable_info<>()

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is
passed.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>
---
 drivers/firmware/efi/runtime-wrappers.c | 86 +
 1 file changed, 66 insertions(+), 20 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index 5cdb787da5d3..531d077aac70 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -68,6 +68,16 @@
  * semaphore (efi_runtime_lock) and caller waits until the work is
  * finished, hence _only_ one work is queued at a time. So, queue_work()
  * should never fail.
+ *
+ * efi_rts_workqueue to run efi_runtime_services() shouldn't be used
+ * while in atomic, because caller thread might sleep. pstore writes
+ * could potentially be invoked in interrupt context and it uses
+ * set_variable<>() and query_variable_info<>(), so pstore code doesn't
+ * use efi_rts_workqueue.
+ *
+ * Semantics that caller function should follow while passing arguments:
+ * 1. If argument is a pointer (of any type), pass it as is.
+ * 2. If argument is a value (of any type), address of the value is passed.
  */
 #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)
\
 ({ \
@@ -150,7 +160,7 @@ static efi_status_t virt_efi_get_time(efi_time_t *tm, 
efi_time_cap_t *tc)
 
if (down_interruptible(_runtime_lock))
return EFI_ABORTED;
-   status = efi_call_virt(get_time, tm, tc);
+   status = efi_queue_work(GET_TIME, tm, tc, NULL, NULL, NULL);
up(_runtime_lock);
return status;
 }
@@ -161,7 +171,7 @@ static efi_status_t virt_efi_set_time(efi_time_t *tm)
 
if (down_interruptible(_runtime_lock))
return EFI_ABORTED;
-   status = efi_call_virt(set_time, tm);
+   status = efi_queue_work(SET_TIME, tm, NULL, NULL, NULL, NULL);
up(_runtime_lock);
return status;
 }
@@ -174,7 +184,8 @@ static efi_status_t virt_efi_get_wakeup_time(efi_bool_t 
*enabled,
 
if (down_interruptible(_runtime_lock))
return EFI_ABORTED;
-   status = efi_call_virt(get_wakeup_time, enabled, pending, tm);
+   status = efi_queue_work(GET_

[PATCH V1 2/3] efi: Introduce efi_rts_workqueue and necessary infrastructure to invoke all efi_runtime_services()

2018-02-24 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce necessary infrastructure:
1. Creating workqueue named efi_rts_wq
2. A macro (efi_queue_work()) that
a. populates efi_runtime_work
b. queues work onto efi_rts_wq and
c. waits until worker thread returns
3. A handler function that
a. understands efi_runtime_work and
b. invokes the appropriate efi_runtime_service() with the
appropriate arguments

The caller thread has to wait until the worker thread returns, because
it's dependent on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type)
or a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>
---
 drivers/firmware/efi/efi.c  |  11 +++
 drivers/firmware/efi/runtime-wrappers.c | 143 
 include/linux/efi.h |  23 +
 3 files changed, 177 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index ac5db5f8dbbf..4714b305ca90 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -76,6 +76,8 @@ static unsigned long *efi_tables[] = {
_attr_table,
 };
 
+struct workqueue_struct *efi_rts_wq;
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -329,6 +331,15 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /* Create a work queue to run EFI Runtime Services */
+   efi_rts_wq = create_workqueue("efi_rts_workqueue");
+   if (!efi_rts_wq) {
+   pr_err("Failed to create efi_rts_workqueue, EFI runtime 
services "
+  "disabled.\n");
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   return 0;
+   }
+
/*
 * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
 * it should be invoked only after efi_rts_workqueue is ready.
diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..5cdb787da5d3 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,14 @@
 /*
  * runtime-wrappers.c - Runtime Services function call wrappers
  *
+ * Implementation summary:
+ * ---
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_workqueue.
+ * 2. Caller thread waits until the work is finished because it's
+ * dependent on the return status and execution of efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
  * Copyright (C) 2014 Linaro Ltd. <ard.biesheu...@linaro.org>
  *
  * Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +30,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 
 /*
@@ -33,6 +43,50 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
+/* Each EFI Runtime Service is represented with a unique number */
+#define GET_TIME   0
+#define SET_

[PATCH V1 0/3] Use efi_rts_workqueue to invoke EFI Runtime Services

2018-02-24 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

This patch set is an outcome of the discussion at
https://lkml.org/lkml/2017/8/21/607

Presently, efi_runtime_services() are executed by firmware in process
context. To execute efi_runtime_service(), kernel switches the page
directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
user space mappings. A potential issue could be, for instance, an NMI
interrupt (like perf) trying to profile some user data while in efi_pgd.

A solution for this issue could be to use kthread to run
efi_runtime_service(). When a user/kernel thread requests to execute
efi_runtime_service(), kernel off-loads this work to kthread which in
turn uses efi_pgd. Anything that tries to touch user space addresses
while in kthread is terminally broken. This patch set adds support to
the efi subsystem to handle all calls to efi_runtime_services() using a
work queue (which in turn uses kthread).

Implementation summary:
---
1. When a user/kernel thread requests to execute efi_runtime_service(),
enqueue work to a work queue, efi_rts_workqueue.
2. The caller thread waits until the work is finished because it's
dependent on the return status of efi_runtime_service() and, in specific
cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance, efi_get_variable()
and efi_get_next_variable(). Hence, the caller process cannot just post
the work and get going, it has to wait for results from firmware.

Caveat: efi_rts_workqueue to run efi_runtime_services() shouldn't be used
while in atomic, because caller thread might sleep. Presently, pstore
code doesn't use efi_rts_workqueue.

Tested using LUV (Linux UEFI Validation) for x86_64 and x86_32. Builds
fine for arm and arm64. Will appreciate the effort if someone could test
the patches on ARM (although I was able to boot with LUV for ARM).
LUV: https://01.org/linux-uefi-validation

Thanks to Ricardo and Dan for initial reviews and suggestions. Please
feel free to pour in your comments and concerns.
Note: Patches are based on Linus's kernel v4.16-rc2

Sai Praneeth (3):
  x86/efi: Call efi_delete_dummy_variable() during efi subsystem
initialization
  efi: Introduce efi_rts_workqueue and necessary infrastructure to
invoke all efi_runtime_services()
  efi: Use efi_rts_workqueue to invoke EFI Runtime Services

 arch/x86/include/asm/efi.h  |   1 -
 arch/x86/platform/efi/efi.c |   6 -
 drivers/firmware/efi/efi.c  |  18 +++
 drivers/firmware/efi/runtime-wrappers.c | 229 +---
 include/linux/efi.h |  26 
 5 files changed, 253 insertions(+), 27 deletions(-)

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4 1/3] efi: Use efi_mm in x86 as well as ARM

2018-01-18 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, only ARM uses mm_struct to manage efi page tables and efi
runtime region mappings. As this is the preferred approach, let's make
this data structure common across architectures. Specially, for x86,
using this data structure improves code maintainability and readability.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>
---
 arch/x86/include/asm/efi.h | 4 
 arch/x86/platform/efi/efi_64.c | 3 +++
 drivers/firmware/efi/arm-runtime.c | 9 -
 drivers/firmware/efi/efi.c | 9 +
 include/linux/efi.h| 2 ++
 5 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb80b91..00f977ddd718 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -2,10 +2,14 @@
 #ifndef _ASM_X86_EFI_H
 #define _ASM_X86_EFI_H
 
+#include 
+#include 
+
 #include 
 #include 
 #include 
 #include 
+#include 
 
 /*
  * We map the EFI regions needed for runtime services non-contiguously,
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 2dd15e967c3f..c9f8e6924df7 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -232,6 +232,9 @@ int __init efi_alloc_page_tables(void)
return -ENOMEM;
}
 
+   mm_init_cpumask(_mm);
+   init_new_context(NULL, _mm);
+
return 0;
 }
 
diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 1cc41c3d6315..d6b26534812b 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -31,15 +31,6 @@
 
 extern u64 efi_system_table;
 
-static struct mm_struct efi_mm = {
-   .mm_rb  = RB_ROOT,
-   .mm_users   = ATOMIC_INIT(2),
-   .mm_count   = ATOMIC_INIT(1),
-   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
-   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
-   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
-};
-
 #ifdef CONFIG_ARM64_PTDUMP_DEBUGFS
 #include 
 
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 557a47829d03..760260b933b6 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -74,6 +74,15 @@ static unsigned long *efi_tables[] = {
_attr_table,
 };
 
+struct mm_struct efi_mm = {
+   .mm_rb  = RB_ROOT,
+   .mm_users   = ATOMIC_INIT(2),
+   .mm_count   = ATOMIC_INIT(1),
+   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
+   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
+   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
+};
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 29fdf8029cf6..d79f1cc4c8bb 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -930,6 +930,8 @@ extern struct efi {
unsigned long flags;
 } efi;
 
+extern struct mm_struct efi_mm;
+
 static inline int
 efi_guidcmp (efi_guid_t left, efi_guid_t right)
 {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4 2/3] x86/efi: Replace efi_pgd with efi_mm.pgd

2018-01-18 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Since the previous patch added support for efi_mm, let's handle efi_pgd
through efi_mm and remove global variable efi_pgd.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>
---
 arch/x86/platform/efi/efi_64.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index c9f8e6924df7..c93f59731608 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -191,8 +191,6 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
early_code_mapping_set_exec(0);
 }
 
-static pgd_t *efi_pgd;
-
 /*
  * We need our own copy of the higher levels of the page tables
  * because we want to avoid inserting EFI region mappings (EFI_VA_END
@@ -204,7 +202,7 @@ static pgd_t *efi_pgd;
  */
 int __init efi_alloc_page_tables(void)
 {
-   pgd_t *pgd;
+   pgd_t *pgd, *efi_pgd;
p4d_t *p4d;
pud_t *pud;
gfp_t gfp_mask;
@@ -232,6 +230,7 @@ int __init efi_alloc_page_tables(void)
return -ENOMEM;
}
 
+   efi_mm.pgd = efi_pgd;
mm_init_cpumask(_mm);
init_new_context(NULL, _mm);
 
@@ -247,6 +246,7 @@ void efi_sync_low_kernel_mappings(void)
pgd_t *pgd_k, *pgd_efi;
p4d_t *p4d_k, *p4d_efi;
pud_t *pud_k, *pud_efi;
+   pgd_t *efi_pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return;
@@ -340,7 +340,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
unsigned long pfn, text, pf;
struct page *page;
unsigned npages;
-   pgd_t *pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
@@ -350,8 +350,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * this value is loaded into cr3 the PGD will be decrypted during
 * the pagetable walk.
 */
-   efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
-   pgd = efi_pgd;
+   efi_scratch.efi_pgt = (pgd_t *)__sme_pa(pgd);
 
/*
 * It can happen that the physical address of new_memmap lands in memory
@@ -421,7 +420,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
 {
unsigned long flags = _PAGE_RW;
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
@@ -525,7 +524,7 @@ void __init parse_efi_setup(u64 phys_addr, u32 data_len)
 static int __init efi_update_mappings(efi_memory_desc_t *md, unsigned long pf)
 {
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
int err1, err2;
 
/* Update the 1:1 mapping */
@@ -622,7 +621,7 @@ void __init efi_dump_pagetable(void)
if (efi_enabled(EFI_OLD_MEMMAP))
ptdump_walk_pgd_level(NULL, swapper_pg_dir);
else
-   ptdump_walk_pgd_level(NULL, efi_pgd);
+   ptdump_walk_pgd_level(NULL, efi_mm.pgd);
 #endif
 }
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V4 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

2018-01-18 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Use helper function (efi_switch_mm()) to switch to/from efi_mm. We
switch to efi_mm before calling
1. efi_set_virtual_address_map() and
2. Invoking any efi_runtime_service()

Likewise, we need to switch back to previous mm (mm context stolen by
efi_mm) after the above calls return successfully. We can use
efi_switch_mm() helper function only with x86_64 kernel and
"efi=old_map" disabled because, x86_32 and efi=old_map doesn't use
efi_pgd, rather they use swapper_pg_dir.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>
---
 arch/x86/include/asm/efi.h   | 25 +-
 arch/x86/platform/efi/efi_64.c   | 40 +++-
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 3 files changed, 32 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 00f977ddd718..cda9940bed7a 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -62,14 +62,13 @@ extern asmlinkage u64 efi_call(void *fp, ...);
 #define efi_call_phys(f, args...)  efi_call((f), args)
 
 /*
- * Scratch space used for switching the pagetable in the EFI stub
+ * struct efi_scratch - Scratch space used while switching to/from efi_mm
+ * @phys_stack: stack used during EFI Mixed Mode
+ * @prev_mm:store/restore stolen mm_struct while switching to/from efi_mm
  */
 struct efi_scratch {
-   u64 r15;
-   u64 prev_cr3;
-   pgd_t   *efi_pgt;
-   booluse_pgd;
-   u64 phys_stack;
+   u64 phys_stack;
+   struct mm_struct*prev_mm;
 } __packed;
 
 #define arch_efi_call_virt_setup() \
@@ -78,11 +77,8 @@ struct efi_scratch {
preempt_disable();  \
__kernel_fpu_begin();   \
\
-   if (efi_scratch.use_pgd) {  \
-   efi_scratch.prev_cr3 = __read_cr3();\
-   write_cr3((unsigned long)efi_scratch.efi_pgt);  \
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(_mm); \
 })
 
 #define arch_efi_call_virt(p, f, args...)  \
@@ -90,10 +86,8 @@ struct efi_scratch {
 
 #define arch_efi_call_virt_teardown()  \
 ({ \
-   if (efi_scratch.use_pgd) {  \
-   write_cr3(efi_scratch.prev_cr3);\
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(efi_scratch.prev_mm); \
\
__kernel_fpu_end(); \
preempt_enable();   \
@@ -135,6 +129,7 @@ extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
 extern void efi_delete_dummy_variable(void);
+extern void efi_switch_mm(struct mm_struct *mm);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index c93f59731608..d6892ad2a693 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -82,9 +82,8 @@ pgd_t * __init efi_call_phys_prolog(void)
int n_pgds, i, j;
 
if (!efi_enabled(EFI_OLD_MEMMAP)) {
-   save_pgd = (pgd_t *)__read_cr3();
-   write_cr3((unsigned long)efi_scratch.efi_pgt);
-   goto out;
+   efi_switch_mm(_mm);
+   return NULL;
}
 
early_code_mapping_set_exec(1);
@@ -156,8 +155,7 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
   

[PATCH V4 0/3] Use mm_struct and switch_mm() instead of manually

2018-01-18 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, in x86, to invoke any efi function like
efi_set_virtual_address_map() or any efi_runtime_service() the code path
typically involves read_cr3() (save previous pgd), write_cr3()
(write efi_pgd) and calling efi function. Likewise after returning from
efi function the code path typically involves read_cr3() (save efi_pgd),
write_cr3() (write previous pgd). We do this couple of times in efi
subsystem of Linux kernel, instead we can use helper function
efi_switch_mm() to do this. This improves readability and maintainability.
Also, instead of maintaining a separate struct "efi_scratch" to store/restore
efi_pgd, we can use mm_struct to do this.

I have tested this patch set against LUV (Linux UEFI Validation), so I
think I didn't break any existing configurations. I have tested this
patch set for
1. x86_64,
2. x86_32,
3. Mixed mode
with efi=old_map and for kexec kernel. Please let me know if I have
missed any other configurations.

Changes in V2:
1. Resolve mm_dropping() issue by not mm_dropping()/mm_grabbing() any mm,
as we are not losing/creating any references.

Changes in V3:
1. When CPUMASK_OFFSTACK is enabled, switch_mm_irqs_off() sets cpumask
by calling cpumask_set_cpu(). This panics kernel as efi_mm is not
initialized, therefore initialize efi_mm in efi_alloc_page_tables().

Changes in V4:
1. Remove the unintended removal of local_irq_restore(flags) (in 3rd patch).
IRQ flags should be restored after switching to orginal mm.

Note:
This patch set is based on Linus's tree v4.15-rc8

Sai Praneeth (3):
  efi: Use efi_mm in x86 as well as ARM
  x86/efi: Replace efi_pgd with efi_mm.pgd
  x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

 arch/x86/include/asm/efi.h   | 29 +-
 arch/x86/platform/efi/efi_64.c   | 58 +++-
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 drivers/firmware/efi/arm-runtime.c   |  9 --
 drivers/firmware/efi/efi.c   |  9 ++
 include/linux/efi.h  |  2 ++
 6 files changed, 57 insertions(+), 52 deletions(-)

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] x86/efi: Replace efi_pgd with efi_mm.pgd

2017-12-16 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Since the previous patch added support for efi_mm, let's handle efi_pgd
through efi_mm and remove global variable efi_pgd.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>
---
 arch/x86/platform/efi/efi_64.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index ccf5239923e8..6b541bdbda5f 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -189,8 +189,6 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
early_code_mapping_set_exec(0);
 }
 
-static pgd_t *efi_pgd;
-
 /*
  * We need our own copy of the higher levels of the page tables
  * because we want to avoid inserting EFI region mappings (EFI_VA_END
@@ -199,7 +197,7 @@ static pgd_t *efi_pgd;
  */
 int __init efi_alloc_page_tables(void)
 {
-   pgd_t *pgd;
+   pgd_t *pgd, *efi_pgd;
p4d_t *p4d;
pud_t *pud;
gfp_t gfp_mask;
@@ -227,6 +225,7 @@ int __init efi_alloc_page_tables(void)
return -ENOMEM;
}
 
+   efi_mm.pgd = efi_pgd;
mm_init_cpumask(_mm);
init_new_context(NULL, _mm);
 
@@ -242,6 +241,7 @@ void efi_sync_low_kernel_mappings(void)
pgd_t *pgd_k, *pgd_efi;
p4d_t *p4d_k, *p4d_efi;
pud_t *pud_k, *pud_efi;
+   pgd_t *efi_pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return;
@@ -335,7 +335,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
unsigned long pfn, text, pf;
struct page *page;
unsigned npages;
-   pgd_t *pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
@@ -345,8 +345,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * this value is loaded into cr3 the PGD will be decrypted during
 * the pagetable walk.
 */
-   efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
-   pgd = efi_pgd;
+   efi_scratch.efi_pgt = (pgd_t *)__sme_pa(pgd);
 
/*
 * It can happen that the physical address of new_memmap lands in memory
@@ -416,7 +415,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
 {
unsigned long flags = _PAGE_RW;
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
@@ -520,7 +519,7 @@ void __init parse_efi_setup(u64 phys_addr, u32 data_len)
 static int __init efi_update_mappings(efi_memory_desc_t *md, unsigned long pf)
 {
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
int err1, err2;
 
/* Update the 1:1 mapping */
@@ -617,7 +616,7 @@ void __init efi_dump_pagetable(void)
if (efi_enabled(EFI_OLD_MEMMAP))
ptdump_walk_pgd_level(NULL, swapper_pg_dir);
else
-   ptdump_walk_pgd_level(NULL, efi_pgd);
+   ptdump_walk_pgd_level(NULL, efi_mm.pgd);
 #endif
 }
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

2017-12-16 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Use helper function (efi_switch_mm()) to switch to/from efi_mm. We
switch to efi_mm before calling
1. efi_set_virtual_address_map() and
2. Invoking any efi_runtime_service()

Likewise, we need to switch back to previous mm (mm context stolen by
efi_mm) after the above calls return successfully. We can use
efi_switch_mm() helper function only with x86_64 kernel and
"efi=old_map" disabled because, x86_32 and efi=old_map doesn't use
efi_pgd, rather they use swapper_pg_dir.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>
---
 arch/x86/include/asm/efi.h   | 25 +-
 arch/x86/platform/efi/efi_64.c   | 41 ++--
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 3 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 00f977ddd718..cda9940bed7a 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -62,14 +62,13 @@ extern asmlinkage u64 efi_call(void *fp, ...);
 #define efi_call_phys(f, args...)  efi_call((f), args)
 
 /*
- * Scratch space used for switching the pagetable in the EFI stub
+ * struct efi_scratch - Scratch space used while switching to/from efi_mm
+ * @phys_stack: stack used during EFI Mixed Mode
+ * @prev_mm:store/restore stolen mm_struct while switching to/from efi_mm
  */
 struct efi_scratch {
-   u64 r15;
-   u64 prev_cr3;
-   pgd_t   *efi_pgt;
-   booluse_pgd;
-   u64 phys_stack;
+   u64 phys_stack;
+   struct mm_struct*prev_mm;
 } __packed;
 
 #define arch_efi_call_virt_setup() \
@@ -78,11 +77,8 @@ struct efi_scratch {
preempt_disable();  \
__kernel_fpu_begin();   \
\
-   if (efi_scratch.use_pgd) {  \
-   efi_scratch.prev_cr3 = __read_cr3();\
-   write_cr3((unsigned long)efi_scratch.efi_pgt);  \
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(_mm); \
 })
 
 #define arch_efi_call_virt(p, f, args...)  \
@@ -90,10 +86,8 @@ struct efi_scratch {
 
 #define arch_efi_call_virt_teardown()  \
 ({ \
-   if (efi_scratch.use_pgd) {  \
-   write_cr3(efi_scratch.prev_cr3);\
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(efi_scratch.prev_mm); \
\
__kernel_fpu_end(); \
preempt_enable();   \
@@ -135,6 +129,7 @@ extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
 extern void efi_delete_dummy_variable(void);
+extern void efi_switch_mm(struct mm_struct *mm);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 6b541bdbda5f..c325b1cc4d1a 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -82,9 +82,8 @@ pgd_t * __init efi_call_phys_prolog(void)
int n_pgds, i, j;
 
if (!efi_enabled(EFI_OLD_MEMMAP)) {
-   save_pgd = (pgd_t *)__read_cr3();
-   write_cr3((unsigned long)efi_scratch.efi_pgt);
-   goto out;
+   efi_switch_mm(_mm);
+   return NULL;
}
 
early_code_mapping_set_exec(1);
@@ -154,8 +153,7 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
   

[PATCH 1/3] efi: Use efi_mm in x86 as well as ARM

2017-12-16 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, only ARM uses mm_struct to manage efi page tables and efi
runtime region mappings. As this is the preferred approach, let's make
this data structure common across architectures. Specially, for x86,
using this data structure improves code maintainability and readability.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>
---
 arch/x86/include/asm/efi.h | 4 
 arch/x86/platform/efi/efi_64.c | 3 +++
 drivers/firmware/efi/arm-runtime.c | 9 -
 drivers/firmware/efi/efi.c | 9 +
 include/linux/efi.h| 2 ++
 5 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb80b91..00f977ddd718 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -2,10 +2,14 @@
 #ifndef _ASM_X86_EFI_H
 #define _ASM_X86_EFI_H
 
+#include 
+#include 
+
 #include 
 #include 
 #include 
 #include 
+#include 
 
 /*
  * We map the EFI regions needed for runtime services non-contiguously,
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 6a151ce70e86..ccf5239923e8 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -227,6 +227,9 @@ int __init efi_alloc_page_tables(void)
return -ENOMEM;
}
 
+   mm_init_cpumask(_mm);
+   init_new_context(NULL, _mm);
+
return 0;
 }
 
diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 1cc41c3d6315..d6b26534812b 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -31,15 +31,6 @@
 
 extern u64 efi_system_table;
 
-static struct mm_struct efi_mm = {
-   .mm_rb  = RB_ROOT,
-   .mm_users   = ATOMIC_INIT(2),
-   .mm_count   = ATOMIC_INIT(1),
-   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
-   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
-   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
-};
-
 #ifdef CONFIG_ARM64_PTDUMP_DEBUGFS
 #include 
 
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 557a47829d03..760260b933b6 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -74,6 +74,15 @@ static unsigned long *efi_tables[] = {
_attr_table,
 };
 
+struct mm_struct efi_mm = {
+   .mm_rb  = RB_ROOT,
+   .mm_users   = ATOMIC_INIT(2),
+   .mm_count   = ATOMIC_INIT(1),
+   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
+   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
+   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
+};
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index d813f7b04da7..6745f4dbbcc1 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -928,6 +928,8 @@ extern struct efi {
unsigned long flags;
 } efi;
 
+extern struct mm_struct efi_mm;
+
 static inline int
 efi_guidcmp (efi_guid_t left, efi_guid_t right)
 {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Use mm_struct and switch_mm() instead of manually

2017-12-16 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, in x86, to invoke any efi function like
efi_set_virtual_address_map() or any efi_runtime_service() the code path
typically involves read_cr3() (save previous pgd), write_cr3()
(write efi_pgd) and calling efi function. Likewise after returning from
efi function the code path typically involves read_cr3() (save efi_pgd),
write_cr3() (write previous pgd). We do this couple of times in efi
subsystem of Linux kernel, instead we can use helper function
efi_switch_mm() to do this. This improves readability and maintainability.
Also, instead of maintaining a separate struct "efi_scratch" to store/restore
efi_pgd, we can use mm_struct to do this.

I have tested this patch set against LUV (Linux UEFI Validation), so I
think I didn't break any existing configurations. I have tested this
patch set for
1. x86_64,
2. x86_32,
3. Mixed mode
with efi=old_map and for kexec kernel. Please let me know if I have
missed any other configurations.

Changes in V2:
1. Resolve mm_dropping() issue by not mm_dropping()/mm_grabbing() any mm,
as we are not losing/creating any references.

Changes in V3:
1. When CPUMASK_OFFSTACK is enabled, switch_mm_irqs_off() sets cpumask
by calling cpumask_set_cpu(). This panics kernel as efi_mm is not
initialized, therefore initialize efi_mm in efi_alloc_page_tables().

Note:
This patch set is based on Linus's tree v4.15-rc3

Sai Praneeth (3):
  efi: Use efi_mm in x86 as well as ARM
  x86/efi: Replace efi_pgd with efi_mm.pgd
  x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

 arch/x86/include/asm/efi.h   | 29 +-
 arch/x86/platform/efi/efi_64.c   | 59 +++-
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 drivers/firmware/efi/arm-runtime.c   |  9 --
 drivers/firmware/efi/efi.c   |  9 ++
 include/linux/efi.h  |  2 ++
 6 files changed, 57 insertions(+), 53 deletions(-)

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of manually

2017-09-05 Thread Sai Praneeth Prakhya
On Tue, 2017-09-05 at 19:21 -0700, Sai Praneeth Prakhya wrote:
> > I get a similar crash on Qemu with linus's master branch and the V2
> > applied on top of it. Here are the details of my test environment:
> > 
> > 1. I use the OVMF (EDK2) EFI firmware to launch the kernel:
> > edk2.git/ovmf-x64
> > 
> > 2. I used linus's master branch (HEAD - commit:
> > b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top
> > of the same.
> > 
> > 3. I use the following qemu command line to launch the test:
> > 
> > # /usr/local/bin/qemu-system-x86_64 --version
> > QEMU emulator version 2.9.50 (v2.9.0-526-g76d20ea)
> > Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
> > 
> > # /usr/local/bin/qemu-system-x86_64 -enable-kvm  -net nic -net tap  -m
> > $MEMSIZE -nographic -drive file=$DISK_IMAGE,if=virtio,format=qcow2
> > -vga std -boot c -cpu host -kernel $KERNEL -append
> > "crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81"  -initrd
> > $INITRAMFS -bios $OVMF_FW_PATH
> > 
> > And here is the crash log:
> > 
> > [0.006054] general protection fault:  [#1] SMP
> > [0.006459] Modules linked in:
> > [0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3
> > [0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS 0.0.0 02/06/2015
> > [0.007000] task: 81e0f480 task.stack: 81e0
> > [0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440
> > [0.007000] RSP: :81e03d80 EFLAGS: 00010086
> > [0.007000] RAX: 80007d084000 RBX:  RCX: 
> > 77ff8000
> > [0.007000] RDX: 7d084000 RSI: 8000 RDI: 
> > 00019a00
> > [0.007000] RBP: 81e03dc0 R08:  R09: 
> > 88007d085000
> > [0.007000] R10: 81e03dd8 R11: 7d095063 R12: 
> > 81e5c6a0
> > [0.007000] R13: 81ed4f40 R14: 0030 R15: 
> > 0001
> > [0.007000] FS:  () GS:88007d40()
> > knlGS:
> > [0.007000] CS:  0010 DS:  ES:  CR0: 80050033
> > [0.007000] CR2: 88007d754000 CR3: 0220a000 CR4: 
> > 000406b0
> > [0.007000] Call Trace:
> > [0.007000]  switch_mm+0xd/0x20
> > [0.007000]  ? switch_mm+0xd/0x20
> > [0.007000]  efi_switch_mm+0x3e/0x4a
> > [0.007000]  efi_call_phys_prolog+0x28/0x1ac
> > [0.007000]  efi_enter_virtual_mode+0x35a/0x48f
> > [0.007000]  start_kernel+0x332/0x3b8
> > [0.007000]  x86_64_start_reservations+0x2a/0x2c
> > [0.007000]  x86_64_start_kernel+0x178/0x18b
> > [0.007000]  secondary_startup_64+0xa5/0xa5
> > [0.007000]  ? secondary_startup_64+0xa5/0xa5
> > [0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9
> > 00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09
> > f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87 fb
> > 7e 89
> > [0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: 81e03d80
> > [0.007000] ---[ end trace bfa55bf4e4765255 ]---
> > [0.007000] Kernel panic - not syncing: Attempted to kill the idle task!
> > [0.007000] ---[ end Kernel panic - not syncing: Attempted to kill
> > the idle task!
> > 
> > 4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf
> > firmware and 64-bit x86 kernel) with your patches, the primary kernel
> > boots fine on Qemu:
> > 
> > ovmf firmware used in this case - edk2.git/ovmf-ia32
> > 
> > 5. Also, if I append 'efi=old_map' to the bootargs (for the failing
> > case in point 3 above), I see the primary kernel boots fine on Qemu as
> > well.
> > 
> > Regards,
> > Bhupesh
> 
> Hi Bhupesh,
> 
> Thanks a lot for the detailed explanation. They are helpful to reproduce
> the issue quickly. From my initial debug, I think that AMD SME +
> efi_mm_struct patches + -cpu host (in qemu) are required to reproduce
> the issue on qemu.
> 
> I have tried the following combinations (all tests are on qemu):
> On Linus's tree:
> 1. With  SME and  efi_mm and  -cpu host -> panics
> 2. With  SME and  efi_mm and !-cpu host -> boots
> 3. With  SME and !efi_mm and  -cpu host -> boots
> 4. With  SME and !efi_mm and !-cpu host -> boots
> 5. With !SME and  efi_mm and  -cpu host -> boots
> 6. With !SME and  efi_mm and !-cpu host -> boots
> 7. With !SME and !efi_mm and  -cpu host -> boots
> 8. With !SME and 

Re: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of manually

2017-09-05 Thread Sai Praneeth Prakhya

> I get a similar crash on Qemu with linus's master branch and the V2
> applied on top of it. Here are the details of my test environment:
> 
> 1. I use the OVMF (EDK2) EFI firmware to launch the kernel:
> edk2.git/ovmf-x64
> 
> 2. I used linus's master branch (HEAD - commit:
> b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top
> of the same.
> 
> 3. I use the following qemu command line to launch the test:
> 
> # /usr/local/bin/qemu-system-x86_64 --version
> QEMU emulator version 2.9.50 (v2.9.0-526-g76d20ea)
> Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
> 
> # /usr/local/bin/qemu-system-x86_64 -enable-kvm  -net nic -net tap  -m
> $MEMSIZE -nographic -drive file=$DISK_IMAGE,if=virtio,format=qcow2
> -vga std -boot c -cpu host -kernel $KERNEL -append
> "crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81"  -initrd
> $INITRAMFS -bios $OVMF_FW_PATH
> 
> And here is the crash log:
> 
> [0.006054] general protection fault:  [#1] SMP
> [0.006459] Modules linked in:
> [0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3
> [0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 0.0.0 02/06/2015
> [0.007000] task: 81e0f480 task.stack: 81e0
> [0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440
> [0.007000] RSP: :81e03d80 EFLAGS: 00010086
> [0.007000] RAX: 80007d084000 RBX:  RCX: 
> 77ff8000
> [0.007000] RDX: 7d084000 RSI: 8000 RDI: 
> 00019a00
> [0.007000] RBP: 81e03dc0 R08:  R09: 
> 88007d085000
> [0.007000] R10: 81e03dd8 R11: 7d095063 R12: 
> 81e5c6a0
> [0.007000] R13: 81ed4f40 R14: 0030 R15: 
> 0001
> [0.007000] FS:  () GS:88007d40()
> knlGS:
> [0.007000] CS:  0010 DS:  ES:  CR0: 80050033
> [0.007000] CR2: 88007d754000 CR3: 0220a000 CR4: 
> 000406b0
> [0.007000] Call Trace:
> [0.007000]  switch_mm+0xd/0x20
> [0.007000]  ? switch_mm+0xd/0x20
> [0.007000]  efi_switch_mm+0x3e/0x4a
> [0.007000]  efi_call_phys_prolog+0x28/0x1ac
> [0.007000]  efi_enter_virtual_mode+0x35a/0x48f
> [0.007000]  start_kernel+0x332/0x3b8
> [0.007000]  x86_64_start_reservations+0x2a/0x2c
> [0.007000]  x86_64_start_kernel+0x178/0x18b
> [0.007000]  secondary_startup_64+0xa5/0xa5
> [0.007000]  ? secondary_startup_64+0xa5/0xa5
> [0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9
> 00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09
> f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87 fb
> 7e 89
> [0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: 81e03d80
> [0.007000] ---[ end trace bfa55bf4e4765255 ]---
> [0.007000] Kernel panic - not syncing: Attempted to kill the idle task!
> [0.007000] ---[ end Kernel panic - not syncing: Attempted to kill
> the idle task!
> 
> 4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf
> firmware and 64-bit x86 kernel) with your patches, the primary kernel
> boots fine on Qemu:
> 
> ovmf firmware used in this case - edk2.git/ovmf-ia32
> 
> 5. Also, if I append 'efi=old_map' to the bootargs (for the failing
> case in point 3 above), I see the primary kernel boots fine on Qemu as
> well.
> 
> Regards,
> Bhupesh

Hi Bhupesh,

Thanks a lot for the detailed explanation. They are helpful to reproduce
the issue quickly. From my initial debug, I think that AMD SME +
efi_mm_struct patches + -cpu host (in qemu) are required to reproduce
the issue on qemu.

I have tried the following combinations (all tests are on qemu):
On Linus's tree:
1. With  SME and  efi_mm and  -cpu host -> panics
2. With  SME and  efi_mm and !-cpu host -> boots
3. With  SME and !efi_mm and  -cpu host -> boots
4. With  SME and !efi_mm and !-cpu host -> boots
5. With !SME and  efi_mm and  -cpu host -> boots
6. With !SME and  efi_mm and !-cpu host -> boots
7. With !SME and !efi_mm and  -cpu host -> boots
8. With !SME and !efi_mm and !-cpu host -> boots

On Matt's tree (no SME):
1. With  efi_mm and  -cpu host -> boots
2. With  efi_mm and !-cpu host -> boots
3. With !efi_mm and  -cpu host -> boots
4. With !efi_mm and !-cpu host -> boots

Summary:
On Matt's tree (next branch), I am unable to reproduce the issue because
they don't have SME patches.

On Linus's tree, with SME patches
(b1b6f83ac938d176742c85757960dec2cf10e468) and my patches and -cpu host
switch enabled in qemu, I was able to reproduce the issue.

Could you please confirm if you are seeing the same behavior?
Specially on real machines (I think, this is equivalent to -cpu host on
qemu) because in earlier mails you have mentioned that you were able to
reproduce this on Matt's tree, but according to my theory it shouldn't
be the case because Matt's three doesn't have SME patches.

[PATCH V2 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

2017-08-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Use helper function (efi_switch_mm()) to switch to/from efi_mm. We
switch to efi_mm before calling
1. efi_set_virtual_address_map() and
2. Invoking any efi_runtime_service()

Likewise, we need to switch back to previous mm (mm context stolen by
efi_mm) after the above calls return successfully. We can use
efi_switch_mm() helper function only with x86_64 kernel and
"efi=old_map" disabled because, x86_32 and efi=old_map doesn't use
efi_pgd, rather they use swapper_pg_dir.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
---
 arch/x86/include/asm/efi.h   | 29 ++---
 arch/x86/platform/efi/efi_64.c   | 36 +---
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 3 files changed, 36 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 2f77bcefe6b4..23b2137a95e5 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -1,10 +1,14 @@
 #ifndef _ASM_X86_EFI_H
 #define _ASM_X86_EFI_H
 
+#include 
+#include 
+
 #include 
 #include 
 #include 
 #include 
+#include 
 
 /*
  * We map the EFI regions needed for runtime services non-contiguously,
@@ -57,14 +61,13 @@ extern u64 asmlinkage efi_call(void *fp, ...);
 #define efi_call_phys(f, args...)  efi_call((f), args)
 
 /*
- * Scratch space used for switching the pagetable in the EFI stub
+ * struct efi_scratch - Scratch space used while switching to/from efi_mm
+ * @phys_stack: stack used during EFI Mixed Mode
+ * @prev_mm:store/restore stolen mm_struct while switching to/from efi_mm
  */
 struct efi_scratch {
-   u64 r15;
-   u64 prev_cr3;
-   pgd_t   *efi_pgt;
-   booluse_pgd;
-   u64 phys_stack;
+   u64 phys_stack;
+   struct mm_struct*prev_mm;
 } __packed;
 
 #define arch_efi_call_virt_setup() \
@@ -73,11 +76,8 @@ struct efi_scratch {
preempt_disable();  \
__kernel_fpu_begin();   \
\
-   if (efi_scratch.use_pgd) {  \
-   efi_scratch.prev_cr3 = read_cr3();  \
-   write_cr3((unsigned long)efi_scratch.efi_pgt);  \
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(_mm); \
 })
 
 #define arch_efi_call_virt(p, f, args...)  \
@@ -85,10 +85,8 @@ struct efi_scratch {
 
 #define arch_efi_call_virt_teardown()  \
 ({ \
-   if (efi_scratch.use_pgd) {  \
-   write_cr3(efi_scratch.prev_cr3);\
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(efi_scratch.prev_mm); \
\
__kernel_fpu_end(); \
preempt_enable();   \
@@ -130,6 +128,7 @@ extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
 extern void efi_delete_dummy_variable(void);
+extern void efi_switch_mm(struct mm_struct *mm);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 0bb98c35e178..e0545f56d703 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -80,9 +80,8 @@ pgd_t * __init efi_call_phys_prolog(void)
int n_pgds, i, j;
 
if (!efi_enabled(EFI_OLD_MEMMAP)) {
-   save_pgd = (pgd_t *)read_cr3();
-   write_cr3((unsigned long)efi_scratch.efi_pgt);
-   goto out;
+   efi_switch_mm(_mm);
+   return NULL;

  1   2   >