Re: [PATCH V2] fork: Improve error message for corrupted page tables

2019-08-06 Thread Sai Praneeth Prakhya
On Tue, 2019-08-06 at 10:36 +0200, Michal Hocko wrote:
> On Mon 05-08-19 20:05:27, Sai Praneeth Prakhya wrote:
> > When a user process exits, the kernel cleans up the mm_struct of the user
> > process and during cleanup, check_mm() checks the page tables of the user
> > process for corruption (E.g: unexpected page flags set/cleared). For
> > corrupted page tables, the error message printed by check_mm() isn't very
> > clear as it prints the loop index instead of page table type (E.g:
> > Resident
> > file mapping pages vs Resident shared memory pages). The loop index in
> > check_mm() is used to index rss_stat[] which represents individual memory
> > type stats. Hence, instead of printing index, print memory type, thereby
> > improving error message.
> > 
> > Without patch:
> > --
> > [  204.836425] mm/pgtable-generic.c:29: bad p4d
> > 89eb4e92(80025f941467)
> > [  204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2
> > [  204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5
> > [  204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480
> > 
> > With patch:
> > ---
> > [   69.815453] mm/pgtable-generic.c:29: bad p4d
> > 84653642(80025ca37467)
> > [   69.815872] BUG: Bad rss-counter state mm:014a6c03
> > type:MM_FILEPAGES val:2
> > [   69.815962] BUG: Bad rss-counter state mm:014a6c03
> > type:MM_ANONPAGES val:5
> > [   69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
> 
> I like this. On any occasion I am investigating an issue with an rss
> inbalance I have to go back to kernel sources to see which pte type that
> is.
> 

Hopefully, this patch will be useful to you the next time you run into any rss
imbalance issues.

> > Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
> > that it matches the other print statement.
> 
> good change as well. Maybe we should also lower the loglevel (in a
> separate patch) as well. While this is not nice because we are
> apparently leaking memory behind it shouldn't be really critical enough
> to jump on normal consoles.

Ya.. I think, probably could be lowered to pr_err() or pr_warn().

Regards,
Sai



[PATCH V3] fork: Improve error message for corrupted page tables

2019-08-06 Thread Sai Praneeth Prakhya
When a user process exits, the kernel cleans up the mm_struct of the user
process and during cleanup, check_mm() checks the page tables of the user
process for corruption (E.g: unexpected page flags set/cleared). For
corrupted page tables, the error message printed by check_mm() isn't very
clear as it prints the loop index instead of page table type (E.g: Resident
file mapping pages vs Resident shared memory pages). The loop index in
check_mm() is used to index rss_stat[] which represents individual memory
type stats. Hence, instead of printing index, print memory type, thereby
improving error message.

Without patch:
--
[  204.836425] mm/pgtable-generic.c:29: bad p4d 
89eb4e92(80025f941467)
[  204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2
[  204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5
[  204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480

With patch:
---
[   69.815453] mm/pgtable-generic.c:29: bad p4d 
84653642(80025ca37467)
[   69.815872] BUG: Bad rss-counter state mm:014a6c03 type:MM_FILEPAGES 
val:2
[   69.815962] BUG: Bad rss-counter state mm:014a6c03 type:MM_ANONPAGES 
val:5
[   69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480

Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
that it matches the other print statement.

Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
Acked-by: Dave Hansen 
Suggested-by: Dave Hansen 
Reviewed-by: Anshuman Khandual 
Signed-off-by: Sai Praneeth Prakhya 
---

Changes from V2 to V3:
--
1. Add comment that suggests to update resident_page_types[] if there are any
   changes to exisiting page types in 
2. Add a build check to enforce resident_page_types[] is always in sync
3. Use a macro to populate elements of resident_page_types[]

Changes from V1 to V2:
--
1. Move struct definition from header file to fork.c file, so that it won't be
   included in every compilation unit. As this struct is used *only* in fork.c,
   include the definition in fork.c itself.
2. Index the struct to match respective macros.
3. Mention about print function change in commit message.

 include/linux/mm_types_task.h |  4 
 kernel/fork.c | 16 ++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h
index d7016dcb245e..c1bc6731125c 100644
--- a/include/linux/mm_types_task.h
+++ b/include/linux/mm_types_task.h
@@ -36,6 +36,10 @@ struct vmacache {
struct vm_area_struct *vmas[VMACACHE_SIZE];
 };
 
+/*
+ * When updating this, please also update struct resident_page_types[] in
+ * kernel/fork.c
+ */
 enum {
MM_FILEPAGES,   /* Resident file mapping pages */
MM_ANONPAGES,   /* Resident anonymous pages */
diff --git a/kernel/fork.c b/kernel/fork.c
index d8ae0f1b4148..7583e0fde0ed 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -125,6 +125,15 @@ int nr_threads;/* The idle threads do 
not count.. */
 
 static int max_threads;/* tunable limit on nr_threads */
 
+#define NAMED_ARRAY_INDEX(x)   [x] = __stringify(x)
+
+static const char * const resident_page_types[] = {
+   NAMED_ARRAY_INDEX(MM_FILEPAGES),
+   NAMED_ARRAY_INDEX(MM_ANONPAGES),
+   NAMED_ARRAY_INDEX(MM_SWAPENTS),
+   NAMED_ARRAY_INDEX(MM_SHMEMPAGES),
+};
+
 DEFINE_PER_CPU(unsigned long, process_counts) = 0;
 
 __cacheline_aligned DEFINE_RWLOCK(tasklist_lock);  /* outer */
@@ -645,12 +654,15 @@ static void check_mm(struct mm_struct *mm)
 {
int i;
 
+   BUILD_BUG_ON_MSG(ARRAY_SIZE(resident_page_types) != NR_MM_COUNTERS,
+"Please make sure 'struct resident_page_types[]' is 
updated as well");
+
for (i = 0; i < NR_MM_COUNTERS; i++) {
long x = atomic_long_read(>rss_stat.count[i]);
 
if (unlikely(x))
-   printk(KERN_ALERT "BUG: Bad rss-counter state "
- "mm:%p idx:%d val:%ld\n", mm, i, x);
+   pr_alert("BUG: Bad rss-counter state mm:%p type:%s 
val:%ld\n",
+mm, resident_page_types[i], x);
}
 
if (mm_pgtables_bytes(mm))
-- 
2.7.4



Re: [PATCH V2] fork: Improve error message for corrupted page tables

2019-08-06 Thread Sai Praneeth Prakhya
On Tue, 2019-08-06 at 09:30 -0700, Dave Hansen wrote:
> On 8/5/19 8:05 PM, Sai Praneeth Prakhya wrote:
> > +static const char * const resident_page_types[NR_MM_COUNTERS] = {
> > +   [MM_FILEPAGES]  = "MM_FILEPAGES",
> > +   [MM_ANONPAGES]  = "MM_ANONPAGES",
> > +   [MM_SWAPENTS]   = "MM_SWAPENTS",
> > +   [MM_SHMEMPAGES] = "MM_SHMEMPAGES",
> > +};
> 
> One trick to ensure that this gets updated if the names are ever
> updated.  You can do:
> 
> #define NAMED_ARRAY_INDEX(x)  [x] = __stringify(x),
> 
> and
> 
> static const char * const resident_page_types[NR_MM_COUNTERS] = {
>   NAMED_ARRAY_INDEX(MM_FILE_PAGES),
>   NAMED_ARRAY_INDEX(MM_SHMEMPAGES),
>   ...
> };

Thanks for the suggestion Dave. I will add this in V3.
Even with this, (if ever) anyone who changes the name of page types or adds an
new entry would still need to update struct resident_page_types[]. So, I will
add the comment as suggested by Vlastimil.

> 
> That makes sure that any name changes make it into the strings.  Then
> stick a:
> 
>   BUILD_BUG_ON(NR_MM_COUNTERS != ARRAY_SIZE(resident_page_types));
> 
> somewhere.  That makes sure that any new array indexes get a string
> added in the array.  Otherwise you get nice, early, compile-time errors.

Sure! this sounds good and a small nit-bit :)
For the BUILD_BUG_ON() to work, the definition of struct should be changed as
below

static const char * const resident_page_types[] = {
...
}

i.e. we should not specify the size of array.

Regards,
Sai



Re: [PATCH] fork: Improve error message for corrupted page tables

2019-08-05 Thread Sai Praneeth Prakhya
On Mon, 2019-08-05 at 15:28 +0200, Vlastimil Babka wrote:
> On 8/2/19 8:46 AM, Prakhya, Sai Praneeth wrote:
> > > > > > +static const char * const resident_page_types[NR_MM_COUNTERS] = {
> > > > > > +   "MM_FILEPAGES",
> > > > > > +   "MM_ANONPAGES",
> > > > > > +   "MM_SWAPENTS",
> > > > > > +   "MM_SHMEMPAGES",
> > > > > > +};
> > > > > 
> > > > > But please let's not put this in a header file.  We're asking the
> > > > > compiler to put a copy of all of this into every compilation unit
> > > > > which includes the header.  Presumably the compiler is smart enough
> > > > > not to do that, but it's not good practice.
> > > > 
> > > > Thanks for the explanation. Makes sense to me.
> > > > 
> > > > Just wanted to check before sending V2, Is it OK if I add this to
> > > > kernel/fork.c? or do you have something else in mind?
> > > 
> > > I was thinking somewhere like mm/util.c so the array could be used by
> > > other
> > > code.  But it seems there is no such code.  Perhaps it's best to just
> > > leave fork.c as
> > > it is now.
> > 
> > Ok, so does that mean have the struct in header file itself?
> 
> If the struct definition (including the string values) was in mm/util.c,
> there would have to be a declaration in a header. If it's in fork.c with
> the only users, there doesn't need to be separate declaration in a header.

Makes sense.

> 
> > Sorry! for too many questions. I wanted to check with you before changing 
> > because it's *the* fork.c file (I presume random changes will not be
> > encouraged here)
> > 
> > I am not yet clear on what's the right thing to do here :(
> > So, could you please help me in deciding.
> 
> fork.c should be fine, IMHO

I was leaning to add struct definition in fork.c as well but just wanted to
check with Andrew before posting V2.

Thanks for the reply though :)

Regards,
Sai



[PATCH V2] fork: Improve error message for corrupted page tables

2019-08-05 Thread Sai Praneeth Prakhya
When a user process exits, the kernel cleans up the mm_struct of the user
process and during cleanup, check_mm() checks the page tables of the user
process for corruption (E.g: unexpected page flags set/cleared). For
corrupted page tables, the error message printed by check_mm() isn't very
clear as it prints the loop index instead of page table type (E.g: Resident
file mapping pages vs Resident shared memory pages). The loop index in
check_mm() is used to index rss_stat[] which represents individual memory
type stats. Hence, instead of printing index, print memory type, thereby
improving error message.

Without patch:
--
[  204.836425] mm/pgtable-generic.c:29: bad p4d 
89eb4e92(80025f941467)
[  204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2
[  204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5
[  204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480

With patch:
---
[   69.815453] mm/pgtable-generic.c:29: bad p4d 
84653642(80025ca37467)
[   69.815872] BUG: Bad rss-counter state mm:014a6c03 type:MM_FILEPAGES 
val:2
[   69.815962] BUG: Bad rss-counter state mm:014a6c03 type:MM_ANONPAGES 
val:5
[   69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480

Also, change print function (from printk(KERN_ALERT, ..) to pr_alert()) so
that it matches the other print statement.

Cc: Ingo Molnar 
Cc: Vlastimil Babka 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Anshuman Khandual 
Acked-by: Dave Hansen 
Suggested-by: Dave Hansen 
Signed-off-by: Sai Praneeth Prakhya 
---

Changes from V1 to V2:
--
1. Move struct definition from header file to fork.c file, so that it won't be
   included in every compilation unit. As this struct is used *only* in fork.c,
   include the definition in fork.c itself.
2. Index the struct to match respective macros.
3. Mention about print function change in commit message.

 kernel/fork.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index d8ae0f1b4148..f34f441c50c0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -125,6 +125,13 @@ int nr_threads;/* The idle threads do 
not count.. */
 
 static int max_threads;/* tunable limit on nr_threads */
 
+static const char * const resident_page_types[NR_MM_COUNTERS] = {
+   [MM_FILEPAGES]  = "MM_FILEPAGES",
+   [MM_ANONPAGES]  = "MM_ANONPAGES",
+   [MM_SWAPENTS]   = "MM_SWAPENTS",
+   [MM_SHMEMPAGES] = "MM_SHMEMPAGES",
+};
+
 DEFINE_PER_CPU(unsigned long, process_counts) = 0;
 
 __cacheline_aligned DEFINE_RWLOCK(tasklist_lock);  /* outer */
@@ -649,8 +656,8 @@ static void check_mm(struct mm_struct *mm)
long x = atomic_long_read(>rss_stat.count[i]);
 
if (unlikely(x))
-   printk(KERN_ALERT "BUG: Bad rss-counter state "
- "mm:%p idx:%d val:%ld\n", mm, i, x);
+   pr_alert("BUG: Bad rss-counter state mm:%p type:%s 
val:%ld\n",
+mm, resident_page_types[i], x);
}
 
if (mm_pgtables_bytes(mm))
-- 
2.7.4



Re: [PATCH] fork: Improve error message for corrupted page tables

2019-07-31 Thread Sai Praneeth Prakhya


> > With patch:
> > ---
> > [   69.815453] mm/pgtable-generic.c:29: bad p4d
> > 84653642(80025ca37467)
> > [   69.815872] BUG: Bad rss-counter state mm:014a6c03
> > type:MM_FILEPAGES val:2
> > [   69.815962] BUG: Bad rss-counter state mm:014a6c03
> > type:MM_ANONPAGES val:5
> > [   69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480
> 
> Seems useful.
> 
> > --- a/include/linux/mm_types_task.h
> > +++ b/include/linux/mm_types_task.h
> > @@ -44,6 +44,13 @@ enum {
> > NR_MM_COUNTERS
> >  };
> >  
> > +static const char * const resident_page_types[NR_MM_COUNTERS] = {
> > +   "MM_FILEPAGES",
> > +   "MM_ANONPAGES",
> > +   "MM_SWAPENTS",
> > +   "MM_SHMEMPAGES",
> > +};
> 
> But please let's not put this in a header file.  We're asking the
> compiler to put a copy of all of this into every compilation unit which
> includes the header.  Presumably the compiler is smart enough not to
> do that, but it's not good practice.

Thanks for the explanation. Makes sense to me.

Just wanted to check before sending V2,
Is it OK if I add this to kernel/fork.c? or do you have something else in
mind?

Regards,
Sai



[PATCH] fork: Improve error message for corrupted page tables

2019-07-30 Thread Sai Praneeth Prakhya
When a user process exits, the kernel cleans up the mm_struct of the user
process and during cleanup, check_mm() checks the page tables of the user
process for corruption (E.g: unexpected page flags set/cleared). For
corrupted page tables, the error message printed by check_mm() isn't very
clear as it prints the loop index instead of page table type (E.g: Resident
file mapping pages vs Resident shared memory pages). Hence, improve the
error message so that it's more informative.

Without patch:
--
[  204.836425] mm/pgtable-generic.c:29: bad p4d 
89eb4e92(80025f941467)
[  204.836544] BUG: Bad rss-counter state mm:f75895ea idx:0 val:2
[  204.836615] BUG: Bad rss-counter state mm:f75895ea idx:1 val:5
[  204.836685] BUG: non-zero pgtables_bytes on freeing mm: 20480

With patch:
---
[   69.815453] mm/pgtable-generic.c:29: bad p4d 
84653642(80025ca37467)
[   69.815872] BUG: Bad rss-counter state mm:014a6c03 type:MM_FILEPAGES 
val:2
[   69.815962] BUG: Bad rss-counter state mm:014a6c03 type:MM_ANONPAGES 
val:5
[   69.816050] BUG: non-zero pgtables_bytes on freeing mm: 20480

Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Suggested-by/Acked-by: Dave Hansen 
Signed-off-by: Sai Praneeth Prakhya 
---
 include/linux/mm_types_task.h | 7 +++
 kernel/fork.c | 4 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h
index d7016dcb245e..881f4ea3a1b5 100644
--- a/include/linux/mm_types_task.h
+++ b/include/linux/mm_types_task.h
@@ -44,6 +44,13 @@ enum {
NR_MM_COUNTERS
 };
 
+static const char * const resident_page_types[NR_MM_COUNTERS] = {
+   "MM_FILEPAGES",
+   "MM_ANONPAGES",
+   "MM_SWAPENTS",
+   "MM_SHMEMPAGES",
+};
+
 #if USE_SPLIT_PTE_PTLOCKS && defined(CONFIG_MMU)
 #define SPLIT_RSS_COUNTING
 /* per-thread cached information, */
diff --git a/kernel/fork.c b/kernel/fork.c
index 2852d0e76ea3..6aef5842d4e0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -649,8 +649,8 @@ static void check_mm(struct mm_struct *mm)
long x = atomic_long_read(>rss_stat.count[i]);
 
if (unlikely(x))
-   printk(KERN_ALERT "BUG: Bad rss-counter state "
- "mm:%p idx:%d val:%ld\n", mm, i, x);
+   pr_alert("BUG: Bad rss-counter state mm:%p type:%s 
val:%ld\n",
+mm, resident_page_types[i], x);
}
 
if (mm_pgtables_bytes(mm))
-- 
2.19.1



[tip:efi/core] x86/efi: Mark can_free_region() as an __init function

2019-02-03 Thread tip-bot for Sai Praneeth Prakhya
Commit-ID:  8fe55212aacfce9b7718de7964b3a3096ec30919
Gitweb: https://git.kernel.org/tip/8fe55212aacfce9b7718de7964b3a3096ec30919
Author: Sai Praneeth Prakhya 
AuthorDate: Sat, 2 Feb 2019 10:41:10 +0100
Committer:  Ingo Molnar 
CommitDate: Mon, 4 Feb 2019 08:19:22 +0100

x86/efi: Mark can_free_region() as an __init function

can_free_region() is called only once during boot, by
efi_reserve_boot_services().

Hence, mark it as an __init function.

Signed-off-by: Sai Praneeth Prakhya 
Signed-off-by: Ard Biesheuvel 
Cc: AKASHI Takahiro 
Cc: Alexander Graf 
Cc: Bjorn Andersson 
Cc: Borislav Petkov 
Cc: Heinrich Schuchardt 
Cc: Jeffrey Hugo 
Cc: Lee Jones 
Cc: Leif Lindholm 
Cc: Linus Torvalds 
Cc: Matt Fleming 
Cc: Peter Jones 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/20190202094119.13230-2-ard.biesheu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/platform/efi/quirks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 17456a1d3f04..9ce85e605052 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -304,7 +304,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
  * - Not within any part of the kernel
  * - Not the BIOS reserved area (E820_TYPE_RESERVED, E820_TYPE_NVS, etc)
  */
-static bool can_free_region(u64 start, u64 size)
+static __init bool can_free_region(u64 start, u64 size)
 {
if (start + size > __pa_symbol(_text) && start <= __pa_symbol(_end))
return false;


[tip:efi/core] x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP and EFI_MIXED_MODE

2018-12-22 Thread tip-bot for Sai Praneeth Prakhya
Commit-ID:  1debf0958fa27b7c469dbf22754929ec59a7c0e7
Gitweb: https://git.kernel.org/tip/1debf0958fa27b7c469dbf22754929ec59a7c0e7
Author: Sai Praneeth Prakhya 
AuthorDate: Fri, 21 Dec 2018 18:22:34 -0800
Committer:  Ingo Molnar 
CommitDate: Sat, 22 Dec 2018 20:58:30 +0100

x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP and 
EFI_MIXED_MODE

The following commit:

  d5052a7130a6 ("x86/efi: Unmap EFI boot services code/data regions from 
efi_pgd")

forgets to take two EFI modes into consideration, namely EFI_OLD_MEMMAP and
EFI_MIXED_MODE:

- EFI_OLD_MEMMAP is a legacy way of mapping EFI regions into swapper_pg_dir
  using ioremap() and init_memory_mapping(). This feature can be enabled by
  passing "efi=old_map" as kernel command line argument. But,
  efi_unmap_pages() unmaps EFI boot services code/data regions *only* from
  efi_pgd and hence cannot be used for unmapping EFI boot services code/data
  regions from swapper_pg_dir.

Introduce a temporary fix to not unmap EFI boot services code/data regions
when EFI_OLD_MEMMAP is enabled while working on a real fix.

- EFI_MIXED_MODE is another feature where a 64-bit kernel runs on a
  64-bit platform crippled by a 32-bit firmware. To support EFI_MIXED_MODE,
  all RAM (i.e. namely EFI regions like EFI_CONVENTIONAL_MEMORY,
  EFI_LOADER_, EFI_BOOT_SERVICES_ and
  EFI_RUNTIME_CODE/DATA regions) is mapped into efi_pgd all the time to
  facilitate EFI runtime calls access it's arguments in 1:1 mode.

Hence, don't unmap EFI boot services code/data regions when booted in mixed 
mode.

Signed-off-by: Sai Praneeth Prakhya 
Acked-by: Ard Biesheuvel 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Link: 
http://lkml.kernel.org/r/20181222022234.7573-1-sai.praneeth.prak...@intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/platform/efi/quirks.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 09e811b9da26..17456a1d3f04 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -380,6 +380,22 @@ static void __init efi_unmap_pages(efi_memory_desc_t *md)
u64 pa = md->phys_addr;
u64 va = md->virt_addr;
 
+   /*
+* To Do: Remove this check after adding functionality to unmap EFI boot
+* services code/data regions from direct mapping area because
+* "efi=old_map" maps EFI regions in swapper_pg_dir.
+*/
+   if (efi_enabled(EFI_OLD_MEMMAP))
+   return;
+
+   /*
+* EFI mixed mode has all RAM mapped to access arguments while making
+* EFI runtime calls, hence don't unmap EFI boot services code/data
+* regions.
+*/
+   if (!efi_is_native())
+   return;
+
if (kernel_unmap_pages_in_pgd(pgd, pa, md->num_pages))
pr_err("Failed to unmap 1:1 mapping for 0x%llx\n", pa);
 


[PATCH] x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP and EFI_MIXED_MODE

2018-12-21 Thread Sai Praneeth Prakhya
Commit d5052a7130a6 ("x86/efi: Unmap EFI boot services code/data regions
from efi_pgd") forgets to take two EFI modes into consideration namely
EFI_OLD_MEMMAP and EFI_MIXED_MODE.

EFI_OLD_MEMMAP is a legacy way of mapping EFI regions into swapper_pg_dir
using ioremap() and init_memory_mapping(). This feature can be enabled by
passing "efi=old_map" as kernel command line argument. But,
efi_unmap_pages() unmaps EFI boot services code/data regions *only* from
efi_pgd and hence cannot be used for unmapping EFI boot services code/data
regions from swapper_pg_dir.

Introduce a temporary fix to not unmap EFI boot services code/data regions
when EFI_OLD_MEMMAP is enabled while working on a real fix.

EFI_MIXED_MODE is another feature where a 64-bit kernel runs on a
64-bit platform crippled by a 32-bit firmware. To support EFI_MIXED_MODE,
all RAM (i.e. namely EFI regions like EFI_CONVENTIONAL_MEMORY,
EFI_LOADER_, EFI_BOOT_SERVICES_ and
EFI_RUNTIME_CODE/DATA regions) is mapped into efi_pgd all the time to
facilitate EFI runtime calls access it's arguments in 1:1 mode. Hence,
don't unmap EFI boot services code/data regions when booted in mixed mode.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Ard Biesheuvel 
---
 arch/x86/platform/efi/quirks.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 09e811b9da26..9c34230aaeae 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -380,6 +380,22 @@ static void __init efi_unmap_pages(efi_memory_desc_t *md)
u64 pa = md->phys_addr;
u64 va = md->virt_addr;
 
+   /*
+* To Do: Remove this check after adding functionality to unmap EFI boot
+* services code/data regions from direct mapping area because
+* "efi=old_map" maps EFI regions in swapper_pg_dir.
+*/
+   if (efi_enabled(EFI_OLD_MEMMAP))
+   return;
+
+   /*
+* EFI mixed mode has all RAM mapped to access arguments while making
+* EFI runtime calls, hence don't unmap EFI boot services code/data
+* regions.
+*/
+   if (!efi_is_native() && IS_ENABLED(CONFIG_EFI_MIXED))
+   return;
+
if (kernel_unmap_pages_in_pgd(pgd, pa, md->num_pages))
pr_err("Failed to unmap 1:1 mapping for 0x%llx\n", pa);
 
-- 
2.19.1



[tip:efi/core] x86/efi: Move efi__boot_services() to arch/x86

2018-11-30 Thread tip-bot for Sai Praneeth Prakhya
Commit-ID:  47c33a095e1fae376d74b4160a0d73c1a4e73969
Gitweb: https://git.kernel.org/tip/47c33a095e1fae376d74b4160a0d73c1a4e73969
Author: Sai Praneeth Prakhya 
AuthorDate: Thu, 29 Nov 2018 18:12:25 +0100
Committer:  Ingo Molnar 
CommitDate: Fri, 30 Nov 2018 09:10:31 +0100

x86/efi: Move efi__boot_services() to arch/x86

efi__boot_services() are x86 specific quirks and as such
should be in asm/efi.h, so move them from linux/efi.h. Also, call
efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86
specific call and ideally shouldn't be part of init/main.c

Signed-off-by: Sai Praneeth Prakhya 
Signed-off-by: Ard Biesheuvel 
Acked-by: Thomas Gleixner 
Cc: Andy Lutomirski 
Cc: Arend van Spriel 
Cc: Bhupesh Sharma 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Eric Snowberg 
Cc: Hans de Goede 
Cc: Joe Perches 
Cc: Jon Hunter 
Cc: Julien Thierry 
Cc: Linus Torvalds 
Cc: Marc Zyngier 
Cc: Matt Fleming 
Cc: Nathan Chancellor 
Cc: Peter Zijlstra 
Cc: Sedat Dilek 
Cc: YiFei Zhu 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/20181129171230.18699-7-ard.biesheu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/efi.h  | 2 ++
 arch/x86/platform/efi/efi.c | 2 ++
 include/linux/efi.h | 3 ---
 init/main.c | 4 
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index eea40d52ca78..d1e64ac80b9c 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -141,6 +141,8 @@ extern int __init efi_reuse_config(u64 tables, int 
nr_tables);
 extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
 extern void efi_recover_from_page_fault(unsigned long phys_addr);
+extern void efi_free_boot_services(void);
+extern void efi_reserve_boot_services(void);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 7ae939e353cd..e1cb01a22fa8 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -993,6 +993,8 @@ static void __init __efi_enter_virtual_mode(void)
panic("EFI call to SetVirtualAddressMap() failed!");
}
 
+   efi_free_boot_services();
+
/*
 * Now that EFI is in virtual mode, update the function
 * pointers in the runtime service table to the new virtual addresses.
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 100ce4a4aff6..2b3b33c83b05 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1000,13 +1000,11 @@ extern void efi_memmap_walk (efi_freemem_callback_t 
callback, void *arg);
 extern void efi_gettimeofday (struct timespec64 *ts);
 extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, 
if possible */
 #ifdef CONFIG_X86
-extern void efi_free_boot_services(void);
 extern efi_status_t efi_query_variable_store(u32 attributes,
 unsigned long size,
 bool nonblocking);
 extern void efi_find_mirror(void);
 #else
-static inline void efi_free_boot_services(void) {}
 
 static inline efi_status_t efi_query_variable_store(u32 attributes,
unsigned long size,
@@ -1046,7 +1044,6 @@ extern void efi_mem_reserve(phys_addr_t addr, u64 size);
 extern int efi_mem_reserve_persistent(phys_addr_t addr, u64 size);
 extern void efi_initialize_iomem_resources(struct resource *code_resource,
struct resource *data_resource, struct resource *bss_resource);
-extern void efi_reserve_boot_services(void);
 extern int efi_get_fdt_params(struct efi_fdt_params *params);
 extern struct kobject *efi_kobj;
 
diff --git a/init/main.c b/init/main.c
index ee147103ba1b..ccefcd8e855f 100644
--- a/init/main.c
+++ b/init/main.c
@@ -737,10 +737,6 @@ asmlinkage __visible void __init start_kernel(void)
arch_post_acpi_subsys_init();
sfi_init_late();
 
-   if (efi_enabled(EFI_RUNTIME_SERVICES)) {
-   efi_free_boot_services();
-   }
-
/* Do the rest non-__init'ed, we're now alive */
arch_call_rest_init();
 }


[tip:efi/core] x86/efi: Move efi__boot_services() to arch/x86

2018-11-30 Thread tip-bot for Sai Praneeth Prakhya
Commit-ID:  47c33a095e1fae376d74b4160a0d73c1a4e73969
Gitweb: https://git.kernel.org/tip/47c33a095e1fae376d74b4160a0d73c1a4e73969
Author: Sai Praneeth Prakhya 
AuthorDate: Thu, 29 Nov 2018 18:12:25 +0100
Committer:  Ingo Molnar 
CommitDate: Fri, 30 Nov 2018 09:10:31 +0100

x86/efi: Move efi__boot_services() to arch/x86

efi__boot_services() are x86 specific quirks and as such
should be in asm/efi.h, so move them from linux/efi.h. Also, call
efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86
specific call and ideally shouldn't be part of init/main.c

Signed-off-by: Sai Praneeth Prakhya 
Signed-off-by: Ard Biesheuvel 
Acked-by: Thomas Gleixner 
Cc: Andy Lutomirski 
Cc: Arend van Spriel 
Cc: Bhupesh Sharma 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Eric Snowberg 
Cc: Hans de Goede 
Cc: Joe Perches 
Cc: Jon Hunter 
Cc: Julien Thierry 
Cc: Linus Torvalds 
Cc: Marc Zyngier 
Cc: Matt Fleming 
Cc: Nathan Chancellor 
Cc: Peter Zijlstra 
Cc: Sedat Dilek 
Cc: YiFei Zhu 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/20181129171230.18699-7-ard.biesheu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/efi.h  | 2 ++
 arch/x86/platform/efi/efi.c | 2 ++
 include/linux/efi.h | 3 ---
 init/main.c | 4 
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index eea40d52ca78..d1e64ac80b9c 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -141,6 +141,8 @@ extern int __init efi_reuse_config(u64 tables, int 
nr_tables);
 extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
 extern void efi_recover_from_page_fault(unsigned long phys_addr);
+extern void efi_free_boot_services(void);
+extern void efi_reserve_boot_services(void);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 7ae939e353cd..e1cb01a22fa8 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -993,6 +993,8 @@ static void __init __efi_enter_virtual_mode(void)
panic("EFI call to SetVirtualAddressMap() failed!");
}
 
+   efi_free_boot_services();
+
/*
 * Now that EFI is in virtual mode, update the function
 * pointers in the runtime service table to the new virtual addresses.
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 100ce4a4aff6..2b3b33c83b05 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1000,13 +1000,11 @@ extern void efi_memmap_walk (efi_freemem_callback_t 
callback, void *arg);
 extern void efi_gettimeofday (struct timespec64 *ts);
 extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, 
if possible */
 #ifdef CONFIG_X86
-extern void efi_free_boot_services(void);
 extern efi_status_t efi_query_variable_store(u32 attributes,
 unsigned long size,
 bool nonblocking);
 extern void efi_find_mirror(void);
 #else
-static inline void efi_free_boot_services(void) {}
 
 static inline efi_status_t efi_query_variable_store(u32 attributes,
unsigned long size,
@@ -1046,7 +1044,6 @@ extern void efi_mem_reserve(phys_addr_t addr, u64 size);
 extern int efi_mem_reserve_persistent(phys_addr_t addr, u64 size);
 extern void efi_initialize_iomem_resources(struct resource *code_resource,
struct resource *data_resource, struct resource *bss_resource);
-extern void efi_reserve_boot_services(void);
 extern int efi_get_fdt_params(struct efi_fdt_params *params);
 extern struct kobject *efi_kobj;
 
diff --git a/init/main.c b/init/main.c
index ee147103ba1b..ccefcd8e855f 100644
--- a/init/main.c
+++ b/init/main.c
@@ -737,10 +737,6 @@ asmlinkage __visible void __init start_kernel(void)
arch_post_acpi_subsys_init();
sfi_init_late();
 
-   if (efi_enabled(EFI_RUNTIME_SERVICES)) {
-   efi_free_boot_services();
-   }
-
/* Do the rest non-__init'ed, we're now alive */
arch_call_rest_init();
 }


[tip:efi/core] x86/efi: Unmap EFI boot services code/data regions from efi_pgd

2018-11-30 Thread tip-bot for Sai Praneeth Prakhya
Commit-ID:  08cfb38f3ef49cfd1bba11a00401451606477d80
Gitweb: https://git.kernel.org/tip/08cfb38f3ef49cfd1bba11a00401451606477d80
Author: Sai Praneeth Prakhya 
AuthorDate: Thu, 29 Nov 2018 18:12:24 +0100
Committer:  Ingo Molnar 
CommitDate: Fri, 30 Nov 2018 09:10:30 +0100

x86/efi: Unmap EFI boot services code/data regions from efi_pgd

efi_free_boot_services(), as the name suggests, frees EFI boot services
code/data regions but forgets to unmap these regions from efi_pgd. This
means that any code that's running in efi_pgd address space (e.g:
any EFI runtime service) would still be able to access these regions but
the contents of these regions would have long been over written by
someone else. So, it's important to unmap these regions. Hence,
introduce efi_unmap_pages() to unmap these regions from efi_pgd.

After unmapping EFI boot services code/data regions, any illegal access
by buggy firmware to these regions would result in page fault which will
be handled by EFI specific fault handler.

Signed-off-by: Sai Praneeth Prakhya 
Signed-off-by: Ard Biesheuvel 
Acked-by: Thomas Gleixner 
Cc: Andy Lutomirski 
Cc: Arend van Spriel 
Cc: Bhupesh Sharma 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Eric Snowberg 
Cc: Hans de Goede 
Cc: Joe Perches 
Cc: Jon Hunter 
Cc: Julien Thierry 
Cc: Linus Torvalds 
Cc: Marc Zyngier 
Cc: Matt Fleming 
Cc: Nathan Chancellor 
Cc: Peter Zijlstra 
Cc: Sedat Dilek 
Cc: YiFei Zhu 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/20181129171230.18699-6-ard.biesheu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/platform/efi/quirks.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 95e77a667ba5..09e811b9da26 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -369,6 +369,24 @@ void __init efi_reserve_boot_services(void)
}
 }
 
+/*
+ * Apart from having VA mappings for EFI boot services code/data regions,
+ * (duplicate) 1:1 mappings were also created as a quirk for buggy firmware. 
So,
+ * unmap both 1:1 and VA mappings.
+ */
+static void __init efi_unmap_pages(efi_memory_desc_t *md)
+{
+   pgd_t *pgd = efi_mm.pgd;
+   u64 pa = md->phys_addr;
+   u64 va = md->virt_addr;
+
+   if (kernel_unmap_pages_in_pgd(pgd, pa, md->num_pages))
+   pr_err("Failed to unmap 1:1 mapping for 0x%llx\n", pa);
+
+   if (kernel_unmap_pages_in_pgd(pgd, va, md->num_pages))
+   pr_err("Failed to unmap VA mapping for 0x%llx\n", va);
+}
+
 void __init efi_free_boot_services(void)
 {
phys_addr_t new_phys, new_size;
@@ -393,6 +411,13 @@ void __init efi_free_boot_services(void)
continue;
}
 
+   /*
+* Before calling set_virtual_address_map(), EFI boot services
+* code/data regions were mapped as a quirk for buggy firmware.
+* Unmap them from efi_pgd before freeing them up.
+*/
+   efi_unmap_pages(md);
+
/*
 * Nasty quirk: if all sub-1MB memory is used for boot
 * services, we can get here without having allocated the


[tip:efi/core] x86/efi: Unmap EFI boot services code/data regions from efi_pgd

2018-11-30 Thread tip-bot for Sai Praneeth Prakhya
Commit-ID:  08cfb38f3ef49cfd1bba11a00401451606477d80
Gitweb: https://git.kernel.org/tip/08cfb38f3ef49cfd1bba11a00401451606477d80
Author: Sai Praneeth Prakhya 
AuthorDate: Thu, 29 Nov 2018 18:12:24 +0100
Committer:  Ingo Molnar 
CommitDate: Fri, 30 Nov 2018 09:10:30 +0100

x86/efi: Unmap EFI boot services code/data regions from efi_pgd

efi_free_boot_services(), as the name suggests, frees EFI boot services
code/data regions but forgets to unmap these regions from efi_pgd. This
means that any code that's running in efi_pgd address space (e.g:
any EFI runtime service) would still be able to access these regions but
the contents of these regions would have long been over written by
someone else. So, it's important to unmap these regions. Hence,
introduce efi_unmap_pages() to unmap these regions from efi_pgd.

After unmapping EFI boot services code/data regions, any illegal access
by buggy firmware to these regions would result in page fault which will
be handled by EFI specific fault handler.

Signed-off-by: Sai Praneeth Prakhya 
Signed-off-by: Ard Biesheuvel 
Acked-by: Thomas Gleixner 
Cc: Andy Lutomirski 
Cc: Arend van Spriel 
Cc: Bhupesh Sharma 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Eric Snowberg 
Cc: Hans de Goede 
Cc: Joe Perches 
Cc: Jon Hunter 
Cc: Julien Thierry 
Cc: Linus Torvalds 
Cc: Marc Zyngier 
Cc: Matt Fleming 
Cc: Nathan Chancellor 
Cc: Peter Zijlstra 
Cc: Sedat Dilek 
Cc: YiFei Zhu 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/20181129171230.18699-6-ard.biesheu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/platform/efi/quirks.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 95e77a667ba5..09e811b9da26 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -369,6 +369,24 @@ void __init efi_reserve_boot_services(void)
}
 }
 
+/*
+ * Apart from having VA mappings for EFI boot services code/data regions,
+ * (duplicate) 1:1 mappings were also created as a quirk for buggy firmware. 
So,
+ * unmap both 1:1 and VA mappings.
+ */
+static void __init efi_unmap_pages(efi_memory_desc_t *md)
+{
+   pgd_t *pgd = efi_mm.pgd;
+   u64 pa = md->phys_addr;
+   u64 va = md->virt_addr;
+
+   if (kernel_unmap_pages_in_pgd(pgd, pa, md->num_pages))
+   pr_err("Failed to unmap 1:1 mapping for 0x%llx\n", pa);
+
+   if (kernel_unmap_pages_in_pgd(pgd, va, md->num_pages))
+   pr_err("Failed to unmap VA mapping for 0x%llx\n", va);
+}
+
 void __init efi_free_boot_services(void)
 {
phys_addr_t new_phys, new_size;
@@ -393,6 +411,13 @@ void __init efi_free_boot_services(void)
continue;
}
 
+   /*
+* Before calling set_virtual_address_map(), EFI boot services
+* code/data regions were mapped as a quirk for buggy firmware.
+* Unmap them from efi_pgd before freeing them up.
+*/
+   efi_unmap_pages(md);
+
/*
 * Nasty quirk: if all sub-1MB memory is used for boot
 * services, we can get here without having allocated the


[tip:efi/core] x86/mm/pageattr: Introduce helper function to unmap EFI boot services

2018-11-30 Thread tip-bot for Sai Praneeth Prakhya
Commit-ID:  7e0dabd3010d6041ee0a952c1146b2150a11f1be
Gitweb: https://git.kernel.org/tip/7e0dabd3010d6041ee0a952c1146b2150a11f1be
Author: Sai Praneeth Prakhya 
AuthorDate: Thu, 29 Nov 2018 18:12:23 +0100
Committer:  Ingo Molnar 
CommitDate: Fri, 30 Nov 2018 09:10:30 +0100

x86/mm/pageattr: Introduce helper function to unmap EFI boot services

Ideally, after kernel assumes control of the platform, firmware
shouldn't access EFI boot services code/data regions. But, it's noticed
that this is not so true in many x86 platforms. Hence, during boot,
kernel reserves EFI boot services code/data regions [1] and maps [2]
them to efi_pgd so that call to set_virtual_address_map() doesn't fail.
After returning from set_virtual_address_map(), kernel frees the
reserved regions [3] but they still remain mapped. Hence, introduce
kernel_unmap_pages_in_pgd() which will later be used to unmap EFI boot
services code/data regions.

While at it modify kernel_map_pages_in_pgd() by:

1. Adding __init modifier because it's always used *only* during boot.
2. Add a warning if it's used after SMP is initialized because it uses
   __flush_tlb_all() which flushes mappings only on current CPU.

Unmapping EFI boot services code/data regions will result in clearing
PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already
handled by protnone_mask() at arch/x86/include/asm/pgtable-invert.h.

[1] efi_reserve_boot_services()
[2] efi_map_region() -> __map_region() -> kernel_map_pages_in_pgd()
[3] efi_free_boot_services()

Signed-off-by: Sai Praneeth Prakhya 
Signed-off-by: Ard Biesheuvel 
Reviewed-by: Thomas Gleixner 
Cc: Andy Lutomirski 
Cc: Arend van Spriel 
Cc: Bhupesh Sharma 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Eric Snowberg 
Cc: Hans de Goede 
Cc: Joe Perches 
Cc: Jon Hunter 
Cc: Julien Thierry 
Cc: Linus Torvalds 
Cc: Marc Zyngier 
Cc: Matt Fleming 
Cc: Nathan Chancellor 
Cc: Peter Zijlstra 
Cc: Sedat Dilek 
Cc: YiFei Zhu 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/20181129171230.18699-5-ard.biesheu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/pgtable_types.h |  8 ++--
 arch/x86/mm/pageattr.c   | 40 ++--
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index 106b7d0e2dae..d6ff0bbdb394 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -564,8 +564,12 @@ extern pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned 
long address,
unsigned int *level);
 extern pmd_t *lookup_pmd_address(unsigned long address);
 extern phys_addr_t slow_virt_to_phys(void *__address);
-extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
-  unsigned numpages, unsigned long page_flags);
+extern int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn,
+ unsigned long address,
+ unsigned numpages,
+ unsigned long page_flags);
+extern int __init kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
+   unsigned long numpages);
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_X86_PGTABLE_DEFS_H */
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index db7a10082238..bac35001d896 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -2338,8 +2338,8 @@ bool kernel_page_present(struct page *page)
 
 #endif /* CONFIG_DEBUG_PAGEALLOC */
 
-int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
-   unsigned numpages, unsigned long page_flags)
+int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
+  unsigned numpages, unsigned long page_flags)
 {
int retval = -EINVAL;
 
@@ -2353,6 +2353,8 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned 
long address,
.flags = 0,
};
 
+   WARN_ONCE(num_online_cpus() > 1, "Don't call after initializing SMP");
+
if (!(__supported_pte_mask & _PAGE_NX))
goto out;
 
@@ -2374,6 +2376,40 @@ out:
return retval;
 }
 
+/*
+ * __flush_tlb_all() flushes mappings only on current CPU and hence this
+ * function shouldn't be used in an SMP environment. Presently, it's used only
+ * during boot (way before smp_init()) by EFI subsystem and hence is ok.
+ */
+int __init kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
+unsigned long numpages)
+{
+   int retval;
+
+   /*
+* The typical sequence for unmapping is to find a pte through
+* lookup_address_in_pgd() (ideally, it should never return NULL because
+* the address is already mapped) and change i

[tip:efi/core] x86/mm/pageattr: Introduce helper function to unmap EFI boot services

2018-11-30 Thread tip-bot for Sai Praneeth Prakhya
Commit-ID:  7e0dabd3010d6041ee0a952c1146b2150a11f1be
Gitweb: https://git.kernel.org/tip/7e0dabd3010d6041ee0a952c1146b2150a11f1be
Author: Sai Praneeth Prakhya 
AuthorDate: Thu, 29 Nov 2018 18:12:23 +0100
Committer:  Ingo Molnar 
CommitDate: Fri, 30 Nov 2018 09:10:30 +0100

x86/mm/pageattr: Introduce helper function to unmap EFI boot services

Ideally, after kernel assumes control of the platform, firmware
shouldn't access EFI boot services code/data regions. But, it's noticed
that this is not so true in many x86 platforms. Hence, during boot,
kernel reserves EFI boot services code/data regions [1] and maps [2]
them to efi_pgd so that call to set_virtual_address_map() doesn't fail.
After returning from set_virtual_address_map(), kernel frees the
reserved regions [3] but they still remain mapped. Hence, introduce
kernel_unmap_pages_in_pgd() which will later be used to unmap EFI boot
services code/data regions.

While at it modify kernel_map_pages_in_pgd() by:

1. Adding __init modifier because it's always used *only* during boot.
2. Add a warning if it's used after SMP is initialized because it uses
   __flush_tlb_all() which flushes mappings only on current CPU.

Unmapping EFI boot services code/data regions will result in clearing
PAGE_PRESENT bit and it shouldn't bother L1TF cases because it's already
handled by protnone_mask() at arch/x86/include/asm/pgtable-invert.h.

[1] efi_reserve_boot_services()
[2] efi_map_region() -> __map_region() -> kernel_map_pages_in_pgd()
[3] efi_free_boot_services()

Signed-off-by: Sai Praneeth Prakhya 
Signed-off-by: Ard Biesheuvel 
Reviewed-by: Thomas Gleixner 
Cc: Andy Lutomirski 
Cc: Arend van Spriel 
Cc: Bhupesh Sharma 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: Eric Snowberg 
Cc: Hans de Goede 
Cc: Joe Perches 
Cc: Jon Hunter 
Cc: Julien Thierry 
Cc: Linus Torvalds 
Cc: Marc Zyngier 
Cc: Matt Fleming 
Cc: Nathan Chancellor 
Cc: Peter Zijlstra 
Cc: Sedat Dilek 
Cc: YiFei Zhu 
Cc: linux-...@vger.kernel.org
Link: http://lkml.kernel.org/r/20181129171230.18699-5-ard.biesheu...@linaro.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/pgtable_types.h |  8 ++--
 arch/x86/mm/pageattr.c   | 40 ++--
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index 106b7d0e2dae..d6ff0bbdb394 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -564,8 +564,12 @@ extern pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned 
long address,
unsigned int *level);
 extern pmd_t *lookup_pmd_address(unsigned long address);
 extern phys_addr_t slow_virt_to_phys(void *__address);
-extern int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
-  unsigned numpages, unsigned long page_flags);
+extern int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn,
+ unsigned long address,
+ unsigned numpages,
+ unsigned long page_flags);
+extern int __init kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
+   unsigned long numpages);
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_X86_PGTABLE_DEFS_H */
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index db7a10082238..bac35001d896 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -2338,8 +2338,8 @@ bool kernel_page_present(struct page *page)
 
 #endif /* CONFIG_DEBUG_PAGEALLOC */
 
-int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
-   unsigned numpages, unsigned long page_flags)
+int __init kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
+  unsigned numpages, unsigned long page_flags)
 {
int retval = -EINVAL;
 
@@ -2353,6 +2353,8 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned 
long address,
.flags = 0,
};
 
+   WARN_ONCE(num_online_cpus() > 1, "Don't call after initializing SMP");
+
if (!(__supported_pte_mask & _PAGE_NX))
goto out;
 
@@ -2374,6 +2376,40 @@ out:
return retval;
 }
 
+/*
+ * __flush_tlb_all() flushes mappings only on current CPU and hence this
+ * function shouldn't be used in an SMP environment. Presently, it's used only
+ * during boot (way before smp_init()) by EFI subsystem and hence is ok.
+ */
+int __init kernel_unmap_pages_in_pgd(pgd_t *pgd, unsigned long address,
+unsigned long numpages)
+{
+   int retval;
+
+   /*
+* The typical sequence for unmapping is to find a pte through
+* lookup_address_in_pgd() (ideally, it should never return NULL because
+* the address is already mapped) and change i

[PATCH V5 0/2] Add efi page fault handler to recover from page

2018-09-10 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

There may exist some buggy UEFI firmware implementations that access efi
memory regions other than EFI_RUNTIME_SERVICES_ even after
the kernel has assumed control of the platform. This violates UEFI
specification. Hence, provide a efi specific page fault handler which
recovers from page faults caused by buggy firmware.

Page faults triggered by firmware happen at ring 0 and if unhandled,
hangs the kernel. So, provide an efi specific page fault handler to:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

The efi page fault handler will check if the access is by
efi_reset_system().
1. If so, then the efi page fault handler will reboot the machine
   through BIOS and not through efi_reset_system().
2. If not, then the efi page fault handler will freeze efi_rts_wq and
   schedules a new process.

This issue was reported by Al Stone when he saw that reboot via EFI hangs
the machine. Upon debugging, I found that it's efi_reset_system() that's
touching memory regions which it shouldn't. To reproduce the same
behavior, I have hacked OVMF and made efi_reset_system() buggy. Along
with efi_reset_system(), I have also modified get_next_high_mono_count()
and set_virtual_address_map(). They illegally access both boot time and
other efi regions.

Testing the patch set:
--
1. Download buggy firmware from here [1].
2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
Add reboot=efi to the kernel command line arguments and after the kernel
is up and running, type "reboot". The kernel should hang while rebooting.
3. With the same setup, boot kernel after applying patches and the
reboot should work fine. Also please notice warning/error messages
printed by kernel.

Changes from RFC to V1:
---
1. Drop "long jump" technique of dealing with illegal access and instead
   use scheduling away from efi_rts_wq.

Changes from V1 to V2:
--
1. Shortened config name to CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS from
   CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES.
2. Made the config option available only to expert users.
3. efi_free_boot_services() should be called only when
   CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is not enabled. Previously, this
   was part of init/main.c file. As it is an architecture agnostic code,
   moved the change to arch/x86/platform/efi/quirks.c file.

Changes from V2 to V3:
--
1. Drop treating illegal access to EFI_BOOT_SERVICES_ regions
   separately from illegal accesses to other regions like
   EFI_CONVENTIONAL_MEMORY or EFI_LOADER_.
   In previous versions, illegal access to EFI_BOOT_SERVICES_
   regions were handled by mapping requested region to efi_pgd but from
   V3 they are handled similar to illegal access to other regions i.e by
   freezing efi_rts_wq and scheduling new process.
2. Change __efi_init_fixup attribute to __efi_init.

Changes from V3 to V4:
--
1. Drop saving original memory map passed by kernel. It also means less
   checks in efi page fault handler.
2. Change the config name to EFI_PAGE_FAULT_HANDLER to reflect it's
   functionality more appropriately.

Changes from V4 to V5:
--
1. Drop config option that enables efi page fault handler, instead make
   it default.
2. Call schedule() in an infinite loop to account for spurious wake ups.
3. Introduce "NONE" as an efi runtime service function identifier so that
   it could be used in efi_recover_from_page_fault() to check if the page
   fault was indeed triggered by an efi runtime service.

Note:
-
Patch set based on "next" branch in efi tree.

[1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt

Sai Praneeth (2):
  efi: Make efi_rts_work accessible to efi page fault handler
  x86/efi: Add efi page fault handler to recover from page faults caused
by the firmware

 arch/x86/include/asm/efi.h  |  1 +
 arch/x86/mm/fault.c |  9 
 arch/x86/platform/efi/quirks.c  | 78 +
 drivers/firmware/efi/runtime-wrappers.c | 61 +++---
 include/linux/efi.h | 42 ++
 5 files changed, 147 insertions(+), 44 deletions(-)

Tested-by: Bhupesh Sharma 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 

-- 
2.7.4



[PATCH V5 0/2] Add efi page fault handler to recover from page

2018-09-10 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

There may exist some buggy UEFI firmware implementations that access efi
memory regions other than EFI_RUNTIME_SERVICES_ even after
the kernel has assumed control of the platform. This violates UEFI
specification. Hence, provide a efi specific page fault handler which
recovers from page faults caused by buggy firmware.

Page faults triggered by firmware happen at ring 0 and if unhandled,
hangs the kernel. So, provide an efi specific page fault handler to:
1. Avoid panics/hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

The efi page fault handler will check if the access is by
efi_reset_system().
1. If so, then the efi page fault handler will reboot the machine
   through BIOS and not through efi_reset_system().
2. If not, then the efi page fault handler will freeze efi_rts_wq and
   schedules a new process.

This issue was reported by Al Stone when he saw that reboot via EFI hangs
the machine. Upon debugging, I found that it's efi_reset_system() that's
touching memory regions which it shouldn't. To reproduce the same
behavior, I have hacked OVMF and made efi_reset_system() buggy. Along
with efi_reset_system(), I have also modified get_next_high_mono_count()
and set_virtual_address_map(). They illegally access both boot time and
other efi regions.

Testing the patch set:
--
1. Download buggy firmware from here [1].
2. Run a qemu instance with this buggy BIOS and boot mainline kernel.
Add reboot=efi to the kernel command line arguments and after the kernel
is up and running, type "reboot". The kernel should hang while rebooting.
3. With the same setup, boot kernel after applying patches and the
reboot should work fine. Also please notice warning/error messages
printed by kernel.

Changes from RFC to V1:
---
1. Drop "long jump" technique of dealing with illegal access and instead
   use scheduling away from efi_rts_wq.

Changes from V1 to V2:
--
1. Shortened config name to CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS from
   CONFIG_EFI_WARN_ON_ILLEGAL_ACCESSES.
2. Made the config option available only to expert users.
3. efi_free_boot_services() should be called only when
   CONFIG_EFI_WARN_ON_ILLEGAL_ACCESS is not enabled. Previously, this
   was part of init/main.c file. As it is an architecture agnostic code,
   moved the change to arch/x86/platform/efi/quirks.c file.

Changes from V2 to V3:
--
1. Drop treating illegal access to EFI_BOOT_SERVICES_ regions
   separately from illegal accesses to other regions like
   EFI_CONVENTIONAL_MEMORY or EFI_LOADER_.
   In previous versions, illegal access to EFI_BOOT_SERVICES_
   regions were handled by mapping requested region to efi_pgd but from
   V3 they are handled similar to illegal access to other regions i.e by
   freezing efi_rts_wq and scheduling new process.
2. Change __efi_init_fixup attribute to __efi_init.

Changes from V3 to V4:
--
1. Drop saving original memory map passed by kernel. It also means less
   checks in efi page fault handler.
2. Change the config name to EFI_PAGE_FAULT_HANDLER to reflect it's
   functionality more appropriately.

Changes from V4 to V5:
--
1. Drop config option that enables efi page fault handler, instead make
   it default.
2. Call schedule() in an infinite loop to account for spurious wake ups.
3. Introduce "NONE" as an efi runtime service function identifier so that
   it could be used in efi_recover_from_page_fault() to check if the page
   fault was indeed triggered by an efi runtime service.

Note:
-
Patch set based on "next" branch in efi tree.

[1] https://drive.google.com/drive/folders/1VozKTms92ifyVHAT0ZDQe55ZYL1UE5wt

Sai Praneeth (2):
  efi: Make efi_rts_work accessible to efi page fault handler
  x86/efi: Add efi page fault handler to recover from page faults caused
by the firmware

 arch/x86/include/asm/efi.h  |  1 +
 arch/x86/mm/fault.c |  9 
 arch/x86/platform/efi/quirks.c  | 78 +
 drivers/firmware/efi/runtime-wrappers.c | 61 +++---
 include/linux/efi.h | 42 ++
 5 files changed, 147 insertions(+), 44 deletions(-)

Tested-by: Bhupesh Sharma 
Suggested-by: Matt Fleming 
Based-on-code-from: Ricardo Neri 
Signed-off-by: Sai Praneeth Prakhya 
Cc: Al Stone 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Bhupesh Sharma 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 

-- 
2.7.4



[PATCH V3] x86/speculation: Support Enhanced IBRS on future CPUs

2018-08-01 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Future Intel processors will support "Enhanced IBRS" which is an "always
on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and never
disabled.

>From the specification [1]:

 "With enhanced IBRS, the predicted targets of indirect branches
  executed cannot be controlled by software that was executed in a less
  privileged predictor mode or on another logical processor. As a
  result, software operating on a processor with enhanced IBRS need not
  use WRMSR to set IA32_SPEC_CTRL.IBRS after every transition to a more
  privileged predictor mode. Software can isolate predictor modes
  effectively simply by setting the bit once. Software need not disable
  enhanced IBRS prior to entering a sleep state such as MWAIT or HLT."

If Enhanced IBRS is supported by the processor then use it as the
preferred spectre v2 mitigation mechanism instead of Retpoline. Intel's
Retpoline white paper [2] states:

 "Retpoline is known to be an effective branch target injection (Spectre
  variant 2) mitigation on Intel processors belonging to family 6
  (enumerated by the CPUID instruction) that do not have support for
  enhanced IBRS. On processors that support enhanced IBRS, it should be
  used for mitigation instead of retpoline."

The reason why Enhanced IBRS is the recommended mitigation on processors
which support it is that these processors also support CET which
provides a defense against ROP attacks. Retpoline is very similar to ROP
techniques and might trigger false positives in the CET defense.

If Enhanced IBRS is selected as the mitigation technique for spectre v2,
the IBRS bit in SPEC_CTRL MSR is set once at boot time and never
cleared. Kernel also has to make sure that IBRS bit remains set after
VMEXIT because the guest might have cleared the bit. This is already
covered by the existing x86_spec_ctrl_set_guest() and
x86_spec_ctrl_restore_host() speculation control functions.

Enhanced IBRS still requires IBPB for full mitigation.

[1] Speculative-Execution-Side-Channel-Mitigations.pdf
[2] Retpoline-A-Branch-Target-Injection-Mitigation.pdf
Both the documents are available at:
https://bugzilla.kernel.org/show_bug.cgi?id=199511

Signed-off-by: Sai Praneeth Prakhya 
Originally-by: David Woodhouse 
Cc: Ingo Molnar 
Cc: Tim C Chen 
Cc: Dave Hansen 
Cc: Thomas Gleixner 
Cc: Ravi Shankar 
---
 arch/x86/include/asm/cpufeatures.h   |  1 +
 arch/x86/include/asm/nospec-branch.h |  2 +-
 arch/x86/kernel/cpu/bugs.c   | 23 +--
 arch/x86/kernel/cpu/common.c |  3 +++
 4 files changed, 26 insertions(+), 3 deletions(-)

 Changes from V2 to V3:
 1. Improve commit message as suggested by Thomas i.e.
a. Use indentation when quoting from specification.
b. Refrain from using "this patch" and "we".
c. Restructuring and enhancing information on the real reason for
   using Enhanced IBRS as the default spectre V2 mitigation technique.
 2. Remove "ibrs_enhanced" feature string as its not needed.
 3. Remove unnecessary WARN_ON_ONCE().
 4. Add explicit wrmsrl() after setting IBRS bit in x86_spec_ctrl_base.

 Changes from V1 to V2:
 1. Explicitly spell out in the change log, the reason for using Enhanced
 IBRS as the default spectre V2 mitigation technique instead of Retpoline.

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 5701f5cecd31..568fa20254f7 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -219,6 +219,7 @@
 #define X86_FEATURE_IBPB   ( 7*32+26) /* Indirect Branch 
Prediction Barrier */
 #define X86_FEATURE_STIBP  ( 7*32+27) /* Single Thread Indirect 
Branch Predictors */
 #define X86_FEATURE_ZEN( 7*32+28) /* "" CPU is AMD 
family 0x17 (Zen) */
+#define X86_FEATURE_IBRS_ENHANCED  ( 7*32+29) /* Enhanced IBRS */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index f6f6c63da62f..fd2a8c1b88bc 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -214,7 +214,7 @@ enum spectre_v2_mitigation {
SPECTRE_V2_RETPOLINE_MINIMAL_AMD,
SPECTRE_V2_RETPOLINE_GENERIC,
SPECTRE_V2_RETPOLINE_AMD,
-   SPECTRE_V2_IBRS,
+   SPECTRE_V2_IBRS_ENHANCED,
 };
 
 /* The Speculative Store Bypass disable variants */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 5c0ea39311fe..4e4be8512a77 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -130,6 +130,7 @@ static const char *spectre_v2_strings[] = {
[SPECTRE_V2_RETPOLINE_MINIMAL_AMD]  = "Vulnerable: Minimal AMD ASM 
retpoline",
[SPECTRE_V2_RETPOLINE_GENERIC]  = "Mitigation: Full gener

[PATCH V3] x86/speculation: Support Enhanced IBRS on future CPUs

2018-08-01 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Future Intel processors will support "Enhanced IBRS" which is an "always
on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and never
disabled.

>From the specification [1]:

 "With enhanced IBRS, the predicted targets of indirect branches
  executed cannot be controlled by software that was executed in a less
  privileged predictor mode or on another logical processor. As a
  result, software operating on a processor with enhanced IBRS need not
  use WRMSR to set IA32_SPEC_CTRL.IBRS after every transition to a more
  privileged predictor mode. Software can isolate predictor modes
  effectively simply by setting the bit once. Software need not disable
  enhanced IBRS prior to entering a sleep state such as MWAIT or HLT."

If Enhanced IBRS is supported by the processor then use it as the
preferred spectre v2 mitigation mechanism instead of Retpoline. Intel's
Retpoline white paper [2] states:

 "Retpoline is known to be an effective branch target injection (Spectre
  variant 2) mitigation on Intel processors belonging to family 6
  (enumerated by the CPUID instruction) that do not have support for
  enhanced IBRS. On processors that support enhanced IBRS, it should be
  used for mitigation instead of retpoline."

The reason why Enhanced IBRS is the recommended mitigation on processors
which support it is that these processors also support CET which
provides a defense against ROP attacks. Retpoline is very similar to ROP
techniques and might trigger false positives in the CET defense.

If Enhanced IBRS is selected as the mitigation technique for spectre v2,
the IBRS bit in SPEC_CTRL MSR is set once at boot time and never
cleared. Kernel also has to make sure that IBRS bit remains set after
VMEXIT because the guest might have cleared the bit. This is already
covered by the existing x86_spec_ctrl_set_guest() and
x86_spec_ctrl_restore_host() speculation control functions.

Enhanced IBRS still requires IBPB for full mitigation.

[1] Speculative-Execution-Side-Channel-Mitigations.pdf
[2] Retpoline-A-Branch-Target-Injection-Mitigation.pdf
Both the documents are available at:
https://bugzilla.kernel.org/show_bug.cgi?id=199511

Signed-off-by: Sai Praneeth Prakhya 
Originally-by: David Woodhouse 
Cc: Ingo Molnar 
Cc: Tim C Chen 
Cc: Dave Hansen 
Cc: Thomas Gleixner 
Cc: Ravi Shankar 
---
 arch/x86/include/asm/cpufeatures.h   |  1 +
 arch/x86/include/asm/nospec-branch.h |  2 +-
 arch/x86/kernel/cpu/bugs.c   | 23 +--
 arch/x86/kernel/cpu/common.c |  3 +++
 4 files changed, 26 insertions(+), 3 deletions(-)

 Changes from V2 to V3:
 1. Improve commit message as suggested by Thomas i.e.
a. Use indentation when quoting from specification.
b. Refrain from using "this patch" and "we".
c. Restructuring and enhancing information on the real reason for
   using Enhanced IBRS as the default spectre V2 mitigation technique.
 2. Remove "ibrs_enhanced" feature string as its not needed.
 3. Remove unnecessary WARN_ON_ONCE().
 4. Add explicit wrmsrl() after setting IBRS bit in x86_spec_ctrl_base.

 Changes from V1 to V2:
 1. Explicitly spell out in the change log, the reason for using Enhanced
 IBRS as the default spectre V2 mitigation technique instead of Retpoline.

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 5701f5cecd31..568fa20254f7 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -219,6 +219,7 @@
 #define X86_FEATURE_IBPB   ( 7*32+26) /* Indirect Branch 
Prediction Barrier */
 #define X86_FEATURE_STIBP  ( 7*32+27) /* Single Thread Indirect 
Branch Predictors */
 #define X86_FEATURE_ZEN( 7*32+28) /* "" CPU is AMD 
family 0x17 (Zen) */
+#define X86_FEATURE_IBRS_ENHANCED  ( 7*32+29) /* Enhanced IBRS */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index f6f6c63da62f..fd2a8c1b88bc 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -214,7 +214,7 @@ enum spectre_v2_mitigation {
SPECTRE_V2_RETPOLINE_MINIMAL_AMD,
SPECTRE_V2_RETPOLINE_GENERIC,
SPECTRE_V2_RETPOLINE_AMD,
-   SPECTRE_V2_IBRS,
+   SPECTRE_V2_IBRS_ENHANCED,
 };
 
 /* The Speculative Store Bypass disable variants */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 5c0ea39311fe..4e4be8512a77 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -130,6 +130,7 @@ static const char *spectre_v2_strings[] = {
[SPECTRE_V2_RETPOLINE_MINIMAL_AMD]  = "Vulnerable: Minimal AMD ASM 
retpoline",
[SPECTRE_V2_RETPOLINE_GENERIC]  = "Mitigation: Full gener

[PATCH V2] x86/speculation: Support Enhanced IBRS on future CPUs

2018-07-30 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Some future Intel processors may support "Enhanced IBRS" which is an
"always on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and
never disabled.

[With enhanced IBRS, the predicted targets of indirect branches executed
cannot be controlled by software that was executed in a less privileged
predictor mode or on another logical processor. As a result, software
operating on a processor with enhanced IBRS need not use WRMSR to set
IA32_SPEC_CTRL.IBRS after every transition to a more privileged
predictor mode. Software can isolate predictor modes effectively simply
by setting the bit once. Software need not disable enhanced IBRS prior
to entering a sleep state such as MWAIT or HLT.] - Specification [1]

Even with enhanced IBRS, we still need to make sure that IBRS bit in
SPEC_CTRL MSR is always set i.e. while booting, if we detect support for
Enhanced IBRS, we enable IBRS bit in SPEC_CTRL MSR and we should also
make sure that it remains set always. In other words, if the guest has
cleared IBRS bit, upon VMEXIT the bit should still be set.

Fortunately, kernel already has the infrastructure ready. kvm/vmx.c does
x86_spec_ctrl_set_guest() before entering guest and
x86_spec_ctrl_restore_host() after leaving guest. So, the guest view of
SPEC_CTRL MSR is restored before entering guest and the host view of
SPEC_CTRL MSR is restored before entering host and hence IBRS will be
set after VMEXIT.

Intel's white paper on Retpoline [2] says that "Retpoline is known to be
an effective branch target injection (Spectre variant 2) mitigation on
Intel processors belonging to family 6 (enumerated by the CPUID
instruction) that do not have support for enhanced IBRS. On processors
that support enhanced IBRS, it should be used for mitigation instead of
retpoline."

This means, Intel recommends using Enhanced IBRS over Retpoline where
available and it also means that retpoline provides less mitigation on
processors with enhanced IBRS compared to those without. Hence, on
processors that support Enhanced IBRS, this patch makes Enhanced IBRS as
the default Spectre V2 mitigation technique instead of retpoline. Also,
we still need IBPB even with enhanced IBRS.

[1] 
https://software.intel.com/sites/default/files/managed/c5/63/336996-Speculative-Execution-Side-Channel-Mitigations.pdf
[2] 
https://software.intel.com/sites/default/files/managed/1d/46/Retpoline-A-Branch-Target-Injection-Mitigation.pdf

Signed-off-by: Sai Praneeth Prakhya 
Originally-by: David Woodhouse 
Cc: Tim C Chen 
Cc: Dave Hansen 
Cc: Thomas Gleixner 
Cc: Ravi Shankar 
Cc: Ingo Molnar 
---
 arch/x86/include/asm/cpufeatures.h   |  1 +
 arch/x86/include/asm/nospec-branch.h |  2 +-
 arch/x86/kernel/cpu/bugs.c   | 29 +++--
 arch/x86/kernel/cpu/common.c |  3 +++
 4 files changed, 32 insertions(+), 3 deletions(-)
 
 Changes from V1 to V2:
 1. Explicitly spell out in the change log, the reason for using Enhanced
 IBRS as the default Spectre V2 mitigation technique instead of Retpoline.

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 5701f5cecd31..f75815b1dbee 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -219,6 +219,7 @@
 #define X86_FEATURE_IBPB   ( 7*32+26) /* Indirect Branch 
Prediction Barrier */
 #define X86_FEATURE_STIBP  ( 7*32+27) /* Single Thread Indirect 
Branch Predictors */
 #define X86_FEATURE_ZEN( 7*32+28) /* "" CPU is AMD 
family 0x17 (Zen) */
+#define X86_FEATURE_IBRS_ENHANCED  ( 7*32+29) /* "ibrs_enhanced" 
Use Enhanced IBRS in kernel */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index f6f6c63da62f..fd2a8c1b88bc 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -214,7 +214,7 @@ enum spectre_v2_mitigation {
SPECTRE_V2_RETPOLINE_MINIMAL_AMD,
SPECTRE_V2_RETPOLINE_GENERIC,
SPECTRE_V2_RETPOLINE_AMD,
-   SPECTRE_V2_IBRS,
+   SPECTRE_V2_IBRS_ENHANCED,
 };
 
 /* The Speculative Store Bypass disable variants */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 5c0ea39311fe..a66517de1301 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -130,6 +130,7 @@ static const char *spectre_v2_strings[] = {
[SPECTRE_V2_RETPOLINE_MINIMAL_AMD]  = "Vulnerable: Minimal AMD ASM 
retpoline",
[SPECTRE_V2_RETPOLINE_GENERIC]  = "Mitigation: Full generic 
retpoline",
[SPECTRE_V2_RETPOLINE_AMD]  = "Mitigation: Full AMD 
retpoline",
+   [SPECTRE_V2_IBRS_ENHANCED]  = "Mitigation: Enhanced IBRS",
 };
 
 #undef pr_fm

[PATCH V2] x86/speculation: Support Enhanced IBRS on future CPUs

2018-07-30 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Some future Intel processors may support "Enhanced IBRS" which is an
"always on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and
never disabled.

[With enhanced IBRS, the predicted targets of indirect branches executed
cannot be controlled by software that was executed in a less privileged
predictor mode or on another logical processor. As a result, software
operating on a processor with enhanced IBRS need not use WRMSR to set
IA32_SPEC_CTRL.IBRS after every transition to a more privileged
predictor mode. Software can isolate predictor modes effectively simply
by setting the bit once. Software need not disable enhanced IBRS prior
to entering a sleep state such as MWAIT or HLT.] - Specification [1]

Even with enhanced IBRS, we still need to make sure that IBRS bit in
SPEC_CTRL MSR is always set i.e. while booting, if we detect support for
Enhanced IBRS, we enable IBRS bit in SPEC_CTRL MSR and we should also
make sure that it remains set always. In other words, if the guest has
cleared IBRS bit, upon VMEXIT the bit should still be set.

Fortunately, kernel already has the infrastructure ready. kvm/vmx.c does
x86_spec_ctrl_set_guest() before entering guest and
x86_spec_ctrl_restore_host() after leaving guest. So, the guest view of
SPEC_CTRL MSR is restored before entering guest and the host view of
SPEC_CTRL MSR is restored before entering host and hence IBRS will be
set after VMEXIT.

Intel's white paper on Retpoline [2] says that "Retpoline is known to be
an effective branch target injection (Spectre variant 2) mitigation on
Intel processors belonging to family 6 (enumerated by the CPUID
instruction) that do not have support for enhanced IBRS. On processors
that support enhanced IBRS, it should be used for mitigation instead of
retpoline."

This means, Intel recommends using Enhanced IBRS over Retpoline where
available and it also means that retpoline provides less mitigation on
processors with enhanced IBRS compared to those without. Hence, on
processors that support Enhanced IBRS, this patch makes Enhanced IBRS as
the default Spectre V2 mitigation technique instead of retpoline. Also,
we still need IBPB even with enhanced IBRS.

[1] 
https://software.intel.com/sites/default/files/managed/c5/63/336996-Speculative-Execution-Side-Channel-Mitigations.pdf
[2] 
https://software.intel.com/sites/default/files/managed/1d/46/Retpoline-A-Branch-Target-Injection-Mitigation.pdf

Signed-off-by: Sai Praneeth Prakhya 
Originally-by: David Woodhouse 
Cc: Tim C Chen 
Cc: Dave Hansen 
Cc: Thomas Gleixner 
Cc: Ravi Shankar 
Cc: Ingo Molnar 
---
 arch/x86/include/asm/cpufeatures.h   |  1 +
 arch/x86/include/asm/nospec-branch.h |  2 +-
 arch/x86/kernel/cpu/bugs.c   | 29 +++--
 arch/x86/kernel/cpu/common.c |  3 +++
 4 files changed, 32 insertions(+), 3 deletions(-)
 
 Changes from V1 to V2:
 1. Explicitly spell out in the change log, the reason for using Enhanced
 IBRS as the default Spectre V2 mitigation technique instead of Retpoline.

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 5701f5cecd31..f75815b1dbee 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -219,6 +219,7 @@
 #define X86_FEATURE_IBPB   ( 7*32+26) /* Indirect Branch 
Prediction Barrier */
 #define X86_FEATURE_STIBP  ( 7*32+27) /* Single Thread Indirect 
Branch Predictors */
 #define X86_FEATURE_ZEN( 7*32+28) /* "" CPU is AMD 
family 0x17 (Zen) */
+#define X86_FEATURE_IBRS_ENHANCED  ( 7*32+29) /* "ibrs_enhanced" 
Use Enhanced IBRS in kernel */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index f6f6c63da62f..fd2a8c1b88bc 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -214,7 +214,7 @@ enum spectre_v2_mitigation {
SPECTRE_V2_RETPOLINE_MINIMAL_AMD,
SPECTRE_V2_RETPOLINE_GENERIC,
SPECTRE_V2_RETPOLINE_AMD,
-   SPECTRE_V2_IBRS,
+   SPECTRE_V2_IBRS_ENHANCED,
 };
 
 /* The Speculative Store Bypass disable variants */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 5c0ea39311fe..a66517de1301 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -130,6 +130,7 @@ static const char *spectre_v2_strings[] = {
[SPECTRE_V2_RETPOLINE_MINIMAL_AMD]  = "Vulnerable: Minimal AMD ASM 
retpoline",
[SPECTRE_V2_RETPOLINE_GENERIC]  = "Mitigation: Full generic 
retpoline",
[SPECTRE_V2_RETPOLINE_AMD]  = "Mitigation: Full AMD 
retpoline",
+   [SPECTRE_V2_IBRS_ENHANCED]  = "Mitigation: Enhanced IBRS",
 };
 
 #undef pr_fm

[PATCH] x86/speculation: Support Enhanced IBRS on future CPUs

2018-07-24 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Some future Intel processors may support "Enhanced IBRS" which is an
"always on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and
never disabled. According to specification[1], this should simplify
software enabling and improve performance.

[With enhanced IBRS, the predicted targets of indirect branches executed
cannot be controlled by software that was executed in a less privileged
predictor mode or on another logical processor. As a result, software
operating on a processor with enhanced IBRS need not use WRMSR to set
IA32_SPEC_CTRL.IBRS after every transition to a more privileged
predictor mode. Software can isolate predictor modes effectively simply
by setting the bit once. Software need not disable enhanced IBRS prior
to entering a sleep state such as MWAIT or HLT.] - Specification

Even with enhanced IBRS, we still need to make sure that IBRS bit in
SPEC_CTRL MSR is always set i.e. while booting, if we detect support for
Enhanced IBRS, we enable IBRS bit in SPEC_CTRL MSR and we should also
make sure that it remains set always. In other words, if the guest has
cleared IBRS bit, upon VMEXIT the bit should still be set.

Fortunately, kernel already has the infrastructure ready. kvm/vmx.c does
x86_spec_ctrl_set_guest() before entering guest and
x86_spec_ctrl_restore_host() after leaving guest. So, the guest view of
SPEC_CTRL MSR is restored before entering guest and the host view of
SPEC_CTRL MSR is restored before entering host and hence IBRS will be
set after VMEXIT.

For Intel CPUs that support Enhanced IBRS, this patch also makes
Enhanced IBRS as the default Spectre V2 mitigation technique instead of
retpoline. Also, we still need IBPB even with enhanced IBRS.

[1] 
https://software.intel.com/sites/default/files/managed/c5/63/336996-Speculative-Execution-Side-Channel-Mitigations.pdf

Signed-off-by: Sai Praneeth Prakhya 
Originally-by: David Woodhouse 
Cc: Tim C Chen 
Cc: Dave Hansen 
Cc: Thomas Gleixner 
Cc: Ravi Shankar 
Cc: Ingo Molnar 
---
 arch/x86/include/asm/cpufeatures.h   |  1 +
 arch/x86/include/asm/nospec-branch.h |  2 +-
 arch/x86/kernel/cpu/bugs.c   | 29 +++--
 arch/x86/kernel/cpu/common.c |  3 +++
 4 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 5701f5cecd31..f75815b1dbee 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -219,6 +219,7 @@
 #define X86_FEATURE_IBPB   ( 7*32+26) /* Indirect Branch 
Prediction Barrier */
 #define X86_FEATURE_STIBP  ( 7*32+27) /* Single Thread Indirect 
Branch Predictors */
 #define X86_FEATURE_ZEN( 7*32+28) /* "" CPU is AMD 
family 0x17 (Zen) */
+#define X86_FEATURE_IBRS_ENHANCED  ( 7*32+29) /* "ibrs_enhanced" 
Use Enhanced IBRS in kernel */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index f6f6c63da62f..fd2a8c1b88bc 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -214,7 +214,7 @@ enum spectre_v2_mitigation {
SPECTRE_V2_RETPOLINE_MINIMAL_AMD,
SPECTRE_V2_RETPOLINE_GENERIC,
SPECTRE_V2_RETPOLINE_AMD,
-   SPECTRE_V2_IBRS,
+   SPECTRE_V2_IBRS_ENHANCED,
 };
 
 /* The Speculative Store Bypass disable variants */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 5c0ea39311fe..a66517de1301 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -130,6 +130,7 @@ static const char *spectre_v2_strings[] = {
[SPECTRE_V2_RETPOLINE_MINIMAL_AMD]  = "Vulnerable: Minimal AMD ASM 
retpoline",
[SPECTRE_V2_RETPOLINE_GENERIC]  = "Mitigation: Full generic 
retpoline",
[SPECTRE_V2_RETPOLINE_AMD]  = "Mitigation: Full AMD 
retpoline",
+   [SPECTRE_V2_IBRS_ENHANCED]  = "Mitigation: Enhanced IBRS",
 };
 
 #undef pr_fmt
@@ -349,6 +350,8 @@ static void __init spectre_v2_select_mitigation(void)
 
case SPECTRE_V2_CMD_FORCE:
case SPECTRE_V2_CMD_AUTO:
+   if (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED))
+   goto skip_retpoline_enable_ibrs;
if (IS_ENABLED(CONFIG_RETPOLINE))
goto retpoline_auto;
break;
@@ -385,7 +388,22 @@ static void __init spectre_v2_select_mitigation(void)
 SPECTRE_V2_RETPOLINE_MINIMAL;
setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
}
+   goto enable_other_mitigations;
 
+skip_retpoline_enable_ibrs:
+   mode = SPECTRE_V2_IBRS_ENHANCED;
+
+   /*
+* As we don't use IBRS in kernel, nobody should

[PATCH] x86/speculation: Support Enhanced IBRS on future CPUs

2018-07-24 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Some future Intel processors may support "Enhanced IBRS" which is an
"always on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and
never disabled. According to specification[1], this should simplify
software enabling and improve performance.

[With enhanced IBRS, the predicted targets of indirect branches executed
cannot be controlled by software that was executed in a less privileged
predictor mode or on another logical processor. As a result, software
operating on a processor with enhanced IBRS need not use WRMSR to set
IA32_SPEC_CTRL.IBRS after every transition to a more privileged
predictor mode. Software can isolate predictor modes effectively simply
by setting the bit once. Software need not disable enhanced IBRS prior
to entering a sleep state such as MWAIT or HLT.] - Specification

Even with enhanced IBRS, we still need to make sure that IBRS bit in
SPEC_CTRL MSR is always set i.e. while booting, if we detect support for
Enhanced IBRS, we enable IBRS bit in SPEC_CTRL MSR and we should also
make sure that it remains set always. In other words, if the guest has
cleared IBRS bit, upon VMEXIT the bit should still be set.

Fortunately, kernel already has the infrastructure ready. kvm/vmx.c does
x86_spec_ctrl_set_guest() before entering guest and
x86_spec_ctrl_restore_host() after leaving guest. So, the guest view of
SPEC_CTRL MSR is restored before entering guest and the host view of
SPEC_CTRL MSR is restored before entering host and hence IBRS will be
set after VMEXIT.

For Intel CPUs that support Enhanced IBRS, this patch also makes
Enhanced IBRS as the default Spectre V2 mitigation technique instead of
retpoline. Also, we still need IBPB even with enhanced IBRS.

[1] 
https://software.intel.com/sites/default/files/managed/c5/63/336996-Speculative-Execution-Side-Channel-Mitigations.pdf

Signed-off-by: Sai Praneeth Prakhya 
Originally-by: David Woodhouse 
Cc: Tim C Chen 
Cc: Dave Hansen 
Cc: Thomas Gleixner 
Cc: Ravi Shankar 
Cc: Ingo Molnar 
---
 arch/x86/include/asm/cpufeatures.h   |  1 +
 arch/x86/include/asm/nospec-branch.h |  2 +-
 arch/x86/kernel/cpu/bugs.c   | 29 +++--
 arch/x86/kernel/cpu/common.c |  3 +++
 4 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 5701f5cecd31..f75815b1dbee 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -219,6 +219,7 @@
 #define X86_FEATURE_IBPB   ( 7*32+26) /* Indirect Branch 
Prediction Barrier */
 #define X86_FEATURE_STIBP  ( 7*32+27) /* Single Thread Indirect 
Branch Predictors */
 #define X86_FEATURE_ZEN( 7*32+28) /* "" CPU is AMD 
family 0x17 (Zen) */
+#define X86_FEATURE_IBRS_ENHANCED  ( 7*32+29) /* "ibrs_enhanced" 
Use Enhanced IBRS in kernel */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index f6f6c63da62f..fd2a8c1b88bc 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -214,7 +214,7 @@ enum spectre_v2_mitigation {
SPECTRE_V2_RETPOLINE_MINIMAL_AMD,
SPECTRE_V2_RETPOLINE_GENERIC,
SPECTRE_V2_RETPOLINE_AMD,
-   SPECTRE_V2_IBRS,
+   SPECTRE_V2_IBRS_ENHANCED,
 };
 
 /* The Speculative Store Bypass disable variants */
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 5c0ea39311fe..a66517de1301 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -130,6 +130,7 @@ static const char *spectre_v2_strings[] = {
[SPECTRE_V2_RETPOLINE_MINIMAL_AMD]  = "Vulnerable: Minimal AMD ASM 
retpoline",
[SPECTRE_V2_RETPOLINE_GENERIC]  = "Mitigation: Full generic 
retpoline",
[SPECTRE_V2_RETPOLINE_AMD]  = "Mitigation: Full AMD 
retpoline",
+   [SPECTRE_V2_IBRS_ENHANCED]  = "Mitigation: Enhanced IBRS",
 };
 
 #undef pr_fmt
@@ -349,6 +350,8 @@ static void __init spectre_v2_select_mitigation(void)
 
case SPECTRE_V2_CMD_FORCE:
case SPECTRE_V2_CMD_AUTO:
+   if (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED))
+   goto skip_retpoline_enable_ibrs;
if (IS_ENABLED(CONFIG_RETPOLINE))
goto retpoline_auto;
break;
@@ -385,7 +388,22 @@ static void __init spectre_v2_select_mitigation(void)
 SPECTRE_V2_RETPOLINE_MINIMAL;
setup_force_cpu_cap(X86_FEATURE_RETPOLINE);
}
+   goto enable_other_mitigations;
 
+skip_retpoline_enable_ibrs:
+   mode = SPECTRE_V2_IBRS_ENHANCED;
+
+   /*
+* As we don't use IBRS in kernel, nobody should

[PATCH 4/6] x86/efi: Free existing memory map before installing new memory map

2018-07-02 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

efi_memmap_install(), unmaps the existing memory map and installs a new
memory map but doesn't free the memory allocated to the existing
memory map. Fortunately, the details about the existing memory map (like
the physical address, number of entries and type of memory) are
stored in efi.memmap. Hence, use them to free the memory.

In __efi_enter_virtual_mode(), we don't use efi_memmap_install() to
install a new memory map, instead we use efi_memmap_init_late(). Hence,
free existing memory map there too before installing a new memory map.

Generally, memory for new memory map is allocated using
efi_memmap_alloc() but in __efi_enter_virtual_mode() it's done using
realloc_pages() [please see efi_map_regions()]. So, it's OK to free this
memory using efi_memmap_free() in efi_free_boot_services().

Also, note that the first time efi_free_memmap() is called either from
efi_fake_memmap() or efi_arch_mem_reserve() [depending on the boot
sequence], we are actually freeing memblock_reserved memory which isn't
allocated by efi_memmap_alloc(). So, there are two outliers where we use
efi_free_memmap() to free memory allocated through other sources
rather than efi_memmap_alloc().

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Lee Chun-Yi 
Cc: Dave Young 
Cc: Borislav Petkov 
Cc: Laszlo Ersek 
Cc: Jan Kiszka 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Nicolai Stange 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Taku Izumi 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
---
 arch/x86/platform/efi/efi.c | 3 +++
 arch/x86/platform/efi/quirks.c  | 6 ++
 drivers/firmware/efi/fake_mem.c | 3 +++
 3 files changed, 12 insertions(+)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index cda54abf25a6..7756426e93b5 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -952,6 +952,9 @@ static void __init __efi_enter_virtual_mode(void)
 * firmware via SetVirtualAddressMap().
 */
efi_memmap_unmap();
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
 
if (efi_memmap_init_late(pa, efi.memmap.desc_size * count)) {
pr_err("Failed to remap late EFI memory map\n");
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 11fa6ac9f0c2..11800f3cbb93 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -292,6 +292,9 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
efi_memmap_insert(, new, );
early_memunmap(new, new_size);
 
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
efi_memmap_install(new_phys, num_entries, alloc_type);
 }
 
@@ -452,6 +455,9 @@ void __init efi_free_boot_services(void)
 
memunmap(new);
 
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
if (efi_memmap_install(new_phys, num_entries, alloc_type)) {
pr_err("Could not install new EFI memmap\n");
return;
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 82dcfa1c340b..a47754efb796 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -90,6 +90,9 @@ void __init efi_fake_memmap(void)
/* swap into new EFI memmap */
early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
 
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
efi_memmap_install(new_memmap_phy, new_nr_map, alloc_type);
 
/* print new EFI memmap */
-- 
2.7.4



[PATCH 4/6] x86/efi: Free existing memory map before installing new memory map

2018-07-02 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

efi_memmap_install(), unmaps the existing memory map and installs a new
memory map but doesn't free the memory allocated to the existing
memory map. Fortunately, the details about the existing memory map (like
the physical address, number of entries and type of memory) are
stored in efi.memmap. Hence, use them to free the memory.

In __efi_enter_virtual_mode(), we don't use efi_memmap_install() to
install a new memory map, instead we use efi_memmap_init_late(). Hence,
free existing memory map there too before installing a new memory map.

Generally, memory for new memory map is allocated using
efi_memmap_alloc() but in __efi_enter_virtual_mode() it's done using
realloc_pages() [please see efi_map_regions()]. So, it's OK to free this
memory using efi_memmap_free() in efi_free_boot_services().

Also, note that the first time efi_free_memmap() is called either from
efi_fake_memmap() or efi_arch_mem_reserve() [depending on the boot
sequence], we are actually freeing memblock_reserved memory which isn't
allocated by efi_memmap_alloc(). So, there are two outliers where we use
efi_free_memmap() to free memory allocated through other sources
rather than efi_memmap_alloc().

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Ard Biesheuvel 
Cc: Lee Chun-Yi 
Cc: Dave Young 
Cc: Borislav Petkov 
Cc: Laszlo Ersek 
Cc: Jan Kiszka 
Cc: Dave Hansen 
Cc: Bhupesh Sharma 
Cc: Nicolai Stange 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Taku Izumi 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
---
 arch/x86/platform/efi/efi.c | 3 +++
 arch/x86/platform/efi/quirks.c  | 6 ++
 drivers/firmware/efi/fake_mem.c | 3 +++
 3 files changed, 12 insertions(+)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index cda54abf25a6..7756426e93b5 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -952,6 +952,9 @@ static void __init __efi_enter_virtual_mode(void)
 * firmware via SetVirtualAddressMap().
 */
efi_memmap_unmap();
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
 
if (efi_memmap_init_late(pa, efi.memmap.desc_size * count)) {
pr_err("Failed to remap late EFI memory map\n");
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 11fa6ac9f0c2..11800f3cbb93 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -292,6 +292,9 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
efi_memmap_insert(, new, );
early_memunmap(new, new_size);
 
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
efi_memmap_install(new_phys, num_entries, alloc_type);
 }
 
@@ -452,6 +455,9 @@ void __init efi_free_boot_services(void)
 
memunmap(new);
 
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
if (efi_memmap_install(new_phys, num_entries, alloc_type)) {
pr_err("Could not install new EFI memmap\n");
return;
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 82dcfa1c340b..a47754efb796 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -90,6 +90,9 @@ void __init efi_fake_memmap(void)
/* swap into new EFI memmap */
early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
 
+   /* Free existing memory map before installing new memory map */
+   efi_memmap_free(efi.memmap.phys_map, efi.memmap.nr_map,
+   efi.memmap.alloc_type);
efi_memmap_install(new_memmap_phy, new_nr_map, alloc_type);
 
/* print new EFI memmap */
-- 
2.7.4



[PATCH V5 3/3] efi: Use efi_rts_wq to invoke EFI Runtime Services

2018-05-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, when a user process requests the kernel to execute any
efi_runtime_service(), kernel switches the page directory (%cr3) from
swapper_pgd to efi_pgd. Other subsystems in the kernel aren't aware of
this switch and they might think, user space is still valid (i.e. the
user space mappings are still pointing to the process that requested to
run efi_runtime_service()) but in reality it is not so.

A solution for this issue is to use kthread to run
efi_runtime_service(). When a user process requests the kernel to
execute any efi_runtime_service(), kernel queues the work to efi_rts_wq,
a kthread comes along, switches to efi_pgd and executes
efi_runtime_service() in kthread context. Anything that tries to touch
user space addresses while in kthread is terminally broken.

Implementation summary:
---
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_wq.
2. Caller thread waits for completion until the work is finished because
it's dependent on the return status of efi_runtime_service().

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is passed.

Introduce a handler function (called efi_call_rts()) that
1. Understands efi_runtime_work and
2. Invokes the appropriate efi_runtime_service() with the appropriate
arguments

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

The non-blocking variants of set_variable() and query_variable_info()
should be used while in atomic context. Use of blocking variants like
set_variable() and query_variable_info() while in atomic will issue a
warning ("scheduling wile in atomic") and prints stack trace. Presently,
pstore uses non-blocking variants and hence works fine.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 
---
 drivers/firmware/efi/runtime-wrappers.c | 135 
 1 file changed, 119 insertions(+), 16 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index cf3bae42a752..127d4de00403 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -173,13 +173,104 @@ void efi_call_virt_check_flags(unsigned long flags, 
const char *call)
  */
 static DEFINE_SEMAPHORE(efi_runtime_lock);
 
+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+   struct efi_runtime_work *efi_rts_work;
+   void *arg1, *arg2, *arg3, *arg4, *arg5;
+   efi_status_t status = EFI_NOT_FOUND;
+
+   efi_rts_work = container_of(work, struct efi_runtime_work, work);
+   arg1 = efi_rts_work->arg1;
+   arg2 = efi_rts_work->arg2;
+   arg3 = efi_rts_work->arg3;
+   arg4 = efi_rts_work->arg4;
+   arg5 = efi_rts_work->arg5;
+
+   switch (efi_rts_work->efi_rts_id) {
+   case GET_TIME:
+   status = efi_call_virt(get_time, (efi_time_t *)arg1,
+  (efi_time_cap_t *)arg2);
+   break;
+   case SET_TIME:
+   status = efi_call_virt(set_time, (efi_time_t *)arg1);
+   break;
+   case GET_WAKEUP_TIME:
+   status = efi_call_virt(get_wakeup_time, (efi_bool_t *)arg1,
+  (efi_bool_t *)arg2, (efi_time_t *)arg3);
+   break;
+   case SET_WAKEUP_TIME:
+   status = efi_call_virt(set_wakeup_time, *(efi_bool_t *)arg1,
+  (efi_time_t *)arg2);
+   break;
+   case GET_VARIABLE:
+   status = efi_call_virt(get_variable, (efi_char16_t *)arg1,
+  (efi_guid_t *)arg2, (u32 *)arg3,
+  (unsigned long *)arg4, (void *)arg5);
+   break;
+   case GET_NEXT_VARIABLE:
+   status = efi_call_virt(get_next_variable, (unsigned long *)arg1,
+  (efi_char16_t *)arg2,
+  (efi_guid_t *)

[PATCH V5 1/3] x86/efi: Make efi_delete_dummy_variable() use set_variable_nonblocking() instead of set_variable()

2018-05-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, efi_delete_dummy_variable() uses set_variable() which might
block and hence kernel prints stack trace with a warning "bad:
scheduling from the idle thread!". So, make efi_delete_dummy_variable()
use set_variable_nonblocking(), which, as the name suggests doesn't
block.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 
---
 arch/x86/platform/efi/quirks.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 36c1f8b9f7e0..6af39dc40325 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -105,12 +105,11 @@ early_param("efi_no_storage_paranoia", 
setup_storage_paranoia);
 */
 void efi_delete_dummy_variable(void)
 {
-   efi.set_variable((efi_char16_t *)efi_dummy_name,
-_DUMMY_GUID,
-EFI_VARIABLE_NON_VOLATILE |
-EFI_VARIABLE_BOOTSERVICE_ACCESS |
-EFI_VARIABLE_RUNTIME_ACCESS,
-0, NULL);
+   efi.set_variable_nonblocking((efi_char16_t *)efi_dummy_name,
+_DUMMY_GUID,
+EFI_VARIABLE_NON_VOLATILE |
+EFI_VARIABLE_BOOTSERVICE_ACCESS |
+EFI_VARIABLE_RUNTIME_ACCESS, 0, NULL);
 }
 
 /*
-- 
2.7.4



[PATCH V5 3/3] efi: Use efi_rts_wq to invoke EFI Runtime Services

2018-05-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, when a user process requests the kernel to execute any
efi_runtime_service(), kernel switches the page directory (%cr3) from
swapper_pgd to efi_pgd. Other subsystems in the kernel aren't aware of
this switch and they might think, user space is still valid (i.e. the
user space mappings are still pointing to the process that requested to
run efi_runtime_service()) but in reality it is not so.

A solution for this issue is to use kthread to run
efi_runtime_service(). When a user process requests the kernel to
execute any efi_runtime_service(), kernel queues the work to efi_rts_wq,
a kthread comes along, switches to efi_pgd and executes
efi_runtime_service() in kthread context. Anything that tries to touch
user space addresses while in kthread is terminally broken.

Implementation summary:
---
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_wq.
2. Caller thread waits for completion until the work is finished because
it's dependent on the return status of efi_runtime_service().

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is passed.

Introduce a handler function (called efi_call_rts()) that
1. Understands efi_runtime_work and
2. Invokes the appropriate efi_runtime_service() with the appropriate
arguments

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

The non-blocking variants of set_variable() and query_variable_info()
should be used while in atomic context. Use of blocking variants like
set_variable() and query_variable_info() while in atomic will issue a
warning ("scheduling wile in atomic") and prints stack trace. Presently,
pstore uses non-blocking variants and hence works fine.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 
---
 drivers/firmware/efi/runtime-wrappers.c | 135 
 1 file changed, 119 insertions(+), 16 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index cf3bae42a752..127d4de00403 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -173,13 +173,104 @@ void efi_call_virt_check_flags(unsigned long flags, 
const char *call)
  */
 static DEFINE_SEMAPHORE(efi_runtime_lock);
 
+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+   struct efi_runtime_work *efi_rts_work;
+   void *arg1, *arg2, *arg3, *arg4, *arg5;
+   efi_status_t status = EFI_NOT_FOUND;
+
+   efi_rts_work = container_of(work, struct efi_runtime_work, work);
+   arg1 = efi_rts_work->arg1;
+   arg2 = efi_rts_work->arg2;
+   arg3 = efi_rts_work->arg3;
+   arg4 = efi_rts_work->arg4;
+   arg5 = efi_rts_work->arg5;
+
+   switch (efi_rts_work->efi_rts_id) {
+   case GET_TIME:
+   status = efi_call_virt(get_time, (efi_time_t *)arg1,
+  (efi_time_cap_t *)arg2);
+   break;
+   case SET_TIME:
+   status = efi_call_virt(set_time, (efi_time_t *)arg1);
+   break;
+   case GET_WAKEUP_TIME:
+   status = efi_call_virt(get_wakeup_time, (efi_bool_t *)arg1,
+  (efi_bool_t *)arg2, (efi_time_t *)arg3);
+   break;
+   case SET_WAKEUP_TIME:
+   status = efi_call_virt(set_wakeup_time, *(efi_bool_t *)arg1,
+  (efi_time_t *)arg2);
+   break;
+   case GET_VARIABLE:
+   status = efi_call_virt(get_variable, (efi_char16_t *)arg1,
+  (efi_guid_t *)arg2, (u32 *)arg3,
+  (unsigned long *)arg4, (void *)arg5);
+   break;
+   case GET_NEXT_VARIABLE:
+   status = efi_call_virt(get_next_variable, (unsigned long *)arg1,
+  (efi_char16_t *)arg2,
+  (efi_guid_t *)

[PATCH V5 1/3] x86/efi: Make efi_delete_dummy_variable() use set_variable_nonblocking() instead of set_variable()

2018-05-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, efi_delete_dummy_variable() uses set_variable() which might
block and hence kernel prints stack trace with a warning "bad:
scheduling from the idle thread!". So, make efi_delete_dummy_variable()
use set_variable_nonblocking(), which, as the name suggests doesn't
block.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 
---
 arch/x86/platform/efi/quirks.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 36c1f8b9f7e0..6af39dc40325 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -105,12 +105,11 @@ early_param("efi_no_storage_paranoia", 
setup_storage_paranoia);
 */
 void efi_delete_dummy_variable(void)
 {
-   efi.set_variable((efi_char16_t *)efi_dummy_name,
-_DUMMY_GUID,
-EFI_VARIABLE_NON_VOLATILE |
-EFI_VARIABLE_BOOTSERVICE_ACCESS |
-EFI_VARIABLE_RUNTIME_ACCESS,
-0, NULL);
+   efi.set_variable_nonblocking((efi_char16_t *)efi_dummy_name,
+_DUMMY_GUID,
+EFI_VARIABLE_NON_VOLATILE |
+EFI_VARIABLE_BOOTSERVICE_ACCESS |
+EFI_VARIABLE_RUNTIME_ACCESS, 0, NULL);
 }
 
 /*
-- 
2.7.4



[PATCH V5 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-05-28 Thread Sai Praneeth Prakhya
Patches are based on Linus's kernel v4.17-rc7

[1] Backup: Detailing efi_pgd:
--
efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
Code/Data) regions. Due to the nature of these mappings, they fall
in user space address ranges and they are not the same as swapper.

[On arm64, the EFI mappings are in the VA range usually used for user
space. The two halves of the address space are managed by separate
tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
user space or EFI runtime mappings in TTBR0.] - Mark Rutland

Changes from V4 to V5:
--
1. As suggested by Ard, don't use efi_rts_wq for non-blocking variants.
  Non-blocking variants are supposed to not block and using workqueue
  exactly does the opposite, hence refrain from using it.
2. Use non-blocking variants in efi_delete_dummy_variable(). Use of
  blocking variants means that we have to call efi_delete_dummy_variable()
  after efi_rts_wq has been created.
3. Remove in_atomic() check in set_variable<>() and query_variable_info<>().
  Any caller wishing to use set_variable() and query_variable_info() in
  atomic context should use their non-blocking variants.

Changes from V3 to V4:
--
1. As suggested by Peter, use completions instead of flush_work() as the
  former is cheaper
2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
  wasn't able to find a better alternative to keep this change local to
  arch/x86.

Changes from V2 to V3:
--
1. Rewrite the cover letter to clearly state the problem. What we are
  fixing and what we are not fixing.
2. Make efi_delete_dummy_variable() change local to x86.
3. Avoid using BUG(), instead, print error message and exit gracefully.
4. Move struct efi_runtime_work to runtime-wrappers.c file.
5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.

Changes from V1 to V2:
--
1. Remove unnecessary include of asm/efi.h file - Fixes build error on
  ia64, reported by 0-day
2. Use enum to identify efi_runtime_services()
3. Use alloc_ordered_workqueue() to create efi_rts_wq as
  create_workqueue() is scheduled for depreciation.
4. Make efi_call_rts() static, as it has no callers outside
  runtime-wrappers.c
5. Use BUG(), when we are unable to queue work or unable to identify
  requested efi_runtime_service() - Because these two situations should
  *never* happen.

Sai Praneeth (3):
  x86/efi: Make efi_delete_dummy_variable() use
set_variable_nonblocking() instead of set_variable()
  efi: Create efi_rts_wq and efi_queue_work() to invoke all
efi_runtime_services()
  efi: Use efi_rts_wq to invoke EFI Runtime Services

 arch/x86/platform/efi/quirks.c  |  11 +-
 drivers/firmware/efi/efi.c  |  14 ++
 drivers/firmware/efi/runtime-wrappers.c | 218 +---
 include/linux/efi.h |   3 +
 4 files changed, 224 insertions(+), 22 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 

-- 
2.7.4



[PATCH V5 2/3] efi: Create efi_rts_wq and efi_queue_work() to invoke all efi_runtime_services()

2018-05-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce some infrastructure:
1. Creating workqueue named efi_rts_wq
2. A macro (efi_queue_work()) that
a. Populates efi_runtime_work
b. Queues work onto efi_rts_wq and
c. Waits until worker thread completes

The caller thread has to wait until the worker thread completes, because
it depends on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type)
or a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 
---
 drivers/firmware/efi/efi.c  | 14 ++
 drivers/firmware/efi/runtime-wrappers.c | 83 +
 include/linux/efi.h |  3 ++
 3 files changed, 100 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 232f4915223b..1379a375dfa8 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -84,6 +84,8 @@ struct mm_struct efi_mm = {
.mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
 };
 
+struct workqueue_struct *efi_rts_wq;
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -337,6 +339,18 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /*
+* Since we process only one efi_runtime_service() at a time, an
+* ordered workqueue (which creates only one execution context)
+* should suffice all our needs.
+*/
+   efi_rts_wq = alloc_ordered_workqueue("efi_rts_wq", 0);
+   if (!efi_rts_wq) {
+   pr_err("Creating efi_rts_wq failed, EFI runtime services 
disabled.\n");
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   return 0;
+   }
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..cf3bae42a752 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,15 @@
 /*
  * runtime-wrappers.c - Runtime Services function call wrappers
  *
+ * Implementation summary:
+ * ---
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_wq.
+ * 2. Caller thread waits for completion until the work is finished
+ * because it's dependent on the return status and execution of
+ * efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
  * Copyright (C) 2014 Linaro Ltd. 
  *
  * Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +31,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+
 #include 
 
 /*
@@ -33,6 +45,77 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   QUERY_VARIABLE_INFO,
+   GET_NEXT_HIGH_MONO_COUNT,
+   RESET_SYSTEM,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work:   Details of EFI Runtime Service work
+ * @arg<1-5>:  EFI Runtime Service function arguments
+ * @status:Status of executing EFI Runtime Service
+ * @efi_rts_id:EFI Runtime Service function identifier
+ * @efi_rts_comp:  Struct used for handling completions
+ */
+struct efi_runtime_work {
+   void *arg1;
+   void *arg2;
+   void *arg3;
+   void *arg4;
+   void *arg5;
+   efi_status_t status;
+   struct work_struct work;
+   enum

[PATCH V5 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-05-28 Thread Sai Praneeth Prakhya
Patches are based on Linus's kernel v4.17-rc7

[1] Backup: Detailing efi_pgd:
--
efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
Code/Data) regions. Due to the nature of these mappings, they fall
in user space address ranges and they are not the same as swapper.

[On arm64, the EFI mappings are in the VA range usually used for user
space. The two halves of the address space are managed by separate
tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
user space or EFI runtime mappings in TTBR0.] - Mark Rutland

Changes from V4 to V5:
--
1. As suggested by Ard, don't use efi_rts_wq for non-blocking variants.
  Non-blocking variants are supposed to not block and using workqueue
  exactly does the opposite, hence refrain from using it.
2. Use non-blocking variants in efi_delete_dummy_variable(). Use of
  blocking variants means that we have to call efi_delete_dummy_variable()
  after efi_rts_wq has been created.
3. Remove in_atomic() check in set_variable<>() and query_variable_info<>().
  Any caller wishing to use set_variable() and query_variable_info() in
  atomic context should use their non-blocking variants.

Changes from V3 to V4:
--
1. As suggested by Peter, use completions instead of flush_work() as the
  former is cheaper
2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
  wasn't able to find a better alternative to keep this change local to
  arch/x86.

Changes from V2 to V3:
--
1. Rewrite the cover letter to clearly state the problem. What we are
  fixing and what we are not fixing.
2. Make efi_delete_dummy_variable() change local to x86.
3. Avoid using BUG(), instead, print error message and exit gracefully.
4. Move struct efi_runtime_work to runtime-wrappers.c file.
5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.

Changes from V1 to V2:
--
1. Remove unnecessary include of asm/efi.h file - Fixes build error on
  ia64, reported by 0-day
2. Use enum to identify efi_runtime_services()
3. Use alloc_ordered_workqueue() to create efi_rts_wq as
  create_workqueue() is scheduled for depreciation.
4. Make efi_call_rts() static, as it has no callers outside
  runtime-wrappers.c
5. Use BUG(), when we are unable to queue work or unable to identify
  requested efi_runtime_service() - Because these two situations should
  *never* happen.

Sai Praneeth (3):
  x86/efi: Make efi_delete_dummy_variable() use
set_variable_nonblocking() instead of set_variable()
  efi: Create efi_rts_wq and efi_queue_work() to invoke all
efi_runtime_services()
  efi: Use efi_rts_wq to invoke EFI Runtime Services

 arch/x86/platform/efi/quirks.c  |  11 +-
 drivers/firmware/efi/efi.c  |  14 ++
 drivers/firmware/efi/runtime-wrappers.c | 218 +---
 include/linux/efi.h |   3 +
 4 files changed, 224 insertions(+), 22 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 

-- 
2.7.4



[PATCH V5 2/3] efi: Create efi_rts_wq and efi_queue_work() to invoke all efi_runtime_services()

2018-05-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce some infrastructure:
1. Creating workqueue named efi_rts_wq
2. A macro (efi_queue_work()) that
a. Populates efi_runtime_work
b. Queues work onto efi_rts_wq and
c. Waits until worker thread completes

The caller thread has to wait until the worker thread completes, because
it depends on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type)
or a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 
---
 drivers/firmware/efi/efi.c  | 14 ++
 drivers/firmware/efi/runtime-wrappers.c | 83 +
 include/linux/efi.h |  3 ++
 3 files changed, 100 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 232f4915223b..1379a375dfa8 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -84,6 +84,8 @@ struct mm_struct efi_mm = {
.mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
 };
 
+struct workqueue_struct *efi_rts_wq;
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -337,6 +339,18 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /*
+* Since we process only one efi_runtime_service() at a time, an
+* ordered workqueue (which creates only one execution context)
+* should suffice all our needs.
+*/
+   efi_rts_wq = alloc_ordered_workqueue("efi_rts_wq", 0);
+   if (!efi_rts_wq) {
+   pr_err("Creating efi_rts_wq failed, EFI runtime services 
disabled.\n");
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   return 0;
+   }
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..cf3bae42a752 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,15 @@
 /*
  * runtime-wrappers.c - Runtime Services function call wrappers
  *
+ * Implementation summary:
+ * ---
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_wq.
+ * 2. Caller thread waits for completion until the work is finished
+ * because it's dependent on the return status and execution of
+ * efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
  * Copyright (C) 2014 Linaro Ltd. 
  *
  * Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +31,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+
 #include 
 
 /*
@@ -33,6 +45,77 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   QUERY_VARIABLE_INFO,
+   GET_NEXT_HIGH_MONO_COUNT,
+   RESET_SYSTEM,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work:   Details of EFI Runtime Service work
+ * @arg<1-5>:  EFI Runtime Service function arguments
+ * @status:Status of executing EFI Runtime Service
+ * @efi_rts_id:EFI Runtime Service function identifier
+ * @efi_rts_comp:  Struct used for handling completions
+ */
+struct efi_runtime_work {
+   void *arg1;
+   void *arg2;
+   void *arg3;
+   void *arg4;
+   void *arg5;
+   efi_status_t status;
+   struct work_struct work;
+   enum

[PATCH V4 1/3] x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization

2018-05-25 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Invoking efi_runtime_services() through efi_rts_wq (efi runtime
services workqueue) means all accesses to efi_runtime_services() should
be done after efi_rts_wq has been created. efi_delete_dummy_variable()
calls set_variable(), hence efi_delete_dummy_variable() should be called
after efi_rts_wq has been created.

Presently, efi_delete_dummy_variable() is called from
efi_enter_virtual_mode() which is early in the boot phase (efi_rts_wq
isn't created yet), so call efi_delete_dummy_variable() later in the
boot phase. Another and the most important reason for calling
efi_delete_dummy_variable() late in the boot process is, if called
before rest_init(), kernel prints stack trace with a warning "bad:
scheduling from the idle thread!". Hence, call from efisubsys_init()
which is called during rest_init().

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Naresh Bhat <naresh.b...@linaro.org>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com>
---
 arch/x86/include/asm/efi.h  | 1 -
 arch/x86/platform/efi/efi.c | 6 --
 drivers/firmware/efi/efi.c  | 6 ++
 include/linux/efi.h | 3 +++
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index cec5fae23eb3..0e61b771b93d 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -138,7 +138,6 @@ extern void __init efi_runtime_update_mappings(void);
 extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
-extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
 
 struct efi_setup_data {
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..a3169d14583f 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void)
 
if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
runtime_code_page_mkexec();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 #endif
 }
 
@@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void)
 * necessary relocation fixups for the new virtual addresses.
 */
efi_runtime_update_mappings();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 }
 
 void __init efi_enter_virtual_mode(void)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 232f4915223b..1176af664013 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -337,6 +337,12 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /*
+* Clean DUMMY object calls EFI Runtime Service, set_variable(), so
+* it should be invoked only after efi_rts_wq is ready.
+*/
+   efi_delete_dummy_variable();
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 3016d8c456bc..1b79939d0b1e 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -994,6 +994,7 @@ extern efi_status_t efi_query_variable_store(u32 attributes,
 unsigned long size,
 bool nonblocking);
 extern void efi_find_mirror(void);
+extern void efi_delete_dummy_variable(void);
 #else
 static inline void efi_late_init(void) {}
 static inline void efi_free_boot_services(void) {}
@@ -1004,6 +1005,8 @@ static inline efi_status_t efi_query_variable_store(u32 
attributes,
 {
return EFI_SUCCESS;
 }
+
+static inline void efi_delete_dummy_variable(void) {}
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
-- 
2.7.4



[PATCH V4 1/3] x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization

2018-05-25 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Invoking efi_runtime_services() through efi_rts_wq (efi runtime
services workqueue) means all accesses to efi_runtime_services() should
be done after efi_rts_wq has been created. efi_delete_dummy_variable()
calls set_variable(), hence efi_delete_dummy_variable() should be called
after efi_rts_wq has been created.

Presently, efi_delete_dummy_variable() is called from
efi_enter_virtual_mode() which is early in the boot phase (efi_rts_wq
isn't created yet), so call efi_delete_dummy_variable() later in the
boot phase. Another and the most important reason for calling
efi_delete_dummy_variable() late in the boot process is, if called
before rest_init(), kernel prints stack trace with a warning "bad:
scheduling from the idle thread!". Hence, call from efisubsys_init()
which is called during rest_init().

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 
---
 arch/x86/include/asm/efi.h  | 1 -
 arch/x86/platform/efi/efi.c | 6 --
 drivers/firmware/efi/efi.c  | 6 ++
 include/linux/efi.h | 3 +++
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index cec5fae23eb3..0e61b771b93d 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -138,7 +138,6 @@ extern void __init efi_runtime_update_mappings(void);
 extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
-extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
 
 struct efi_setup_data {
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..a3169d14583f 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void)
 
if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
runtime_code_page_mkexec();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 #endif
 }
 
@@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void)
 * necessary relocation fixups for the new virtual addresses.
 */
efi_runtime_update_mappings();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 }
 
 void __init efi_enter_virtual_mode(void)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 232f4915223b..1176af664013 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -337,6 +337,12 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /*
+* Clean DUMMY object calls EFI Runtime Service, set_variable(), so
+* it should be invoked only after efi_rts_wq is ready.
+*/
+   efi_delete_dummy_variable();
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 3016d8c456bc..1b79939d0b1e 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -994,6 +994,7 @@ extern efi_status_t efi_query_variable_store(u32 attributes,
 unsigned long size,
 bool nonblocking);
 extern void efi_find_mirror(void);
+extern void efi_delete_dummy_variable(void);
 #else
 static inline void efi_late_init(void) {}
 static inline void efi_free_boot_services(void) {}
@@ -1004,6 +1005,8 @@ static inline efi_status_t efi_query_variable_store(u32 
attributes,
 {
return EFI_SUCCESS;
 }
+
+static inline void efi_delete_dummy_variable(void) {}
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
-- 
2.7.4



[PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-05-25 Thread Sai Praneeth Prakhya
comments and concerns.

Note:
-
Patches are based on Linus's kernel v4.17-rc6

[1] Backup: Detailing efi_pgd:
--
efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
Code/Data) regions. Due to the nature of these mappings, they fall
in user space address ranges and they are not the same as swapper.

[On arm64, the EFI mappings are in the VA range usually used for user
space. The two halves of the address space are managed by separate
tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
user space or EFI runtime mappings in TTBR0.] - Mark Rutland

Changes from V3 to V4:
--
1. As suggested by Peter, use completions instead of flush_work() as the
  former is cheaper
2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
  wasn't able to find a better alternative to keep this change local to
  arch/x86.

Changes from V2 to V3:
--
1. Rewrite the cover letter to clearly state the problem. What we are
  fixing and what we are not fixing.
2. Make efi_delete_dummy_variable() change local to x86.
3. Avoid using BUG(), instead, print error message and exit gracefully.
4. Move struct efi_runtime_work to runtime-wrappers.c file.
5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.

Changes from V1 to V2:
--
1. Remove unnecessary include of asm/efi.h file - Fixes build error on
  ia64, reported by 0-day
2. Use enum to identify efi_runtime_services()
3. Use alloc_ordered_workqueue() to create efi_rts_wq as
  create_workqueue() is scheduled for depreciation.
4. Make efi_call_rts() static, as it has no callers outside
  runtime-wrappers.c
5. Use BUG(), when we are unable to queue work or unable to identify
  requested efi_runtime_service() - Because these two situations should
  *never* happen.

Sai Praneeth (3):
  x86/efi: Call efi_delete_dummy_variable() during efi subsystem
initialization
  efi: Create efi_rts_wq and efi_queue_work() to invoke all
efi_runtime_services()
  efi: Use efi_rts_wq to invoke EFI Runtime Services

 arch/x86/include/asm/efi.h  |   1 -
 arch/x86/platform/efi/efi.c |   6 -
 drivers/firmware/efi/efi.c  |  20 +++
 drivers/firmware/efi/runtime-wrappers.c | 256 +---
 include/linux/efi.h |   6 +
 5 files changed, 262 insertions(+), 27 deletions(-)

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Naresh Bhat <naresh.b...@linaro.org>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com>

-- 
2.7.4



[PATCH V4 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-05-25 Thread Sai Praneeth Prakhya
s are based on Linus's kernel v4.17-rc6

[1] Backup: Detailing efi_pgd:
--
efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
Code/Data) regions. Due to the nature of these mappings, they fall
in user space address ranges and they are not the same as swapper.

[On arm64, the EFI mappings are in the VA range usually used for user
space. The two halves of the address space are managed by separate
tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
user space or EFI runtime mappings in TTBR0.] - Mark Rutland

Changes from V3 to V4:
--
1. As suggested by Peter, use completions instead of flush_work() as the
  former is cheaper
2. Call efi_delete_dummy_variable() from efisubsys_init(). Sorry! Ard,
  wasn't able to find a better alternative to keep this change local to
  arch/x86.

Changes from V2 to V3:
--
1. Rewrite the cover letter to clearly state the problem. What we are
  fixing and what we are not fixing.
2. Make efi_delete_dummy_variable() change local to x86.
3. Avoid using BUG(), instead, print error message and exit gracefully.
4. Move struct efi_runtime_work to runtime-wrappers.c file.
5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.

Changes from V1 to V2:
--
1. Remove unnecessary include of asm/efi.h file - Fixes build error on
  ia64, reported by 0-day
2. Use enum to identify efi_runtime_services()
3. Use alloc_ordered_workqueue() to create efi_rts_wq as
  create_workqueue() is scheduled for depreciation.
4. Make efi_call_rts() static, as it has no callers outside
  runtime-wrappers.c
5. Use BUG(), when we are unable to queue work or unable to identify
  requested efi_runtime_service() - Because these two situations should
  *never* happen.

Sai Praneeth (3):
  x86/efi: Call efi_delete_dummy_variable() during efi subsystem
initialization
  efi: Create efi_rts_wq and efi_queue_work() to invoke all
efi_runtime_services()
  efi: Use efi_rts_wq to invoke EFI Runtime Services

 arch/x86/include/asm/efi.h  |   1 -
 arch/x86/platform/efi/efi.c |   6 -
 drivers/firmware/efi/efi.c  |  20 +++
 drivers/firmware/efi/runtime-wrappers.c | 256 +---
 include/linux/efi.h |   6 +
 5 files changed, 262 insertions(+), 27 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 

-- 
2.7.4



[PATCH V4 3/3] efi: Use efi_rts_wq to invoke EFI Runtime Services

2018-05-25 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, when a user process requests the kernel to execute any
efi_runtime_service(), kernel switches the page directory (%cr3) from
swapper_pgd to efi_pgd. Other subsystems in the kernel aren't aware of
this switch and they might think, user space is still valid (i.e. the
user space mappings are still pointing to the process that requested to
run efi_runtime_service()) but in reality it is not so.

A solution for this issue is to use kthread to run efi_runtime_service().
When a user process requests the kernel to execute any
efi_runtime_service(), kernel queues the work to efi_rts_wq, a kthread
comes along, switches to efi_pgd and executes efi_runtime_service() in
kthread context. Anything that tries to touch user space addresses while
in kthread is terminally broken.

Implementation summary:
---
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_wq.
2. Caller thread waits for completion until the work is finished because
it's dependent on the return status of efi_runtime_service().

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is passed.

Introduce a handler function (called efi_call_rts()) that
1. Understands efi_runtime_work and
2. Invokes the appropriate efi_runtime_service() with the appropriate
arguments

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

pstore writes could potentially be invoked in atomic context and it uses
set_variable<>() and query_variable_info<>() to store logs. If we invoke
efi_runtime_services() through efi_rts_wq while in atomic(), kernel
issues a warning ("scheduling wile in atomic") and prints stack trace.
One way to overcome this is to not make the caller process wait for the
worker thread to finish. This approach breaks pstore i.e. the log
messages aren't written to efi variables. Hence, pstore calls
efi_runtime_services() without using efi_rts_wq or in other words
efi_rts_wq will be used unconditionally for all the
efi_runtime_services() except set_variable<>() and
query_variable_info<>().

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Naresh Bhat <naresh.b...@linaro.org>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com>
---
 drivers/firmware/efi/runtime-wrappers.c | 171 
 1 file changed, 151 insertions(+), 20 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index 534bd348feca..26bb6645ff59 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -175,13 +175,108 @@ void efi_call_virt_check_flags(unsigned long flags, 
const char *call)
  */
 static DEFINE_SEMAPHORE(efi_runtime_lock);
 
+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+   struct efi_runtime_work *efi_rts_work;
+   void *arg1, *arg2, *arg3, *arg4, *arg5;
+   efi_status_t status = EFI_NOT_FOUND;
+
+   efi_rts_work = container_of(work, struct efi_runtime_work, work);
+   arg1 = efi_rts_work->arg1;
+   arg2 = efi_rts_work->arg2;
+   arg3 = efi_rts_work->arg3;
+   arg4 = efi_rts_work->arg4;
+   arg5 = efi_rts_work->arg5;
+
+   switch (efi_rts_work->efi_rts_id) {
+   case GET_TIME:
+   status = efi_call_virt(get_time, (efi_time_t *)arg1,
+  (efi_time_cap_t *)arg2);
+   break;
+   case SET_TIME:
+   status = efi_call_virt(set_time, (efi_time_t *)arg1);
+   break;
+  

[PATCH V4 3/3] efi: Use efi_rts_wq to invoke EFI Runtime Services

2018-05-25 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, when a user process requests the kernel to execute any
efi_runtime_service(), kernel switches the page directory (%cr3) from
swapper_pgd to efi_pgd. Other subsystems in the kernel aren't aware of
this switch and they might think, user space is still valid (i.e. the
user space mappings are still pointing to the process that requested to
run efi_runtime_service()) but in reality it is not so.

A solution for this issue is to use kthread to run efi_runtime_service().
When a user process requests the kernel to execute any
efi_runtime_service(), kernel queues the work to efi_rts_wq, a kthread
comes along, switches to efi_pgd and executes efi_runtime_service() in
kthread context. Anything that tries to touch user space addresses while
in kthread is terminally broken.

Implementation summary:
---
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_wq.
2. Caller thread waits for completion until the work is finished because
it's dependent on the return status of efi_runtime_service().

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is passed.

Introduce a handler function (called efi_call_rts()) that
1. Understands efi_runtime_work and
2. Invokes the appropriate efi_runtime_service() with the appropriate
arguments

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

pstore writes could potentially be invoked in atomic context and it uses
set_variable<>() and query_variable_info<>() to store logs. If we invoke
efi_runtime_services() through efi_rts_wq while in atomic(), kernel
issues a warning ("scheduling wile in atomic") and prints stack trace.
One way to overcome this is to not make the caller process wait for the
worker thread to finish. This approach breaks pstore i.e. the log
messages aren't written to efi variables. Hence, pstore calls
efi_runtime_services() without using efi_rts_wq or in other words
efi_rts_wq will be used unconditionally for all the
efi_runtime_services() except set_variable<>() and
query_variable_info<>().

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 
---
 drivers/firmware/efi/runtime-wrappers.c | 171 
 1 file changed, 151 insertions(+), 20 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index 534bd348feca..26bb6645ff59 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -175,13 +175,108 @@ void efi_call_virt_check_flags(unsigned long flags, 
const char *call)
  */
 static DEFINE_SEMAPHORE(efi_runtime_lock);
 
+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+   struct efi_runtime_work *efi_rts_work;
+   void *arg1, *arg2, *arg3, *arg4, *arg5;
+   efi_status_t status = EFI_NOT_FOUND;
+
+   efi_rts_work = container_of(work, struct efi_runtime_work, work);
+   arg1 = efi_rts_work->arg1;
+   arg2 = efi_rts_work->arg2;
+   arg3 = efi_rts_work->arg3;
+   arg4 = efi_rts_work->arg4;
+   arg5 = efi_rts_work->arg5;
+
+   switch (efi_rts_work->efi_rts_id) {
+   case GET_TIME:
+   status = efi_call_virt(get_time, (efi_time_t *)arg1,
+  (efi_time_cap_t *)arg2);
+   break;
+   case SET_TIME:
+   status = efi_call_virt(set_time, (efi_time_t *)arg1);
+   break;
+   case GET_WAKEUP_TIME:
+   status = efi_call_virt(get_wakeup_time, (efi_bool_t *)arg1,
+  (efi_bool_t *)arg2, (efi_time_t *)arg3);
+   break;
+   case SET_WAKEUP_TIME:
+   status = efi_call_virt(set_wakeup_time, *(efi_bool_t *)arg1,
+  (efi_time_t *)arg2);
+   break;
+   case GET_VARIABLE:
+   status = efi_call_virt(get_variable, (efi_char16_t *)arg1,
+ 

[PATCH V4 2/3] efi: Create efi_rts_wq and efi_queue_work() to invoke all efi_runtime_services()

2018-05-25 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce some infrastructure:
1. Creating workqueue named efi_rts_wq
2. A macro (efi_queue_work()) that
a. Populates efi_runtime_work
b. Queues work onto efi_rts_wq and
c. Waits until worker thread completes

The caller thread has to wait until the worker thread completes, because
it depends on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type)
or a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Naresh Bhat <naresh.b...@linaro.org>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com>
---
 drivers/firmware/efi/efi.c  | 14 ++
 drivers/firmware/efi/runtime-wrappers.c | 85 +
 include/linux/efi.h |  3 ++
 3 files changed, 102 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 1176af664013..2632294eb33f 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -84,6 +84,8 @@ struct mm_struct efi_mm = {
.mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
 };
 
+struct workqueue_struct *efi_rts_wq;
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -338,6 +340,18 @@ static int __init efisubsys_init(void)
return 0;
 
/*
+* Since we process only one efi_runtime_service() at a time, an
+* ordered workqueue (which creates only one execution context)
+* should suffice all our needs.
+*/
+   efi_rts_wq = alloc_ordered_workqueue("efi_rts_wq", 0);
+   if (!efi_rts_wq) {
+   pr_err("Creating efi_rts_wq failed, EFI runtime services 
disabled.\n");
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   return 0;
+   }
+
+   /*
 * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
 * it should be invoked only after efi_rts_wq is ready.
 */
diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..534bd348feca 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,15 @@
 /*
  * runtime-wrappers.c - Runtime Services function call wrappers
  *
+ * Implementation summary:
+ * ---
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_wq.
+ * 2. Caller thread waits for completion until the work is finished
+ * because it's dependent on the return status and execution of
+ * efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
  * Copyright (C) 2014 Linaro Ltd. <ard.biesheu...@linaro.org>
  *
  * Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +31,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+
 #include 
 
 /*
@@ -33,6 +45,79 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   SET_VARIABLE_NONBLOCKING,
+   QUERY_VARIABLE_INFO,
+   QUERY_VARIABLE_INFO_NONBLOCKING,
+   GET_NEXT_HIGH_MONO_COUNT,
+ 

[PATCH V4 2/3] efi: Create efi_rts_wq and efi_queue_work() to invoke all efi_runtime_services()

2018-05-25 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce some infrastructure:
1. Creating workqueue named efi_rts_wq
2. A macro (efi_queue_work()) that
a. Populates efi_runtime_work
b. Queues work onto efi_rts_wq and
c. Waits until worker thread completes

The caller thread has to wait until the worker thread completes, because
it depends on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type)
or a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 
---
 drivers/firmware/efi/efi.c  | 14 ++
 drivers/firmware/efi/runtime-wrappers.c | 85 +
 include/linux/efi.h |  3 ++
 3 files changed, 102 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 1176af664013..2632294eb33f 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -84,6 +84,8 @@ struct mm_struct efi_mm = {
.mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
 };
 
+struct workqueue_struct *efi_rts_wq;
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -338,6 +340,18 @@ static int __init efisubsys_init(void)
return 0;
 
/*
+* Since we process only one efi_runtime_service() at a time, an
+* ordered workqueue (which creates only one execution context)
+* should suffice all our needs.
+*/
+   efi_rts_wq = alloc_ordered_workqueue("efi_rts_wq", 0);
+   if (!efi_rts_wq) {
+   pr_err("Creating efi_rts_wq failed, EFI runtime services 
disabled.\n");
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   return 0;
+   }
+
+   /*
 * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
 * it should be invoked only after efi_rts_wq is ready.
 */
diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..534bd348feca 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,15 @@
 /*
  * runtime-wrappers.c - Runtime Services function call wrappers
  *
+ * Implementation summary:
+ * ---
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_wq.
+ * 2. Caller thread waits for completion until the work is finished
+ * because it's dependent on the return status and execution of
+ * efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
  * Copyright (C) 2014 Linaro Ltd. 
  *
  * Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +31,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+
 #include 
 
 /*
@@ -33,6 +45,79 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   SET_VARIABLE_NONBLOCKING,
+   QUERY_VARIABLE_INFO,
+   QUERY_VARIABLE_INFO_NONBLOCKING,
+   GET_NEXT_HIGH_MONO_COUNT,
+   RESET_SYSTEM,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work:   Details of EFI Runtime Service work
+ * @arg<1-5>:  EFI Runtime Service function arguments
+ * @status:Status of executing EFI Runtime Service
+ * @efi_rts_id:EFI Runtime Service function identifier
+ * @efi_rts_comp:  Struct used for handling completions
+ */
+struct efi_runtime_work {
+   void *arg1;
+   void *arg2;
+   void *arg3;
+   void *arg4;
+   void *arg5;
+   efi_status_t status;
+ 

[PATCH V3 1/3] x86/efi: Call efi_delete_dummy_variable() after creating efi_rts_wq

2018-05-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Create a workqueue named efi_rts_wq (efi runtime services workqueue), so
that all efi_runtime_services() are executed in kthread context.

Invoking efi_runtime_services() through efi_rts_wq means all accesses to
efi_runtime_services() should be done after efi_rts_wq has been created.
efi_delete_dummy_variable() calls set_variable(), hence
efi_delete_dummy_variable() should be called after efi_rts_wq has been
created.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Naresh Bhat <naresh.b...@linaro.org>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com>
---
 arch/x86/platform/efi/efi.c| 15 +--
 drivers/firmware/efi/arm-runtime.c |  3 +++
 drivers/firmware/efi/efi.c | 25 +
 include/linux/efi.h|  4 
 4 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..adcc55cd25ce 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void)
 
if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
runtime_code_page_mkexec();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 #endif
 }
 
@@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void)
 * necessary relocation fixups for the new virtual addresses.
 */
efi_runtime_update_mappings();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 }
 
 void __init efi_enter_virtual_mode(void)
@@ -1031,6 +1025,15 @@ void __init efi_enter_virtual_mode(void)
__efi_enter_virtual_mode();
 
efi_dump_pagetable();
+
+   if (!efi_create_rts_wq())
+   return;
+
+   /*
+* Clean DUMMY object calls EFI Runtime Service, set_variable(), so
+* it should be invoked only after efi_rts_wq is ready.
+*/
+   efi_delete_dummy_variable();
 }
 
 static int __init arch_parse_efi_cmdline(char *str)
diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 5889cbea60b8..6fb06130b53f 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -139,6 +139,9 @@ static int __init arm_enable_runtime_services(void)
return -ENOMEM;
}
 
+   if (!efi_create_rts_wq())
+   return 0;
+
/* Set up runtime services function pointers */
efi_native_runtime_setup();
set_bit(EFI_RUNTIME_SERVICES, );
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 232f4915223b..b9103caa03b4 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -84,6 +84,8 @@ struct mm_struct efi_mm = {
.mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
 };
 
+struct workqueue_struct *efi_rts_wq;
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -337,6 +339,13 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /*
+* If we failed to create efi_rts_wq, EFI_RUNTIME_SERVICES would
+* have been be cleared, check for that condition.
+*/
+   if (!efi_enabled(EFI_RUNTIME_SERVICES))
+   return 0;
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
@@ -971,3 +980,19 @@ static int register_update_efi_random_seed(void)
 }
 late_initcall(register_update_efi_random_seed);
 #endif
+
+bool __init efi_create_rts_wq(void)
+{
+   /*
+* Since we process only one efi_runtime_service() at a time, an
+* ordered workqueue (which creates only one execution context)
+* should suffice all our needs.
+*/
+   efi_rts_wq = alloc_ordered_workqueue("efi_rts_wq", 0);
+   if (!efi_rts_wq) {
+   pr_err("Creating efi_rts_wq failed, EFI runtime services 
disabled.\n");
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   return false;
+   }
+   return true;
+}
diff --git a/include/li

[PATCH V3 1/3] x86/efi: Call efi_delete_dummy_variable() after creating efi_rts_wq

2018-05-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Create a workqueue named efi_rts_wq (efi runtime services workqueue), so
that all efi_runtime_services() are executed in kthread context.

Invoking efi_runtime_services() through efi_rts_wq means all accesses to
efi_runtime_services() should be done after efi_rts_wq has been created.
efi_delete_dummy_variable() calls set_variable(), hence
efi_delete_dummy_variable() should be called after efi_rts_wq has been
created.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 
---
 arch/x86/platform/efi/efi.c| 15 +--
 drivers/firmware/efi/arm-runtime.c |  3 +++
 drivers/firmware/efi/efi.c | 25 +
 include/linux/efi.h|  4 
 4 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..adcc55cd25ce 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void)
 
if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
runtime_code_page_mkexec();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 #endif
 }
 
@@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void)
 * necessary relocation fixups for the new virtual addresses.
 */
efi_runtime_update_mappings();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 }
 
 void __init efi_enter_virtual_mode(void)
@@ -1031,6 +1025,15 @@ void __init efi_enter_virtual_mode(void)
__efi_enter_virtual_mode();
 
efi_dump_pagetable();
+
+   if (!efi_create_rts_wq())
+   return;
+
+   /*
+* Clean DUMMY object calls EFI Runtime Service, set_variable(), so
+* it should be invoked only after efi_rts_wq is ready.
+*/
+   efi_delete_dummy_variable();
 }
 
 static int __init arch_parse_efi_cmdline(char *str)
diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 5889cbea60b8..6fb06130b53f 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -139,6 +139,9 @@ static int __init arm_enable_runtime_services(void)
return -ENOMEM;
}
 
+   if (!efi_create_rts_wq())
+   return 0;
+
/* Set up runtime services function pointers */
efi_native_runtime_setup();
set_bit(EFI_RUNTIME_SERVICES, );
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 232f4915223b..b9103caa03b4 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -84,6 +84,8 @@ struct mm_struct efi_mm = {
.mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
 };
 
+struct workqueue_struct *efi_rts_wq;
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -337,6 +339,13 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /*
+* If we failed to create efi_rts_wq, EFI_RUNTIME_SERVICES would
+* have been be cleared, check for that condition.
+*/
+   if (!efi_enabled(EFI_RUNTIME_SERVICES))
+   return 0;
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
@@ -971,3 +980,19 @@ static int register_update_efi_random_seed(void)
 }
 late_initcall(register_update_efi_random_seed);
 #endif
+
+bool __init efi_create_rts_wq(void)
+{
+   /*
+* Since we process only one efi_runtime_service() at a time, an
+* ordered workqueue (which creates only one execution context)
+* should suffice all our needs.
+*/
+   efi_rts_wq = alloc_ordered_workqueue("efi_rts_wq", 0);
+   if (!efi_rts_wq) {
+   pr_err("Creating efi_rts_wq failed, EFI runtime services 
disabled.\n");
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   return false;
+   }
+   return true;
+}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 3016d8c456bc..565955010b18 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -987,6 +987,7 @@ extern void efi_map_pal_code (void);
 extern void efi_memmap_walk (efi_freemem_callback_t callback, void *arg);
 extern void efi_gettimeofday (struct timespec64 *ts);
 extern void efi_enter_virtual_mode (void); /* switch EFI to virtual mode, 
if possible */
+extern bool __init efi_create_rts_wq(void);
 #ifdef CONFIG_X86
 extern void efi_late_init(void);
 extern void efi_free_boot_se

[PATCH V3 3/3] efi: Use efi_rts_wq to invoke EFI Runtime Services

2018-05-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, when a user process requests the kernel to execute any
efi_runtime_service(), kernel switches the page directory (%cr3) from
swapper_pgd to efi_pgd. Other subsystems in the kernel aren't aware of
this switch and they might think, user space is still valid (i.e. the
user space mappings are still pointing to the process that requested to
run efi_runtime_service()) but in reality it is not so.

A solution for this issue is to use kthread to run efi_runtime_service()
When a user process requests the kernel to execute any
efi_runtime_service(), kernel queues the work to efi_rts_wq, a kthread
comes along, switches to efi_pgd and executes efi_runtime_service() in
kthread context. Anything that tries to touch user space addresses while
in kthread is terminally broken.

Implementation summary:
---
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_wq.
2. Caller thread waits until the work is finished because it's dependent
on the return status of efi_runtime_service().

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is passed.

Introduce a handler function (called efi_call_rts()) that
1. Understands efi_runtime_work and
2. Invokes the appropriate efi_runtime_service() with the appropriate
arguments

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

pstore writes could potentially be invoked in atomic context and it uses
set_variable<>() and query_variable_info<>() to store logs. If we invoke
efi_runtime_services() through efi_rts_wq while in atomic(), kernel
issues a warning ("scheduling wile in atomic") and prints stack trace.
One way to overcome this is to not make the caller process wait for the
worker thread to finish. This approach breaks pstore i.e. the log
messages aren't written to efi variables. Hence, pstore calls
efi_runtime_services() without using efi_rts_wq or in other words
efi_rts_wq will be used unconditionally for all the
efi_runtime_services() except set_variable<>() and
query_variable_info<>().

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Naresh Bhat <naresh.b...@linaro.org>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com>
---
 drivers/firmware/efi/runtime-wrappers.c | 170 
 1 file changed, 150 insertions(+), 20 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index a9866045ed52..23ff128fcb2f 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -170,13 +170,107 @@ void efi_call_virt_check_flags(unsigned long flags, 
const char *call)
  */
 static DEFINE_SEMAPHORE(efi_runtime_lock);
 
+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+   struct efi_runtime_work *efi_rts_work;
+   void *arg1, *arg2, *arg3, *arg4, *arg5;
+   efi_status_t status = EFI_NOT_FOUND;
+
+   efi_rts_work = container_of(work, struct efi_runtime_work, work);
+   arg1 = efi_rts_work->arg1;
+   arg2 = efi_rts_work->arg2;
+   arg3 = efi_rts_work->arg3;
+   arg4 = efi_rts_work->arg4;
+   arg5 = efi_rts_work->arg5;
+
+   switch (efi_rts_work->efi_rts_id) {
+   case GET_TIME:
+   status = efi_call_virt(get_time, (efi_time_t *)arg1,
+  (efi_time_cap_t *)arg2);
+   break;
+   case SET_TIME:
+   status = efi_call_virt(set_time, (efi_time_t *)arg1);
+   break;
+   case GET

[PATCH V3 2/3] efi: Introduce efi_queue_work() to queue any efi_runtime_service() on efi_rts_wq

2018-05-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce efi_queue_work() that 1. Populates efi_runtime_work 2. Queues
work onto efi_rts_wq and 3. Waits until worker thread returns.

The caller thread has to wait until the worker thread returns, because
it depends on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type) or
a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Naresh Bhat <naresh.b...@linaro.org>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com>
---
 drivers/firmware/efi/runtime-wrappers.c | 80 +
 1 file changed, 80 insertions(+)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..a9866045ed52 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,14 @@
 /*
  * runtime-wrappers.c - Runtime Services function call wrappers
  *
+ * Implementation summary:
+ * ---
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_wq.
+ * 2. Caller thread waits until the work is finished because it's
+ * dependent on the return status and execution of efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
  * Copyright (C) 2014 Linaro Ltd. <ard.biesheu...@linaro.org>
  *
  * Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +30,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 
 /*
@@ -33,6 +43,76 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   SET_VARIABLE_NONBLOCKING,
+   QUERY_VARIABLE_INFO,
+   QUERY_VARIABLE_INFO_NONBLOCKING,
+   GET_NEXT_HIGH_MONO_COUNT,
+   RESET_SYSTEM,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work:   Details of EFI Runtime Service work
+ * @func:  EFI Runtime Service function identifier
+ * @arg<1-5>:  EFI Runtime Service function arguments
+ * @status:Status of executing EFI Runtime Service
+ */
+struct efi_runtime_work {
+   void *arg1;
+   void *arg2;
+   void *arg3;
+   void *arg4;
+   void *arg5;
+   efi_status_t status;
+   struct work_struct work;
+   enum efi_rts_ids efi_rts_id;
+};
+
+/*
+ * efi_queue_work: Queue efi_runtime_service() and wait until it's done
+ * @rts:   efi_runtime_service() function identifier
+ * @rts_arg<1-5>:  efi_runtime_service() function arguments
+ *
+ * Accesses to efi_runtime_services() are serialized by a binary
+ * semaphore (efi_runtime_lock) and caller waits until the work is
+ * finished, hence _only_ one work is queued at a time and the queued
+ * work gets flushed.
+ */
+#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)\
+({ \
+   struct efi_runtime_work efi_rts_work;   \
+   efi_rts_work.status = EFI_ABORTED;  \
+   

[PATCH V3 2/3] efi: Introduce efi_queue_work() to queue any efi_runtime_service() on efi_rts_wq

2018-05-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce efi_queue_work() that 1. Populates efi_runtime_work 2. Queues
work onto efi_rts_wq and 3. Waits until worker thread returns.

The caller thread has to wait until the worker thread returns, because
it depends on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type) or
a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 
---
 drivers/firmware/efi/runtime-wrappers.c | 80 +
 1 file changed, 80 insertions(+)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..a9866045ed52 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,14 @@
 /*
  * runtime-wrappers.c - Runtime Services function call wrappers
  *
+ * Implementation summary:
+ * ---
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_wq.
+ * 2. Caller thread waits until the work is finished because it's
+ * dependent on the return status and execution of efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
  * Copyright (C) 2014 Linaro Ltd. 
  *
  * Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +30,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 
 /*
@@ -33,6 +43,76 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
+/* efi_runtime_service() function identifiers */
+enum efi_rts_ids {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   SET_VARIABLE_NONBLOCKING,
+   QUERY_VARIABLE_INFO,
+   QUERY_VARIABLE_INFO_NONBLOCKING,
+   GET_NEXT_HIGH_MONO_COUNT,
+   RESET_SYSTEM,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_runtime_work:   Details of EFI Runtime Service work
+ * @func:  EFI Runtime Service function identifier
+ * @arg<1-5>:  EFI Runtime Service function arguments
+ * @status:Status of executing EFI Runtime Service
+ */
+struct efi_runtime_work {
+   void *arg1;
+   void *arg2;
+   void *arg3;
+   void *arg4;
+   void *arg5;
+   efi_status_t status;
+   struct work_struct work;
+   enum efi_rts_ids efi_rts_id;
+};
+
+/*
+ * efi_queue_work: Queue efi_runtime_service() and wait until it's done
+ * @rts:   efi_runtime_service() function identifier
+ * @rts_arg<1-5>:  efi_runtime_service() function arguments
+ *
+ * Accesses to efi_runtime_services() are serialized by a binary
+ * semaphore (efi_runtime_lock) and caller waits until the work is
+ * finished, hence _only_ one work is queued at a time and the queued
+ * work gets flushed.
+ */
+#define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)\
+({ \
+   struct efi_runtime_work efi_rts_work;   \
+   efi_rts_work.status = EFI_ABORTED;  \
+   \
+   INIT_WORK_ONSTACK(_rts_work.work, efi_call_rts);\
+   efi_rts_work.arg1 = _arg1;  \
+   efi_rts_work.arg2 = _arg2;  \
+   efi_rts_work.arg3 = _arg3;  \
+   efi_rts_work.arg4 = _arg4;  \
+   efi_rts_work.arg5 = _arg5;  \
+   efi_rts_work.ef

[PATCH V3 3/3] efi: Use efi_rts_wq to invoke EFI Runtime Services

2018-05-21 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, when a user process requests the kernel to execute any
efi_runtime_service(), kernel switches the page directory (%cr3) from
swapper_pgd to efi_pgd. Other subsystems in the kernel aren't aware of
this switch and they might think, user space is still valid (i.e. the
user space mappings are still pointing to the process that requested to
run efi_runtime_service()) but in reality it is not so.

A solution for this issue is to use kthread to run efi_runtime_service()
When a user process requests the kernel to execute any
efi_runtime_service(), kernel queues the work to efi_rts_wq, a kthread
comes along, switches to efi_pgd and executes efi_runtime_service() in
kthread context. Anything that tries to touch user space addresses while
in kthread is terminally broken.

Implementation summary:
---
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_wq.
2. Caller thread waits until the work is finished because it's dependent
on the return status of efi_runtime_service().

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is passed.

Introduce a handler function (called efi_call_rts()) that
1. Understands efi_runtime_work and
2. Invokes the appropriate efi_runtime_service() with the appropriate
arguments

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

pstore writes could potentially be invoked in atomic context and it uses
set_variable<>() and query_variable_info<>() to store logs. If we invoke
efi_runtime_services() through efi_rts_wq while in atomic(), kernel
issues a warning ("scheduling wile in atomic") and prints stack trace.
One way to overcome this is to not make the caller process wait for the
worker thread to finish. This approach breaks pstore i.e. the log
messages aren't written to efi variables. Hence, pstore calls
efi_runtime_services() without using efi_rts_wq or in other words
efi_rts_wq will be used unconditionally for all the
efi_runtime_services() except set_variable<>() and
query_variable_info<>().

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 
---
 drivers/firmware/efi/runtime-wrappers.c | 170 
 1 file changed, 150 insertions(+), 20 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index a9866045ed52..23ff128fcb2f 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -170,13 +170,107 @@ void efi_call_virt_check_flags(unsigned long flags, 
const char *call)
  */
 static DEFINE_SEMAPHORE(efi_runtime_lock);
 
+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+   struct efi_runtime_work *efi_rts_work;
+   void *arg1, *arg2, *arg3, *arg4, *arg5;
+   efi_status_t status = EFI_NOT_FOUND;
+
+   efi_rts_work = container_of(work, struct efi_runtime_work, work);
+   arg1 = efi_rts_work->arg1;
+   arg2 = efi_rts_work->arg2;
+   arg3 = efi_rts_work->arg3;
+   arg4 = efi_rts_work->arg4;
+   arg5 = efi_rts_work->arg5;
+
+   switch (efi_rts_work->efi_rts_id) {
+   case GET_TIME:
+   status = efi_call_virt(get_time, (efi_time_t *)arg1,
+  (efi_time_cap_t *)arg2);
+   break;
+   case SET_TIME:
+   status = efi_call_virt(set_time, (efi_time_t *)arg1);
+   break;
+   case GET_WAKEUP_TIME:
+   status = efi_call_virt(get_wakeup_time, (efi_bool_t *)arg1,
+  (efi_bool_t *)arg2, (efi_time_t *)arg3);
+   break;
+   case SET_WAKEUP_TIME:
+   status = efi_call_virt(set_wakeup_time, *(efi_bool_t *)arg1,
+  (efi_time_t *)arg2);
+   break;
+   case GET_VARIABLE:
+   status = efi_call_virt(get_variable, (efi_char16_t *)arg1,
+ 

[PATCH V3 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-05-21 Thread Sai Praneeth Prakhya
comments and concerns.

Note:
-
Patches are based on Linus's kernel v4.17-rc6

[1] Backup: Detailing efi_pgd:
--
efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
Code/Data) regions. Due to the nature of these mappings, they fall
in user space address ranges and they are not the same as swapper.

[On arm64, the EFI mappings are in the VA range usually used for user
space. The two halves of the address space are managed by separate
tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
user space or EFI runtime mappings in TTBR0.] - Mark Rutland

Changes from V2 to V3:
--
1. Rewrite the cover letter to clearly state the problem. What we are
fixing and what we are not fixing.
2. Make efi_delete_dummy_variable() change local to x86.
3. Avoid using BUG(), instead, print error message and exit gracefully.
4. Move struct efi_runtime_work to runtime-wrappers.c file.
5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.

Changes from V1 to V2:
--
1. Remove unnecessary include of asm/efi.h file - Fixes build error on
ia64, reported by 0-day
2. Use enum to identify efi_runtime_services()
3. Use alloc_ordered_workqueue() to create efi_rts_wq as
create_workqueue() is scheduled for depreciation.
4. Make efi_call_rts() static, as it has no callers outside
runtime-wrappers.c
5. Use BUG(), when we are unable to queue work or unable to identify
requested efi_runtime_service() - Because these two situations should
*never* happen.

Sai Praneeth (3):
  x86/efi: Call efi_delete_dummy_variable() after creating efi_rts_wq
  efi: Introduce efi_queue_work() to queue any efi_runtime_service() on 
   efi_rts_wq
  efi: Use efi_rts_wq to invoke EFI Runtime Services

 arch/x86/platform/efi/efi.c |  15 +-
 drivers/firmware/efi/arm-runtime.c  |   3 +
 drivers/firmware/efi/efi.c  |  25 
 drivers/firmware/efi/runtime-wrappers.c | 250 +---
 include/linux/efi.h |   4 +
 5 files changed, 271 insertions(+), 26 deletions(-)

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Naresh Bhat <naresh.b...@linaro.org>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Miguel Ojeda <miguel.ojeda.sando...@gmail.com>

-- 
2.7.4



[PATCH V3 0/3] Use efi_rts_wq to invoke EFI Runtime Services

2018-05-21 Thread Sai Praneeth Prakhya
s are based on Linus's kernel v4.17-rc6

[1] Backup: Detailing efi_pgd:
--
efi_pgd has mappings for EFI Runtime Code/Data (on x86, plus EFI Boot time
Code/Data) regions. Due to the nature of these mappings, they fall
in user space address ranges and they are not the same as swapper.

[On arm64, the EFI mappings are in the VA range usually used for user
space. The two halves of the address space are managed by separate
tables, TTBR0 and TTBR1. We always map the kernel in TTBR1, and we map
user space or EFI runtime mappings in TTBR0.] - Mark Rutland

Changes from V2 to V3:
--
1. Rewrite the cover letter to clearly state the problem. What we are
fixing and what we are not fixing.
2. Make efi_delete_dummy_variable() change local to x86.
3. Avoid using BUG(), instead, print error message and exit gracefully.
4. Move struct efi_runtime_work to runtime-wrappers.c file.
5. Give enum a name (efi_rts_ids) and use it in efi_runtime_work.
6. Add Naresh (maintainer of LUV for ARM) and Miguel to the CC list.

Changes from V1 to V2:
--
1. Remove unnecessary include of asm/efi.h file - Fixes build error on
ia64, reported by 0-day
2. Use enum to identify efi_runtime_services()
3. Use alloc_ordered_workqueue() to create efi_rts_wq as
create_workqueue() is scheduled for depreciation.
4. Make efi_call_rts() static, as it has no callers outside
runtime-wrappers.c
5. Use BUG(), when we are unable to queue work or unable to identify
requested efi_runtime_service() - Because these two situations should
*never* happen.

Sai Praneeth (3):
  x86/efi: Call efi_delete_dummy_variable() after creating efi_rts_wq
  efi: Introduce efi_queue_work() to queue any efi_runtime_service() on 
   efi_rts_wq
  efi: Use efi_rts_wq to invoke EFI Runtime Services

 arch/x86/platform/efi/efi.c |  15 +-
 drivers/firmware/efi/arm-runtime.c  |   3 +
 drivers/firmware/efi/efi.c  |  25 
 drivers/firmware/efi/runtime-wrappers.c | 250 +---
 include/linux/efi.h |   4 +
 5 files changed, 271 insertions(+), 26 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Naresh Bhat 
Cc: Ricardo Neri 
Cc: Peter Zijlstra 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Dan Williams 
Cc: Ard Biesheuvel 
Cc: Miguel Ojeda 

-- 
2.7.4



[PATCH] x86: Use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()

2018-04-04 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

When the platform supports PCID and if CONFIG_DEBUG_VM is enabled,
build_cr3_noflush() (called via switch_mm()) does a sanity check to see
if X86_FEATURE_PCID is set. Presently, build_cr3_noflush() uses
"this_cpu_has(X86_FEATURE_PCID)" to perform the check but this_cpu_has()
works only after SMP is initialized (i.e. per cpu cpu_info's should be
populated) and this happens to be very late in the boot process (during
rest_init).

As efi_runtime_services() are called during (early) kernel boot time
and run time, modify build_cr3_noflush() to use boot_cpu_has() all the
time. As suggested by Dave, this should be OK because all cpu's have
same capabilities anyways (for x86).

Without this change we see below warning during kernel boot.

WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/tlbflush.h:134
load_new_mm_cr3+0x114/0x170
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.16.0-02277-gbc16d4052f1a #1
Hardware name: System manufacturer System Product Name/Z170-K, BIOS 3301
02/08/2017
RIP: 0010:load_new_mm_cr3+0x114/0x170
RSP: :9b203e38 EFLAGS: 00010046
RAX:  RBX: 9b26f5a0 RCX: 
RDX:  RSI:  RDI: 9b20a000
RBP: 9b203e90 R08:  R09: 0f63eb29
R10: 9b203ea8 R11: c3292018 R12: 
R13: 9b2e1180 R14: 0001ee80 R15: 
FS:  () GS:968df6c0()
knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 968df6fff000 CR3: 0004261e6002 CR4: 000606b0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
switch_mm_irqs_off+0x267/0x590
switch_mm+0xe/0x20
efi_switch_mm+0x3e/0x50
efi_enter_virtual_mode+0x43f/0x4da
start_kernel+0x3bf/0x458
secondary_startup_64+0xa5/0xb0

Dave also suggested that we put a warning in this_cpu_has() if it's used
early in the boot process. This is still work in progress as it effects
MCE.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Reported-by: Linus Torvalds <torva...@linux-foundation.org>
Cc: Lee Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Peter Zijlstra <a.p.zijls...@chello.nl>
Cc: Andrew Morton <a...@linux-foundation.org>
Cc: Dave Hansen <dave.han...@intel.com>
---
 arch/x86/include/asm/tlbflush.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 84137c22fdfa..42e040859067 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -131,7 +131,12 @@ static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
 static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
 {
VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
-   VM_WARN_ON_ONCE(!this_cpu_has(X86_FEATURE_PCID));
+   /*
+* Use boot_cpu_has() instead of this_cpu_has() as this function
+* might be called during early boot. This should work even after
+* boot because all cpu's have same capabilities anyways.
+*/
+   VM_WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_PCID));
return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
 }
 
-- 
2.7.4



[PATCH] x86: Use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()

2018-04-04 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

When the platform supports PCID and if CONFIG_DEBUG_VM is enabled,
build_cr3_noflush() (called via switch_mm()) does a sanity check to see
if X86_FEATURE_PCID is set. Presently, build_cr3_noflush() uses
"this_cpu_has(X86_FEATURE_PCID)" to perform the check but this_cpu_has()
works only after SMP is initialized (i.e. per cpu cpu_info's should be
populated) and this happens to be very late in the boot process (during
rest_init).

As efi_runtime_services() are called during (early) kernel boot time
and run time, modify build_cr3_noflush() to use boot_cpu_has() all the
time. As suggested by Dave, this should be OK because all cpu's have
same capabilities anyways (for x86).

Without this change we see below warning during kernel boot.

WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/tlbflush.h:134
load_new_mm_cr3+0x114/0x170
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.16.0-02277-gbc16d4052f1a #1
Hardware name: System manufacturer System Product Name/Z170-K, BIOS 3301
02/08/2017
RIP: 0010:load_new_mm_cr3+0x114/0x170
RSP: :9b203e38 EFLAGS: 00010046
RAX:  RBX: 9b26f5a0 RCX: 
RDX:  RSI:  RDI: 9b20a000
RBP: 9b203e90 R08:  R09: 0f63eb29
R10: 9b203ea8 R11: c3292018 R12: 
R13: 9b2e1180 R14: 0001ee80 R15: 
FS:  () GS:968df6c0()
knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 968df6fff000 CR3: 0004261e6002 CR4: 000606b0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
switch_mm_irqs_off+0x267/0x590
switch_mm+0xe/0x20
efi_switch_mm+0x3e/0x50
efi_enter_virtual_mode+0x43f/0x4da
start_kernel+0x3bf/0x458
secondary_startup_64+0xa5/0xb0

Dave also suggested that we put a warning in this_cpu_has() if it's used
early in the boot process. This is still work in progress as it effects
MCE.

Signed-off-by: Sai Praneeth Prakhya 
Reported-by: Linus Torvalds 
Cc: Lee Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Andy Lutomirski 
Cc: Michael S. Tsirkin 
Cc: Ricardo Neri 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: Ravi Shankar 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Dave Hansen 
---
 arch/x86/include/asm/tlbflush.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 84137c22fdfa..42e040859067 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -131,7 +131,12 @@ static inline unsigned long build_cr3(pgd_t *pgd, u16 asid)
 static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid)
 {
VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE);
-   VM_WARN_ON_ONCE(!this_cpu_has(X86_FEATURE_PCID));
+   /*
+* Use boot_cpu_has() instead of this_cpu_has() as this function
+* might be called during early boot. This should work even after
+* boot because all cpu's have same capabilities anyways.
+*/
+   VM_WARN_ON_ONCE(!boot_cpu_has(X86_FEATURE_PCID));
return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH;
 }
 
-- 
2.7.4



[PATCH V2 1/3] x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization

2018-03-05 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Invoking efi_runtime_services() through efi_workqueue means all accesses
to efi_runtime_services() should be done after efi_rts_wq has been
created. efi_delete_dummy_variable() calls set_variable(), hence
efi_delete_dummy_variable() should be called after efi_rts_wq has been
created.

efi_delete_dummy_variable() is called from efi_enter_virtual_mode()
which is early in the boot phase (efi_rts_wq isn't created yet), so call
efi_delete_dummy_variable() later in the boot phase i.e. while
initializing efi subsystem. In the next patch, this is the place where
we create efi_rts_wq and all the efi_runtime_services() will be called
using efi_rts_wq.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>
---
 arch/x86/include/asm/efi.h  | 1 -
 arch/x86/platform/efi/efi.c | 6 --
 drivers/firmware/efi/efi.c  | 6 ++
 include/linux/efi.h | 3 +++
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index a399c1ebf6f0..43009e3f821b 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -143,7 +143,6 @@ extern void __init efi_runtime_update_mappings(void);
 extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
-extern void efi_delete_dummy_variable(void);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..a3169d14583f 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void)
 
if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
runtime_code_page_mkexec();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 #endif
 }
 
@@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void)
 * necessary relocation fixups for the new virtual addresses.
 */
efi_runtime_update_mappings();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 }
 
 void __init efi_enter_virtual_mode(void)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index cd42f66a7c85..838b8efe639c 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -328,6 +328,12 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /*
+* Clean DUMMY object calls EFI Runtime Service, set_variable(), so
+* it should be invoked only after efi_rts_workqueue is ready.
+*/
+   efi_delete_dummy_variable();
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index f5083aa72eae..c4efb3ef0dfa 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -992,6 +992,7 @@ extern efi_status_t efi_query_variable_store(u32 attributes,
 unsigned long size,
 bool nonblocking);
 extern void efi_find_mirror(void);
+extern void efi_delete_dummy_variable(void);
 #else
 static inline void efi_late_init(void) {}
 static inline void efi_free_boot_services(void) {}
@@ -1002,6 +1003,8 @@ static inline efi_status_t efi_query_variable_store(u32 
attributes,
 {
return EFI_SUCCESS;
 }
+
+static inline void efi_delete_dummy_variable(void) {}
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
-- 
2.7.4



[PATCH V2 0/3] Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-05 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

This patch set is an outcome of the discussion at
https://lkml.org/lkml/2017/8/21/607

Presently, efi_runtime_services() are executed by firmware in process
context. To execute efi_runtime_service(), kernel switches the page
directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
user space mappings. A potential issue could be, for instance, an NMI
interrupt (like perf) trying to profile some user data while in efi_pgd.

A solution for this issue could be to use kthread to run
efi_runtime_service(). When a user/kernel thread requests to execute
efi_runtime_service(), kernel off-loads this work to kthread which in
turn uses efi_pgd. Anything that tries to touch user space addresses
while in kthread is terminally broken. This patch set adds support to
the efi subsystem to handle all calls to efi_runtime_services() using a
work queue (which in turn uses kthread).

Implementation summary:
---
1. When a user/kernel thread requests to execute efi_runtime_service(),
enqueue work to a work queue, efi_rts_workqueue.
2. The caller thread waits until the work is finished because it's
dependent on the return status of efi_runtime_service() and, in specific
cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance, efi_get_variable()
and efi_get_next_variable(). Hence, the caller process cannot just post
the work and get going, it has to wait for results from firmware.

Caveat: efi_rts_workqueue to run efi_runtime_services() shouldn't be used
while in atomic, because caller thread might sleep. Presently, pstore
code doesn't use efi_rts_workqueue.

Tested using LUV (Linux UEFI Validation) for x86_64 and x86_32. Builds
fine for arm and arm64. Will appreciate the effort if someone could test
the patches on real ARM/ARM64 machines.
LUV: https://01.org/linux-uefi-validation

Thanks to Ricardo and Dan for initial reviews and suggestions. Please
feel free to pour in your comments and concerns.
Note: Patches are based on Linus's kernel v4.16-rc4

Changes from V1 to V2:
--
1. Remove unnecessary include of asm/efi.h file - Fixes build error on
ia64, reported by 0-day
2. Use enum to identify efi_runtime_services()
3. Use alloc_ordered_workqueue() to create efi_rts_wq as
create_workqueue() is scheduled for depreciation.
4. Make efi_call_rts() static, as it has no callers outside
runtime-wrappers.c
5. Use BUG(), when we are unable to queue work or unable to identify
requested efi_runtime_service() - Because these two situations should
*never* happen.

Sai Praneeth (3):
  x86/efi: Call efi_delete_dummy_variable() during efi subsystem
initialization
  efi: Introduce efi_rts_workqueue and some infrastructure to invoke all
efi_runtime_services()
  efi: Use efi_rts_workqueue to invoke EFI Runtime Services

 arch/x86/include/asm/efi.h  |   1 -
 arch/x86/platform/efi/efi.c |   6 -
 drivers/firmware/efi/efi.c  |  21 +++
 drivers/firmware/efi/runtime-wrappers.c | 229 +---
 include/linux/efi.h |  23 
 5 files changed, 253 insertions(+), 27 deletions(-)

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>

-- 
2.7.4



[PATCH V2 1/3] x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization

2018-03-05 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Invoking efi_runtime_services() through efi_workqueue means all accesses
to efi_runtime_services() should be done after efi_rts_wq has been
created. efi_delete_dummy_variable() calls set_variable(), hence
efi_delete_dummy_variable() should be called after efi_rts_wq has been
created.

efi_delete_dummy_variable() is called from efi_enter_virtual_mode()
which is early in the boot phase (efi_rts_wq isn't created yet), so call
efi_delete_dummy_variable() later in the boot phase i.e. while
initializing efi subsystem. In the next patch, this is the place where
we create efi_rts_wq and all the efi_runtime_services() will be called
using efi_rts_wq.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
Cc: Dan Williams 
---
 arch/x86/include/asm/efi.h  | 1 -
 arch/x86/platform/efi/efi.c | 6 --
 drivers/firmware/efi/efi.c  | 6 ++
 include/linux/efi.h | 3 +++
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index a399c1ebf6f0..43009e3f821b 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -143,7 +143,6 @@ extern void __init efi_runtime_update_mappings(void);
 extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
-extern void efi_delete_dummy_variable(void);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..a3169d14583f 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void)
 
if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
runtime_code_page_mkexec();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 #endif
 }
 
@@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void)
 * necessary relocation fixups for the new virtual addresses.
 */
efi_runtime_update_mappings();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 }
 
 void __init efi_enter_virtual_mode(void)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index cd42f66a7c85..838b8efe639c 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -328,6 +328,12 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /*
+* Clean DUMMY object calls EFI Runtime Service, set_variable(), so
+* it should be invoked only after efi_rts_workqueue is ready.
+*/
+   efi_delete_dummy_variable();
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index f5083aa72eae..c4efb3ef0dfa 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -992,6 +992,7 @@ extern efi_status_t efi_query_variable_store(u32 attributes,
 unsigned long size,
 bool nonblocking);
 extern void efi_find_mirror(void);
+extern void efi_delete_dummy_variable(void);
 #else
 static inline void efi_late_init(void) {}
 static inline void efi_free_boot_services(void) {}
@@ -1002,6 +1003,8 @@ static inline efi_status_t efi_query_variable_store(u32 
attributes,
 {
return EFI_SUCCESS;
 }
+
+static inline void efi_delete_dummy_variable(void) {}
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
-- 
2.7.4



[PATCH V2 0/3] Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-05 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

This patch set is an outcome of the discussion at
https://lkml.org/lkml/2017/8/21/607

Presently, efi_runtime_services() are executed by firmware in process
context. To execute efi_runtime_service(), kernel switches the page
directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
user space mappings. A potential issue could be, for instance, an NMI
interrupt (like perf) trying to profile some user data while in efi_pgd.

A solution for this issue could be to use kthread to run
efi_runtime_service(). When a user/kernel thread requests to execute
efi_runtime_service(), kernel off-loads this work to kthread which in
turn uses efi_pgd. Anything that tries to touch user space addresses
while in kthread is terminally broken. This patch set adds support to
the efi subsystem to handle all calls to efi_runtime_services() using a
work queue (which in turn uses kthread).

Implementation summary:
---
1. When a user/kernel thread requests to execute efi_runtime_service(),
enqueue work to a work queue, efi_rts_workqueue.
2. The caller thread waits until the work is finished because it's
dependent on the return status of efi_runtime_service() and, in specific
cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance, efi_get_variable()
and efi_get_next_variable(). Hence, the caller process cannot just post
the work and get going, it has to wait for results from firmware.

Caveat: efi_rts_workqueue to run efi_runtime_services() shouldn't be used
while in atomic, because caller thread might sleep. Presently, pstore
code doesn't use efi_rts_workqueue.

Tested using LUV (Linux UEFI Validation) for x86_64 and x86_32. Builds
fine for arm and arm64. Will appreciate the effort if someone could test
the patches on real ARM/ARM64 machines.
LUV: https://01.org/linux-uefi-validation

Thanks to Ricardo and Dan for initial reviews and suggestions. Please
feel free to pour in your comments and concerns.
Note: Patches are based on Linus's kernel v4.16-rc4

Changes from V1 to V2:
--
1. Remove unnecessary include of asm/efi.h file - Fixes build error on
ia64, reported by 0-day
2. Use enum to identify efi_runtime_services()
3. Use alloc_ordered_workqueue() to create efi_rts_wq as
create_workqueue() is scheduled for depreciation.
4. Make efi_call_rts() static, as it has no callers outside
runtime-wrappers.c
5. Use BUG(), when we are unable to queue work or unable to identify
requested efi_runtime_service() - Because these two situations should
*never* happen.

Sai Praneeth (3):
  x86/efi: Call efi_delete_dummy_variable() during efi subsystem
initialization
  efi: Introduce efi_rts_workqueue and some infrastructure to invoke all
efi_runtime_services()
  efi: Use efi_rts_workqueue to invoke EFI Runtime Services

 arch/x86/include/asm/efi.h  |   1 -
 arch/x86/platform/efi/efi.c |   6 -
 drivers/firmware/efi/efi.c  |  21 +++
 drivers/firmware/efi/runtime-wrappers.c | 229 +---
 include/linux/efi.h |  23 
 5 files changed, 253 insertions(+), 27 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
Cc: Dan Williams 

-- 
2.7.4



[PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-05 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce some infrastructure:
1. Creating workqueue named efi_rts_wq
2. A macro (efi_queue_work()) that
a. populates efi_runtime_work
b. queues work onto efi_rts_wq and
c. waits until worker thread returns

The caller thread has to wait until the worker thread returns, because
it's dependent on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type)
or a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>
---
 drivers/firmware/efi/efi.c  | 15 
 drivers/firmware/efi/runtime-wrappers.c | 61 +
 include/linux/efi.h | 20 +++
 3 files changed, 96 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 838b8efe639c..04b46c62f3ce 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -75,6 +75,8 @@ static unsigned long *efi_tables[] = {
_attr_table,
 };
 
+struct workqueue_struct *efi_rts_wq;
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -329,6 +331,19 @@ static int __init efisubsys_init(void)
return 0;
 
/*
+* Since we process only one efi_runtime_service() at a time, an
+* ordered workqueue (which creates only one execution context)
+* should suffice all our needs.
+*/
+   efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue", 0);
+   if (!efi_rts_wq) {
+   pr_err("Failed to create efi_rts_workqueue, EFI runtime 
services "
+  "disabled.\n");
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   return 0;
+   }
+
+   /*
 * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
 * it should be invoked only after efi_rts_workqueue is ready.
 */
diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..649763171439 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,14 @@
 /*
  * runtime-wrappers.c - Runtime Services function call wrappers
  *
+ * Implementation summary:
+ * ---
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_workqueue.
+ * 2. Caller thread waits until the work is finished because it's
+ * dependent on the return status and execution of efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
  * Copyright (C) 2014 Linaro Ltd. <ard.biesheu...@linaro.org>
  *
  * Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +30,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 
 /*
@@ -33,6 +43,57 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
+/* efi_runtime_service() function identifiers */
+enum {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   SET_VARIABLE_NONBLOCKING,
+   QUERY_VARIABLE_INFO,
+   QUERY_VARIABLE_INFO_NONBLOCKING,
+   GET_NEXT_HIGH_MONO_COUNT,
+   RESET_SYSTEM,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_queue_work: 

[PATCH V2 2/3] efi: Introduce efi_rts_workqueue and some infrastructure to invoke all efi_runtime_services()

2018-03-05 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce some infrastructure:
1. Creating workqueue named efi_rts_wq
2. A macro (efi_queue_work()) that
a. populates efi_runtime_work
b. queues work onto efi_rts_wq and
c. waits until worker thread returns

The caller thread has to wait until the worker thread returns, because
it's dependent on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type)
or a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
Cc: Dan Williams 
---
 drivers/firmware/efi/efi.c  | 15 
 drivers/firmware/efi/runtime-wrappers.c | 61 +
 include/linux/efi.h | 20 +++
 3 files changed, 96 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 838b8efe639c..04b46c62f3ce 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -75,6 +75,8 @@ static unsigned long *efi_tables[] = {
_attr_table,
 };
 
+struct workqueue_struct *efi_rts_wq;
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -329,6 +331,19 @@ static int __init efisubsys_init(void)
return 0;
 
/*
+* Since we process only one efi_runtime_service() at a time, an
+* ordered workqueue (which creates only one execution context)
+* should suffice all our needs.
+*/
+   efi_rts_wq = alloc_ordered_workqueue("efi_rts_workqueue", 0);
+   if (!efi_rts_wq) {
+   pr_err("Failed to create efi_rts_workqueue, EFI runtime 
services "
+  "disabled.\n");
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   return 0;
+   }
+
+   /*
 * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
 * it should be invoked only after efi_rts_workqueue is ready.
 */
diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..649763171439 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,14 @@
 /*
  * runtime-wrappers.c - Runtime Services function call wrappers
  *
+ * Implementation summary:
+ * ---
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_workqueue.
+ * 2. Caller thread waits until the work is finished because it's
+ * dependent on the return status and execution of efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
  * Copyright (C) 2014 Linaro Ltd. 
  *
  * Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +30,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 
 /*
@@ -33,6 +43,57 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
+/* efi_runtime_service() function identifiers */
+enum {
+   GET_TIME,
+   SET_TIME,
+   GET_WAKEUP_TIME,
+   SET_WAKEUP_TIME,
+   GET_VARIABLE,
+   GET_NEXT_VARIABLE,
+   SET_VARIABLE,
+   SET_VARIABLE_NONBLOCKING,
+   QUERY_VARIABLE_INFO,
+   QUERY_VARIABLE_INFO_NONBLOCKING,
+   GET_NEXT_HIGH_MONO_COUNT,
+   RESET_SYSTEM,
+   UPDATE_CAPSULE,
+   QUERY_CAPSULE_CAPS,
+};
+
+/*
+ * efi_queue_work: Queue efi_runtime_service() and wait until it's done
+ * @rts:   efi_runtime_service() function identifier
+ * @rts_arg<1-5>:  efi_runtime_service() function arguments
+ *
+ * Accesses to efi_runtime_services() are serialized by a binary
+ * semaphore (efi_runtime_lock) and caller waits until the work is
+ * finished, hence _only_ one work is queued at a time and the queued
+ * work gets flushed.
+ */
+#define efi_queue_work(

[PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-05 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, efi_runtime_services() are executed by firmware in process
context. To execute efi_runtime_service(), kernel switches the page
directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
user space mappings. A potential issue could be, for instance, an NMI
interrupt (like perf) trying to profile some user data while in efi_pgd.

A solution for this issue could be to use kthread to run
efi_runtime_service(). When a user/kernel thread requests to execute
efi_runtime_service(), kernel off-loads this work to kthread which in
turn uses efi_pgd. Anything that tries to touch user space addresses
while in kthread is terminally broken. This patch adds support to efi
subsystem to handle all calls to efi_runtime_services() using a work
queue (which in turn uses kthread).

Implementation summary:
---
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_workqueue.
2. Caller thread waits until the work is finished because it's dependent
on the return status of efi_runtime_service().

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is
passed.

Introduce a handler function (called efi_call_rts()) that
a. understands efi_runtime_work and
b. invokes the appropriate efi_runtime_service() with the
appropriate arguments

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

pstore writes could potentially be invoked in interrupt context and it
uses set_variable<>() and query_variable_info<>() to store logs. If we
invoke efi_runtime_services() through efi_rts_wq while in atomic()
kernel issues a warning ("scheduling wile in atomic") and prints stack
trace. One way to overcome this is to not make the caller process wait
for the worker thread to finish. This approach breaks pstore i.e. the
log messages aren't written to efi variables. Hence, pstore calls
efi_runtime_services() without using efi_rts_wq or in other words
efi_rts_wq will be used unconditionally for all the
efi_runtime_services() except set_variable<>() and
query_variable_info<>()

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>
---
 drivers/firmware/efi/runtime-wrappers.c | 168 
 1 file changed, 148 insertions(+), 20 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index 649763171439..eff443bf942c 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -151,13 +151,105 @@ void efi_call_virt_check_flags(unsigned long flags, 
const char *call)
  */
 static DEFINE_SEMAPHORE(efi_runtime_lock);
 
+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+   struct efi_runtime_work *efi_rts_work;
+   void *arg1, *arg2, *arg3, *arg4, *arg5;
+   efi_status_t status = EFI_NOT_FOUND;
+
+   efi_rts_work = container_of(work, struct efi_runtime_work, work);
+   arg1 = efi_rts_work->arg1;
+   arg2 = efi_rts_work->arg2;
+   arg3 = efi_rts_work->arg3;
+   arg4 = efi_rts_work->arg4;
+   arg5 = efi_rts_work->arg5;
+
+   switch (efi_rts_work->func) {
+   case GET_TIME:
+   status = efi_call_virt(get_time, (efi_time_t *)arg1,
+  (efi_time_cap_t *)arg2);
+   break;
+   case SET_TIME:
+   status = efi_call_virt(set_time, (efi_time_t *)arg1);
+   break;
+   case GET_WAKEUP_TIME:
+   status = efi_call_virt(get_

[PATCH V2 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-03-05 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, efi_runtime_services() are executed by firmware in process
context. To execute efi_runtime_service(), kernel switches the page
directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
user space mappings. A potential issue could be, for instance, an NMI
interrupt (like perf) trying to profile some user data while in efi_pgd.

A solution for this issue could be to use kthread to run
efi_runtime_service(). When a user/kernel thread requests to execute
efi_runtime_service(), kernel off-loads this work to kthread which in
turn uses efi_pgd. Anything that tries to touch user space addresses
while in kthread is terminally broken. This patch adds support to efi
subsystem to handle all calls to efi_runtime_services() using a work
queue (which in turn uses kthread).

Implementation summary:
---
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_workqueue.
2. Caller thread waits until the work is finished because it's dependent
on the return status of efi_runtime_service().

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is
passed.

Introduce a handler function (called efi_call_rts()) that
a. understands efi_runtime_work and
b. invokes the appropriate efi_runtime_service() with the
appropriate arguments

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

pstore writes could potentially be invoked in interrupt context and it
uses set_variable<>() and query_variable_info<>() to store logs. If we
invoke efi_runtime_services() through efi_rts_wq while in atomic()
kernel issues a warning ("scheduling wile in atomic") and prints stack
trace. One way to overcome this is to not make the caller process wait
for the worker thread to finish. This approach breaks pstore i.e. the
log messages aren't written to efi variables. Hence, pstore calls
efi_runtime_services() without using efi_rts_wq or in other words
efi_rts_wq will be used unconditionally for all the
efi_runtime_services() except set_variable<>() and
query_variable_info<>()

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
Cc: Dan Williams 
---
 drivers/firmware/efi/runtime-wrappers.c | 168 
 1 file changed, 148 insertions(+), 20 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index 649763171439..eff443bf942c 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -151,13 +151,105 @@ void efi_call_virt_check_flags(unsigned long flags, 
const char *call)
  */
 static DEFINE_SEMAPHORE(efi_runtime_lock);
 
+/*
+ * Calls the appropriate efi_runtime_service() with the appropriate
+ * arguments.
+ *
+ * Semantics followed by efi_call_rts() to understand efi_runtime_work:
+ * 1. If argument was a pointer, recast it from void pointer to original
+ * pointer type.
+ * 2. If argument was a value, recast it from void pointer to original
+ * pointer type and dereference it.
+ */
+static void efi_call_rts(struct work_struct *work)
+{
+   struct efi_runtime_work *efi_rts_work;
+   void *arg1, *arg2, *arg3, *arg4, *arg5;
+   efi_status_t status = EFI_NOT_FOUND;
+
+   efi_rts_work = container_of(work, struct efi_runtime_work, work);
+   arg1 = efi_rts_work->arg1;
+   arg2 = efi_rts_work->arg2;
+   arg3 = efi_rts_work->arg3;
+   arg4 = efi_rts_work->arg4;
+   arg5 = efi_rts_work->arg5;
+
+   switch (efi_rts_work->func) {
+   case GET_TIME:
+   status = efi_call_virt(get_time, (efi_time_t *)arg1,
+  (efi_time_cap_t *)arg2);
+   break;
+   case SET_TIME:
+   status = efi_call_virt(set_time, (efi_time_t *)arg1);
+   break;
+   case GET_WAKEUP_TIME:
+   status = efi_call_virt(get_wakeup_time, (efi_bool_t *)arg1,
+  (efi_bool_t *)arg2, (efi_time_t *)arg3);
+   break;
+   case SET_WAKEUP_TIME:
+   status = efi_call_virt(set_wakeup_time, *(efi_bool_t *)arg1,
+  (efi_time_t *)arg2);
+   break;
+   case GET_VARIABLE:
+   status = efi_call_virt(get_variable, (efi_char16_t *)arg1,
+ 

[PATCH V1 1/3] x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization

2018-02-24 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Invoking efi_runtime_services() through efi_workqueue means all accesses
to efi_runtime_services() should be done after efi_rts_wq has been
created. efi_delete_dummy_variable() calls set_variable(), hence
efi_delete_dummy_variable() should be called after efi_rts_wq has been
created.

efi_delete_dummy_variable() is called from efi_enter_virtual_mode()
which is early in the boot phase (efi_rts_wq isn't created yet), so call
efi_delete_dummy_variable() later in the boot phase i.e. while
initializing efi subsystem. In the next patch, this is the place where
we create efi_rts_wq and all the efi_runtime_services() will be called
using efi_rts_wq.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>
---
 arch/x86/include/asm/efi.h  | 1 -
 arch/x86/platform/efi/efi.c | 6 --
 drivers/firmware/efi/efi.c  | 7 +++
 include/linux/efi.h | 3 +++
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb80b91..34b03440a80f 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -130,7 +130,6 @@ extern void __init efi_runtime_update_mappings(void);
 extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
-extern void efi_delete_dummy_variable(void);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..a3169d14583f 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void)
 
if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
runtime_code_page_mkexec();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 #endif
 }
 
@@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void)
 * necessary relocation fixups for the new virtual addresses.
 */
efi_runtime_update_mappings();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 }
 
 void __init efi_enter_virtual_mode(void)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index cd42f66a7c85..ac5db5f8dbbf 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -33,6 +33,7 @@
 #include 
 
 #include 
+#include 
 
 struct efi __read_mostly efi = {
.mps= EFI_INVALID_TABLE_ADDR,
@@ -328,6 +329,12 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /*
+* Clean DUMMY object calls EFI Runtime Service, set_variable(), so
+* it should be invoked only after efi_rts_workqueue is ready.
+*/
+   efi_delete_dummy_variable();
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index f5083aa72eae..c4efb3ef0dfa 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -992,6 +992,7 @@ extern efi_status_t efi_query_variable_store(u32 attributes,
 unsigned long size,
 bool nonblocking);
 extern void efi_find_mirror(void);
+extern void efi_delete_dummy_variable(void);
 #else
 static inline void efi_late_init(void) {}
 static inline void efi_free_boot_services(void) {}
@@ -1002,6 +1003,8 @@ static inline efi_status_t efi_query_variable_store(u32 
attributes,
 {
return EFI_SUCCESS;
 }
+
+static inline void efi_delete_dummy_variable(void) {}
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
-- 
2.1.4



[PATCH V1 1/3] x86/efi: Call efi_delete_dummy_variable() during efi subsystem initialization

2018-02-24 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Invoking efi_runtime_services() through efi_workqueue means all accesses
to efi_runtime_services() should be done after efi_rts_wq has been
created. efi_delete_dummy_variable() calls set_variable(), hence
efi_delete_dummy_variable() should be called after efi_rts_wq has been
created.

efi_delete_dummy_variable() is called from efi_enter_virtual_mode()
which is early in the boot phase (efi_rts_wq isn't created yet), so call
efi_delete_dummy_variable() later in the boot phase i.e. while
initializing efi subsystem. In the next patch, this is the place where
we create efi_rts_wq and all the efi_runtime_services() will be called
using efi_rts_wq.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
Cc: Dan Williams 
---
 arch/x86/include/asm/efi.h  | 1 -
 arch/x86/platform/efi/efi.c | 6 --
 drivers/firmware/efi/efi.c  | 7 +++
 include/linux/efi.h | 3 +++
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb80b91..34b03440a80f 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -130,7 +130,6 @@ extern void __init efi_runtime_update_mappings(void);
 extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
-extern void efi_delete_dummy_variable(void);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9061babfbc83..a3169d14583f 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -893,9 +893,6 @@ static void __init kexec_enter_virtual_mode(void)
 
if (efi_enabled(EFI_OLD_MEMMAP) && (__supported_pte_mask & _PAGE_NX))
runtime_code_page_mkexec();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 #endif
 }
 
@@ -1015,9 +1012,6 @@ static void __init __efi_enter_virtual_mode(void)
 * necessary relocation fixups for the new virtual addresses.
 */
efi_runtime_update_mappings();
-
-   /* clean DUMMY object */
-   efi_delete_dummy_variable();
 }
 
 void __init efi_enter_virtual_mode(void)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index cd42f66a7c85..ac5db5f8dbbf 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -33,6 +33,7 @@
 #include 
 
 #include 
+#include 
 
 struct efi __read_mostly efi = {
.mps= EFI_INVALID_TABLE_ADDR,
@@ -328,6 +329,12 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /*
+* Clean DUMMY object calls EFI Runtime Service, set_variable(), so
+* it should be invoked only after efi_rts_workqueue is ready.
+*/
+   efi_delete_dummy_variable();
+
/* We register the efi directory at /sys/firmware/efi */
efi_kobj = kobject_create_and_add("efi", firmware_kobj);
if (!efi_kobj) {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index f5083aa72eae..c4efb3ef0dfa 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -992,6 +992,7 @@ extern efi_status_t efi_query_variable_store(u32 attributes,
 unsigned long size,
 bool nonblocking);
 extern void efi_find_mirror(void);
+extern void efi_delete_dummy_variable(void);
 #else
 static inline void efi_late_init(void) {}
 static inline void efi_free_boot_services(void) {}
@@ -1002,6 +1003,8 @@ static inline efi_status_t efi_query_variable_store(u32 
attributes,
 {
return EFI_SUCCESS;
 }
+
+static inline void efi_delete_dummy_variable(void) {}
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 
-- 
2.1.4



[PATCH V1 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-02-24 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, efi_runtime_services() are executed by firmware in process
context. To execute efi_runtime_service(), kernel switches the page
directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
user space mappings. A potential issue could be, for instance, an NMI
interrupt (like perf) trying to profile some user data while in efi_pgd.

A solution for this issue could be to use kthread to run
efi_runtime_service(). When a user/kernel thread requests to execute
efi_runtime_service(), kernel off-loads this work to kthread which in
turn uses efi_pgd. Anything that tries to touch user space addresses
while in kthread is terminally broken. This patch adds support to efi
subsystem to handle all calls to efi_runtime_services() using a work
queue (which in turn uses kthread).

Implementation summary:
---
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_workqueue.
2. Caller thread waits until the work is finished because it's dependent
on the return status of efi_runtime_service().

pstore writes could potentially be invoked in interrupt context and it
uses set_variable<>() and query_variable_info<>() to store logs. If we
invoke efi_runtime_services() through efi_rts_wq while in atomic()
kernel issues a warning ("scheduling wile in atomic") and prints stack
trace. One way to overcome this is to not make the caller process wait
for the worker thread to finish. This approach breaks pstore i.e. the
log messages aren't written to efi variables. Hence, pstore calls
efi_runtime_services() without using efi_rts_wq or in other words
efi_rts_wq will be used unconditionally for all the
efi_runtime_services() except set_variable<>() and
query_variable_info<>()

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is
passed.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>
---
 drivers/firmware/efi/runtime-wrappers.c | 86 +
 1 file changed, 66 insertions(+), 20 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index 5cdb787da5d3..531d077aac70 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -68,6 +68,16 @@
  * semaphore (efi_runtime_lock) and caller waits until the work is
  * finished, hence _only_ one work is queued at a time. So, queue_work()
  * should never fail.
+ *
+ * efi_rts_workqueue to run efi_runtime_services() shouldn't be used
+ * while in atomic, because caller thread might sleep. pstore writes
+ * could potentially be invoked in interrupt context and it uses
+ * set_variable<>() and query_variable_info<>(), so pstore code doesn't
+ * use efi_rts_workqueue.
+ *
+ * Semantics that caller function should follow while passing arguments:
+ * 1. If argument is a pointer (of any type), pass it as is.
+ * 2. If argument is a value (of any type), address of the value is passed.
  */
 #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)
\
 ({ \
@@ -150,7 +160,7 @@ static efi_status_t virt_efi_get_time(efi_time_t *tm, 
efi_time_cap_t *tc)
 
if (down_interruptible(_runtime_lock))
return EFI_ABORTED;
-   status = efi_call_virt(get_time, tm, tc);
+   status = efi_queue_work(GET_TIME, tm, tc, NULL, NULL, NULL);
up(_runtime_lock);
return status;
 }
@@ -161,7 +171,7 @@ static efi_status_t virt_efi_set_time(efi_time_t *tm)
 
if (down_interruptible(_runtime_lock))
return EFI_ABORTED;
-   status = efi_call_virt(set_time, tm);
+   status = efi_queue_work(SET_TIME, tm, NULL, NULL, NULL, NULL);
up(_runtime_lock);
return status;
 }
@@ -174,7 +184,8 @@ static efi_status_t virt_efi_get_wakeup_time(efi_bool_t 
*enabled,
 
if (down_interruptible(_runtime_lock))
return EFI_ABORTED;
-   status = efi_call_virt(get_wakeup_time, enabled, pending, tm);
+   status = efi_queue_work(GET_

[PATCH V1 3/3] efi: Use efi_rts_workqueue to invoke EFI Runtime Services

2018-02-24 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, efi_runtime_services() are executed by firmware in process
context. To execute efi_runtime_service(), kernel switches the page
directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
user space mappings. A potential issue could be, for instance, an NMI
interrupt (like perf) trying to profile some user data while in efi_pgd.

A solution for this issue could be to use kthread to run
efi_runtime_service(). When a user/kernel thread requests to execute
efi_runtime_service(), kernel off-loads this work to kthread which in
turn uses efi_pgd. Anything that tries to touch user space addresses
while in kthread is terminally broken. This patch adds support to efi
subsystem to handle all calls to efi_runtime_services() using a work
queue (which in turn uses kthread).

Implementation summary:
---
1. When user/kernel thread requests to execute efi_runtime_service(),
enqueue work to efi_rts_workqueue.
2. Caller thread waits until the work is finished because it's dependent
on the return status of efi_runtime_service().

pstore writes could potentially be invoked in interrupt context and it
uses set_variable<>() and query_variable_info<>() to store logs. If we
invoke efi_runtime_services() through efi_rts_wq while in atomic()
kernel issues a warning ("scheduling wile in atomic") and prints stack
trace. One way to overcome this is to not make the caller process wait
for the worker thread to finish. This approach breaks pstore i.e. the
log messages aren't written to efi variables. Hence, pstore calls
efi_runtime_services() without using efi_rts_wq or in other words
efi_rts_wq will be used unconditionally for all the
efi_runtime_services() except set_variable<>() and
query_variable_info<>()

Semantics to pack arguments in efi_runtime_work (has void pointers):
1. If argument is a pointer (of any type), pass it as is.
2. If argument is a value (of any type), address of the value is
passed.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
Cc: Dan Williams 
---
 drivers/firmware/efi/runtime-wrappers.c | 86 +
 1 file changed, 66 insertions(+), 20 deletions(-)

diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index 5cdb787da5d3..531d077aac70 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -68,6 +68,16 @@
  * semaphore (efi_runtime_lock) and caller waits until the work is
  * finished, hence _only_ one work is queued at a time. So, queue_work()
  * should never fail.
+ *
+ * efi_rts_workqueue to run efi_runtime_services() shouldn't be used
+ * while in atomic, because caller thread might sleep. pstore writes
+ * could potentially be invoked in interrupt context and it uses
+ * set_variable<>() and query_variable_info<>(), so pstore code doesn't
+ * use efi_rts_workqueue.
+ *
+ * Semantics that caller function should follow while passing arguments:
+ * 1. If argument is a pointer (of any type), pass it as is.
+ * 2. If argument is a value (of any type), address of the value is passed.
  */
 #define efi_queue_work(_rts, _arg1, _arg2, _arg3, _arg4, _arg5)
\
 ({ \
@@ -150,7 +160,7 @@ static efi_status_t virt_efi_get_time(efi_time_t *tm, 
efi_time_cap_t *tc)
 
if (down_interruptible(_runtime_lock))
return EFI_ABORTED;
-   status = efi_call_virt(get_time, tm, tc);
+   status = efi_queue_work(GET_TIME, tm, tc, NULL, NULL, NULL);
up(_runtime_lock);
return status;
 }
@@ -161,7 +171,7 @@ static efi_status_t virt_efi_set_time(efi_time_t *tm)
 
if (down_interruptible(_runtime_lock))
return EFI_ABORTED;
-   status = efi_call_virt(set_time, tm);
+   status = efi_queue_work(SET_TIME, tm, NULL, NULL, NULL, NULL);
up(_runtime_lock);
return status;
 }
@@ -174,7 +184,8 @@ static efi_status_t virt_efi_get_wakeup_time(efi_bool_t 
*enabled,
 
if (down_interruptible(_runtime_lock))
return EFI_ABORTED;
-   status = efi_call_virt(get_wakeup_time, enabled, pending, tm);
+   status = efi_queue_work(GET_WAKEUP_TIME, enabled, pending, tm, NULL,
+   NULL);
up(_runtime_lock);
return status;
 }
@@ -185,7 +196,8 @@ static efi_status_t virt_efi_set_wakeup_time(efi_bool_t 
enabled, efi_time_t *tm)
 
if (down_interruptible(_runtime_lock))
return EFI_ABORTED;
-   status = efi_call_virt(set_wakeup_time, enabled, tm);
+   status = efi_queue_work(SET_WAKEUP_TIME, , tm, NULL, NULL,
+

[PATCH V1 2/3] efi: Introduce efi_rts_workqueue and necessary infrastructure to invoke all efi_runtime_services()

2018-02-24 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce necessary infrastructure:
1. Creating workqueue named efi_rts_wq
2. A macro (efi_queue_work()) that
a. populates efi_runtime_work
b. queues work onto efi_rts_wq and
c. waits until worker thread returns
3. A handler function that
a. understands efi_runtime_work and
b. invokes the appropriate efi_runtime_service() with the
appropriate arguments

The caller thread has to wait until the worker thread returns, because
it's dependent on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type)
or a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>
---
 drivers/firmware/efi/efi.c  |  11 +++
 drivers/firmware/efi/runtime-wrappers.c | 143 
 include/linux/efi.h |  23 +
 3 files changed, 177 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index ac5db5f8dbbf..4714b305ca90 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -76,6 +76,8 @@ static unsigned long *efi_tables[] = {
_attr_table,
 };
 
+struct workqueue_struct *efi_rts_wq;
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -329,6 +331,15 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /* Create a work queue to run EFI Runtime Services */
+   efi_rts_wq = create_workqueue("efi_rts_workqueue");
+   if (!efi_rts_wq) {
+   pr_err("Failed to create efi_rts_workqueue, EFI runtime 
services "
+  "disabled.\n");
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   return 0;
+   }
+
/*
 * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
 * it should be invoked only after efi_rts_workqueue is ready.
diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..5cdb787da5d3 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,14 @@
 /*
  * runtime-wrappers.c - Runtime Services function call wrappers
  *
+ * Implementation summary:
+ * ---
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_workqueue.
+ * 2. Caller thread waits until the work is finished because it's
+ * dependent on the return status and execution of efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
  * Copyright (C) 2014 Linaro Ltd. <ard.biesheu...@linaro.org>
  *
  * Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +30,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 
 /*
@@ -33,6 +43,50 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
+/* Each EFI Runtime Service is represented with a unique number */
+#define GET_TIME   0
+#define SET_

[PATCH V1 2/3] efi: Introduce efi_rts_workqueue and necessary infrastructure to invoke all efi_runtime_services()

2018-02-24 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

When a process requests the kernel to execute any efi_runtime_service(),
the requested efi_runtime_service (represented as an identifier) and its
arguments are packed into a struct named efi_runtime_work and queued
onto work queue named efi_rts_wq. The caller then waits until the work
is completed.

Introduce necessary infrastructure:
1. Creating workqueue named efi_rts_wq
2. A macro (efi_queue_work()) that
a. populates efi_runtime_work
b. queues work onto efi_rts_wq and
c. waits until worker thread returns
3. A handler function that
a. understands efi_runtime_work and
b. invokes the appropriate efi_runtime_service() with the
appropriate arguments

The caller thread has to wait until the worker thread returns, because
it's dependent on the return status of efi_runtime_service() and, in
specific cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance,
efi_get_variable() and efi_get_next_variable(). Hence, caller process
cannot just post the work and get going.

Some facts about efi_runtime_services():
1. A quick look at all the efi_runtime_services() shows that any
efi_runtime_service() has five or less arguments.
2. An argument of efi_runtime_service() can be a value (of any type)
or a pointer (of any type).
Hence, efi_runtime_work has five void pointers to store these arguments.

Semantics followed by efi_call_rts() to understand efi_runtime_work:
1. If argument was a pointer, recast it from void pointer to original
pointer type.
2. If argument was a value, recast it from void pointer to original
pointer type and dereference it.

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
Cc: Dan Williams 
---
 drivers/firmware/efi/efi.c  |  11 +++
 drivers/firmware/efi/runtime-wrappers.c | 143 
 include/linux/efi.h |  23 +
 3 files changed, 177 insertions(+)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index ac5db5f8dbbf..4714b305ca90 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -76,6 +76,8 @@ static unsigned long *efi_tables[] = {
_attr_table,
 };
 
+struct workqueue_struct *efi_rts_wq;
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
@@ -329,6 +331,15 @@ static int __init efisubsys_init(void)
if (!efi_enabled(EFI_BOOT))
return 0;
 
+   /* Create a work queue to run EFI Runtime Services */
+   efi_rts_wq = create_workqueue("efi_rts_workqueue");
+   if (!efi_rts_wq) {
+   pr_err("Failed to create efi_rts_workqueue, EFI runtime 
services "
+  "disabled.\n");
+   clear_bit(EFI_RUNTIME_SERVICES, );
+   return 0;
+   }
+
/*
 * Clean DUMMY object calls EFI Runtime Service, set_variable(), so
 * it should be invoked only after efi_rts_workqueue is ready.
diff --git a/drivers/firmware/efi/runtime-wrappers.c 
b/drivers/firmware/efi/runtime-wrappers.c
index ae54870b2788..5cdb787da5d3 100644
--- a/drivers/firmware/efi/runtime-wrappers.c
+++ b/drivers/firmware/efi/runtime-wrappers.c
@@ -1,6 +1,14 @@
 /*
  * runtime-wrappers.c - Runtime Services function call wrappers
  *
+ * Implementation summary:
+ * ---
+ * 1. When user/kernel thread requests to execute efi_runtime_service(),
+ * enqueue work to efi_rts_workqueue.
+ * 2. Caller thread waits until the work is finished because it's
+ * dependent on the return status and execution of efi_runtime_service().
+ * For instance, get_variable() and get_next_variable().
+ *
  * Copyright (C) 2014 Linaro Ltd. 
  *
  * Split off from arch/x86/platform/efi/efi.c
@@ -22,6 +30,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 
 /*
@@ -33,6 +43,50 @@
 #define __efi_call_virt(f, args...) \
__efi_call_virt_pointer(efi.systab->runtime, f, args)
 
+/* Each EFI Runtime Service is represented with a unique number */
+#define GET_TIME   0
+#define SET_TIME   1
+#define GET_WAKEUP_TIME2
+#define SET_WAKEUP_TIME3
+#define GET_VARIABLE   4
+#define GET_NEXT_VARIABLE  5
+#define SET_VARIABLE   6
+#define SET_VARIABLE_NONBLOCKING   7
+#define QUERY_VARIABLE_INFO8
+#define QUERY_VARIABLE_INFO_NONBLOCKING9

[PATCH V1 0/3] Use efi_rts_workqueue to invoke EFI Runtime Services

2018-02-24 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

This patch set is an outcome of the discussion at
https://lkml.org/lkml/2017/8/21/607

Presently, efi_runtime_services() are executed by firmware in process
context. To execute efi_runtime_service(), kernel switches the page
directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
user space mappings. A potential issue could be, for instance, an NMI
interrupt (like perf) trying to profile some user data while in efi_pgd.

A solution for this issue could be to use kthread to run
efi_runtime_service(). When a user/kernel thread requests to execute
efi_runtime_service(), kernel off-loads this work to kthread which in
turn uses efi_pgd. Anything that tries to touch user space addresses
while in kthread is terminally broken. This patch set adds support to
the efi subsystem to handle all calls to efi_runtime_services() using a
work queue (which in turn uses kthread).

Implementation summary:
---
1. When a user/kernel thread requests to execute efi_runtime_service(),
enqueue work to a work queue, efi_rts_workqueue.
2. The caller thread waits until the work is finished because it's
dependent on the return status of efi_runtime_service() and, in specific
cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance, efi_get_variable()
and efi_get_next_variable(). Hence, the caller process cannot just post
the work and get going, it has to wait for results from firmware.

Caveat: efi_rts_workqueue to run efi_runtime_services() shouldn't be used
while in atomic, because caller thread might sleep. Presently, pstore
code doesn't use efi_rts_workqueue.

Tested using LUV (Linux UEFI Validation) for x86_64 and x86_32. Builds
fine for arm and arm64. Will appreciate the effort if someone could test
the patches on ARM (although I was able to boot with LUV for ARM).
LUV: https://01.org/linux-uefi-validation

Thanks to Ricardo and Dan for initial reviews and suggestions. Please
feel free to pour in your comments and concerns.
Note: Patches are based on Linus's kernel v4.16-rc2

Sai Praneeth (3):
  x86/efi: Call efi_delete_dummy_variable() during efi subsystem
initialization
  efi: Introduce efi_rts_workqueue and necessary infrastructure to
invoke all efi_runtime_services()
  efi: Use efi_rts_workqueue to invoke EFI Runtime Services

 arch/x86/include/asm/efi.h  |   1 -
 arch/x86/platform/efi/efi.c |   6 -
 drivers/firmware/efi/efi.c  |  18 +++
 drivers/firmware/efi/runtime-wrappers.c | 229 +---
 include/linux/efi.h |  26 
 5 files changed, 253 insertions(+), 27 deletions(-)

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Suggested-by: Andy Lutomirski <l...@kernel.org>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Peter Zijlstra <peter.zijls...@intel.com>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Dan Williams <dan.j.willi...@intel.com>

-- 
2.1.4



[PATCH V1 0/3] Use efi_rts_workqueue to invoke EFI Runtime Services

2018-02-24 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

This patch set is an outcome of the discussion at
https://lkml.org/lkml/2017/8/21/607

Presently, efi_runtime_services() are executed by firmware in process
context. To execute efi_runtime_service(), kernel switches the page
directory from swapper_pgd to efi_pgd. However, efi_pgd doesn't have any
user space mappings. A potential issue could be, for instance, an NMI
interrupt (like perf) trying to profile some user data while in efi_pgd.

A solution for this issue could be to use kthread to run
efi_runtime_service(). When a user/kernel thread requests to execute
efi_runtime_service(), kernel off-loads this work to kthread which in
turn uses efi_pgd. Anything that tries to touch user space addresses
while in kthread is terminally broken. This patch set adds support to
the efi subsystem to handle all calls to efi_runtime_services() using a
work queue (which in turn uses kthread).

Implementation summary:
---
1. When a user/kernel thread requests to execute efi_runtime_service(),
enqueue work to a work queue, efi_rts_workqueue.
2. The caller thread waits until the work is finished because it's
dependent on the return status of efi_runtime_service() and, in specific
cases, the arguments populated by efi_runtime_service(). Some
efi_runtime_services() takes a pointer to buffer as an argument and
fills up the buffer with requested data. For instance, efi_get_variable()
and efi_get_next_variable(). Hence, the caller process cannot just post
the work and get going, it has to wait for results from firmware.

Caveat: efi_rts_workqueue to run efi_runtime_services() shouldn't be used
while in atomic, because caller thread might sleep. Presently, pstore
code doesn't use efi_rts_workqueue.

Tested using LUV (Linux UEFI Validation) for x86_64 and x86_32. Builds
fine for arm and arm64. Will appreciate the effort if someone could test
the patches on ARM (although I was able to boot with LUV for ARM).
LUV: https://01.org/linux-uefi-validation

Thanks to Ricardo and Dan for initial reviews and suggestions. Please
feel free to pour in your comments and concerns.
Note: Patches are based on Linus's kernel v4.16-rc2

Sai Praneeth (3):
  x86/efi: Call efi_delete_dummy_variable() during efi subsystem
initialization
  efi: Introduce efi_rts_workqueue and necessary infrastructure to
invoke all efi_runtime_services()
  efi: Use efi_rts_workqueue to invoke EFI Runtime Services

 arch/x86/include/asm/efi.h  |   1 -
 arch/x86/platform/efi/efi.c |   6 -
 drivers/firmware/efi/efi.c  |  18 +++
 drivers/firmware/efi/runtime-wrappers.c | 229 +---
 include/linux/efi.h |  26 
 5 files changed, 253 insertions(+), 27 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Suggested-by: Andy Lutomirski 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Will Deacon 
Cc: Dave Hansen 
Cc: Mark Rutland 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Ravi Shankar 
Cc: Matt Fleming 
Cc: Peter Zijlstra 
Cc: Ard Biesheuvel 
Cc: Dan Williams 

-- 
2.1.4



[PATCH V4 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

2018-01-18 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Use helper function (efi_switch_mm()) to switch to/from efi_mm. We
switch to efi_mm before calling
1. efi_set_virtual_address_map() and
2. Invoking any efi_runtime_service()

Likewise, we need to switch back to previous mm (mm context stolen by
efi_mm) after the above calls return successfully. We can use
efi_switch_mm() helper function only with x86_64 kernel and
"efi=old_map" disabled because, x86_32 and efi=old_map doesn't use
efi_pgd, rather they use swapper_pg_dir.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>
---
 arch/x86/include/asm/efi.h   | 25 +-
 arch/x86/platform/efi/efi_64.c   | 40 +++-
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 3 files changed, 32 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 00f977ddd718..cda9940bed7a 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -62,14 +62,13 @@ extern asmlinkage u64 efi_call(void *fp, ...);
 #define efi_call_phys(f, args...)  efi_call((f), args)
 
 /*
- * Scratch space used for switching the pagetable in the EFI stub
+ * struct efi_scratch - Scratch space used while switching to/from efi_mm
+ * @phys_stack: stack used during EFI Mixed Mode
+ * @prev_mm:store/restore stolen mm_struct while switching to/from efi_mm
  */
 struct efi_scratch {
-   u64 r15;
-   u64 prev_cr3;
-   pgd_t   *efi_pgt;
-   booluse_pgd;
-   u64 phys_stack;
+   u64 phys_stack;
+   struct mm_struct*prev_mm;
 } __packed;
 
 #define arch_efi_call_virt_setup() \
@@ -78,11 +77,8 @@ struct efi_scratch {
preempt_disable();  \
__kernel_fpu_begin();   \
\
-   if (efi_scratch.use_pgd) {  \
-   efi_scratch.prev_cr3 = __read_cr3();\
-   write_cr3((unsigned long)efi_scratch.efi_pgt);  \
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(_mm); \
 })
 
 #define arch_efi_call_virt(p, f, args...)  \
@@ -90,10 +86,8 @@ struct efi_scratch {
 
 #define arch_efi_call_virt_teardown()  \
 ({ \
-   if (efi_scratch.use_pgd) {  \
-   write_cr3(efi_scratch.prev_cr3);\
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(efi_scratch.prev_mm); \
\
__kernel_fpu_end(); \
preempt_enable();   \
@@ -135,6 +129,7 @@ extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
 extern void efi_delete_dummy_variable(void);
+extern void efi_switch_mm(struct mm_struct *mm);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index c93f59731608..d6892ad2a693 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -82,9 +82,8 @@ pgd_t * __init efi_call_phys_prolog(void)
int n_pgds, i, j;
 
if (!efi_enabled(EFI_OLD_MEMMAP)) {
-   save_pgd = (pgd_t *)__read_cr3();
-   write_cr3((unsigned long)efi_scratch.efi_pgt);
-   goto out;
+   efi_switch_mm(_mm);
+   return NULL;
}
 
early_code_mapping_set_exec(1);
@@ -156,8 +155,7 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
   

[PATCH V4 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

2018-01-18 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Use helper function (efi_switch_mm()) to switch to/from efi_mm. We
switch to efi_mm before calling
1. efi_set_virtual_address_map() and
2. Invoking any efi_runtime_service()

Likewise, we need to switch back to previous mm (mm context stolen by
efi_mm) after the above calls return successfully. We can use
efi_switch_mm() helper function only with x86_64 kernel and
"efi=old_map" disabled because, x86_32 and efi=old_map doesn't use
efi_pgd, rather they use swapper_pg_dir.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Andy Lutomirski 
Cc: Michael S. Tsirkin 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: Ravi Shankar 
Tested-by: Bhupesh Sharma 
---
 arch/x86/include/asm/efi.h   | 25 +-
 arch/x86/platform/efi/efi_64.c   | 40 +++-
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 3 files changed, 32 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 00f977ddd718..cda9940bed7a 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -62,14 +62,13 @@ extern asmlinkage u64 efi_call(void *fp, ...);
 #define efi_call_phys(f, args...)  efi_call((f), args)
 
 /*
- * Scratch space used for switching the pagetable in the EFI stub
+ * struct efi_scratch - Scratch space used while switching to/from efi_mm
+ * @phys_stack: stack used during EFI Mixed Mode
+ * @prev_mm:store/restore stolen mm_struct while switching to/from efi_mm
  */
 struct efi_scratch {
-   u64 r15;
-   u64 prev_cr3;
-   pgd_t   *efi_pgt;
-   booluse_pgd;
-   u64 phys_stack;
+   u64 phys_stack;
+   struct mm_struct*prev_mm;
 } __packed;
 
 #define arch_efi_call_virt_setup() \
@@ -78,11 +77,8 @@ struct efi_scratch {
preempt_disable();  \
__kernel_fpu_begin();   \
\
-   if (efi_scratch.use_pgd) {  \
-   efi_scratch.prev_cr3 = __read_cr3();\
-   write_cr3((unsigned long)efi_scratch.efi_pgt);  \
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(_mm); \
 })
 
 #define arch_efi_call_virt(p, f, args...)  \
@@ -90,10 +86,8 @@ struct efi_scratch {
 
 #define arch_efi_call_virt_teardown()  \
 ({ \
-   if (efi_scratch.use_pgd) {  \
-   write_cr3(efi_scratch.prev_cr3);\
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(efi_scratch.prev_mm); \
\
__kernel_fpu_end(); \
preempt_enable();   \
@@ -135,6 +129,7 @@ extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
 extern void efi_delete_dummy_variable(void);
+extern void efi_switch_mm(struct mm_struct *mm);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index c93f59731608..d6892ad2a693 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -82,9 +82,8 @@ pgd_t * __init efi_call_phys_prolog(void)
int n_pgds, i, j;
 
if (!efi_enabled(EFI_OLD_MEMMAP)) {
-   save_pgd = (pgd_t *)__read_cr3();
-   write_cr3((unsigned long)efi_scratch.efi_pgt);
-   goto out;
+   efi_switch_mm(_mm);
+   return NULL;
}
 
early_code_mapping_set_exec(1);
@@ -156,8 +155,7 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
pud_t *pud;
 
if (!efi_enabled(EFI_OLD_MEMMAP)) {
-   write_cr3((unsigned long)save_pgd);
-   __flush_tlb_all();
+   efi_switch_mm(efi_scratch.prev_mm);
return;
}
 
@@ -346,13 +344,6 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
  

[PATCH V4 1/3] efi: Use efi_mm in x86 as well as ARM

2018-01-18 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, only ARM uses mm_struct to manage efi page tables and efi
runtime region mappings. As this is the preferred approach, let's make
this data structure common across architectures. Specially, for x86,
using this data structure improves code maintainability and readability.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>
---
 arch/x86/include/asm/efi.h | 4 
 arch/x86/platform/efi/efi_64.c | 3 +++
 drivers/firmware/efi/arm-runtime.c | 9 -
 drivers/firmware/efi/efi.c | 9 +
 include/linux/efi.h| 2 ++
 5 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb80b91..00f977ddd718 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -2,10 +2,14 @@
 #ifndef _ASM_X86_EFI_H
 #define _ASM_X86_EFI_H
 
+#include 
+#include 
+
 #include 
 #include 
 #include 
 #include 
+#include 
 
 /*
  * We map the EFI regions needed for runtime services non-contiguously,
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 2dd15e967c3f..c9f8e6924df7 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -232,6 +232,9 @@ int __init efi_alloc_page_tables(void)
return -ENOMEM;
}
 
+   mm_init_cpumask(_mm);
+   init_new_context(NULL, _mm);
+
return 0;
 }
 
diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 1cc41c3d6315..d6b26534812b 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -31,15 +31,6 @@
 
 extern u64 efi_system_table;
 
-static struct mm_struct efi_mm = {
-   .mm_rb  = RB_ROOT,
-   .mm_users   = ATOMIC_INIT(2),
-   .mm_count   = ATOMIC_INIT(1),
-   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
-   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
-   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
-};
-
 #ifdef CONFIG_ARM64_PTDUMP_DEBUGFS
 #include 
 
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 557a47829d03..760260b933b6 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -74,6 +74,15 @@ static unsigned long *efi_tables[] = {
_attr_table,
 };
 
+struct mm_struct efi_mm = {
+   .mm_rb  = RB_ROOT,
+   .mm_users   = ATOMIC_INIT(2),
+   .mm_count   = ATOMIC_INIT(1),
+   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
+   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
+   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
+};
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 29fdf8029cf6..d79f1cc4c8bb 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -930,6 +930,8 @@ extern struct efi {
unsigned long flags;
 } efi;
 
+extern struct mm_struct efi_mm;
+
 static inline int
 efi_guidcmp (efi_guid_t left, efi_guid_t right)
 {
-- 
2.1.4



[PATCH V4 2/3] x86/efi: Replace efi_pgd with efi_mm.pgd

2018-01-18 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Since the previous patch added support for efi_mm, let's handle efi_pgd
through efi_mm and remove global variable efi_pgd.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>
---
 arch/x86/platform/efi/efi_64.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index c9f8e6924df7..c93f59731608 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -191,8 +191,6 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
early_code_mapping_set_exec(0);
 }
 
-static pgd_t *efi_pgd;
-
 /*
  * We need our own copy of the higher levels of the page tables
  * because we want to avoid inserting EFI region mappings (EFI_VA_END
@@ -204,7 +202,7 @@ static pgd_t *efi_pgd;
  */
 int __init efi_alloc_page_tables(void)
 {
-   pgd_t *pgd;
+   pgd_t *pgd, *efi_pgd;
p4d_t *p4d;
pud_t *pud;
gfp_t gfp_mask;
@@ -232,6 +230,7 @@ int __init efi_alloc_page_tables(void)
return -ENOMEM;
}
 
+   efi_mm.pgd = efi_pgd;
mm_init_cpumask(_mm);
init_new_context(NULL, _mm);
 
@@ -247,6 +246,7 @@ void efi_sync_low_kernel_mappings(void)
pgd_t *pgd_k, *pgd_efi;
p4d_t *p4d_k, *p4d_efi;
pud_t *pud_k, *pud_efi;
+   pgd_t *efi_pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return;
@@ -340,7 +340,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
unsigned long pfn, text, pf;
struct page *page;
unsigned npages;
-   pgd_t *pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
@@ -350,8 +350,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * this value is loaded into cr3 the PGD will be decrypted during
 * the pagetable walk.
 */
-   efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
-   pgd = efi_pgd;
+   efi_scratch.efi_pgt = (pgd_t *)__sme_pa(pgd);
 
/*
 * It can happen that the physical address of new_memmap lands in memory
@@ -421,7 +420,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
 {
unsigned long flags = _PAGE_RW;
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
@@ -525,7 +524,7 @@ void __init parse_efi_setup(u64 phys_addr, u32 data_len)
 static int __init efi_update_mappings(efi_memory_desc_t *md, unsigned long pf)
 {
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
int err1, err2;
 
/* Update the 1:1 mapping */
@@ -622,7 +621,7 @@ void __init efi_dump_pagetable(void)
if (efi_enabled(EFI_OLD_MEMMAP))
ptdump_walk_pgd_level(NULL, swapper_pg_dir);
else
-   ptdump_walk_pgd_level(NULL, efi_pgd);
+   ptdump_walk_pgd_level(NULL, efi_mm.pgd);
 #endif
 }
 
-- 
2.1.4



[PATCH V4 2/3] x86/efi: Replace efi_pgd with efi_mm.pgd

2018-01-18 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Since the previous patch added support for efi_mm, let's handle efi_pgd
through efi_mm and remove global variable efi_pgd.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Andy Lutomirski 
Cc: Michael S. Tsirkin 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: Ravi Shankar 
Tested-by: Bhupesh Sharma 
---
 arch/x86/platform/efi/efi_64.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index c9f8e6924df7..c93f59731608 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -191,8 +191,6 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
early_code_mapping_set_exec(0);
 }
 
-static pgd_t *efi_pgd;
-
 /*
  * We need our own copy of the higher levels of the page tables
  * because we want to avoid inserting EFI region mappings (EFI_VA_END
@@ -204,7 +202,7 @@ static pgd_t *efi_pgd;
  */
 int __init efi_alloc_page_tables(void)
 {
-   pgd_t *pgd;
+   pgd_t *pgd, *efi_pgd;
p4d_t *p4d;
pud_t *pud;
gfp_t gfp_mask;
@@ -232,6 +230,7 @@ int __init efi_alloc_page_tables(void)
return -ENOMEM;
}
 
+   efi_mm.pgd = efi_pgd;
mm_init_cpumask(_mm);
init_new_context(NULL, _mm);
 
@@ -247,6 +246,7 @@ void efi_sync_low_kernel_mappings(void)
pgd_t *pgd_k, *pgd_efi;
p4d_t *p4d_k, *p4d_efi;
pud_t *pud_k, *pud_efi;
+   pgd_t *efi_pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return;
@@ -340,7 +340,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
unsigned long pfn, text, pf;
struct page *page;
unsigned npages;
-   pgd_t *pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
@@ -350,8 +350,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * this value is loaded into cr3 the PGD will be decrypted during
 * the pagetable walk.
 */
-   efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
-   pgd = efi_pgd;
+   efi_scratch.efi_pgt = (pgd_t *)__sme_pa(pgd);
 
/*
 * It can happen that the physical address of new_memmap lands in memory
@@ -421,7 +420,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
 {
unsigned long flags = _PAGE_RW;
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
@@ -525,7 +524,7 @@ void __init parse_efi_setup(u64 phys_addr, u32 data_len)
 static int __init efi_update_mappings(efi_memory_desc_t *md, unsigned long pf)
 {
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
int err1, err2;
 
/* Update the 1:1 mapping */
@@ -622,7 +621,7 @@ void __init efi_dump_pagetable(void)
if (efi_enabled(EFI_OLD_MEMMAP))
ptdump_walk_pgd_level(NULL, swapper_pg_dir);
else
-   ptdump_walk_pgd_level(NULL, efi_pgd);
+   ptdump_walk_pgd_level(NULL, efi_mm.pgd);
 #endif
 }
 
-- 
2.1.4



[PATCH V4 1/3] efi: Use efi_mm in x86 as well as ARM

2018-01-18 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, only ARM uses mm_struct to manage efi page tables and efi
runtime region mappings. As this is the preferred approach, let's make
this data structure common across architectures. Specially, for x86,
using this data structure improves code maintainability and readability.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Andy Lutomirski 
Cc: Michael S. Tsirkin 
Cc: Ricardo Neri 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: Ravi Shankar 
Tested-by: Bhupesh Sharma 
---
 arch/x86/include/asm/efi.h | 4 
 arch/x86/platform/efi/efi_64.c | 3 +++
 drivers/firmware/efi/arm-runtime.c | 9 -
 drivers/firmware/efi/efi.c | 9 +
 include/linux/efi.h| 2 ++
 5 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb80b91..00f977ddd718 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -2,10 +2,14 @@
 #ifndef _ASM_X86_EFI_H
 #define _ASM_X86_EFI_H
 
+#include 
+#include 
+
 #include 
 #include 
 #include 
 #include 
+#include 
 
 /*
  * We map the EFI regions needed for runtime services non-contiguously,
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 2dd15e967c3f..c9f8e6924df7 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -232,6 +232,9 @@ int __init efi_alloc_page_tables(void)
return -ENOMEM;
}
 
+   mm_init_cpumask(_mm);
+   init_new_context(NULL, _mm);
+
return 0;
 }
 
diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 1cc41c3d6315..d6b26534812b 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -31,15 +31,6 @@
 
 extern u64 efi_system_table;
 
-static struct mm_struct efi_mm = {
-   .mm_rb  = RB_ROOT,
-   .mm_users   = ATOMIC_INIT(2),
-   .mm_count   = ATOMIC_INIT(1),
-   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
-   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
-   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
-};
-
 #ifdef CONFIG_ARM64_PTDUMP_DEBUGFS
 #include 
 
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 557a47829d03..760260b933b6 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -74,6 +74,15 @@ static unsigned long *efi_tables[] = {
_attr_table,
 };
 
+struct mm_struct efi_mm = {
+   .mm_rb  = RB_ROOT,
+   .mm_users   = ATOMIC_INIT(2),
+   .mm_count   = ATOMIC_INIT(1),
+   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
+   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
+   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
+};
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 29fdf8029cf6..d79f1cc4c8bb 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -930,6 +930,8 @@ extern struct efi {
unsigned long flags;
 } efi;
 
+extern struct mm_struct efi_mm;
+
 static inline int
 efi_guidcmp (efi_guid_t left, efi_guid_t right)
 {
-- 
2.1.4



[PATCH V4 0/3] Use mm_struct and switch_mm() instead of manually

2018-01-18 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, in x86, to invoke any efi function like
efi_set_virtual_address_map() or any efi_runtime_service() the code path
typically involves read_cr3() (save previous pgd), write_cr3()
(write efi_pgd) and calling efi function. Likewise after returning from
efi function the code path typically involves read_cr3() (save efi_pgd),
write_cr3() (write previous pgd). We do this couple of times in efi
subsystem of Linux kernel, instead we can use helper function
efi_switch_mm() to do this. This improves readability and maintainability.
Also, instead of maintaining a separate struct "efi_scratch" to store/restore
efi_pgd, we can use mm_struct to do this.

I have tested this patch set against LUV (Linux UEFI Validation), so I
think I didn't break any existing configurations. I have tested this
patch set for
1. x86_64,
2. x86_32,
3. Mixed mode
with efi=old_map and for kexec kernel. Please let me know if I have
missed any other configurations.

Changes in V2:
1. Resolve mm_dropping() issue by not mm_dropping()/mm_grabbing() any mm,
as we are not losing/creating any references.

Changes in V3:
1. When CPUMASK_OFFSTACK is enabled, switch_mm_irqs_off() sets cpumask
by calling cpumask_set_cpu(). This panics kernel as efi_mm is not
initialized, therefore initialize efi_mm in efi_alloc_page_tables().

Changes in V4:
1. Remove the unintended removal of local_irq_restore(flags) (in 3rd patch).
IRQ flags should be restored after switching to orginal mm.

Note:
This patch set is based on Linus's tree v4.15-rc8

Sai Praneeth (3):
  efi: Use efi_mm in x86 as well as ARM
  x86/efi: Replace efi_pgd with efi_mm.pgd
  x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

 arch/x86/include/asm/efi.h   | 29 +-
 arch/x86/platform/efi/efi_64.c   | 58 +++-
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 drivers/firmware/efi/arm-runtime.c   |  9 --
 drivers/firmware/efi/efi.c   |  9 ++
 include/linux/efi.h  |  2 ++
 6 files changed, 57 insertions(+), 52 deletions(-)

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>

-- 
2.1.4



[PATCH V4 0/3] Use mm_struct and switch_mm() instead of manually

2018-01-18 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, in x86, to invoke any efi function like
efi_set_virtual_address_map() or any efi_runtime_service() the code path
typically involves read_cr3() (save previous pgd), write_cr3()
(write efi_pgd) and calling efi function. Likewise after returning from
efi function the code path typically involves read_cr3() (save efi_pgd),
write_cr3() (write previous pgd). We do this couple of times in efi
subsystem of Linux kernel, instead we can use helper function
efi_switch_mm() to do this. This improves readability and maintainability.
Also, instead of maintaining a separate struct "efi_scratch" to store/restore
efi_pgd, we can use mm_struct to do this.

I have tested this patch set against LUV (Linux UEFI Validation), so I
think I didn't break any existing configurations. I have tested this
patch set for
1. x86_64,
2. x86_32,
3. Mixed mode
with efi=old_map and for kexec kernel. Please let me know if I have
missed any other configurations.

Changes in V2:
1. Resolve mm_dropping() issue by not mm_dropping()/mm_grabbing() any mm,
as we are not losing/creating any references.

Changes in V3:
1. When CPUMASK_OFFSTACK is enabled, switch_mm_irqs_off() sets cpumask
by calling cpumask_set_cpu(). This panics kernel as efi_mm is not
initialized, therefore initialize efi_mm in efi_alloc_page_tables().

Changes in V4:
1. Remove the unintended removal of local_irq_restore(flags) (in 3rd patch).
IRQ flags should be restored after switching to orginal mm.

Note:
This patch set is based on Linus's tree v4.15-rc8

Sai Praneeth (3):
  efi: Use efi_mm in x86 as well as ARM
  x86/efi: Replace efi_pgd with efi_mm.pgd
  x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

 arch/x86/include/asm/efi.h   | 29 +-
 arch/x86/platform/efi/efi_64.c   | 58 +++-
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 drivers/firmware/efi/arm-runtime.c   |  9 --
 drivers/firmware/efi/efi.c   |  9 ++
 include/linux/efi.h  |  2 ++
 6 files changed, 57 insertions(+), 52 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Andy Lutomirski 
Cc: Michael S. Tsirkin 
Cc: Ricardo Neri 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: Ravi Shankar 
Tested-by: Bhupesh Sharma 

-- 
2.1.4



[PATCH 2/3] x86/efi: Replace efi_pgd with efi_mm.pgd

2017-12-16 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Since the previous patch added support for efi_mm, let's handle efi_pgd
through efi_mm and remove global variable efi_pgd.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>
---
 arch/x86/platform/efi/efi_64.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index ccf5239923e8..6b541bdbda5f 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -189,8 +189,6 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
early_code_mapping_set_exec(0);
 }
 
-static pgd_t *efi_pgd;
-
 /*
  * We need our own copy of the higher levels of the page tables
  * because we want to avoid inserting EFI region mappings (EFI_VA_END
@@ -199,7 +197,7 @@ static pgd_t *efi_pgd;
  */
 int __init efi_alloc_page_tables(void)
 {
-   pgd_t *pgd;
+   pgd_t *pgd, *efi_pgd;
p4d_t *p4d;
pud_t *pud;
gfp_t gfp_mask;
@@ -227,6 +225,7 @@ int __init efi_alloc_page_tables(void)
return -ENOMEM;
}
 
+   efi_mm.pgd = efi_pgd;
mm_init_cpumask(_mm);
init_new_context(NULL, _mm);
 
@@ -242,6 +241,7 @@ void efi_sync_low_kernel_mappings(void)
pgd_t *pgd_k, *pgd_efi;
p4d_t *p4d_k, *p4d_efi;
pud_t *pud_k, *pud_efi;
+   pgd_t *efi_pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return;
@@ -335,7 +335,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
unsigned long pfn, text, pf;
struct page *page;
unsigned npages;
-   pgd_t *pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
@@ -345,8 +345,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * this value is loaded into cr3 the PGD will be decrypted during
 * the pagetable walk.
 */
-   efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
-   pgd = efi_pgd;
+   efi_scratch.efi_pgt = (pgd_t *)__sme_pa(pgd);
 
/*
 * It can happen that the physical address of new_memmap lands in memory
@@ -416,7 +415,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
 {
unsigned long flags = _PAGE_RW;
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
@@ -520,7 +519,7 @@ void __init parse_efi_setup(u64 phys_addr, u32 data_len)
 static int __init efi_update_mappings(efi_memory_desc_t *md, unsigned long pf)
 {
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
int err1, err2;
 
/* Update the 1:1 mapping */
@@ -617,7 +616,7 @@ void __init efi_dump_pagetable(void)
if (efi_enabled(EFI_OLD_MEMMAP))
ptdump_walk_pgd_level(NULL, swapper_pg_dir);
else
-   ptdump_walk_pgd_level(NULL, efi_pgd);
+   ptdump_walk_pgd_level(NULL, efi_mm.pgd);
 #endif
 }
 
-- 
2.1.4



[PATCH 2/3] x86/efi: Replace efi_pgd with efi_mm.pgd

2017-12-16 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Since the previous patch added support for efi_mm, let's handle efi_pgd
through efi_mm and remove global variable efi_pgd.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Andy Lutomirski 
Cc: Michael S. Tsirkin 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: Ravi Shankar 
Tested-by: Bhupesh Sharma 
---
 arch/x86/platform/efi/efi_64.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index ccf5239923e8..6b541bdbda5f 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -189,8 +189,6 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
early_code_mapping_set_exec(0);
 }
 
-static pgd_t *efi_pgd;
-
 /*
  * We need our own copy of the higher levels of the page tables
  * because we want to avoid inserting EFI region mappings (EFI_VA_END
@@ -199,7 +197,7 @@ static pgd_t *efi_pgd;
  */
 int __init efi_alloc_page_tables(void)
 {
-   pgd_t *pgd;
+   pgd_t *pgd, *efi_pgd;
p4d_t *p4d;
pud_t *pud;
gfp_t gfp_mask;
@@ -227,6 +225,7 @@ int __init efi_alloc_page_tables(void)
return -ENOMEM;
}
 
+   efi_mm.pgd = efi_pgd;
mm_init_cpumask(_mm);
init_new_context(NULL, _mm);
 
@@ -242,6 +241,7 @@ void efi_sync_low_kernel_mappings(void)
pgd_t *pgd_k, *pgd_efi;
p4d_t *p4d_k, *p4d_efi;
pud_t *pud_k, *pud_efi;
+   pgd_t *efi_pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return;
@@ -335,7 +335,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
unsigned long pfn, text, pf;
struct page *page;
unsigned npages;
-   pgd_t *pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
@@ -345,8 +345,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * this value is loaded into cr3 the PGD will be decrypted during
 * the pagetable walk.
 */
-   efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
-   pgd = efi_pgd;
+   efi_scratch.efi_pgt = (pgd_t *)__sme_pa(pgd);
 
/*
 * It can happen that the physical address of new_memmap lands in memory
@@ -416,7 +415,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
 {
unsigned long flags = _PAGE_RW;
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
@@ -520,7 +519,7 @@ void __init parse_efi_setup(u64 phys_addr, u32 data_len)
 static int __init efi_update_mappings(efi_memory_desc_t *md, unsigned long pf)
 {
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
int err1, err2;
 
/* Update the 1:1 mapping */
@@ -617,7 +616,7 @@ void __init efi_dump_pagetable(void)
if (efi_enabled(EFI_OLD_MEMMAP))
ptdump_walk_pgd_level(NULL, swapper_pg_dir);
else
-   ptdump_walk_pgd_level(NULL, efi_pgd);
+   ptdump_walk_pgd_level(NULL, efi_mm.pgd);
 #endif
 }
 
-- 
2.1.4



[PATCH 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

2017-12-16 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Use helper function (efi_switch_mm()) to switch to/from efi_mm. We
switch to efi_mm before calling
1. efi_set_virtual_address_map() and
2. Invoking any efi_runtime_service()

Likewise, we need to switch back to previous mm (mm context stolen by
efi_mm) after the above calls return successfully. We can use
efi_switch_mm() helper function only with x86_64 kernel and
"efi=old_map" disabled because, x86_32 and efi=old_map doesn't use
efi_pgd, rather they use swapper_pg_dir.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Bhupesh Sharma <bhsha...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>
---
 arch/x86/include/asm/efi.h   | 25 +-
 arch/x86/platform/efi/efi_64.c   | 41 ++--
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 3 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 00f977ddd718..cda9940bed7a 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -62,14 +62,13 @@ extern asmlinkage u64 efi_call(void *fp, ...);
 #define efi_call_phys(f, args...)  efi_call((f), args)
 
 /*
- * Scratch space used for switching the pagetable in the EFI stub
+ * struct efi_scratch - Scratch space used while switching to/from efi_mm
+ * @phys_stack: stack used during EFI Mixed Mode
+ * @prev_mm:store/restore stolen mm_struct while switching to/from efi_mm
  */
 struct efi_scratch {
-   u64 r15;
-   u64 prev_cr3;
-   pgd_t   *efi_pgt;
-   booluse_pgd;
-   u64 phys_stack;
+   u64 phys_stack;
+   struct mm_struct*prev_mm;
 } __packed;
 
 #define arch_efi_call_virt_setup() \
@@ -78,11 +77,8 @@ struct efi_scratch {
preempt_disable();  \
__kernel_fpu_begin();   \
\
-   if (efi_scratch.use_pgd) {  \
-   efi_scratch.prev_cr3 = __read_cr3();\
-   write_cr3((unsigned long)efi_scratch.efi_pgt);  \
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(_mm); \
 })
 
 #define arch_efi_call_virt(p, f, args...)  \
@@ -90,10 +86,8 @@ struct efi_scratch {
 
 #define arch_efi_call_virt_teardown()  \
 ({ \
-   if (efi_scratch.use_pgd) {  \
-   write_cr3(efi_scratch.prev_cr3);\
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(efi_scratch.prev_mm); \
\
__kernel_fpu_end(); \
preempt_enable();   \
@@ -135,6 +129,7 @@ extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
 extern void efi_delete_dummy_variable(void);
+extern void efi_switch_mm(struct mm_struct *mm);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 6b541bdbda5f..c325b1cc4d1a 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -82,9 +82,8 @@ pgd_t * __init efi_call_phys_prolog(void)
int n_pgds, i, j;
 
if (!efi_enabled(EFI_OLD_MEMMAP)) {
-   save_pgd = (pgd_t *)__read_cr3();
-   write_cr3((unsigned long)efi_scratch.efi_pgt);
-   goto out;
+   efi_switch_mm(_mm);
+   return NULL;
}
 
early_code_mapping_set_exec(1);
@@ -154,8 +153,7 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
   

[PATCH 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

2017-12-16 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Use helper function (efi_switch_mm()) to switch to/from efi_mm. We
switch to efi_mm before calling
1. efi_set_virtual_address_map() and
2. Invoking any efi_runtime_service()

Likewise, we need to switch back to previous mm (mm context stolen by
efi_mm) after the above calls return successfully. We can use
efi_switch_mm() helper function only with x86_64 kernel and
"efi=old_map" disabled because, x86_32 and efi=old_map doesn't use
efi_pgd, rather they use swapper_pg_dir.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Andy Lutomirski 
Cc: Michael S. Tsirkin 
Cc: Bhupesh Sharma 
Cc: Ricardo Neri 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: Ravi Shankar 
Tested-by: Bhupesh Sharma 
---
 arch/x86/include/asm/efi.h   | 25 +-
 arch/x86/platform/efi/efi_64.c   | 41 ++--
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 3 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 00f977ddd718..cda9940bed7a 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -62,14 +62,13 @@ extern asmlinkage u64 efi_call(void *fp, ...);
 #define efi_call_phys(f, args...)  efi_call((f), args)
 
 /*
- * Scratch space used for switching the pagetable in the EFI stub
+ * struct efi_scratch - Scratch space used while switching to/from efi_mm
+ * @phys_stack: stack used during EFI Mixed Mode
+ * @prev_mm:store/restore stolen mm_struct while switching to/from efi_mm
  */
 struct efi_scratch {
-   u64 r15;
-   u64 prev_cr3;
-   pgd_t   *efi_pgt;
-   booluse_pgd;
-   u64 phys_stack;
+   u64 phys_stack;
+   struct mm_struct*prev_mm;
 } __packed;
 
 #define arch_efi_call_virt_setup() \
@@ -78,11 +77,8 @@ struct efi_scratch {
preempt_disable();  \
__kernel_fpu_begin();   \
\
-   if (efi_scratch.use_pgd) {  \
-   efi_scratch.prev_cr3 = __read_cr3();\
-   write_cr3((unsigned long)efi_scratch.efi_pgt);  \
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(_mm); \
 })
 
 #define arch_efi_call_virt(p, f, args...)  \
@@ -90,10 +86,8 @@ struct efi_scratch {
 
 #define arch_efi_call_virt_teardown()  \
 ({ \
-   if (efi_scratch.use_pgd) {  \
-   write_cr3(efi_scratch.prev_cr3);\
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(efi_scratch.prev_mm); \
\
__kernel_fpu_end(); \
preempt_enable();   \
@@ -135,6 +129,7 @@ extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
 extern void efi_delete_dummy_variable(void);
+extern void efi_switch_mm(struct mm_struct *mm);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 6b541bdbda5f..c325b1cc4d1a 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -82,9 +82,8 @@ pgd_t * __init efi_call_phys_prolog(void)
int n_pgds, i, j;
 
if (!efi_enabled(EFI_OLD_MEMMAP)) {
-   save_pgd = (pgd_t *)__read_cr3();
-   write_cr3((unsigned long)efi_scratch.efi_pgt);
-   goto out;
+   efi_switch_mm(_mm);
+   return NULL;
}
 
early_code_mapping_set_exec(1);
@@ -154,8 +153,7 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
pud_t *pud;
 
if (!efi_enabled(EFI_OLD_MEMMAP)) {
-   write_cr3((unsigned long)save_pgd);
-   __flush_tlb_all();
+   efi_switch_mm(efi_scratch.prev_mm);
return;
}
 
@@ -341,13 +339,6 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
  

[PATCH 1/3] efi: Use efi_mm in x86 as well as ARM

2017-12-16 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, only ARM uses mm_struct to manage efi page tables and efi
runtime region mappings. As this is the preferred approach, let's make
this data structure common across architectures. Specially, for x86,
using this data structure improves code maintainability and readability.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>
---
 arch/x86/include/asm/efi.h | 4 
 arch/x86/platform/efi/efi_64.c | 3 +++
 drivers/firmware/efi/arm-runtime.c | 9 -
 drivers/firmware/efi/efi.c | 9 +
 include/linux/efi.h| 2 ++
 5 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb80b91..00f977ddd718 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -2,10 +2,14 @@
 #ifndef _ASM_X86_EFI_H
 #define _ASM_X86_EFI_H
 
+#include 
+#include 
+
 #include 
 #include 
 #include 
 #include 
+#include 
 
 /*
  * We map the EFI regions needed for runtime services non-contiguously,
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 6a151ce70e86..ccf5239923e8 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -227,6 +227,9 @@ int __init efi_alloc_page_tables(void)
return -ENOMEM;
}
 
+   mm_init_cpumask(_mm);
+   init_new_context(NULL, _mm);
+
return 0;
 }
 
diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 1cc41c3d6315..d6b26534812b 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -31,15 +31,6 @@
 
 extern u64 efi_system_table;
 
-static struct mm_struct efi_mm = {
-   .mm_rb  = RB_ROOT,
-   .mm_users   = ATOMIC_INIT(2),
-   .mm_count   = ATOMIC_INIT(1),
-   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
-   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
-   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
-};
-
 #ifdef CONFIG_ARM64_PTDUMP_DEBUGFS
 #include 
 
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 557a47829d03..760260b933b6 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -74,6 +74,15 @@ static unsigned long *efi_tables[] = {
_attr_table,
 };
 
+struct mm_struct efi_mm = {
+   .mm_rb  = RB_ROOT,
+   .mm_users   = ATOMIC_INIT(2),
+   .mm_count   = ATOMIC_INIT(1),
+   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
+   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
+   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
+};
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index d813f7b04da7..6745f4dbbcc1 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -928,6 +928,8 @@ extern struct efi {
unsigned long flags;
 } efi;
 
+extern struct mm_struct efi_mm;
+
 static inline int
 efi_guidcmp (efi_guid_t left, efi_guid_t right)
 {
-- 
2.1.4



[PATCH 1/3] efi: Use efi_mm in x86 as well as ARM

2017-12-16 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, only ARM uses mm_struct to manage efi page tables and efi
runtime region mappings. As this is the preferred approach, let's make
this data structure common across architectures. Specially, for x86,
using this data structure improves code maintainability and readability.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Andy Lutomirski 
Cc: Michael S. Tsirkin 
Cc: Ricardo Neri 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: Ravi Shankar 
Tested-by: Bhupesh Sharma 
---
 arch/x86/include/asm/efi.h | 4 
 arch/x86/platform/efi/efi_64.c | 3 +++
 drivers/firmware/efi/arm-runtime.c | 9 -
 drivers/firmware/efi/efi.c | 9 +
 include/linux/efi.h| 2 ++
 5 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 85f6ccb80b91..00f977ddd718 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -2,10 +2,14 @@
 #ifndef _ASM_X86_EFI_H
 #define _ASM_X86_EFI_H
 
+#include 
+#include 
+
 #include 
 #include 
 #include 
 #include 
+#include 
 
 /*
  * We map the EFI regions needed for runtime services non-contiguously,
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 6a151ce70e86..ccf5239923e8 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -227,6 +227,9 @@ int __init efi_alloc_page_tables(void)
return -ENOMEM;
}
 
+   mm_init_cpumask(_mm);
+   init_new_context(NULL, _mm);
+
return 0;
 }
 
diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 1cc41c3d6315..d6b26534812b 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -31,15 +31,6 @@
 
 extern u64 efi_system_table;
 
-static struct mm_struct efi_mm = {
-   .mm_rb  = RB_ROOT,
-   .mm_users   = ATOMIC_INIT(2),
-   .mm_count   = ATOMIC_INIT(1),
-   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
-   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
-   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
-};
-
 #ifdef CONFIG_ARM64_PTDUMP_DEBUGFS
 #include 
 
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 557a47829d03..760260b933b6 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -74,6 +74,15 @@ static unsigned long *efi_tables[] = {
_attr_table,
 };
 
+struct mm_struct efi_mm = {
+   .mm_rb  = RB_ROOT,
+   .mm_users   = ATOMIC_INIT(2),
+   .mm_count   = ATOMIC_INIT(1),
+   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
+   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
+   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
+};
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index d813f7b04da7..6745f4dbbcc1 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -928,6 +928,8 @@ extern struct efi {
unsigned long flags;
 } efi;
 
+extern struct mm_struct efi_mm;
+
 static inline int
 efi_guidcmp (efi_guid_t left, efi_guid_t right)
 {
-- 
2.1.4



[PATCH 0/3] Use mm_struct and switch_mm() instead of manually

2017-12-16 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, in x86, to invoke any efi function like
efi_set_virtual_address_map() or any efi_runtime_service() the code path
typically involves read_cr3() (save previous pgd), write_cr3()
(write efi_pgd) and calling efi function. Likewise after returning from
efi function the code path typically involves read_cr3() (save efi_pgd),
write_cr3() (write previous pgd). We do this couple of times in efi
subsystem of Linux kernel, instead we can use helper function
efi_switch_mm() to do this. This improves readability and maintainability.
Also, instead of maintaining a separate struct "efi_scratch" to store/restore
efi_pgd, we can use mm_struct to do this.

I have tested this patch set against LUV (Linux UEFI Validation), so I
think I didn't break any existing configurations. I have tested this
patch set for
1. x86_64,
2. x86_32,
3. Mixed mode
with efi=old_map and for kexec kernel. Please let me know if I have
missed any other configurations.

Changes in V2:
1. Resolve mm_dropping() issue by not mm_dropping()/mm_grabbing() any mm,
as we are not losing/creating any references.

Changes in V3:
1. When CPUMASK_OFFSTACK is enabled, switch_mm_irqs_off() sets cpumask
by calling cpumask_set_cpu(). This panics kernel as efi_mm is not
initialized, therefore initialize efi_mm in efi_alloc_page_tables().

Note:
This patch set is based on Linus's tree v4.15-rc3

Sai Praneeth (3):
  efi: Use efi_mm in x86 as well as ARM
  x86/efi: Replace efi_pgd with efi_mm.pgd
  x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

 arch/x86/include/asm/efi.h   | 29 +-
 arch/x86/platform/efi/efi_64.c   | 59 +++-
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 drivers/firmware/efi/arm-runtime.c   |  9 --
 drivers/firmware/efi/efi.c   |  9 ++
 include/linux/efi.h  |  2 ++
 6 files changed, 57 insertions(+), 53 deletions(-)

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
Tested-by: Bhupesh Sharma <bhsha...@redhat.com>

-- 
2.1.4



[PATCH 0/3] Use mm_struct and switch_mm() instead of manually

2017-12-16 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, in x86, to invoke any efi function like
efi_set_virtual_address_map() or any efi_runtime_service() the code path
typically involves read_cr3() (save previous pgd), write_cr3()
(write efi_pgd) and calling efi function. Likewise after returning from
efi function the code path typically involves read_cr3() (save efi_pgd),
write_cr3() (write previous pgd). We do this couple of times in efi
subsystem of Linux kernel, instead we can use helper function
efi_switch_mm() to do this. This improves readability and maintainability.
Also, instead of maintaining a separate struct "efi_scratch" to store/restore
efi_pgd, we can use mm_struct to do this.

I have tested this patch set against LUV (Linux UEFI Validation), so I
think I didn't break any existing configurations. I have tested this
patch set for
1. x86_64,
2. x86_32,
3. Mixed mode
with efi=old_map and for kexec kernel. Please let me know if I have
missed any other configurations.

Changes in V2:
1. Resolve mm_dropping() issue by not mm_dropping()/mm_grabbing() any mm,
as we are not losing/creating any references.

Changes in V3:
1. When CPUMASK_OFFSTACK is enabled, switch_mm_irqs_off() sets cpumask
by calling cpumask_set_cpu(). This panics kernel as efi_mm is not
initialized, therefore initialize efi_mm in efi_alloc_page_tables().

Note:
This patch set is based on Linus's tree v4.15-rc3

Sai Praneeth (3):
  efi: Use efi_mm in x86 as well as ARM
  x86/efi: Replace efi_pgd with efi_mm.pgd
  x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

 arch/x86/include/asm/efi.h   | 29 +-
 arch/x86/platform/efi/efi_64.c   | 59 +++-
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 drivers/firmware/efi/arm-runtime.c   |  9 --
 drivers/firmware/efi/efi.c   |  9 ++
 include/linux/efi.h  |  2 ++
 6 files changed, 57 insertions(+), 53 deletions(-)

Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Andy Lutomirski 
Cc: Michael S. Tsirkin 
Cc: Ricardo Neri 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: Ravi Shankar 
Tested-by: Bhupesh Sharma 

-- 
2.1.4



Re: [PATCH v4 00/10] PCID and improved laziness

2017-09-12 Thread Sai Praneeth Prakhya
> 
> 
> Hi Andy,
> 
> I have booted Linus's tree (8fac2f96ab86b0e14ec4e42851e21e9b518bdc55) on
> Skylake server and noticed that it reboots automatically.
> 
> When I booted the same kernel with command line arg "nopcid" it works
> fine. Please find below a snippet of dmesg. Please let me know if you
> need more info to debug.
> 
> [0.00] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.13.0+
> root=UUID=3b8e9636-6e23-4785-a4e2-5954bfe86fd9 ro console=tty0
> console=ttyS0,115200n8
> [0.00] log_buf_len individual max cpu contribution: 4096 bytes
> [0.00] log_buf_len total cpu_extra contributions: 258048 bytes
> [0.00] log_buf_len min size: 262144 bytes
> [0.00] log_buf_len: 524288 bytes
> [0.00] early log buf free: 212560(81%)
> [0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
> [0.00] [ cut here ]
> [0.00] WARNING: CPU: 0 PID: 0 at arch/x86/mm/tlb.c:245
> initialize_tlbstate_and_flush+0x6c/0xf0
> [0.00] Modules linked in:
> [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.13.0+ #5
> [0.00] task: 8960f480 task.stack: 8960
> [0.00] RIP: 0010:initialize_tlbstate_and_flush+0x6c/0xf0
> [0.00] RSP: :89603e60 EFLAGS: 00010046
> [0.00] RAX: 000406b0 RBX: 9f1700a17880 RCX:
> 8965de60
> [0.00] RDX: 008383a0a000 RSI: 0960a000 RDI:
> 008383a0a000
> [0.00] RBP: 89603e60 R08:  R09:
> 
> [0.00] R10: 89603ee8 R11:  R12:
> 
> [0.00] R13: 9f1700a0c3e0 R14: 8960f480 R15:
> 
> [0.00] FS:  () GS:9f1700a0()
> knlGS:
> [0.00] CS:  0010 DS:  ES:  CR0: 80050033
> [0.00] CR2: 9fa7b000 CR3: 008383a0a000 CR4:
> 000406b0
> [0.00] Call Trace:
> [0.00]  cpu_init+0x206/0x4f0
> [0.00]  ? __set_pte_vaddr+0x1d/0x30
> [0.00]  trap_init+0x3e/0x50
> [0.00]  ? trap_init+0x3e/0x50
> [0.00]  start_kernel+0x1e2/0x3f2
> [0.00]  x86_64_start_reservations+0x24/0x26
> [0.00]  x86_64_start_kernel+0x6f/0x72
> [0.00]  secondary_startup_64+0xa5/0xa5
> [0.00] Code: de 00 48 01 f0 48 39 c7 0f 85 92 00 00 00 48 8b 05
> ee e2 ee 00 a9 00 00 02 00 74 11 65 48 8b 05 8b 9d 7c 77 a9 00 00 02 00
> 75 02 <0f> ff 48 81 e2 00 f0 ff ff 0f 22 da 65 66 c7 05 66 9d 7c 77 00 
> [0.00] ---[ end trace c258f2d278fe031f ]---
> [0.00] Memory: 791050356K/803934656K available (9585K kernel
> code, 1313K rwdata, 3000K rodata, 1176K init, 680K bss, 12884300K
> reserved, 0K cma-reserved)
> [0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=64,
> Nodes=4
> [0.00] Hierarchical RCU implementation.
> [0.00]RCU event tracing is enabled.
> [0.00] NR_IRQS: 4352, nr_irqs: 3928, preallocated irqs: 16
> [0.00] Console: colour dummy device 80x25
> [0.00] console [tty0] enabled
> [0.00] console [ttyS0] enabled
> [0.00] clocksource: hpet: mask: 0x max_cycles:
> 0x, max_idle_ns: 79635855245 ns
> [0.001000] tsc: Detected 2000.000 MHz processor
> [0.002000] Calibrating delay loop (skipped), value calculated using
> timer frequency.. 4000.00 BogoMIPS (lpj=200)
> [0.003003] pid_max: default: 65536 minimum: 512
> [0.004030] ACPI: Core revision 20170728
> [0.091853] ACPI: 6 ACPI AML tables successfully acquired and loaded
> [0.094143] Security Framework initialized
> [0.095004] SELinux:  Initializing.
> [0.145612] Dentry cache hash table entries: 33554432 (order: 16,
> 268435456 bytes)
> [0.170544] Inode-cache hash table entries: 16777216 (order: 15,
> 134217728 bytes)
> [0.172699] Mount-cache hash table entries: 524288 (order: 10,
> 4194304 bytes)
> [0.174441] Mountpoint-cache hash table entries: 524288 (order: 10,
> 4194304 bytes)
> [0.176351] CPU: Physical Processor ID: 0
> [0.177003] CPU: Processor Core ID: 0
> [0.178007] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
> [0.179003] ENERGY_PERF_BIAS: View and update with
> x86_energy_perf_policy(8)
> [0.180013] mce: CPU supports 20 MCE banks
> [0.181018] CPU0: Thermal monitoring enabled (TM1)
> [0.182057] process: using mwait in idle threads
> [0.183005] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
> [0.184003] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
> [0.185223] Freeing SMP alternatives memory: 36K
> [0.193912] smpboot: Max logical packages: 8
> [0.194017] Switched APIC routing to physical flat.
> [0.196496] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [0.206252] smpboot: CPU0: Intel(R) Xeon(R) Platinum 8164 CPU @
> 2.00GHz (family: 0x6, model: 0x55, stepping: 0x4)
> [   

Re: [PATCH v4 00/10] PCID and improved laziness

2017-09-12 Thread Sai Praneeth Prakhya
> 
> 
> Hi Andy,
> 
> I have booted Linus's tree (8fac2f96ab86b0e14ec4e42851e21e9b518bdc55) on
> Skylake server and noticed that it reboots automatically.
> 
> When I booted the same kernel with command line arg "nopcid" it works
> fine. Please find below a snippet of dmesg. Please let me know if you
> need more info to debug.
> 
> [0.00] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.13.0+
> root=UUID=3b8e9636-6e23-4785-a4e2-5954bfe86fd9 ro console=tty0
> console=ttyS0,115200n8
> [0.00] log_buf_len individual max cpu contribution: 4096 bytes
> [0.00] log_buf_len total cpu_extra contributions: 258048 bytes
> [0.00] log_buf_len min size: 262144 bytes
> [0.00] log_buf_len: 524288 bytes
> [0.00] early log buf free: 212560(81%)
> [0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
> [0.00] [ cut here ]
> [0.00] WARNING: CPU: 0 PID: 0 at arch/x86/mm/tlb.c:245
> initialize_tlbstate_and_flush+0x6c/0xf0
> [0.00] Modules linked in:
> [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.13.0+ #5
> [0.00] task: 8960f480 task.stack: 8960
> [0.00] RIP: 0010:initialize_tlbstate_and_flush+0x6c/0xf0
> [0.00] RSP: :89603e60 EFLAGS: 00010046
> [0.00] RAX: 000406b0 RBX: 9f1700a17880 RCX:
> 8965de60
> [0.00] RDX: 008383a0a000 RSI: 0960a000 RDI:
> 008383a0a000
> [0.00] RBP: 89603e60 R08:  R09:
> 
> [0.00] R10: 89603ee8 R11:  R12:
> 
> [0.00] R13: 9f1700a0c3e0 R14: 8960f480 R15:
> 
> [0.00] FS:  () GS:9f1700a0()
> knlGS:
> [0.00] CS:  0010 DS:  ES:  CR0: 80050033
> [0.00] CR2: 9fa7b000 CR3: 008383a0a000 CR4:
> 000406b0
> [0.00] Call Trace:
> [0.00]  cpu_init+0x206/0x4f0
> [0.00]  ? __set_pte_vaddr+0x1d/0x30
> [0.00]  trap_init+0x3e/0x50
> [0.00]  ? trap_init+0x3e/0x50
> [0.00]  start_kernel+0x1e2/0x3f2
> [0.00]  x86_64_start_reservations+0x24/0x26
> [0.00]  x86_64_start_kernel+0x6f/0x72
> [0.00]  secondary_startup_64+0xa5/0xa5
> [0.00] Code: de 00 48 01 f0 48 39 c7 0f 85 92 00 00 00 48 8b 05
> ee e2 ee 00 a9 00 00 02 00 74 11 65 48 8b 05 8b 9d 7c 77 a9 00 00 02 00
> 75 02 <0f> ff 48 81 e2 00 f0 ff ff 0f 22 da 65 66 c7 05 66 9d 7c 77 00 
> [0.00] ---[ end trace c258f2d278fe031f ]---
> [0.00] Memory: 791050356K/803934656K available (9585K kernel
> code, 1313K rwdata, 3000K rodata, 1176K init, 680K bss, 12884300K
> reserved, 0K cma-reserved)
> [0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=64,
> Nodes=4
> [0.00] Hierarchical RCU implementation.
> [0.00]RCU event tracing is enabled.
> [0.00] NR_IRQS: 4352, nr_irqs: 3928, preallocated irqs: 16
> [0.00] Console: colour dummy device 80x25
> [0.00] console [tty0] enabled
> [0.00] console [ttyS0] enabled
> [0.00] clocksource: hpet: mask: 0x max_cycles:
> 0x, max_idle_ns: 79635855245 ns
> [0.001000] tsc: Detected 2000.000 MHz processor
> [0.002000] Calibrating delay loop (skipped), value calculated using
> timer frequency.. 4000.00 BogoMIPS (lpj=200)
> [0.003003] pid_max: default: 65536 minimum: 512
> [0.004030] ACPI: Core revision 20170728
> [0.091853] ACPI: 6 ACPI AML tables successfully acquired and loaded
> [0.094143] Security Framework initialized
> [0.095004] SELinux:  Initializing.
> [0.145612] Dentry cache hash table entries: 33554432 (order: 16,
> 268435456 bytes)
> [0.170544] Inode-cache hash table entries: 16777216 (order: 15,
> 134217728 bytes)
> [0.172699] Mount-cache hash table entries: 524288 (order: 10,
> 4194304 bytes)
> [0.174441] Mountpoint-cache hash table entries: 524288 (order: 10,
> 4194304 bytes)
> [0.176351] CPU: Physical Processor ID: 0
> [0.177003] CPU: Processor Core ID: 0
> [0.178007] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
> [0.179003] ENERGY_PERF_BIAS: View and update with
> x86_energy_perf_policy(8)
> [0.180013] mce: CPU supports 20 MCE banks
> [0.181018] CPU0: Thermal monitoring enabled (TM1)
> [0.182057] process: using mwait in idle threads
> [0.183005] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
> [0.184003] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
> [0.185223] Freeing SMP alternatives memory: 36K
> [0.193912] smpboot: Max logical packages: 8
> [0.194017] Switched APIC routing to physical flat.
> [0.196496] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [0.206252] smpboot: CPU0: Intel(R) Xeon(R) Platinum 8164 CPU @
> 2.00GHz (family: 0x6, model: 0x55, stepping: 0x4)
> [   

Re: [PATCH v4 00/10] PCID and improved laziness

2017-09-12 Thread Sai Praneeth Prakhya
> From: Andy Lutomirski 
> Date: Thu, Jun 29, 2017 at 8:53 AM
> Subject: [PATCH v4 00/10] PCID and improved laziness
> To: x...@kernel.org
> Cc: linux-kernel@vger.kernel.org, Borislav Petkov ,
> Linus Torvalds , Andrew Morton
> , Mel Gorman ,
> "linux...@kvack.org" , Nadav Amit
> , Rik van Riel , Dave Hansen
> , Arjan van de Ven ,
> Peter Zijlstra , Andy Lutomirski
> 
> 
> 
> *** Ingo, even if this misses 4.13, please apply the first patch
> before
> *** the merge window.
> 
> There are three performance benefits here:
> 
> 1. TLB flushing is slow.  (I.e. the flush itself takes a while.)
>This avoids many of them when switching tasks by using PCID.  In
>a stupid little benchmark I did, it saves about 100ns on my laptop
>per context switch.  I'll try to improve that benchmark.
> 
> 2. Mms that have been used recently on a given CPU might get to keep
>their TLB entries alive across process switches with this patch
>set.  TLB fills are pretty fast on modern CPUs, but they're even
>faster when they don't happen.
> 
> 3. Lazy TLB is way better.  We used to do two stupid things when we
>ran kernel threads: we'd send IPIs to flush user contexts on their
>CPUs and then we'd write to CR3 for no particular reason as an
> excuse
>to stop further IPIs.  With this patch, we do neither.
> 
> This will, in general, perform suboptimally if paravirt TLB flushing
> is in use (currently just Xen, I think, but Hyper-V is in the works).
> The code is structured so we could fix it in one of two ways: we
> could take a spinlock when touching the percpu state so we can update
> it remotely after a paravirt flush, or we could be more careful about
> our exactly how we access the state and use cmpxchg16b to do atomic
> remote updates.  (On SMP systems without cmpxchg16b, we'd just skip
> the optimization entirely.)
> 
> This is still missing a final comment-only patch to add overall
> documentation for the whole thing, but I didn't want to block sending
> the maybe-hopefully-final code on that.
> 
> This is based on tip:x86/mm.  The branch is here if you want to play:
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/pcid
> 
> In general, performance seems to exceed my expectations.  Here are
> some performance numbers copy-and-pasted from the changelogs for
> "Rework lazy TLB mode and TLB freshness" and "Try to preserve old
> TLB entries using PCID":
> 
> 

Hi Andy,

I have booted Linus's tree (8fac2f96ab86b0e14ec4e42851e21e9b518bdc55) on
Skylake server and noticed that it reboots automatically.

When I booted the same kernel with command line arg "nopcid" it works
fine. Please find below a snippet of dmesg. Please let me know if you
need more info to debug.

[0.00] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.13.0+
root=UUID=3b8e9636-6e23-4785-a4e2-5954bfe86fd9 ro console=tty0
console=ttyS0,115200n8
[0.00] log_buf_len individual max cpu contribution: 4096 bytes
[0.00] log_buf_len total cpu_extra contributions: 258048 bytes
[0.00] log_buf_len min size: 262144 bytes
[0.00] log_buf_len: 524288 bytes
[0.00] early log buf free: 212560(81%)
[0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
[0.00] [ cut here ]
[0.00] WARNING: CPU: 0 PID: 0 at arch/x86/mm/tlb.c:245
initialize_tlbstate_and_flush+0x6c/0xf0
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.13.0+ #5
[0.00] task: 8960f480 task.stack: 8960
[0.00] RIP: 0010:initialize_tlbstate_and_flush+0x6c/0xf0
[0.00] RSP: :89603e60 EFLAGS: 00010046
[0.00] RAX: 000406b0 RBX: 9f1700a17880 RCX:
8965de60
[0.00] RDX: 008383a0a000 RSI: 0960a000 RDI:
008383a0a000
[0.00] RBP: 89603e60 R08:  R09:

[0.00] R10: 89603ee8 R11:  R12:

[0.00] R13: 9f1700a0c3e0 R14: 8960f480 R15:

[0.00] FS:  () GS:9f1700a0()
knlGS:
[0.00] CS:  0010 DS:  ES:  CR0: 80050033
[0.00] CR2: 9fa7b000 CR3: 008383a0a000 CR4:
000406b0
[0.00] Call Trace:
[0.00]  cpu_init+0x206/0x4f0
[0.00]  ? __set_pte_vaddr+0x1d/0x30
[0.00]  trap_init+0x3e/0x50
[0.00]  ? trap_init+0x3e/0x50
[0.00]  start_kernel+0x1e2/0x3f2
[0.00]  x86_64_start_reservations+0x24/0x26
[0.00]  x86_64_start_kernel+0x6f/0x72
[0.00]  secondary_startup_64+0xa5/0xa5
[0.00] Code: de 00 48 01 f0 48 39 

Re: [PATCH v4 00/10] PCID and improved laziness

2017-09-12 Thread Sai Praneeth Prakhya
> From: Andy Lutomirski 
> Date: Thu, Jun 29, 2017 at 8:53 AM
> Subject: [PATCH v4 00/10] PCID and improved laziness
> To: x...@kernel.org
> Cc: linux-kernel@vger.kernel.org, Borislav Petkov ,
> Linus Torvalds , Andrew Morton
> , Mel Gorman ,
> "linux...@kvack.org" , Nadav Amit
> , Rik van Riel , Dave Hansen
> , Arjan van de Ven ,
> Peter Zijlstra , Andy Lutomirski
> 
> 
> 
> *** Ingo, even if this misses 4.13, please apply the first patch
> before
> *** the merge window.
> 
> There are three performance benefits here:
> 
> 1. TLB flushing is slow.  (I.e. the flush itself takes a while.)
>This avoids many of them when switching tasks by using PCID.  In
>a stupid little benchmark I did, it saves about 100ns on my laptop
>per context switch.  I'll try to improve that benchmark.
> 
> 2. Mms that have been used recently on a given CPU might get to keep
>their TLB entries alive across process switches with this patch
>set.  TLB fills are pretty fast on modern CPUs, but they're even
>faster when they don't happen.
> 
> 3. Lazy TLB is way better.  We used to do two stupid things when we
>ran kernel threads: we'd send IPIs to flush user contexts on their
>CPUs and then we'd write to CR3 for no particular reason as an
> excuse
>to stop further IPIs.  With this patch, we do neither.
> 
> This will, in general, perform suboptimally if paravirt TLB flushing
> is in use (currently just Xen, I think, but Hyper-V is in the works).
> The code is structured so we could fix it in one of two ways: we
> could take a spinlock when touching the percpu state so we can update
> it remotely after a paravirt flush, or we could be more careful about
> our exactly how we access the state and use cmpxchg16b to do atomic
> remote updates.  (On SMP systems without cmpxchg16b, we'd just skip
> the optimization entirely.)
> 
> This is still missing a final comment-only patch to add overall
> documentation for the whole thing, but I didn't want to block sending
> the maybe-hopefully-final code on that.
> 
> This is based on tip:x86/mm.  The branch is here if you want to play:
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/pcid
> 
> In general, performance seems to exceed my expectations.  Here are
> some performance numbers copy-and-pasted from the changelogs for
> "Rework lazy TLB mode and TLB freshness" and "Try to preserve old
> TLB entries using PCID":
> 
> 

Hi Andy,

I have booted Linus's tree (8fac2f96ab86b0e14ec4e42851e21e9b518bdc55) on
Skylake server and noticed that it reboots automatically.

When I booted the same kernel with command line arg "nopcid" it works
fine. Please find below a snippet of dmesg. Please let me know if you
need more info to debug.

[0.00] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.13.0+
root=UUID=3b8e9636-6e23-4785-a4e2-5954bfe86fd9 ro console=tty0
console=ttyS0,115200n8
[0.00] log_buf_len individual max cpu contribution: 4096 bytes
[0.00] log_buf_len total cpu_extra contributions: 258048 bytes
[0.00] log_buf_len min size: 262144 bytes
[0.00] log_buf_len: 524288 bytes
[0.00] early log buf free: 212560(81%)
[0.00] PID hash table entries: 4096 (order: 3, 32768 bytes)
[0.00] [ cut here ]
[0.00] WARNING: CPU: 0 PID: 0 at arch/x86/mm/tlb.c:245
initialize_tlbstate_and_flush+0x6c/0xf0
[0.00] Modules linked in:
[0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 4.13.0+ #5
[0.00] task: 8960f480 task.stack: 8960
[0.00] RIP: 0010:initialize_tlbstate_and_flush+0x6c/0xf0
[0.00] RSP: :89603e60 EFLAGS: 00010046
[0.00] RAX: 000406b0 RBX: 9f1700a17880 RCX:
8965de60
[0.00] RDX: 008383a0a000 RSI: 0960a000 RDI:
008383a0a000
[0.00] RBP: 89603e60 R08:  R09:

[0.00] R10: 89603ee8 R11:  R12:

[0.00] R13: 9f1700a0c3e0 R14: 8960f480 R15:

[0.00] FS:  () GS:9f1700a0()
knlGS:
[0.00] CS:  0010 DS:  ES:  CR0: 80050033
[0.00] CR2: 9fa7b000 CR3: 008383a0a000 CR4:
000406b0
[0.00] Call Trace:
[0.00]  cpu_init+0x206/0x4f0
[0.00]  ? __set_pte_vaddr+0x1d/0x30
[0.00]  trap_init+0x3e/0x50
[0.00]  ? trap_init+0x3e/0x50
[0.00]  start_kernel+0x1e2/0x3f2
[0.00]  x86_64_start_reservations+0x24/0x26
[0.00]  x86_64_start_kernel+0x6f/0x72
[0.00]  secondary_startup_64+0xa5/0xa5
[0.00] Code: de 00 48 01 f0 48 39 c7 0f 85 92 00 00 00 48 8b 05
ee e2 ee 00 a9 00 00 02 00 74 11 65 48 8b 05 8b 9d 7c 77 a9 00 00 02 00
75 02 <0f> ff 48 81 e2 00 f0 ff ff 0f 22 da 65 66 c7 05 66 9d 7c 77 00 
[0.00] ---[ end trace c258f2d278fe031f ]---
[0.00] Memory: 

Re: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of manually

2017-09-05 Thread Sai Praneeth Prakhya
On Tue, 2017-09-05 at 19:21 -0700, Sai Praneeth Prakhya wrote:
> > I get a similar crash on Qemu with linus's master branch and the V2
> > applied on top of it. Here are the details of my test environment:
> > 
> > 1. I use the OVMF (EDK2) EFI firmware to launch the kernel:
> > edk2.git/ovmf-x64
> > 
> > 2. I used linus's master branch (HEAD - commit:
> > b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top
> > of the same.
> > 
> > 3. I use the following qemu command line to launch the test:
> > 
> > # /usr/local/bin/qemu-system-x86_64 --version
> > QEMU emulator version 2.9.50 (v2.9.0-526-g76d20ea)
> > Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
> > 
> > # /usr/local/bin/qemu-system-x86_64 -enable-kvm  -net nic -net tap  -m
> > $MEMSIZE -nographic -drive file=$DISK_IMAGE,if=virtio,format=qcow2
> > -vga std -boot c -cpu host -kernel $KERNEL -append
> > "crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81"  -initrd
> > $INITRAMFS -bios $OVMF_FW_PATH
> > 
> > And here is the crash log:
> > 
> > [0.006054] general protection fault:  [#1] SMP
> > [0.006459] Modules linked in:
> > [0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3
> > [0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS 0.0.0 02/06/2015
> > [0.007000] task: 81e0f480 task.stack: 81e0
> > [0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440
> > [0.007000] RSP: :81e03d80 EFLAGS: 00010086
> > [0.007000] RAX: 80007d084000 RBX:  RCX: 
> > 77ff8000
> > [0.007000] RDX: 7d084000 RSI: 8000 RDI: 
> > 00019a00
> > [0.007000] RBP: 81e03dc0 R08:  R09: 
> > 88007d085000
> > [0.007000] R10: 81e03dd8 R11: 7d095063 R12: 
> > 81e5c6a0
> > [0.007000] R13: 81ed4f40 R14: 0030 R15: 
> > 0001
> > [0.007000] FS:  () GS:88007d40()
> > knlGS:
> > [0.007000] CS:  0010 DS:  ES:  CR0: 80050033
> > [0.007000] CR2: 88007d754000 CR3: 0220a000 CR4: 
> > 000406b0
> > [0.007000] Call Trace:
> > [0.007000]  switch_mm+0xd/0x20
> > [0.007000]  ? switch_mm+0xd/0x20
> > [0.007000]  efi_switch_mm+0x3e/0x4a
> > [0.007000]  efi_call_phys_prolog+0x28/0x1ac
> > [0.007000]  efi_enter_virtual_mode+0x35a/0x48f
> > [0.007000]  start_kernel+0x332/0x3b8
> > [0.007000]  x86_64_start_reservations+0x2a/0x2c
> > [0.007000]  x86_64_start_kernel+0x178/0x18b
> > [0.007000]  secondary_startup_64+0xa5/0xa5
> > [0.007000]  ? secondary_startup_64+0xa5/0xa5
> > [0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9
> > 00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09
> > f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87 fb
> > 7e 89
> > [0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: 81e03d80
> > [0.007000] ---[ end trace bfa55bf4e4765255 ]---
> > [0.007000] Kernel panic - not syncing: Attempted to kill the idle task!
> > [0.007000] ---[ end Kernel panic - not syncing: Attempted to kill
> > the idle task!
> > 
> > 4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf
> > firmware and 64-bit x86 kernel) with your patches, the primary kernel
> > boots fine on Qemu:
> > 
> > ovmf firmware used in this case - edk2.git/ovmf-ia32
> > 
> > 5. Also, if I append 'efi=old_map' to the bootargs (for the failing
> > case in point 3 above), I see the primary kernel boots fine on Qemu as
> > well.
> > 
> > Regards,
> > Bhupesh
> 
> Hi Bhupesh,
> 
> Thanks a lot for the detailed explanation. They are helpful to reproduce
> the issue quickly. From my initial debug, I think that AMD SME +
> efi_mm_struct patches + -cpu host (in qemu) are required to reproduce
> the issue on qemu.
> 
> I have tried the following combinations (all tests are on qemu):
> On Linus's tree:
> 1. With  SME and  efi_mm and  -cpu host -> panics
> 2. With  SME and  efi_mm and !-cpu host -> boots
> 3. With  SME and !efi_mm and  -cpu host -> boots
> 4. With  SME and !efi_mm and !-cpu host -> boots
> 5. With !SME and  efi_mm and  -cpu host -> boots
> 6. With !SME and  efi_mm and !-cpu host -> boots
> 7. With !SME and !efi_mm and  -cpu host -> boots
> 8. With !SME and 

Re: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of manually

2017-09-05 Thread Sai Praneeth Prakhya
On Tue, 2017-09-05 at 19:21 -0700, Sai Praneeth Prakhya wrote:
> > I get a similar crash on Qemu with linus's master branch and the V2
> > applied on top of it. Here are the details of my test environment:
> > 
> > 1. I use the OVMF (EDK2) EFI firmware to launch the kernel:
> > edk2.git/ovmf-x64
> > 
> > 2. I used linus's master branch (HEAD - commit:
> > b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top
> > of the same.
> > 
> > 3. I use the following qemu command line to launch the test:
> > 
> > # /usr/local/bin/qemu-system-x86_64 --version
> > QEMU emulator version 2.9.50 (v2.9.0-526-g76d20ea)
> > Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
> > 
> > # /usr/local/bin/qemu-system-x86_64 -enable-kvm  -net nic -net tap  -m
> > $MEMSIZE -nographic -drive file=$DISK_IMAGE,if=virtio,format=qcow2
> > -vga std -boot c -cpu host -kernel $KERNEL -append
> > "crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81"  -initrd
> > $INITRAMFS -bios $OVMF_FW_PATH
> > 
> > And here is the crash log:
> > 
> > [0.006054] general protection fault:  [#1] SMP
> > [0.006459] Modules linked in:
> > [0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3
> > [0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS 0.0.0 02/06/2015
> > [0.007000] task: 81e0f480 task.stack: 81e0
> > [0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440
> > [0.007000] RSP: :81e03d80 EFLAGS: 00010086
> > [0.007000] RAX: 80007d084000 RBX:  RCX: 
> > 77ff8000
> > [0.007000] RDX: 7d084000 RSI: 8000 RDI: 
> > 00019a00
> > [0.007000] RBP: 81e03dc0 R08:  R09: 
> > 88007d085000
> > [0.007000] R10: 81e03dd8 R11: 7d095063 R12: 
> > 81e5c6a0
> > [0.007000] R13: 81ed4f40 R14: 0030 R15: 
> > 0001
> > [0.007000] FS:  () GS:88007d40()
> > knlGS:
> > [0.007000] CS:  0010 DS:  ES:  CR0: 80050033
> > [0.007000] CR2: 88007d754000 CR3: 0220a000 CR4: 
> > 000406b0
> > [0.007000] Call Trace:
> > [0.007000]  switch_mm+0xd/0x20
> > [0.007000]  ? switch_mm+0xd/0x20
> > [0.007000]  efi_switch_mm+0x3e/0x4a
> > [0.007000]  efi_call_phys_prolog+0x28/0x1ac
> > [0.007000]  efi_enter_virtual_mode+0x35a/0x48f
> > [0.007000]  start_kernel+0x332/0x3b8
> > [0.007000]  x86_64_start_reservations+0x2a/0x2c
> > [0.007000]  x86_64_start_kernel+0x178/0x18b
> > [0.007000]  secondary_startup_64+0xa5/0xa5
> > [0.007000]  ? secondary_startup_64+0xa5/0xa5
> > [0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9
> > 00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09
> > f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87 fb
> > 7e 89
> > [0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: 81e03d80
> > [0.007000] ---[ end trace bfa55bf4e4765255 ]---
> > [0.007000] Kernel panic - not syncing: Attempted to kill the idle task!
> > [0.007000] ---[ end Kernel panic - not syncing: Attempted to kill
> > the idle task!
> > 
> > 4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf
> > firmware and 64-bit x86 kernel) with your patches, the primary kernel
> > boots fine on Qemu:
> > 
> > ovmf firmware used in this case - edk2.git/ovmf-ia32
> > 
> > 5. Also, if I append 'efi=old_map' to the bootargs (for the failing
> > case in point 3 above), I see the primary kernel boots fine on Qemu as
> > well.
> > 
> > Regards,
> > Bhupesh
> 
> Hi Bhupesh,
> 
> Thanks a lot for the detailed explanation. They are helpful to reproduce
> the issue quickly. From my initial debug, I think that AMD SME +
> efi_mm_struct patches + -cpu host (in qemu) are required to reproduce
> the issue on qemu.
> 
> I have tried the following combinations (all tests are on qemu):
> On Linus's tree:
> 1. With  SME and  efi_mm and  -cpu host -> panics
> 2. With  SME and  efi_mm and !-cpu host -> boots
> 3. With  SME and !efi_mm and  -cpu host -> boots
> 4. With  SME and !efi_mm and !-cpu host -> boots
> 5. With !SME and  efi_mm and  -cpu host -> boots
> 6. With !SME and  efi_mm and !-cpu host -> boots
> 7. With !SME and !efi_mm and  -cpu host -> boots
> 8. With !SME and 

Re: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of manually

2017-09-05 Thread Sai Praneeth Prakhya

> I get a similar crash on Qemu with linus's master branch and the V2
> applied on top of it. Here are the details of my test environment:
> 
> 1. I use the OVMF (EDK2) EFI firmware to launch the kernel:
> edk2.git/ovmf-x64
> 
> 2. I used linus's master branch (HEAD - commit:
> b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top
> of the same.
> 
> 3. I use the following qemu command line to launch the test:
> 
> # /usr/local/bin/qemu-system-x86_64 --version
> QEMU emulator version 2.9.50 (v2.9.0-526-g76d20ea)
> Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
> 
> # /usr/local/bin/qemu-system-x86_64 -enable-kvm  -net nic -net tap  -m
> $MEMSIZE -nographic -drive file=$DISK_IMAGE,if=virtio,format=qcow2
> -vga std -boot c -cpu host -kernel $KERNEL -append
> "crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81"  -initrd
> $INITRAMFS -bios $OVMF_FW_PATH
> 
> And here is the crash log:
> 
> [0.006054] general protection fault:  [#1] SMP
> [0.006459] Modules linked in:
> [0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3
> [0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 0.0.0 02/06/2015
> [0.007000] task: 81e0f480 task.stack: 81e0
> [0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440
> [0.007000] RSP: :81e03d80 EFLAGS: 00010086
> [0.007000] RAX: 80007d084000 RBX:  RCX: 
> 77ff8000
> [0.007000] RDX: 7d084000 RSI: 8000 RDI: 
> 00019a00
> [0.007000] RBP: 81e03dc0 R08:  R09: 
> 88007d085000
> [0.007000] R10: 81e03dd8 R11: 7d095063 R12: 
> 81e5c6a0
> [0.007000] R13: 81ed4f40 R14: 0030 R15: 
> 0001
> [0.007000] FS:  () GS:88007d40()
> knlGS:
> [0.007000] CS:  0010 DS:  ES:  CR0: 80050033
> [0.007000] CR2: 88007d754000 CR3: 0220a000 CR4: 
> 000406b0
> [0.007000] Call Trace:
> [0.007000]  switch_mm+0xd/0x20
> [0.007000]  ? switch_mm+0xd/0x20
> [0.007000]  efi_switch_mm+0x3e/0x4a
> [0.007000]  efi_call_phys_prolog+0x28/0x1ac
> [0.007000]  efi_enter_virtual_mode+0x35a/0x48f
> [0.007000]  start_kernel+0x332/0x3b8
> [0.007000]  x86_64_start_reservations+0x2a/0x2c
> [0.007000]  x86_64_start_kernel+0x178/0x18b
> [0.007000]  secondary_startup_64+0xa5/0xa5
> [0.007000]  ? secondary_startup_64+0xa5/0xa5
> [0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9
> 00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09
> f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87 fb
> 7e 89
> [0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: 81e03d80
> [0.007000] ---[ end trace bfa55bf4e4765255 ]---
> [0.007000] Kernel panic - not syncing: Attempted to kill the idle task!
> [0.007000] ---[ end Kernel panic - not syncing: Attempted to kill
> the idle task!
> 
> 4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf
> firmware and 64-bit x86 kernel) with your patches, the primary kernel
> boots fine on Qemu:
> 
> ovmf firmware used in this case - edk2.git/ovmf-ia32
> 
> 5. Also, if I append 'efi=old_map' to the bootargs (for the failing
> case in point 3 above), I see the primary kernel boots fine on Qemu as
> well.
> 
> Regards,
> Bhupesh

Hi Bhupesh,

Thanks a lot for the detailed explanation. They are helpful to reproduce
the issue quickly. From my initial debug, I think that AMD SME +
efi_mm_struct patches + -cpu host (in qemu) are required to reproduce
the issue on qemu.

I have tried the following combinations (all tests are on qemu):
On Linus's tree:
1. With  SME and  efi_mm and  -cpu host -> panics
2. With  SME and  efi_mm and !-cpu host -> boots
3. With  SME and !efi_mm and  -cpu host -> boots
4. With  SME and !efi_mm and !-cpu host -> boots
5. With !SME and  efi_mm and  -cpu host -> boots
6. With !SME and  efi_mm and !-cpu host -> boots
7. With !SME and !efi_mm and  -cpu host -> boots
8. With !SME and !efi_mm and !-cpu host -> boots

On Matt's tree (no SME):
1. With  efi_mm and  -cpu host -> boots
2. With  efi_mm and !-cpu host -> boots
3. With !efi_mm and  -cpu host -> boots
4. With !efi_mm and !-cpu host -> boots

Summary:
On Matt's tree (next branch), I am unable to reproduce the issue because
they don't have SME patches.

On Linus's tree, with SME patches
(b1b6f83ac938d176742c85757960dec2cf10e468) and my patches and -cpu host
switch enabled in qemu, I was able to reproduce the issue.

Could you please confirm if you are seeing the same behavior?
Specially on real machines (I think, this is equivalent to -cpu host on
qemu) because in earlier mails you have mentioned that you were able to
reproduce this on Matt's tree, but according to my theory it shouldn't
be the case because Matt's three doesn't have SME patches.

Re: [PATCH V2 0/3] Use mm_struct and switch_mm() instead of manually

2017-09-05 Thread Sai Praneeth Prakhya

> I get a similar crash on Qemu with linus's master branch and the V2
> applied on top of it. Here are the details of my test environment:
> 
> 1. I use the OVMF (EDK2) EFI firmware to launch the kernel:
> edk2.git/ovmf-x64
> 
> 2. I used linus's master branch (HEAD - commit:
> b1b6f83ac938d176742c85757960dec2cf10e468) and applied your v2 on top
> of the same.
> 
> 3. I use the following qemu command line to launch the test:
> 
> # /usr/local/bin/qemu-system-x86_64 --version
> QEMU emulator version 2.9.50 (v2.9.0-526-g76d20ea)
> Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
> 
> # /usr/local/bin/qemu-system-x86_64 -enable-kvm  -net nic -net tap  -m
> $MEMSIZE -nographic -drive file=$DISK_IMAGE,if=virtio,format=qcow2
> -vga std -boot c -cpu host -kernel $KERNEL -append
> "crashkernel=$CRASH_MEMSIZE console=ttyS0,115200n81"  -initrd
> $INITRAMFS -bios $OVMF_FW_PATH
> 
> And here is the crash log:
> 
> [0.006054] general protection fault:  [#1] SMP
> [0.006459] Modules linked in:
> [0.006711] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3
> [0.007000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 0.0.0 02/06/2015
> [0.007000] task: 81e0f480 task.stack: 81e0
> [0.007000] RIP: 0010:switch_mm_irqs_off+0x1bc/0x440
> [0.007000] RSP: :81e03d80 EFLAGS: 00010086
> [0.007000] RAX: 80007d084000 RBX:  RCX: 
> 77ff8000
> [0.007000] RDX: 7d084000 RSI: 8000 RDI: 
> 00019a00
> [0.007000] RBP: 81e03dc0 R08:  R09: 
> 88007d085000
> [0.007000] R10: 81e03dd8 R11: 7d095063 R12: 
> 81e5c6a0
> [0.007000] R13: 81ed4f40 R14: 0030 R15: 
> 0001
> [0.007000] FS:  () GS:88007d40()
> knlGS:
> [0.007000] CS:  0010 DS:  ES:  CR0: 80050033
> [0.007000] CR2: 88007d754000 CR3: 0220a000 CR4: 
> 000406b0
> [0.007000] Call Trace:
> [0.007000]  switch_mm+0xd/0x20
> [0.007000]  ? switch_mm+0xd/0x20
> [0.007000]  efi_switch_mm+0x3e/0x4a
> [0.007000]  efi_call_phys_prolog+0x28/0x1ac
> [0.007000]  efi_enter_virtual_mode+0x35a/0x48f
> [0.007000]  start_kernel+0x332/0x3b8
> [0.007000]  x86_64_start_reservations+0x2a/0x2c
> [0.007000]  x86_64_start_kernel+0x178/0x18b
> [0.007000]  secondary_startup_64+0xa5/0xa5
> [0.007000]  ? secondary_startup_64+0xa5/0xa5
> [0.007000] Code: 00 00 00 80 49 03 55 50 0f 82 7f 02 00 00 48 b9
> 00 00 00 80 ff 77 00 00 48 be 00 00 00 00 00 00 00 80 48 01 ca 48 09
> f0 48 09 d0 <0f> 22 d8 0f 1f 44 00 00 e9 47 ff ff ff 65 8b 05 b8 87 fb
> 7e 89
> [0.007000] RIP: switch_mm_irqs_off+0x1bc/0x440 RSP: 81e03d80
> [0.007000] ---[ end trace bfa55bf4e4765255 ]---
> [0.007000] Kernel panic - not syncing: Attempted to kill the idle task!
> [0.007000] ---[ end Kernel panic - not syncing: Attempted to kill
> the idle task!
> 
> 4. Note though that if I use the EFI_MIXED mode (i.e. 32-bit ovmf
> firmware and 64-bit x86 kernel) with your patches, the primary kernel
> boots fine on Qemu:
> 
> ovmf firmware used in this case - edk2.git/ovmf-ia32
> 
> 5. Also, if I append 'efi=old_map' to the bootargs (for the failing
> case in point 3 above), I see the primary kernel boots fine on Qemu as
> well.
> 
> Regards,
> Bhupesh

Hi Bhupesh,

Thanks a lot for the detailed explanation. They are helpful to reproduce
the issue quickly. From my initial debug, I think that AMD SME +
efi_mm_struct patches + -cpu host (in qemu) are required to reproduce
the issue on qemu.

I have tried the following combinations (all tests are on qemu):
On Linus's tree:
1. With  SME and  efi_mm and  -cpu host -> panics
2. With  SME and  efi_mm and !-cpu host -> boots
3. With  SME and !efi_mm and  -cpu host -> boots
4. With  SME and !efi_mm and !-cpu host -> boots
5. With !SME and  efi_mm and  -cpu host -> boots
6. With !SME and  efi_mm and !-cpu host -> boots
7. With !SME and !efi_mm and  -cpu host -> boots
8. With !SME and !efi_mm and !-cpu host -> boots

On Matt's tree (no SME):
1. With  efi_mm and  -cpu host -> boots
2. With  efi_mm and !-cpu host -> boots
3. With !efi_mm and  -cpu host -> boots
4. With !efi_mm and !-cpu host -> boots

Summary:
On Matt's tree (next branch), I am unable to reproduce the issue because
they don't have SME patches.

On Linus's tree, with SME patches
(b1b6f83ac938d176742c85757960dec2cf10e468) and my patches and -cpu host
switch enabled in qemu, I was able to reproduce the issue.

Could you please confirm if you are seeing the same behavior?
Specially on real machines (I think, this is equivalent to -cpu host on
qemu) because in earlier mails you have mentioned that you were able to
reproduce this on Matt's tree, but according to my theory it shouldn't
be the case because Matt's three doesn't have SME patches.

[PATCH V2 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

2017-08-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Use helper function (efi_switch_mm()) to switch to/from efi_mm. We
switch to efi_mm before calling
1. efi_set_virtual_address_map() and
2. Invoking any efi_runtime_service()

Likewise, we need to switch back to previous mm (mm context stolen by
efi_mm) after the above calls return successfully. We can use
efi_switch_mm() helper function only with x86_64 kernel and
"efi=old_map" disabled because, x86_32 and efi=old_map doesn't use
efi_pgd, rather they use swapper_pg_dir.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
---
 arch/x86/include/asm/efi.h   | 29 ++---
 arch/x86/platform/efi/efi_64.c   | 36 +---
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 3 files changed, 36 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 2f77bcefe6b4..23b2137a95e5 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -1,10 +1,14 @@
 #ifndef _ASM_X86_EFI_H
 #define _ASM_X86_EFI_H
 
+#include 
+#include 
+
 #include 
 #include 
 #include 
 #include 
+#include 
 
 /*
  * We map the EFI regions needed for runtime services non-contiguously,
@@ -57,14 +61,13 @@ extern u64 asmlinkage efi_call(void *fp, ...);
 #define efi_call_phys(f, args...)  efi_call((f), args)
 
 /*
- * Scratch space used for switching the pagetable in the EFI stub
+ * struct efi_scratch - Scratch space used while switching to/from efi_mm
+ * @phys_stack: stack used during EFI Mixed Mode
+ * @prev_mm:store/restore stolen mm_struct while switching to/from efi_mm
  */
 struct efi_scratch {
-   u64 r15;
-   u64 prev_cr3;
-   pgd_t   *efi_pgt;
-   booluse_pgd;
-   u64 phys_stack;
+   u64 phys_stack;
+   struct mm_struct*prev_mm;
 } __packed;
 
 #define arch_efi_call_virt_setup() \
@@ -73,11 +76,8 @@ struct efi_scratch {
preempt_disable();  \
__kernel_fpu_begin();   \
\
-   if (efi_scratch.use_pgd) {  \
-   efi_scratch.prev_cr3 = read_cr3();  \
-   write_cr3((unsigned long)efi_scratch.efi_pgt);  \
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(_mm); \
 })
 
 #define arch_efi_call_virt(p, f, args...)  \
@@ -85,10 +85,8 @@ struct efi_scratch {
 
 #define arch_efi_call_virt_teardown()  \
 ({ \
-   if (efi_scratch.use_pgd) {  \
-   write_cr3(efi_scratch.prev_cr3);\
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(efi_scratch.prev_mm); \
\
__kernel_fpu_end(); \
preempt_enable();   \
@@ -130,6 +128,7 @@ extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
 extern void efi_delete_dummy_variable(void);
+extern void efi_switch_mm(struct mm_struct *mm);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 0bb98c35e178..e0545f56d703 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -80,9 +80,8 @@ pgd_t * __init efi_call_phys_prolog(void)
int n_pgds, i, j;
 
if (!efi_enabled(EFI_OLD_MEMMAP)) {
-   save_pgd = (pgd_t *)read_cr3();
-   write_cr3((unsigned long)efi_scratch.efi_pgt);
-   goto out;
+   efi_switch_mm(_mm);
+   return NULL;

[PATCH V2 1/3] efi: Use efi_mm in x86 as well as ARM

2017-08-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Presently, only ARM uses mm_struct to manage efi page tables and efi
runtime region mappings. As this is the preferred approach, let's make
this data structure common across architectures. Specially, for
x86, using this data structure improves code maintainability and
readability.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
---
 drivers/firmware/efi/arm-runtime.c | 9 -
 drivers/firmware/efi/efi.c | 9 +
 include/linux/efi.h| 2 ++
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 1cc41c3d6315..d6b26534812b 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -31,15 +31,6 @@
 
 extern u64 efi_system_table;
 
-static struct mm_struct efi_mm = {
-   .mm_rb  = RB_ROOT,
-   .mm_users   = ATOMIC_INIT(2),
-   .mm_count   = ATOMIC_INIT(1),
-   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
-   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
-   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
-};
-
 #ifdef CONFIG_ARM64_PTDUMP_DEBUGFS
 #include 
 
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index b372aad3b449..3abbb25602bc 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -55,6 +55,15 @@ struct efi __read_mostly efi = {
 };
 EXPORT_SYMBOL(efi);
 
+struct mm_struct efi_mm = {
+   .mm_rb  = RB_ROOT,
+   .mm_users   = ATOMIC_INIT(2),
+   .mm_count   = ATOMIC_INIT(1),
+   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
+   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
+   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
+};
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 8269bcb8ccf7..d1f261d2ce69 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -927,6 +927,8 @@ extern struct efi {
unsigned long flags;
 } efi;
 
+extern struct mm_struct efi_mm;
+
 static inline int
 efi_guidcmp (efi_guid_t left, efi_guid_t right)
 {
-- 
2.1.4



[PATCH V2 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

2017-08-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Use helper function (efi_switch_mm()) to switch to/from efi_mm. We
switch to efi_mm before calling
1. efi_set_virtual_address_map() and
2. Invoking any efi_runtime_service()

Likewise, we need to switch back to previous mm (mm context stolen by
efi_mm) after the above calls return successfully. We can use
efi_switch_mm() helper function only with x86_64 kernel and
"efi=old_map" disabled because, x86_32 and efi=old_map doesn't use
efi_pgd, rather they use swapper_pg_dir.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Andy Lutomirski 
Cc: Michael S. Tsirkin 
Cc: Ricardo Neri 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: Ravi Shankar 
---
 arch/x86/include/asm/efi.h   | 29 ++---
 arch/x86/platform/efi/efi_64.c   | 36 +---
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 3 files changed, 36 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 2f77bcefe6b4..23b2137a95e5 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -1,10 +1,14 @@
 #ifndef _ASM_X86_EFI_H
 #define _ASM_X86_EFI_H
 
+#include 
+#include 
+
 #include 
 #include 
 #include 
 #include 
+#include 
 
 /*
  * We map the EFI regions needed for runtime services non-contiguously,
@@ -57,14 +61,13 @@ extern u64 asmlinkage efi_call(void *fp, ...);
 #define efi_call_phys(f, args...)  efi_call((f), args)
 
 /*
- * Scratch space used for switching the pagetable in the EFI stub
+ * struct efi_scratch - Scratch space used while switching to/from efi_mm
+ * @phys_stack: stack used during EFI Mixed Mode
+ * @prev_mm:store/restore stolen mm_struct while switching to/from efi_mm
  */
 struct efi_scratch {
-   u64 r15;
-   u64 prev_cr3;
-   pgd_t   *efi_pgt;
-   booluse_pgd;
-   u64 phys_stack;
+   u64 phys_stack;
+   struct mm_struct*prev_mm;
 } __packed;
 
 #define arch_efi_call_virt_setup() \
@@ -73,11 +76,8 @@ struct efi_scratch {
preempt_disable();  \
__kernel_fpu_begin();   \
\
-   if (efi_scratch.use_pgd) {  \
-   efi_scratch.prev_cr3 = read_cr3();  \
-   write_cr3((unsigned long)efi_scratch.efi_pgt);  \
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(_mm); \
 })
 
 #define arch_efi_call_virt(p, f, args...)  \
@@ -85,10 +85,8 @@ struct efi_scratch {
 
 #define arch_efi_call_virt_teardown()  \
 ({ \
-   if (efi_scratch.use_pgd) {  \
-   write_cr3(efi_scratch.prev_cr3);\
-   __flush_tlb_all();  \
-   }   \
+   if (!efi_enabled(EFI_OLD_MEMMAP))   \
+   efi_switch_mm(efi_scratch.prev_mm); \
\
__kernel_fpu_end(); \
preempt_enable();   \
@@ -130,6 +128,7 @@ extern void __init efi_dump_pagetable(void);
 extern void __init efi_apply_memmap_quirks(void);
 extern int __init efi_reuse_config(u64 tables, int nr_tables);
 extern void efi_delete_dummy_variable(void);
+extern void efi_switch_mm(struct mm_struct *mm);
 
 struct efi_setup_data {
u64 fw_vendor;
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 0bb98c35e178..e0545f56d703 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -80,9 +80,8 @@ pgd_t * __init efi_call_phys_prolog(void)
int n_pgds, i, j;
 
if (!efi_enabled(EFI_OLD_MEMMAP)) {
-   save_pgd = (pgd_t *)read_cr3();
-   write_cr3((unsigned long)efi_scratch.efi_pgt);
-   goto out;
+   efi_switch_mm(_mm);
+   return NULL;
}
 
early_code_mapping_set_exec(1);
@@ -152,8 +151,7 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
pud_t *pud;
 
if (!efi_enabled(EFI_OLD_MEMMAP)) {
-   write_cr3((unsigned long)save_pgd);
-   __flush_tlb_all();
+   efi_switch_mm(efi_scrat

[PATCH V2 1/3] efi: Use efi_mm in x86 as well as ARM

2017-08-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, only ARM uses mm_struct to manage efi page tables and efi
runtime region mappings. As this is the preferred approach, let's make
this data structure common across architectures. Specially, for
x86, using this data structure improves code maintainability and
readability.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Andy Lutomirski 
Cc: Michael S. Tsirkin 
Cc: Ricardo Neri 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: Ravi Shankar 
---
 drivers/firmware/efi/arm-runtime.c | 9 -
 drivers/firmware/efi/efi.c | 9 +
 include/linux/efi.h| 2 ++
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/firmware/efi/arm-runtime.c 
b/drivers/firmware/efi/arm-runtime.c
index 1cc41c3d6315..d6b26534812b 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -31,15 +31,6 @@
 
 extern u64 efi_system_table;
 
-static struct mm_struct efi_mm = {
-   .mm_rb  = RB_ROOT,
-   .mm_users   = ATOMIC_INIT(2),
-   .mm_count   = ATOMIC_INIT(1),
-   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
-   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
-   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
-};
-
 #ifdef CONFIG_ARM64_PTDUMP_DEBUGFS
 #include 
 
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index b372aad3b449..3abbb25602bc 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -55,6 +55,15 @@ struct efi __read_mostly efi = {
 };
 EXPORT_SYMBOL(efi);
 
+struct mm_struct efi_mm = {
+   .mm_rb  = RB_ROOT,
+   .mm_users   = ATOMIC_INIT(2),
+   .mm_count   = ATOMIC_INIT(1),
+   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
+   .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
+   .mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
+};
+
 static bool disable_runtime;
 static int __init setup_noefi(char *arg)
 {
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 8269bcb8ccf7..d1f261d2ce69 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -927,6 +927,8 @@ extern struct efi {
unsigned long flags;
 } efi;
 
+extern struct mm_struct efi_mm;
+
 static inline int
 efi_guidcmp (efi_guid_t left, efi_guid_t right)
 {
-- 
2.1.4



[PATCH V2 0/3] Use mm_struct and switch_mm() instead of manually

2017-08-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, in x86, to invoke any efi function like
efi_set_virtual_address_map() or any efi_runtime_service() the code path
typically involves read_cr3() (save previous pgd), write_cr3()
(write efi_pgd) and calling efi function. Likewise after returning from
efi function the code path typically involves read_cr3() (save efi_pgd),
write_cr3() (write previous pgd). We do this couple of times in efi
subsystem of Linux kernel, instead we can use helper function
efi_switch_mm() to do this. This improves readability and maintainability.
Also, instead of maintaining a separate struct "efi_scratch" to store/restore
efi_pgd, we can use mm_struct to do this.

I have tested this patch set against LUV (Linux UEFI Validation), so I
think I didn't break any existing configurations. I have tested this
patch set for
1. x86_64,
2. x86_32,
3. Mixed mode
with efi=old_map and for kexec kernel. Please let me know if I have
missed any other configurations.

Changes in V2:
1. Resolve mm_dropping() issue by not mm_dropping()/mm_grabbing() any mm,
as we are not losing/creating any references.

Sai Praneeth (3):
  efi: Use efi_mm in x86 as well as ARM
  x86/efi: Replace efi_pgd with efi_mm.pgd
  x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

 arch/x86/include/asm/efi.h   | 29 ++--
 arch/x86/platform/efi/efi_64.c   | 52 
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 drivers/firmware/efi/arm-runtime.c   |  9 ---
 drivers/firmware/efi/efi.c   |  9 +++
 include/linux/efi.h  |  2 ++
 6 files changed, 55 insertions(+), 48 deletions(-)

-- 
2.1.4



[PATCH V2 0/3] Use mm_struct and switch_mm() instead of manually

2017-08-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Presently, in x86, to invoke any efi function like
efi_set_virtual_address_map() or any efi_runtime_service() the code path
typically involves read_cr3() (save previous pgd), write_cr3()
(write efi_pgd) and calling efi function. Likewise after returning from
efi function the code path typically involves read_cr3() (save efi_pgd),
write_cr3() (write previous pgd). We do this couple of times in efi
subsystem of Linux kernel, instead we can use helper function
efi_switch_mm() to do this. This improves readability and maintainability.
Also, instead of maintaining a separate struct "efi_scratch" to store/restore
efi_pgd, we can use mm_struct to do this.

I have tested this patch set against LUV (Linux UEFI Validation), so I
think I didn't break any existing configurations. I have tested this
patch set for
1. x86_64,
2. x86_32,
3. Mixed mode
with efi=old_map and for kexec kernel. Please let me know if I have
missed any other configurations.

Changes in V2:
1. Resolve mm_dropping() issue by not mm_dropping()/mm_grabbing() any mm,
as we are not losing/creating any references.

Sai Praneeth (3):
  efi: Use efi_mm in x86 as well as ARM
  x86/efi: Replace efi_pgd with efi_mm.pgd
  x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3

 arch/x86/include/asm/efi.h   | 29 ++--
 arch/x86/platform/efi/efi_64.c   | 52 
 arch/x86/platform/efi/efi_thunk_64.S |  2 +-
 drivers/firmware/efi/arm-runtime.c   |  9 ---
 drivers/firmware/efi/efi.c   |  9 +++
 include/linux/efi.h  |  2 ++
 6 files changed, 55 insertions(+), 48 deletions(-)

-- 
2.1.4



[PATCH V2 2/3] x86/efi: Replace efi_pgd with efi_mm.pgd

2017-08-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth <sai.praneeth.prak...@intel.com>

Since the previous patch added support for efi_mm, let's handle efi_pgd
through efi_mm and remove global variable efi_pgd.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prak...@intel.com>
Cc: Lee, Chun-Yi <j...@suse.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Tony Luck <tony.l...@intel.com>
Cc: Andy Lutomirski <l...@kernel.org>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Ricardo Neri <ricardo.n...@intel.com>
Cc: Matt Fleming <m...@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheu...@linaro.org>
Cc: Ravi Shankar <ravi.v.shan...@intel.com>
---
 arch/x86/platform/efi/efi_64.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 8ff1f95627f9..0bb98c35e178 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -187,8 +187,6 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
early_code_mapping_set_exec(0);
 }
 
-static pgd_t *efi_pgd;
-
 /*
  * We need our own copy of the higher levels of the page tables
  * because we want to avoid inserting EFI region mappings (EFI_VA_END
@@ -197,7 +195,7 @@ static pgd_t *efi_pgd;
  */
 int __init efi_alloc_page_tables(void)
 {
-   pgd_t *pgd;
+   pgd_t *pgd, *efi_pgd;
p4d_t *p4d;
pud_t *pud;
gfp_t gfp_mask;
@@ -225,6 +223,8 @@ int __init efi_alloc_page_tables(void)
return -ENOMEM;
}
 
+   efi_mm.pgd = efi_pgd;
+
return 0;
 }
 
@@ -237,6 +237,7 @@ void efi_sync_low_kernel_mappings(void)
pgd_t *pgd_k, *pgd_efi;
p4d_t *p4d_k, *p4d_efi;
pud_t *pud_k, *pud_efi;
+   pgd_t *efi_pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return;
@@ -330,13 +331,12 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
unsigned long pfn, text;
struct page *page;
unsigned npages;
-   pgd_t *pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
 
-   efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
-   pgd = efi_pgd;
+   efi_scratch.efi_pgt = (pgd_t *)__pa(pgd);
 
/*
 * It can happen that the physical address of new_memmap lands in memory
@@ -400,7 +400,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
 {
unsigned long flags = _PAGE_RW;
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
@@ -501,7 +501,7 @@ void __init parse_efi_setup(u64 phys_addr, u32 data_len)
 static int __init efi_update_mappings(efi_memory_desc_t *md, unsigned long pf)
 {
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
int err1, err2;
 
/* Update the 1:1 mapping */
@@ -592,7 +592,7 @@ void __init efi_dump_pagetable(void)
if (efi_enabled(EFI_OLD_MEMMAP))
ptdump_walk_pgd_level(NULL, swapper_pg_dir);
else
-   ptdump_walk_pgd_level(NULL, efi_pgd);
+   ptdump_walk_pgd_level(NULL, efi_mm.pgd);
 #endif
 }
 
-- 
2.1.4



[PATCH V2 2/3] x86/efi: Replace efi_pgd with efi_mm.pgd

2017-08-28 Thread Sai Praneeth Prakhya
From: Sai Praneeth 

Since the previous patch added support for efi_mm, let's handle efi_pgd
through efi_mm and remove global variable efi_pgd.

Signed-off-by: Sai Praneeth Prakhya 
Cc: Lee, Chun-Yi 
Cc: Borislav Petkov 
Cc: Tony Luck 
Cc: Andy Lutomirski 
Cc: Michael S. Tsirkin 
Cc: Ricardo Neri 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: Ravi Shankar 
---
 arch/x86/platform/efi/efi_64.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 8ff1f95627f9..0bb98c35e178 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -187,8 +187,6 @@ void __init efi_call_phys_epilog(pgd_t *save_pgd)
early_code_mapping_set_exec(0);
 }
 
-static pgd_t *efi_pgd;
-
 /*
  * We need our own copy of the higher levels of the page tables
  * because we want to avoid inserting EFI region mappings (EFI_VA_END
@@ -197,7 +195,7 @@ static pgd_t *efi_pgd;
  */
 int __init efi_alloc_page_tables(void)
 {
-   pgd_t *pgd;
+   pgd_t *pgd, *efi_pgd;
p4d_t *p4d;
pud_t *pud;
gfp_t gfp_mask;
@@ -225,6 +223,8 @@ int __init efi_alloc_page_tables(void)
return -ENOMEM;
}
 
+   efi_mm.pgd = efi_pgd;
+
return 0;
 }
 
@@ -237,6 +237,7 @@ void efi_sync_low_kernel_mappings(void)
pgd_t *pgd_k, *pgd_efi;
p4d_t *p4d_k, *p4d_efi;
pud_t *pud_k, *pud_efi;
+   pgd_t *efi_pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return;
@@ -330,13 +331,12 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
unsigned long pfn, text;
struct page *page;
unsigned npages;
-   pgd_t *pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
 
-   efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
-   pgd = efi_pgd;
+   efi_scratch.efi_pgt = (pgd_t *)__pa(pgd);
 
/*
 * It can happen that the physical address of new_memmap lands in memory
@@ -400,7 +400,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
 {
unsigned long flags = _PAGE_RW;
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
 
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
@@ -501,7 +501,7 @@ void __init parse_efi_setup(u64 phys_addr, u32 data_len)
 static int __init efi_update_mappings(efi_memory_desc_t *md, unsigned long pf)
 {
unsigned long pfn;
-   pgd_t *pgd = efi_pgd;
+   pgd_t *pgd = efi_mm.pgd;
int err1, err2;
 
/* Update the 1:1 mapping */
@@ -592,7 +592,7 @@ void __init efi_dump_pagetable(void)
if (efi_enabled(EFI_OLD_MEMMAP))
ptdump_walk_pgd_level(NULL, swapper_pg_dir);
else
-   ptdump_walk_pgd_level(NULL, efi_pgd);
+   ptdump_walk_pgd_level(NULL, efi_mm.pgd);
 #endif
 }
 
-- 
2.1.4



  1   2   >