Re: [PATCH 3/3] resource: Fix find_next_iomem_res() iteration issue

2018-09-27 Thread lijiang


在 2018年09月27日 22:03, Bjorn Helgaas 写道:
> On Thu, Sep 27, 2018 at 01:27:41PM +0800, lijiang wrote:
>> 在 2018年09月25日 06:15, Bjorn Helgaas 写道:
>>> From: Bjorn Helgaas 
>>>
>>> Previously find_next_iomem_res() used "*res" as both an input parameter for
>>> the range to search and the type of resource to search for, and an output
>>> parameter for the resource we found, which makes the interface confusing
>>> and hard to use correctly.
>>>
>>> All callers allocate a single struct resource and use it for repeated calls
>>> to find_next_iomem_res().  When find_next_iomem_res() returns a resource,
>>> it overwrites the start, end, flags, and desc members of the struct.  If we
>>> call find_next_iomem_res() again, we must update or restore these fields.
>>>
>>> The callers (__walk_iomem_res_desc() and walk_system_ram_range()) do not
>>> restore res->flags, so if the caller is searching for flags of
>>> IORESOURCE_MEM | IORESOURCE_BUSY and finds a resource with flags of
>>> IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search will
>>> find only resources marked as IORESOURCE_SYSRAM.
>>>
>>> Fix this by restructuring the interface so it takes explicit "start, end,
>>> flags" parameters and uses "*res" only as an output parameter.
>>
>> Hi, Bjorn
>> I personally suggest that some comments might be added in the code, make it 
>> clear
>> and easy to understand, then which could avoid the old confusion and more 
>> code changes.
> 
> Since I think the current interface (using *res as both input and
> output parameters that have very different meanings) is confusing,
> it's hard for *me* to write comments that make it less confusing, but
> of course, you're welcome to propose something.
> 
> My opinion (probably not universally shared) is that my proposal would
> make the code more readable, and it's worth doing even though the diff
> is larger.
> 
> Anyway, I'll post these patches independently and see if anybody else
> has an opinion.
> 

I don't mind at all, because that is just my own opinion about more code
changes.

Anyway, thank you to improve this patch.

Regards,
Lianbo

> Bjorn
> 
>>> Original-patch: 
>>> http://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com
>>> Based-on-patch-by: Lianbo Jiang 
>>> Signed-off-by: Bjorn Helgaas 
>>> ---
>>>  kernel/resource.c |   94 
>>> +++--
>>>  1 file changed, 41 insertions(+), 53 deletions(-)
>>>
>>> diff --git a/kernel/resource.c b/kernel/resource.c
>>> index 155ec873ea4d..9891ea90cc8f 100644
>>> --- a/kernel/resource.c
>>> +++ b/kernel/resource.c
>>> @@ -319,23 +319,26 @@ int release_resource(struct resource *old)
>>>  EXPORT_SYMBOL(release_resource);
>>>  
>>>  /*
>>> - * Finds the lowest iomem resource existing within [res->start..res->end].
>>> - * The caller must specify res->start, res->end, res->flags, and optionally
>>> - * desc.  If found, returns 0, res is overwritten, if not found, returns 
>>> -1.
>>> - * This function walks the whole tree and not just first level children 
>>> until
>>> - * and unless first_level_children_only is true.
>>> + * Finds the lowest iomem resource that covers part of [start..end].  The
>>> + * caller must specify start, end, flags, and desc (which may be
>>> + * IORES_DESC_NONE).
>>> + *
>>> + * If a resource is found, returns 0 and *res is overwritten with the part
>>> + * of the resource that's within [start..end]; if none is found, returns
>>> + * -1.
>>> + *
>>> + * This function walks the whole tree and not just first level children
>>> + * unless first_level_children_only is true.
>>>   */
>>> -static int find_next_iomem_res(struct resource *res, unsigned long desc,
>>> -  bool first_level_children_only)
>>> +static int find_next_iomem_res(resource_size_t start, resource_size_t end,
>>> +  unsigned long flags, unsigned long desc,
>>> +  bool first_level_children_only,
>>> +  struct resource *res)
>>>  {
>>> -   resource_size_t start, end;
>>> struct resource *p;
>>> bool sibling_only = false;
>>>  
>>> BUG_ON(!res);
>>> -
>>> -   start = res->start;
>>> -   end = res->end;
>>> BUG_ON(start >= end);
>>>  
>>> if (first_level_children_only)
>>> @@ -344,7 +347,7 @@ static int find_next_iomem_res(struct resource *res, 
>>> unsigned long desc,
>>> read_lock(_lock);
>>>  
>>> for (p = iomem_resource.child; p; p = next_resource(p, sibling_only)) {
>>> -   if ((p->flags & res->flags) != res->flags)
>>> +   if ((p->flags & flags) != flags)
>>> continue;
>>> if ((desc != IORES_DESC_NONE) && (desc != p->desc))
>>> continue;
>>> @@ -359,32 +362,31 @@ static int find_next_iomem_res(struct resource *res, 
>>> unsigned long desc,
>>> read_unlock(_lock);
>>> if (!p)
>>> return -1;
>>> +
>>> /* copy data */
>>> -   if (res->start < p->start)
>>> -   

Re: [PATCH v7 RESEND 2/4] kexec: allocate unencrypted control pages for kdump in case SME is enabled

2018-09-27 Thread lijiang
在 2018年09月28日 00:53, Borislav Petkov 写道:
> On Thu, Sep 27, 2018 at 03:19:52PM +0800, Lianbo Jiang wrote:
>> When SME is enabled in the first kernel, we will allocate unencrypted pages
>> for kdump in order to be able to boot the kdump kernel like kexec.
> 
> This is not what the commit does - it marks the control pages as
> decrypted when SME. Why doesn't the commit message state that and why is
> this being done?
> 
Ok, i will improve this patch log.

If SME is active, we need to mark the control pages as decrypted, because
when we boot to the kdump kernel, these pages won't be accessed encrypted
at the initial stage, in order to boot the kdump kernel in the same manner
as originally booted.

>> Signed-off-by: Lianbo Jiang 
>> Reviewed-by: Tom Lendacky 
>> ---
>>  kernel/kexec_core.c | 12 
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
>> index 23a83a4da38a..e7efcd1a977b 100644
>> --- a/kernel/kexec_core.c
>> +++ b/kernel/kexec_core.c
>> @@ -471,6 +471,16 @@ static struct page 
>> *kimage_alloc_crash_control_pages(struct kimage *image,
>>  }
>>  }
>>  
>> +if (pages) {
>> +/*
>> + * For kdump, we need to ensure that these pages are
>> + * unencrypted pages if SME is enabled.
> 
> Remember to always call unencrypted pages "decrypted" - this is the
> convention we agreed upon and it should keep the confusion level at
> minimum to others staring at this code.
> 
Ok, i will improve this comment.

>> + * By the way, it is unnecessary to call the arch_
>> + * kexec_pre_free_pages(), which will make the code
>> + * become more simple.
>> + */
> 
> This second sentence I don't understand...
> 

There are two functions that are usually called in pairs, they are:
arch_kexec_post_alloc_pages() and arch_kexec_pre_free_pages().

One marks the pages as decrypted, another one marks the pages as encrypted.

But for the crash control pages, no need to call arch_kexec_pre_free_pages(),
there are three reasons:
1. Crash pages are reserved in memblock, these pages are only used by kdump,
   no other people uses these pages;

2. Whenever crash pages are allocated, these pages are always marked as
   decrypted(when SME is active);

3. If we plan to call the arch_kexe_pre_free_pages(), we have to store these
pages to somewhere, which will have more code changes.

So, here it is redundant to call the arch_kexe_pre_free_pages().

Thanks.
Lianbo
>> +arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
>> +}
>>  return pages;
>>  }
>>  
>> @@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage 
>> *image,
>>  result  = -ENOMEM;
>>  goto out;
>>  }
>> +arch_kexec_post_alloc_pages(page_address(page), 1, 0);
>>  ptr = kmap(page);
>>  ptr += maddr & ~PAGE_MASK;
>>  mchunk = min_t(size_t, mbytes,
>> @@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage 
>> *image,
>>  result = copy_from_user(ptr, buf, uchunk);
>>  kexec_flush_icache_page(page);
>>  kunmap(page);
>> +arch_kexec_pre_free_pages(page_address(page), 1);
>>  if (result) {
>>  result = -EFAULT;
>>  goto out;
>> -- 
>> 2.17.1
>>
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-27 Thread lijiang
在 2018年09月28日 00:10, Borislav Petkov 写道:
> On Thu, Sep 27, 2018 at 10:53:47PM +0800, lijiang wrote:
>> If no need to break this line, it will cause a warning of exceeding 80 
>> characters per line.
> 
> That's fine - we don't take the 80 cols rule blindly but apply common
> sense. In this particular case the lines can stick out because they're
> simply externs and are meant to be skimmed over and not really read. :)
> 
Ok, i see. Thanks.

>> Thank you for pointing out this issue, i forgot to remove this header file. 
>>
>> I will get rid of this header file and post this patch again.
> 
> No need - already fixed that up.
> 
Great, thank you so much for your help.

Regards,
Lianbo
> Thx.
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 RESEND 2/4] kexec: allocate unencrypted control pages for kdump in case SME is enabled

2018-09-27 Thread Borislav Petkov
On Thu, Sep 27, 2018 at 03:19:52PM +0800, Lianbo Jiang wrote:
> When SME is enabled in the first kernel, we will allocate unencrypted pages
> for kdump in order to be able to boot the kdump kernel like kexec.

This is not what the commit does - it marks the control pages as
decrypted when SME. Why doesn't the commit message state that and why is
this being done?

> Signed-off-by: Lianbo Jiang 
> Reviewed-by: Tom Lendacky 
> ---
>  kernel/kexec_core.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 23a83a4da38a..e7efcd1a977b 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -471,6 +471,16 @@ static struct page 
> *kimage_alloc_crash_control_pages(struct kimage *image,
>   }
>   }
>  
> + if (pages) {
> + /*
> +  * For kdump, we need to ensure that these pages are
> +  * unencrypted pages if SME is enabled.

Remember to always call unencrypted pages "decrypted" - this is the
convention we agreed upon and it should keep the confusion level at
minimum to others staring at this code.

> +  * By the way, it is unnecessary to call the arch_
> +  * kexec_pre_free_pages(), which will make the code
> +  * become more simple.
> +  */

This second sentence I don't understand...

> + arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
> + }
>   return pages;
>  }
>  
> @@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage *image,
>   result  = -ENOMEM;
>   goto out;
>   }
> + arch_kexec_post_alloc_pages(page_address(page), 1, 0);
>   ptr = kmap(page);
>   ptr += maddr & ~PAGE_MASK;
>   mchunk = min_t(size_t, mbytes,
> @@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image,
>   result = copy_from_user(ptr, buf, uchunk);
>   kexec_flush_icache_page(page);
>   kunmap(page);
> + arch_kexec_pre_free_pages(page_address(page), 1);
>   if (result) {
>   result = -EFAULT;
>   goto out;
> -- 
> 2.17.1
> 

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-27 Thread Borislav Petkov
On Thu, Sep 27, 2018 at 10:53:47PM +0800, lijiang wrote:
> If no need to break this line, it will cause a warning of exceeding 80 
> characters per line.

That's fine - we don't take the 80 cols rule blindly but apply common
sense. In this particular case the lines can stick out because they're
simply externs and are meant to be skimmed over and not really read. :)

> Thank you for pointing out this issue, i forgot to remove this header file. 
> 
> I will get rid of this header file and post this patch again.

No need - already fixed that up.

Thx.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-27 Thread lijiang
在 2018年09月27日 21:17, Borislav Petkov 写道:
> On Thu, Sep 27, 2018 at 03:19:51PM +0800, Lianbo Jiang wrote:
>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>> index 6de64840dd22..f8795f9581c7 100644
>> --- a/arch/x86/include/asm/io.h
>> +++ b/arch/x86/include/asm/io.h
>> @@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t 
>> offset, unsigned long size);
>>  #define ioremap_cache ioremap_cache
>>  extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long 
>> size, unsigned long prot_val);
>>  #define ioremap_prot ioremap_prot
>> +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
>> +unsigned long size);
> 
> No need to break this line - see how the other externs don't.
> 
Ok, i will fix it. 

If no need to break this line, it will cause a warning of exceeding 80 
characters per line.

>> +#define ioremap_encrypted ioremap_encrypted
>>  
>>  /**
>>   * ioremap -   map bus memory into CPU space
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index c63a545ec199..e01e6c695add 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -24,6 +24,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
> 
> What is that include for and why is it not up there with the  includes instead here with the  ones?
> 
Thank you for pointing out this issue, i forgot to remove this header file. 

I will get rid of this header file and post this patch again.

Regards,
Lianbo

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 3/3] resource: Fix find_next_iomem_res() iteration issue

2018-09-27 Thread Bjorn Helgaas
From: Bjorn Helgaas 

Previously find_next_iomem_res() used "*res" as both an input parameter for
the range to search and the type of resource to search for, and an output
parameter for the resource we found, which makes the interface confusing.

The current callers use find_next_iomem_res() incorrectly because they
allocate a single struct resource and use it for repeated calls to
find_next_iomem_res().  When find_next_iomem_res() returns a resource, it
overwrites the start, end, flags, and desc members of the struct.  If we
call find_next_iomem_res() again, we must update or restore these fields.
The previous code restored res.start and res.end, but not res.flags or
res.desc.

Since the callers did not restore res.flags, if they searched for flags
IORESOURCE_MEM | IORESOURCE_BUSY and found a resource with flags
IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search would
incorrectly skip resources unless they were also marked as
IORESOURCE_SYSRAM.

Fix this by restructuring the interface so it takes explicit "start, end,
flags" parameters and uses "*res" only as an output parameter.

Original-patch: 
http://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com
Based-on-patch-by: Lianbo Jiang 
Signed-off-by: Bjorn Helgaas 
---
 kernel/resource.c |   94 +++--
 1 file changed, 41 insertions(+), 53 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 155ec873ea4d..9891ea90cc8f 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -319,23 +319,26 @@ int release_resource(struct resource *old)
 EXPORT_SYMBOL(release_resource);
 
 /*
- * Finds the lowest iomem resource existing within [res->start..res->end].
- * The caller must specify res->start, res->end, res->flags, and optionally
- * desc.  If found, returns 0, res is overwritten, if not found, returns -1.
- * This function walks the whole tree and not just first level children until
- * and unless first_level_children_only is true.
+ * Finds the lowest iomem resource that covers part of [start..end].  The
+ * caller must specify start, end, flags, and desc (which may be
+ * IORES_DESC_NONE).
+ *
+ * If a resource is found, returns 0 and *res is overwritten with the part
+ * of the resource that's within [start..end]; if none is found, returns
+ * -1.
+ *
+ * This function walks the whole tree and not just first level children
+ * unless first_level_children_only is true.
  */
-static int find_next_iomem_res(struct resource *res, unsigned long desc,
-  bool first_level_children_only)
+static int find_next_iomem_res(resource_size_t start, resource_size_t end,
+  unsigned long flags, unsigned long desc,
+  bool first_level_children_only,
+  struct resource *res)
 {
-   resource_size_t start, end;
struct resource *p;
bool sibling_only = false;
 
BUG_ON(!res);
-
-   start = res->start;
-   end = res->end;
BUG_ON(start >= end);
 
if (first_level_children_only)
@@ -344,7 +347,7 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
read_lock(_lock);
 
for (p = iomem_resource.child; p; p = next_resource(p, sibling_only)) {
-   if ((p->flags & res->flags) != res->flags)
+   if ((p->flags & flags) != flags)
continue;
if ((desc != IORES_DESC_NONE) && (desc != p->desc))
continue;
@@ -359,32 +362,31 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
read_unlock(_lock);
if (!p)
return -1;
+
/* copy data */
-   if (res->start < p->start)
-   res->start = p->start;
-   if (res->end > p->end)
-   res->end = p->end;
+   res->start = max(start, p->start);
+   res->end = min(end, p->end);
res->flags = p->flags;
res->desc = p->desc;
return 0;
 }
 
-static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
-bool first_level_children_only,
-void *arg,
+static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
+unsigned long flags, unsigned long desc,
+bool first_level_children_only, void *arg,
 int (*func)(struct resource *, void *))
 {
-   u64 orig_end = res->end;
+   struct resource res;
int ret = -1;
 
-   while ((res->start < res->end) &&
-  !find_next_iomem_res(res, desc, first_level_children_only)) {
-   ret = (*func)(res, arg);
+   while (start < end &&
+  !find_next_iomem_res(start, end, flags, desc,
+   first_level_children_only, )) {
+   ret = (*func)(, arg);

[PATCH 2/3] resource: Include resource end in walk_*() interfaces

2018-09-27 Thread Bjorn Helgaas
From: Bjorn Helgaas 

find_next_iomem_res() finds an iomem resource that covers part of a range
described by "start, end".  All callers expect that range to be inclusive,
i.e., both start and end are included, but find_next_iomem_res() doesn't
handle the end address correctly.

If it finds an iomem resource that contains exactly the end address, it
skips it, e.g., if "start, end" is [0x0-0x1] and there happens to be an
iomem resource [mem 0x1-0x1] (the single byte at 0x1), we skip
it:

  find_next_iomem_res(...)
  {
start = 0x0;
end = 0x1;
for (p = next_resource(...)) {
  # p->start = 0x1;
  # p->end = 0x1;
  # we *should* return this resource, but this condition is false:
  if ((p->end >= start) && (p->start < end))
break;

Adjust find_next_iomem_res() so it allows a resource that includes the
single byte at the end of the range.  This is a corner case that we
probably don't see in practice.

Fixes: 58c1b5b07907 ("[PATCH] memory hotadd fixes: find_next_system_ram catch 
range fix")
Signed-off-by: Bjorn Helgaas 
---
 kernel/resource.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 30e1bc68503b..155ec873ea4d 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -319,7 +319,7 @@ int release_resource(struct resource *old)
 EXPORT_SYMBOL(release_resource);
 
 /*
- * Finds the lowest iomem resource existing within [res->start.res->end).
+ * Finds the lowest iomem resource existing within [res->start..res->end].
  * The caller must specify res->start, res->end, res->flags, and optionally
  * desc.  If found, returns 0, res is overwritten, if not found, returns -1.
  * This function walks the whole tree and not just first level children until
@@ -352,7 +352,7 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
p = NULL;
break;
}
-   if ((p->end >= start) && (p->start < end))
+   if ((p->end >= start) && (p->start <= end))
break;
}
 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 0/3] find_next_iomem_res() fixes

2018-09-27 Thread Bjorn Helgaas
These fix:

  - A kexec off-by-one error that we probably never see in practice (only
affects a resource starting at exactly 640KB).

  - A corner case in walk_iomem_res_desc(), walk_system_ram_res(), etc that
we probably also never see in practice (only affects a single byte
resource at the very end of the region we're searching)

  - An issue in walk_iomem_res_desc() that apparently causes a kdump issue
(see Lianbo's note at [1]).

I think we need to fix the kdump issue either by these patches
(specifically the last one) or by the patch Lianbo posted [2].

I'm hoping to avoid deciding and merging these myself, but I'm not
sure who really wants to own kernel/resource.c.

[1] https://lore.kernel.org/lkml/01551d06-c421-5df3-b19f-fc66f3639...@redhat.com
[2] https://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com

---

Bjorn Helgaas (3):
  x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error
  resource: Include resource end in walk_*() interfaces
  resource: Fix find_next_iomem_res() iteration issue


 arch/x86/include/asm/kexec.h |2 -
 kernel/resource.c|   96 ++
 2 files changed, 43 insertions(+), 55 deletions(-)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/3] x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error

2018-09-27 Thread Bjorn Helgaas
From: Bjorn Helgaas 

The only use of KEXEC_BACKUP_SRC_END is as an argument to
walk_system_ram_res():

  int crash_load_segments(struct kimage *image)
  {
...
walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END,
image, determine_backup_region);

walk_system_ram_res() expects "start, end" arguments that are inclusive,
i.e., the range to be walked includes both the start and end addresses.

KEXEC_BACKUP_SRC_END was previously defined as (640 * 1024UL), which is the
first address *past* the desired 0-640KB range.

Define KEXEC_BACKUP_SRC_END as (640 * 1024UL - 1) so the KEXEC_BACKUP_SRC
region is [0-0x9], not [0-0xa].

Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call")
Signed-off-by: Bjorn Helgaas 
---
 arch/x86/include/asm/kexec.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index f327236f0fa7..5125fca472bb 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -67,7 +67,7 @@ struct kimage;
 
 /* Memory to backup during crash kdump */
 #define KEXEC_BACKUP_SRC_START (0UL)
-#define KEXEC_BACKUP_SRC_END   (640 * 1024UL)  /* 640K */
+#define KEXEC_BACKUP_SRC_END   (640 * 1024UL - 1)  /* 640K */
 
 /*
  * CPU does not save ss and sp on stack if execution is already


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 3/3] resource: Fix find_next_iomem_res() iteration issue

2018-09-27 Thread Bjorn Helgaas
On Thu, Sep 27, 2018 at 01:27:41PM +0800, lijiang wrote:
> 在 2018年09月25日 06:15, Bjorn Helgaas 写道:
> > From: Bjorn Helgaas 
> > 
> > Previously find_next_iomem_res() used "*res" as both an input parameter for
> > the range to search and the type of resource to search for, and an output
> > parameter for the resource we found, which makes the interface confusing
> > and hard to use correctly.
> > 
> > All callers allocate a single struct resource and use it for repeated calls
> > to find_next_iomem_res().  When find_next_iomem_res() returns a resource,
> > it overwrites the start, end, flags, and desc members of the struct.  If we
> > call find_next_iomem_res() again, we must update or restore these fields.
> > 
> > The callers (__walk_iomem_res_desc() and walk_system_ram_range()) do not
> > restore res->flags, so if the caller is searching for flags of
> > IORESOURCE_MEM | IORESOURCE_BUSY and finds a resource with flags of
> > IORESOURCE_MEM | IORESOURCE_BUSY | IORESOURCE_SYSRAM, the next search will
> > find only resources marked as IORESOURCE_SYSRAM.
> > 
> > Fix this by restructuring the interface so it takes explicit "start, end,
> > flags" parameters and uses "*res" only as an output parameter.
> 
> Hi, Bjorn
> I personally suggest that some comments might be added in the code, make it 
> clear
> and easy to understand, then which could avoid the old confusion and more 
> code changes.

Since I think the current interface (using *res as both input and
output parameters that have very different meanings) is confusing,
it's hard for *me* to write comments that make it less confusing, but
of course, you're welcome to propose something.

My opinion (probably not universally shared) is that my proposal would
make the code more readable, and it's worth doing even though the diff
is larger.

Anyway, I'll post these patches independently and see if anybody else
has an opinion.

Bjorn

> > Original-patch: 
> > http://lore.kernel.org/lkml/20180921073211.20097-2-liji...@redhat.com
> > Based-on-patch-by: Lianbo Jiang 
> > Signed-off-by: Bjorn Helgaas 
> > ---
> >  kernel/resource.c |   94 
> > +++--
> >  1 file changed, 41 insertions(+), 53 deletions(-)
> > 
> > diff --git a/kernel/resource.c b/kernel/resource.c
> > index 155ec873ea4d..9891ea90cc8f 100644
> > --- a/kernel/resource.c
> > +++ b/kernel/resource.c
> > @@ -319,23 +319,26 @@ int release_resource(struct resource *old)
> >  EXPORT_SYMBOL(release_resource);
> >  
> >  /*
> > - * Finds the lowest iomem resource existing within [res->start..res->end].
> > - * The caller must specify res->start, res->end, res->flags, and optionally
> > - * desc.  If found, returns 0, res is overwritten, if not found, returns 
> > -1.
> > - * This function walks the whole tree and not just first level children 
> > until
> > - * and unless first_level_children_only is true.
> > + * Finds the lowest iomem resource that covers part of [start..end].  The
> > + * caller must specify start, end, flags, and desc (which may be
> > + * IORES_DESC_NONE).
> > + *
> > + * If a resource is found, returns 0 and *res is overwritten with the part
> > + * of the resource that's within [start..end]; if none is found, returns
> > + * -1.
> > + *
> > + * This function walks the whole tree and not just first level children
> > + * unless first_level_children_only is true.
> >   */
> > -static int find_next_iomem_res(struct resource *res, unsigned long desc,
> > -  bool first_level_children_only)
> > +static int find_next_iomem_res(resource_size_t start, resource_size_t end,
> > +  unsigned long flags, unsigned long desc,
> > +  bool first_level_children_only,
> > +  struct resource *res)
> >  {
> > -   resource_size_t start, end;
> > struct resource *p;
> > bool sibling_only = false;
> >  
> > BUG_ON(!res);
> > -
> > -   start = res->start;
> > -   end = res->end;
> > BUG_ON(start >= end);
> >  
> > if (first_level_children_only)
> > @@ -344,7 +347,7 @@ static int find_next_iomem_res(struct resource *res, 
> > unsigned long desc,
> > read_lock(_lock);
> >  
> > for (p = iomem_resource.child; p; p = next_resource(p, sibling_only)) {
> > -   if ((p->flags & res->flags) != res->flags)
> > +   if ((p->flags & flags) != flags)
> > continue;
> > if ((desc != IORES_DESC_NONE) && (desc != p->desc))
> > continue;
> > @@ -359,32 +362,31 @@ static int find_next_iomem_res(struct resource *res, 
> > unsigned long desc,
> > read_unlock(_lock);
> > if (!p)
> > return -1;
> > +
> > /* copy data */
> > -   if (res->start < p->start)
> > -   res->start = p->start;
> > -   if (res->end > p->end)
> > -   res->end = p->end;
> > +   res->start = max(start, p->start);
> > +   res->end = min(end, p->end);
> > res->flags = p->flags;
> > 

[PATCH] x86/boot: Fix kexec booting failure after SEV early boot support

2018-09-27 Thread Kairui Song
Commit 1958b5fc4010 ("x86/boot: Add early boot support when running
with SEV active") is causing kexec becomes sometimes unstable even if
SEV is not active. kexec reboot won't start a second kernel bypassing
BIOS boot process, instead, the system got reset.

That's because, in get_sev_encryption_bit function, we are using
32-bit RIP-relative addressing to read the value of enc_bit, but
kexec may alloc the early boot up code to a higher location, which
is beyond 32-bit addressing limit. Some garbage will be read and
get_sev_encryption_bit will return the wrong value, which leads to
wrong memory page flag.

This patch removes the use of enc_bit, as currently, enc_bit's only
purpose is to avoid duplicated encryption bit reading, but the overhead
of reading encryption bit is so tiny, so no need to cache that.

Fixes: 1958b5fc4010 ("x86/boot: Add early boot support when running with SEV 
active")
Suggested-by: Borislav Petkov 
Signed-off-by: Kairui Song 
---
 arch/x86/boot/compressed/mem_encrypt.S | 19 ---
 1 file changed, 19 deletions(-)

diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
index eaa843a52907..a480356e0ed8 100644
--- a/arch/x86/boot/compressed/mem_encrypt.S
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -25,20 +25,6 @@ ENTRY(get_sev_encryption_bit)
push%ebx
push%ecx
push%edx
-   push%edi
-
-   /*
-* RIP-relative addressing is needed to access the encryption bit
-* variable. Since we are running in 32-bit mode we need this call/pop
-* sequence to get the proper relative addressing.
-*/
-   call1f
-1: popl%edi
-   subl$1b, %edi
-
-   movlenc_bit(%edi), %eax
-   cmpl$0, %eax
-   jge .Lsev_exit
 
/* Check if running under a hypervisor */
movl$1, %eax
@@ -69,15 +55,12 @@ ENTRY(get_sev_encryption_bit)
 
movl%ebx, %eax
andl$0x3f, %eax /* Return the encryption bit location */
-   movl%eax, enc_bit(%edi)
jmp .Lsev_exit
 
 .Lno_sev:
xor %eax, %eax
-   movl%eax, enc_bit(%edi)
 
 .Lsev_exit:
-   pop %edi
pop %edx
pop %ecx
pop %ebx
@@ -113,8 +96,6 @@ ENTRY(set_sev_encryption_mask)
 ENDPROC(set_sev_encryption_mask)
 
.data
-enc_bit:
-   .int0x
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
.balign 8
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 2/6] ima: prevent kexec_load syscall based on runtime secureboot flag

2018-09-27 Thread Mimi Zohar
[Cc'ing the kexec mailing list, and Seth]

On Wed, 2018-09-26 at 17:52 +0530, Nayna Jain wrote:
> When CONFIG_KEXEC_VERIFY_SIG is enabled, the kexec_file_load syscall
> requires the kexec'd kernel image to be signed. Distros are concerned
> about totally disabling the kexec_load syscall. As a compromise, the
> kexec_load syscall will only be disabled when CONFIG_KEXEC_VERIFY_SIG
> is configured and the system is booted with secureboot enabled.
> 
> This patch disables the kexec_load syscall only for systems booted with
> secureboot enabled.
> 
> Signed-off-by: Nayna Jain 

Nice!

Mimi

> ---
>  security/integrity/ima/ima_main.c | 17 +++--
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/security/integrity/ima/ima_main.c 
> b/security/integrity/ima/ima_main.c
> index dce0a8a217bb..bdb6e5563d05 100644
> --- a/security/integrity/ima/ima_main.c
> +++ b/security/integrity/ima/ima_main.c
> @@ -505,20 +505,24 @@ int ima_post_read_file(struct file *file, void *buf, 
> loff_t size,
>   */
>  int ima_load_data(enum kernel_load_data_id id)
>  {
> - bool sig_enforce;
> + bool ima_enforce, sig_enforce;
>  
> - if ((ima_appraise & IMA_APPRAISE_ENFORCE) != IMA_APPRAISE_ENFORCE)
> - return 0;
> + ima_enforce =
> + (ima_appraise & IMA_APPRAISE_ENFORCE) == IMA_APPRAISE_ENFORCE;
>  
>   switch (id) {
>   case LOADING_KEXEC_IMAGE:
> - if (ima_appraise & IMA_APPRAISE_KEXEC) {
> +#ifdef CONFIG_KEXEC_VERIFY_SIG
> + if (arch_ima_get_secureboot())
> + return -EACCES;
> +#endif
> + if (ima_enforce && (ima_appraise & IMA_APPRAISE_KEXEC)) {
>   pr_err("impossible to appraise a kernel image without a 
> file descriptor; try using kexec_file_load syscall.\n");
>   return -EACCES; /* INTEGRITY_UNKNOWN */
>   }
>   break;
>   case LOADING_FIRMWARE:
> - if (ima_appraise & IMA_APPRAISE_FIRMWARE) {
> + if (ima_enforce && (ima_appraise & IMA_APPRAISE_FIRMWARE)) {
>   pr_err("Prevent firmware sysfs fallback loading.\n");
>   return -EACCES; /* INTEGRITY_UNKNOWN */
>   }
> @@ -526,7 +530,8 @@ int ima_load_data(enum kernel_load_data_id id)
>   case LOADING_MODULE:
>   sig_enforce = is_module_sig_enforced();
>  
> - if (!sig_enforce && (ima_appraise & IMA_APPRAISE_MODULES)) {
> + if (ima_enforce && (!sig_enforce
> + && (ima_appraise & IMA_APPRAISE_MODULES))) {
>   pr_err("impossible to appraise a module without a file 
> descriptor. sig_enforce kernel parameter might help\n");
>   return -EACCES; /* INTEGRITY_UNKNOWN */
>   }


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 1/6] x86/ima: define arch_ima_get_secureboot

2018-09-27 Thread Mimi Zohar
[Cc'ing the kexec mailing list, and Seth]

On Wed, 2018-09-26 at 17:52 +0530, Nayna Jain wrote:
> Distros are concerned about totally disabling the kexec_load syscall.
> As a compromise, the kexec_load syscall will only be disabled when
> CONFIG_KEXEC_VERIFY_SIG is configured and the system is booted with
> secureboot enabled.
> 
> This patch defines the new arch specific function called
> arch_ima_get_secureboot() to retrieve the secureboot state of the system.
> 
> Signed-off-by: Nayna Jain 
> Suggested-by: Seth Forshee 

Nice!

Mimi

> ---
>  arch/x86/kernel/Makefile   |  2 ++
>  arch/x86/kernel/ima_arch.c | 17 +
>  include/linux/ima.h|  9 +
>  3 files changed, 28 insertions(+)
>  create mode 100644 arch/x86/kernel/ima_arch.c
> 
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 02d6f5cf4e70..f32406e51424 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -149,3 +149,5 @@ ifeq ($(CONFIG_X86_64),y)
>   obj-$(CONFIG_MMCONF_FAM10H) += mmconf-fam10h_64.o
>   obj-y   += vsmp_64.o
>  endif
> +
> +obj-$(CONFIG_IMA)+= ima_arch.o
> diff --git a/arch/x86/kernel/ima_arch.c b/arch/x86/kernel/ima_arch.c
> new file mode 100644
> index ..bb5a88d2b271
> --- /dev/null
> +++ b/arch/x86/kernel/ima_arch.c
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +/*
> + * Copyright (C) 2018 IBM Corporation
> + */
> +#include 
> +#include 
> +
> +extern struct boot_params boot_params;
> +
> +bool arch_ima_get_secureboot(void)
> +{
> + if (efi_enabled(EFI_BOOT) &&
> + (boot_params.secure_boot == efi_secureboot_mode_enabled))
> + return true;
> + else
> + return false;
> +}
> diff --git a/include/linux/ima.h b/include/linux/ima.h
> index 84806b54b50a..4852255aa4f4 100644
> --- a/include/linux/ima.h
> +++ b/include/linux/ima.h
> @@ -30,6 +30,15 @@ extern void ima_post_path_mknod(struct dentry *dentry);
>  extern void ima_add_kexec_buffer(struct kimage *image);
>  #endif
>  
> +#ifdef CONFIG_X86
> +extern bool arch_ima_get_secureboot(void);
> +#else
> +static inline bool arch_ima_get_secureboot(void)
> +{
> + return false;
> +}
> +#endif
> +
>  #else
>  static inline int ima_bprm_check(struct linux_binprm *bprm)
>  {


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v7 RESEND 4/4] kdump/vmcore: support encrypted old memory with SME enabled

2018-09-27 Thread Lianbo Jiang
In kdump kernel, we need to dump the old memory into vmcore file,if SME
is enabled in the first kernel, we have to remap the old memory with the
memory encryption mask, which will be automatically decrypted when we
read from DRAM.

For SME kdump, there are two cases that doesn't support:

 --
| first-kernel | second-kernel | kdump support |
|  (mem_encrypt=on|off)|   (yes|no)|
|--+---+---|
| on   | on| yes   |
| off  | off   | yes   |
| on   | off   | no|
| off  | on| no|
|__|___|___|

1. SME is enabled in the first kernel, but SME is disabled in kdump kernel
In this case, because the old memory is encrypted, we can't decrypt the
old memory.

2. SME is disabled in the first kernel, but SME is enabled in kdump kernel
On the one hand, the old memory is unencrypted, the old memory can be dumped
as usual, we don't need to enable SME in kdump kernel; On the other hand, it
will increase the complexity of the code, we will have to consider how to
pass the SME flag from the first kernel to the kdump kernel, it is really
too expensive to do this.

This patches are only for SME kdump, the patches don't support SEV kdump.

Signed-off-by: Lianbo Jiang 
Reviewed-by: Tom Lendacky 
---
 arch/x86/kernel/Makefile |  1 +
 arch/x86/kernel/crash_dump_encrypt.c | 53 
 fs/proc/vmcore.c | 21 +++
 include/linux/crash_dump.h   | 12 +++
 4 files changed, 81 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/kernel/crash_dump_encrypt.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8824d01c0c35..dfbeae0e35ce 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -97,6 +97,7 @@ obj-$(CONFIG_KEXEC_CORE)  += machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC_CORE)   += relocate_kernel_$(BITS).o crash.o
 obj-$(CONFIG_KEXEC_FILE)   += kexec-bzimage64.o
 obj-$(CONFIG_CRASH_DUMP)   += crash_dump_$(BITS).o
+obj-$(CONFIG_AMD_MEM_ENCRYPT)  += crash_dump_encrypt.o
 obj-y  += kprobes/
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault.o
diff --git a/arch/x86/kernel/crash_dump_encrypt.c 
b/arch/x86/kernel/crash_dump_encrypt.c
new file mode 100644
index ..e1b1a577f197
--- /dev/null
+++ b/arch/x86/kernel/crash_dump_encrypt.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Memory preserving reboot related code.
+ *
+ * Created by: Lianbo Jiang (liji...@redhat.com)
+ * Copyright (C) RedHat Corporation, 2018. All rights reserved
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * copy_oldmem_page_encrypted - copy one page from "oldmem encrypted"
+ * @pfn: page frame number to be copied
+ * @buf: target memory address for the copy; this can be in kernel address
+ * space or user address space (see @userbuf)
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page (based on pfn) to begin the copy
+ * @userbuf: if set, @buf is in user address space, use copy_to_user(),
+ * otherwise @buf is in kernel address space, use memcpy().
+ *
+ * Copy a page from "oldmem encrypted". For this page, there is no pte
+ * mapped in the current kernel. We stitch up a pte, similar to
+ * kmap_atomic.
+ */
+
+ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
+   size_t csize, unsigned long offset, int userbuf)
+{
+   void  *vaddr;
+
+   if (!csize)
+   return 0;
+
+   vaddr = (__force void *)ioremap_encrypted(pfn << PAGE_SHIFT,
+ PAGE_SIZE);
+   if (!vaddr)
+   return -ENOMEM;
+
+   if (userbuf) {
+   if (copy_to_user((void __user *)buf, vaddr + offset, csize)) {
+   iounmap((void __iomem *)vaddr);
+   return -EFAULT;
+   }
+   } else
+   memcpy(buf, vaddr + offset, csize);
+
+   set_iounmap_nonlazy();
+   iounmap((void __iomem *)vaddr);
+   return csize;
+}
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index cbde728f8ac6..3065c8bada6a 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -25,6 +25,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include "internal.h"
 
 /* List representing chunks of contiguous memory areas and their offsets in
@@ -98,7 +101,8 @@ static int pfn_is_ram(unsigned long pfn)
 
 /* Reads a page from the oldmem device from given offset. */
 static ssize_t read_from_oldmem(char *buf, size_t count,
-   u64 *ppos, int userbuf)
+   u64 *ppos, int userbuf,
+   bool encrypted)
 {
unsigned long pfn, offset;
   

[PATCH v7 RESEND 2/4] kexec: allocate unencrypted control pages for kdump in case SME is enabled

2018-09-27 Thread Lianbo Jiang
When SME is enabled in the first kernel, we will allocate unencrypted pages
for kdump in order to be able to boot the kdump kernel like kexec.

Signed-off-by: Lianbo Jiang 
Reviewed-by: Tom Lendacky 
---
 kernel/kexec_core.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 23a83a4da38a..e7efcd1a977b 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -471,6 +471,16 @@ static struct page 
*kimage_alloc_crash_control_pages(struct kimage *image,
}
}
 
+   if (pages) {
+   /*
+* For kdump, we need to ensure that these pages are
+* unencrypted pages if SME is enabled.
+* By the way, it is unnecessary to call the arch_
+* kexec_pre_free_pages(), which will make the code
+* become more simple.
+*/
+   arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
+   }
return pages;
 }
 
@@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result  = -ENOMEM;
goto out;
}
+   arch_kexec_post_alloc_pages(page_address(page), 1, 0);
ptr = kmap(page);
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
@@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
kunmap(page);
+   arch_kexec_pre_free_pages(page_address(page), 1);
if (result) {
result = -EFAULT;
goto out;
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v7 RESEND 1/4] x86/ioremap: add a function ioremap_encrypted() to remap kdump old memory

2018-09-27 Thread Lianbo Jiang
When SME is enabled on AMD machine, the memory is encrypted in the first
kernel. In this case, SME also needs to be enabled in kdump kernel, and
we have to remap the old memory with the memory encryption mask.

Here we only talk about the case that SME is active in the first kernel,
and only care it's active too in kdump kernel. there are four cases we
need considered.

a. dump vmcore
   It is encrypted in the first kernel, and needs be read out in kdump
   kernel.

b. crash notes
   When dumping vmcore, the people usually need to read the useful
   information from notes, and the notes is also encrypted.

c. iommu device table
   It is allocated by kernel, need fill its pointer into mmio of amd iommu.
   It's encrypted in the first kernel, need read the old content to analyze
   and get useful information.

d. mmio of amd iommu
   Register reported by amd firmware, it's not RAM, we don't encrypt in
   both the first kernel and kdump kernel.

To achieve the goal, the solution is:
1. add a new bool parameter "encrypted" to __ioremap_caller()
   It is a low level function, and check the newly added parameter, if it's
   true and in kdump kernel, will remap the memory with sme mask.

2. add a new function ioremap_encrypted() to explicitly passed in a "true"
   value for "encrypted".
   For above a, b, c, we will call ioremap_encrypted();

3. adjust all existed ioremap wrapper functions, passed in "false" for
   encrypted to make them an before.

   ioremap_encrypted()\
   ioremap_cache() |
   ioremap_prot()  |
   ioremap_wt()|->__ioremap_caller()
   ioremap_wc()|
   ioremap_uc()|
   ioremap_nocache()  /

Signed-off-by: Lianbo Jiang 
Reviewed-by: Tom Lendacky 
---
 arch/x86/include/asm/io.h |  3 +++
 arch/x86/mm/ioremap.c | 25 +
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 6de64840dd22..f8795f9581c7 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t offset, 
unsigned long size);
 #define ioremap_cache ioremap_cache
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, 
unsigned long prot_val);
 #define ioremap_prot ioremap_prot
+extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
+   unsigned long size);
+#define ioremap_encrypted ioremap_encrypted
 
 /**
  * ioremap -   map bus memory into CPU space
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c63a545ec199..e01e6c695add 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "physaddr.h"
 
@@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
unsigned long size,
  * caller shouldn't need to know that small detail.
  */
 static void __iomem *__ioremap_caller(resource_size_t phys_addr,
-   unsigned long size, enum page_cache_mode pcm, void *caller)
+   unsigned long size, enum page_cache_mode pcm,
+   void *caller, bool encrypted)
 {
unsigned long offset, vaddr;
resource_size_t last_addr;
@@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
 * resulting mapping.
 */
prot = PAGE_KERNEL_IO;
-   if (sev_active() && mem_flags.desc_other)
+   if ((sev_active() && mem_flags.desc_other) || encrypted)
prot = pgprot_encrypted(prot);
 
switch (pcm) {
@@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
@@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL_GPL(ioremap_uc);
 
@@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wc);
 
@@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc);
 void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
-   

[PATCH v7 RESEND 0/4] Support kdump for AMD secure memory encryption(SME)

2018-09-27 Thread Lianbo Jiang
When SME is enabled on AMD machine, we also need to support kdump. Because
the memory is encrypted in the first kernel, we will remap the old memory
to the kdump kernel for dumping data, and SME is also enabled in the kdump
kernel, otherwise the old memory can not be decrypted.

For the kdump, it is necessary to distinguish whether the memory is encrypted.
Furthermore, we should also know which part of the memory is encrypted or
decrypted. We will appropriately remap the memory according to the specific
situation in order to tell cpu how to access the memory.

As we know, a page of memory that is marked as encrypted, which will be
automatically decrypted when read from DRAM, and will also be automatically
encrypted when written to DRAM. If the old memory is encrypted, we have to
remap the old memory with the memory encryption mask, which will automatically
decrypt the old memory when we read those data.

For kdump(SME), there are two cases that doesn't support:

 --
| first-kernel | second-kernel | kdump support |
|  (mem_encrypt=on|off)|   (yes|no)|
|--+---+---|
| on   | on| yes   |
| off  | off   | yes   |
| on   | off   | no|
| off  | on| no|
|__|___|___|

1. SME is enabled in the first kernel, but SME is disabled in kdump kernel
In this case, because the old memory is encrypted, we can't decrypt the
old memory.

2. SME is disabled in the first kernel, but SME is enabled in kdump kernel
It is unnecessary to support in this case, because the old memory is
unencrypted, the old memory can be dumped as usual, we don't need to enable
SME in kdump kernel. Another, If we must support the scenario, it will
increase the complexity of the code, we will have to consider how to pass
the SME flag from the first kernel to the kdump kernel, in order to let the
kdump kernel know that whether the old memory is encrypted.

There are two methods to pass the SME flag to the kdump kernel. The first
method is to modify the assembly code, which includes some common code and
the path is too long. The second method is to use kexec tool, which could
require the SME flag to be exported in the first kernel by "proc" or "sysfs",
kexec tools will read the SME flag from "proc" or "sysfs" when we use kexec
tools to load image, subsequently the SME flag will be saved in boot_params,
we can properly remap the old memory according to the previously saved SME
flag. But it is too expensive to do this.

This patches are only for SME kdump, the patches don't support SEV kdump.

Test tools:
makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
commit  "A draft for kdump vmcore about AMD SME"
Note: This patch can only dump vmcore in the case of SME enabled.

crash-7.2.3: https://github.com/crash-utility/crash.git
commit <001f77a05585> "Fix for Linux 4.19-rc1 and later kernels that contain 
kernel commit <7290d5809571>"

kexec-tools-2.0.17: 
git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
commit  "kexec: fix for "Unhandled rela relocation: 
R_X86_64_PLT32" error"

Note:
Before you load the kernel and initramfs for kdump, this patch(
http://lists.infradead.org/pipermail/kexec/2018-September/021460.html) must be 
merged
to kexec-tools, and then the kdump kernel will work well. Because there is a 
patch
which is removed based on v6(x86/ioremap: strengthen the logic in
early_memremap_pgprot_adjust() to adjust encryption mask).

Test environment:
HP ProLiant DL385Gen10 AMD EPYC 7251
8-Core Processor
32768 MB memory
600 GB disk space

Linux 4.19-rc5:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit <6bf4ca7fbc85> "Linux 4.19-rc5"

Reference:
AMD64 Architecture Programmer's Manual
https://support.amd.com/TechDocs/24593.pdf

Changes since v6:
1. There is a patch which is removed based on v6.
(x86/ioremap: strengthen the logic in early_memremap_pgprot_adjust() to adjust 
encryption mask)
Dave Young suggests that this patch can be removed and fix the kexec-tools.
Reference: 
http://lists.infradead.org/pipermail/kexec/2018-September/021460.html)
2. Update the patch log.

Changes since v7:
1. Improve patch log for patch 1/4(Suggested by Baoquan He)
2. Add Reviewed-by for all patches(Tom Lendacky )
3. Add Acked-by for patch 3/4(Joerg Roedel )

Some known issues:
1. about SME
Upstream kernel will hang on HP machine(DL385Gen10 AMD EPYC 7251) when
we execute the kexec command as follow:

# kexec -l /boot/vmlinuz-4.19.0-rc5+ --initrd=/boot/initramfs-4.19.0-rc5+.img 
--command-line="root=/dev/mapper/rhel_hp--dl385g10--03-root ro mem_encrypt=on 
rd.lvm.lv=rhel_hp-dl385g10-03/root rd.lvm.lv=rhel_hp-dl385g10-03/swap 
console=ttyS0,115200n81 LANG=en_US.UTF-8 earlyprintk=serial debug nokaslr"
# kexec -e (or reboot)

But this issue can not be reproduced on speedway machine, and this