Re: [PATCH 1/5 V5] Add a function(ioremap_encrypted) for kdump when AMD sme enabled

2018-07-02 Thread lijiang
在 2018年07月02日 18:14, Borislav Petkov 写道:
> On Mon, Jul 02, 2018 at 03:26:35PM +0800, Lianbo Jiang wrote:
>> @@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
>> unsigned long size,
>>   * caller shouldn't need to know that small detail.
>>   */
>>  static void __iomem *__ioremap_caller(resource_size_t phys_addr,
>> -unsigned long size, enum page_cache_mode pcm, void *caller)
>> +unsigned long size, enum page_cache_mode pcm,
>> +void *caller, bool encrypted)
> 
> So instead of sprinkling that @encrypted argument everywhere and then
> setting it based on sme_active() ...
> 
>>  {
>>  unsigned long offset, vaddr;
>>  resource_size_t last_addr;
>> @@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
>> phys_addr,
>>   * resulting mapping.
>>   */
>>  prot = PAGE_KERNEL_IO;
>> -if (sev_active() && mem_flags.desc_other)
>> +if ((sev_active() && mem_flags.desc_other) || encrypted)
> 
> ... why can't you simply do your checks:
> 
>   sme_active() && is_kdump_kernel()
> 
> here so that __ioremap_caller() can automatically choose the proper
> pgprot value when ioremapping the memory in the kdump kernel?
> 
> And this way the callers don't even have to care whether the memory is
> encrypted or not?
> 
Thank you, Boris. I'm very glad to read your comments. That's a good idea, but 
it has some
unencrypted memory in kdump mode, for example, the elfcorehdr. In fact, the 
elfcorehdr and
notes call the same function(read_from_oldmem->ioremap_cache), in this case, it 
is very
difficult to properly remap the memory if the caller don't care whether the 
memory is encrypted.

Regards,
Lianbo
>>  prot = pgprot_encrypted(prot);
>>  
>>  switch (pcm) {
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/amd: remove redundant variable tag

2018-07-02 Thread Gary R Hook

On 07/02/2018 11:37 AM, Colin King wrote:

From: Colin Ian King 

Variable tag is being assigned but is never used hence it is
redundant and can be removed.

Cleans up clang warning:
warning: variable 'tag' set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King 
---
  drivers/iommu/amd_iommu.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 596b95c50051..c6aed1b088e9 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -547,7 +547,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
  static void iommu_print_event(struct amd_iommu *iommu, void *__evt)
  {
struct device *dev = iommu->iommu.dev;
-   int type, devid, pasid, flags, tag;
+   int type, devid, pasid, flags;
volatile u32 *event = __evt;
int count = 0;
u64 address;
@@ -615,7 +615,6 @@ static void iommu_print_event(struct amd_iommu *iommu, void 
*__evt)
case EVENT_TYPE_INV_PPR_REQ:
pasid = ((event[0] >> 16) & 0x)
| ((event[1] << 6) & 0xF);
-   tag = event[1] & 0x03FF;
dev_err(dev, "INVALID_PPR_REQUEST device=%02x:%02x.%x pasid=0x%05x 
address=0x%016llx flags=0x%04x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
pasid, address, flags);



If it's all the same to you, the intent was to print the tag as well. 
Could we add that to the message, rather than removing the variable, please?



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/amd: remove redundant variable tag

2018-07-02 Thread Colin King
From: Colin Ian King 

Variable tag is being assigned but is never used hence it is
redundant and can be removed.

Cleans up clang warning:
warning: variable 'tag' set but not used [-Wunused-but-set-variable]

Signed-off-by: Colin Ian King 
---
 drivers/iommu/amd_iommu.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 596b95c50051..c6aed1b088e9 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -547,7 +547,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 
domain_id,
 static void iommu_print_event(struct amd_iommu *iommu, void *__evt)
 {
struct device *dev = iommu->iommu.dev;
-   int type, devid, pasid, flags, tag;
+   int type, devid, pasid, flags;
volatile u32 *event = __evt;
int count = 0;
u64 address;
@@ -615,7 +615,6 @@ static void iommu_print_event(struct amd_iommu *iommu, void 
*__evt)
case EVENT_TYPE_INV_PPR_REQ:
pasid = ((event[0] >> 16) & 0x)
| ((event[1] << 6) & 0xF);
-   tag = event[1] & 0x03FF;
dev_err(dev, "INVALID_PPR_REQUEST device=%02x:%02x.%x 
pasid=0x%05x address=0x%016llx flags=0x%04x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
pasid, address, flags);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] ARM: dma-mapping: Set proper DMA ops in arm_iommu_detach_device()

2018-07-02 Thread Christoph Hellwig
Looks good:

Acked-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] ARM: dma-mapping: Set proper DMA ops in arm_iommu_detach_device()

2018-07-02 Thread Russell King - ARM Linux
On Mon, Jul 02, 2018 at 01:53:17PM +0200, Thierry Reding wrote:
> On Wed, May 30, 2018 at 04:06:24PM +0200, Thierry Reding wrote:
> > From: Thierry Reding 
> > 
> > Instead of setting the DMA ops pointer to NULL, set the correct,
> > non-IOMMU ops depending on the device's coherency setting.
> > 
> > Signed-off-by: Thierry Reding 
> > ---
> > Changes in v4:
> > - new patch to fix existing arm_iommu_detach_device() to do what we need
> > 
> >  arch/arm/mm/dma-mapping.c | 12 ++--
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> Christoph, Russell,
> 
> could either of you provide an Acked-by for this? I think it makes the
> most sense for Ben to pick this up into the Nouveau tree along with
> patch 2/2.

Looks fine to me.

Acked-by: Russell King 

Thanks.

> 
> Thierry
> 
> > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> > index af27f1c22d93..87a0037574e4 100644
> > --- a/arch/arm/mm/dma-mapping.c
> > +++ b/arch/arm/mm/dma-mapping.c
> > @@ -1151,6 +1151,11 @@ int arm_dma_supported(struct device *dev, u64 mask)
> > return __dma_supported(dev, mask, false);
> >  }
> >  
> > +static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
> > +{
> > +   return coherent ? _coherent_dma_ops : _dma_ops;
> > +}
> > +
> >  #ifdef CONFIG_ARM_DMA_USE_IOMMU
> >  
> >  static int __dma_info_to_prot(enum dma_data_direction dir, unsigned long 
> > attrs)
> > @@ -2296,7 +2301,7 @@ void arm_iommu_detach_device(struct device *dev)
> > iommu_detach_device(mapping->domain, dev);
> > kref_put(>kref, release_iommu_mapping);
> > to_dma_iommu_mapping(dev) = NULL;
> > -   set_dma_ops(dev, NULL);
> > +   set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent));
> >  
> > pr_debug("Detached IOMMU controller from %s device.\n", dev_name(dev));
> >  }
> > @@ -2357,11 +2362,6 @@ static void arm_teardown_iommu_dma_ops(struct device 
> > *dev) { }
> >  
> >  #endif /* CONFIG_ARM_DMA_USE_IOMMU */
> >  
> > -static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
> > -{
> > -   return coherent ? _coherent_dma_ops : _dma_ops;
> > -}
> > -
> >  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> > const struct iommu_ops *iommu, bool coherent)
> >  {
> > -- 
> > 2.17.0
> > 



-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DMA mappings and crossing boundaries

2018-07-02 Thread Benjamin Herrenschmidt
On Mon, 2018-07-02 at 14:06 +0100, Robin Murphy wrote:

 .../...

Thanks Robin, I was starting to depair anybody would reply ;-)

> > AFAIK, dma_alloc_coherent() is defined (Documentation/DMA-API-
> > HOWTO.txt) as always allocating to the next power-of-2 order, so we
> > should never have the problem unless we allocate a single chunk larger
> > than the IOMMU page size.
> 
> (and even then it's not *that* much of a problem, since it comes down to 
> just finding n > 1 consecutive unused IOMMU entries for exclusive use by 
> that new chunk)

Yes, this case is not my biggest worry.

> > For dma_map_sg() however, if a request that has a single "entry"
> > spawning such a boundary, we need to ensure that the result mapping is
> > 2 contiguous "large" iommu pages as well.
> > 
> > However, that doesn't fit well with us re-using existing mappings since
> > they may already exist and either not be contiguous, or partially exist
> > with no free hole around them.
> > 
> > Now, we *could* possibly construe a way to solve this by detecting this
> > case and just allocating another "pair" (or set if we cross even more
> > pages) of IOMMU pages elsewhere, thus partially breaking our re-use
> > scheme.
> > 
> > But while doable, this introduce some serious complexity in the
> > implementation, which I would very much like to avoid.
> > 
> > So I was wondering if you guys thought that was ever likely to happen ?
> > Do you see reasonable cases where dma_map_sg() would be called with a
> > list in which a single entry crosses a 256M or 1G boundary ?
> 
> For streaming mappings of buffers cobbled together out of any old CPU 
> pages (e.g. user memory), you may well happen to get two 
> physically-adjacent pages falling either side of an IOMMU boundary, 
> which comprise all or part of a single request - note that whilst it's 
> probably less likely than the scatterlist case, this could technically 
> happen for dma_map_{page, single}() calls too.

Could it ? I wouldn't think dma_map_page is allows to cross page
boundaries ... what about single() ? The main worry is people using
these things on kmalloc'ed memory

> Conceptually it looks pretty easy to extend the allocation constraints 
> to cope with that - even the pathological worst case would have an 
> absolute upper bound of 3 IOMMU entries for any one physical region - 
> but if in practice it's a case of mapping arbitrary CPU pages to 32-bit 
> DMA addresses having only 4 1GB slots to play with, I can't really see a 
> way to make that practical :(

No we are talking about 40-ish-bits of address space, so there's a bit
of leeway. Of course no scheme will work if the user app tries to map
more than the GPU can possibly access.

But with newer AMD adding a few more bits and nVidia being at 47-bits,
I think we have some margin, it's just that they can't reach our
discontiguous memory with a normal 'bypass' mapping and I'd rather not
teach Linux about every single way our HW can scatter memory accross
nodes, so an "on demand" mechanism is by far the most flexible way to
deal with all configurations.

> Maybe the best compromise would be some sort of hybrid scheme which 
> makes sure that one of the IOMMU entries always covers the SWIOTLB 
> buffer, and invokes software bouncing for the awkward cases.

Hrm... not too sure about that. I'm happy to limit that scheme to well
known GPU vendor/device IDs, and SW bouncing is pointless in these
cases. It would be nice if we could have some kind of guarantee that a
single mapping or sglist entry never crossed a specific boundary
though... We more/less have that for 4G already (well, we are supposed
to at least). Who are the main potential problematic subsystems here ?
I'm thinking network skb allocation pools ... and page cache if it
tries to coalesce entries before issuing the map request, does it ?

Ben.

> Robin.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DMA mappings and crossing boundaries

2018-07-02 Thread Robin Murphy

Hi Ben,

On 24/06/18 08:32, Benjamin Herrenschmidt wrote:

Hi Folks !

So due work around issues with devices having to strict limitations in
DMA address bits (GPUs ugh) on POWER, we've been playing with a
mechanism that does dynamic mapping in the IOMMU but using a very large
IOMMU page size (256M on POWER8 and 1G on POWER9) for performances.

Now, with such page size, we can't just pop out new entries for every
DMA map, we need to try to re-use entries for mappings in the same
"area".

We've prototypes something using refcounts on the entires. It does
imply some locking which is potentially problematic, and we'll be
looking at options there long run, but it works... so far.

My worry is that it will fail if we ever get a mapping request (or
coherent allocation request) that spawns one of those giant pages
boundaries. At least our current implementation.

AFAIK, dma_alloc_coherent() is defined (Documentation/DMA-API-
HOWTO.txt) as always allocating to the next power-of-2 order, so we
should never have the problem unless we allocate a single chunk larger
than the IOMMU page size.


(and even then it's not *that* much of a problem, since it comes down to 
just finding n > 1 consecutive unused IOMMU entries for exclusive use by 
that new chunk)



For dma_map_sg() however, if a request that has a single "entry"
spawning such a boundary, we need to ensure that the result mapping is
2 contiguous "large" iommu pages as well.

However, that doesn't fit well with us re-using existing mappings since
they may already exist and either not be contiguous, or partially exist
with no free hole around them.

Now, we *could* possibly construe a way to solve this by detecting this
case and just allocating another "pair" (or set if we cross even more
pages) of IOMMU pages elsewhere, thus partially breaking our re-use
scheme.

But while doable, this introduce some serious complexity in the
implementation, which I would very much like to avoid.

So I was wondering if you guys thought that was ever likely to happen ?
Do you see reasonable cases where dma_map_sg() would be called with a
list in which a single entry crosses a 256M or 1G boundary ?


For streaming mappings of buffers cobbled together out of any old CPU 
pages (e.g. user memory), you may well happen to get two 
physically-adjacent pages falling either side of an IOMMU boundary, 
which comprise all or part of a single request - note that whilst it's 
probably less likely than the scatterlist case, this could technically 
happen for dma_map_{page, single}() calls too.


Conceptually it looks pretty easy to extend the allocation constraints 
to cope with that - even the pathological worst case would have an 
absolute upper bound of 3 IOMMU entries for any one physical region - 
but if in practice it's a case of mapping arbitrary CPU pages to 32-bit 
DMA addresses having only 4 1GB slots to play with, I can't really see a 
way to make that practical :(


Maybe the best compromise would be some sort of hybrid scheme which 
makes sure that one of the IOMMU entries always covers the SWIOTLB 
buffer, and invokes software bouncing for the awkward cases.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] ARM: dma-mapping: Set proper DMA ops in arm_iommu_detach_device()

2018-07-02 Thread Thierry Reding
On Wed, May 30, 2018 at 04:06:24PM +0200, Thierry Reding wrote:
> From: Thierry Reding 
> 
> Instead of setting the DMA ops pointer to NULL, set the correct,
> non-IOMMU ops depending on the device's coherency setting.
> 
> Signed-off-by: Thierry Reding 
> ---
> Changes in v4:
> - new patch to fix existing arm_iommu_detach_device() to do what we need
> 
>  arch/arm/mm/dma-mapping.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)

Christoph, Russell,

could either of you provide an Acked-by for this? I think it makes the
most sense for Ben to pick this up into the Nouveau tree along with
patch 2/2.

Thierry

> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index af27f1c22d93..87a0037574e4 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -1151,6 +1151,11 @@ int arm_dma_supported(struct device *dev, u64 mask)
>   return __dma_supported(dev, mask, false);
>  }
>  
> +static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
> +{
> + return coherent ? _coherent_dma_ops : _dma_ops;
> +}
> +
>  #ifdef CONFIG_ARM_DMA_USE_IOMMU
>  
>  static int __dma_info_to_prot(enum dma_data_direction dir, unsigned long 
> attrs)
> @@ -2296,7 +2301,7 @@ void arm_iommu_detach_device(struct device *dev)
>   iommu_detach_device(mapping->domain, dev);
>   kref_put(>kref, release_iommu_mapping);
>   to_dma_iommu_mapping(dev) = NULL;
> - set_dma_ops(dev, NULL);
> + set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent));
>  
>   pr_debug("Detached IOMMU controller from %s device.\n", dev_name(dev));
>  }
> @@ -2357,11 +2362,6 @@ static void arm_teardown_iommu_dma_ops(struct device 
> *dev) { }
>  
>  #endif   /* CONFIG_ARM_DMA_USE_IOMMU */
>  
> -static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
> -{
> - return coherent ? _coherent_dma_ops : _dma_ops;
> -}
> -
>  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>   const struct iommu_ops *iommu, bool coherent)
>  {
> -- 
> 2.17.0
> 


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/5 V5] Add a function(ioremap_encrypted) for kdump when AMD sme enabled

2018-07-02 Thread Borislav Petkov
On Mon, Jul 02, 2018 at 03:26:35PM +0800, Lianbo Jiang wrote:
> @@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
> unsigned long size,
>   * caller shouldn't need to know that small detail.
>   */
>  static void __iomem *__ioremap_caller(resource_size_t phys_addr,
> - unsigned long size, enum page_cache_mode pcm, void *caller)
> + unsigned long size, enum page_cache_mode pcm,
> + void *caller, bool encrypted)

So instead of sprinkling that @encrypted argument everywhere and then
setting it based on sme_active() ...

>  {
>   unsigned long offset, vaddr;
>   resource_size_t last_addr;
> @@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
> phys_addr,
>* resulting mapping.
>*/
>   prot = PAGE_KERNEL_IO;
> - if (sev_active() && mem_flags.desc_other)
> + if ((sev_active() && mem_flags.desc_other) || encrypted)

... why can't you simply do your checks:

sme_active() && is_kdump_kernel()

here so that __ioremap_caller() can automatically choose the proper
pgprot value when ioremapping the memory in the kdump kernel?

And this way the callers don't even have to care whether the memory is
encrypted or not?

>   prot = pgprot_encrypted(prot);
>  
>   switch (pcm) {

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/5 V5] Help to dump the old memory encrypted into vmcore file

2018-07-02 Thread Lianbo Jiang
In kdump mode, we need to dump the old memory into vmcore file,
if SME is enabled in the first kernel, we must remap the old
memory in encrypted manner, which will be automatically decrypted
when we read from DRAM. It helps to parse the vmcore for some tools.

Signed-off-by: Lianbo Jiang 
---
Some changes:
1. add a new file and modify Makefile.
2. revert some code about previously using sev_active().

 arch/x86/kernel/Makefile |  1 +
 arch/x86/kernel/crash_dump_encrypt.c | 53 
 fs/proc/vmcore.c | 21 ++
 include/linux/crash_dump.h   | 12 
 4 files changed, 81 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/kernel/crash_dump_encrypt.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 02d6f5c..afb5bad 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -96,6 +96,7 @@ obj-$(CONFIG_KEXEC_CORE)  += machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC_CORE)   += relocate_kernel_$(BITS).o crash.o
 obj-$(CONFIG_KEXEC_FILE)   += kexec-bzimage64.o
 obj-$(CONFIG_CRASH_DUMP)   += crash_dump_$(BITS).o
+obj-$(CONFIG_AMD_MEM_ENCRYPT)  += crash_dump_encrypt.o
 obj-y  += kprobes/
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault.o
diff --git a/arch/x86/kernel/crash_dump_encrypt.c 
b/arch/x86/kernel/crash_dump_encrypt.c
new file mode 100644
index 000..e1b1a57
--- /dev/null
+++ b/arch/x86/kernel/crash_dump_encrypt.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Memory preserving reboot related code.
+ *
+ * Created by: Lianbo Jiang (liji...@redhat.com)
+ * Copyright (C) RedHat Corporation, 2018. All rights reserved
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * copy_oldmem_page_encrypted - copy one page from "oldmem encrypted"
+ * @pfn: page frame number to be copied
+ * @buf: target memory address for the copy; this can be in kernel address
+ * space or user address space (see @userbuf)
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page (based on pfn) to begin the copy
+ * @userbuf: if set, @buf is in user address space, use copy_to_user(),
+ * otherwise @buf is in kernel address space, use memcpy().
+ *
+ * Copy a page from "oldmem encrypted". For this page, there is no pte
+ * mapped in the current kernel. We stitch up a pte, similar to
+ * kmap_atomic.
+ */
+
+ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf,
+   size_t csize, unsigned long offset, int userbuf)
+{
+   void  *vaddr;
+
+   if (!csize)
+   return 0;
+
+   vaddr = (__force void *)ioremap_encrypted(pfn << PAGE_SHIFT,
+ PAGE_SIZE);
+   if (!vaddr)
+   return -ENOMEM;
+
+   if (userbuf) {
+   if (copy_to_user((void __user *)buf, vaddr + offset, csize)) {
+   iounmap((void __iomem *)vaddr);
+   return -EFAULT;
+   }
+   } else
+   memcpy(buf, vaddr + offset, csize);
+
+   set_iounmap_nonlazy();
+   iounmap((void __iomem *)vaddr);
+   return csize;
+}
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index cfb6674..07c1934 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -25,6 +25,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include "internal.h"
 
 /* List representing chunks of contiguous memory areas and their offsets in
@@ -98,7 +101,8 @@ static int pfn_is_ram(unsigned long pfn)
 
 /* Reads a page from the oldmem device from given offset. */
 static ssize_t read_from_oldmem(char *buf, size_t count,
-   u64 *ppos, int userbuf)
+   u64 *ppos, int userbuf,
+   bool encrypted)
 {
unsigned long pfn, offset;
size_t nr_bytes;
@@ -120,8 +124,11 @@ static ssize_t read_from_oldmem(char *buf, size_t count,
if (pfn_is_ram(pfn) == 0)
memset(buf, 0, nr_bytes);
else {
-   tmp = copy_oldmem_page(pfn, buf, nr_bytes,
-   offset, userbuf);
+   tmp = encrypted ? copy_oldmem_page_encrypted(pfn,
+   buf, nr_bytes, offset, userbuf)
+   : copy_oldmem_page(pfn, buf, nr_bytes,
+  offset, userbuf);
+
if (tmp < 0)
return tmp;
}
@@ -155,7 +162,7 @@ void __weak elfcorehdr_free(unsigned long long addr)
  */
 ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos)
 {
-   return read_from_oldmem(buf, count, ppos, 0);
+   return read_from_oldmem(buf, count, ppos, 0, false);
 }
 
 /*
@@ -163,7 +170,7 

[PATCH 4/5 V5] Adjust some permanent mappings in unencrypted ways for kdump when SME is enabled.

2018-07-02 Thread Lianbo Jiang
For kdump, the acpi table and dmi table will need to be remapped in
unencrypted ways during early init, they have just a simple wrapper
around early_memremap(), but the early_memremap() remaps the memory
in encrypted ways by default when SME is enabled, so we put some logic
into the early_memremap_pgprot_adjust(), which will have an opportunity
to adjust it.

Signed-off-by: Lianbo Jiang 
---
 arch/x86/mm/ioremap.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index e01e6c6..3c1c8c4 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -689,8 +689,17 @@ pgprot_t __init 
early_memremap_pgprot_adjust(resource_size_t phys_addr,
encrypted_prot = true;
 
if (sme_active()) {
+   /*
+* In kdump mode, the acpi table and dmi table will need to
+* be remapped in unencrypted ways during early init when
+* SME is enabled. They have just a simple wrapper around
+* early_memremap(), but the early_memremap() remaps the
+* memory in encrypted ways by default when SME is enabled,
+* so we must adjust it.
+*/
if (early_memremap_is_setup_data(phys_addr, size) ||
-   memremap_is_efi_data(phys_addr, size))
+   memremap_is_efi_data(phys_addr, size) ||
+   is_kdump_kernel())
encrypted_prot = false;
}
 
-- 
2.9.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/5 V5] Remap the device table of IOMMU in encrypted manner for kdump

2018-07-02 Thread Lianbo Jiang
In kdump mode, it will copy the device table of IOMMU from the old
device table, which is encrypted when SME is enabled in the first
kernel. So we must remap it in encrypted manner in order to be
automatically decrypted when we read.

Signed-off-by: Lianbo Jiang 
---
Some changes:
1. add some comments
2. clean compile warning.
3. remove unnecessary code when we clear sme mask bit.

 drivers/iommu/amd_iommu_init.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 904c575..4cebb00 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -888,12 +888,22 @@ static bool copy_device_table(void)
}
}
 
-   old_devtb_phys = entry & PAGE_MASK;
+   /*
+* When SME is enabled in the first kernel, the entry includes the
+* memory encryption mask(sme_me_mask), we must remove the memory
+* encryption mask to obtain the true physical address in kdump mode.
+*/
+   old_devtb_phys = __sme_clr(entry) & PAGE_MASK;
+
if (old_devtb_phys >= 0x1ULL) {
pr_err("The address of old device table is above 4G, not 
trustworthy!\n");
return false;
}
-   old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+   old_devtb = (sme_active() && is_kdump_kernel())
+   ? (__force void *)ioremap_encrypted(old_devtb_phys,
+   dev_table_size)
+   : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+
if (!old_devtb)
return false;
 
-- 
2.9.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/5 V5] Add a function(ioremap_encrypted) for kdump when AMD sme enabled

2018-07-02 Thread Lianbo Jiang
It is convenient to remap the old memory encrypted to the second
kernel by calling ioremap_encrypted().

Signed-off-by: Lianbo Jiang 
---
Some changes:
1. remove the sme_active() check in __ioremap_caller().
2. revert some logic in the early_memremap_pgprot_adjust() for
early memremap and make it separate a new patch.

 arch/x86/include/asm/io.h |  3 +++
 arch/x86/mm/ioremap.c | 25 +
 2 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 6de6484..f8795f9 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t offset, 
unsigned long size);
 #define ioremap_cache ioremap_cache
 extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, 
unsigned long prot_val);
 #define ioremap_prot ioremap_prot
+extern void __iomem *ioremap_encrypted(resource_size_t phys_addr,
+   unsigned long size);
+#define ioremap_encrypted ioremap_encrypted
 
 /**
  * ioremap -   map bus memory into CPU space
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c63a545..e01e6c6 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "physaddr.h"
 
@@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, 
unsigned long size,
  * caller shouldn't need to know that small detail.
  */
 static void __iomem *__ioremap_caller(resource_size_t phys_addr,
-   unsigned long size, enum page_cache_mode pcm, void *caller)
+   unsigned long size, enum page_cache_mode pcm,
+   void *caller, bool encrypted)
 {
unsigned long offset, vaddr;
resource_size_t last_addr;
@@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
 * resulting mapping.
 */
prot = PAGE_KERNEL_IO;
-   if (sev_active() && mem_flags.desc_other)
+   if ((sev_active() && mem_flags.desc_other) || encrypted)
prot = pgprot_encrypted(prot);
 
switch (pcm) {
@@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
@@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, 
unsigned long size)
enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC;
 
return __ioremap_caller(phys_addr, size, pcm,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL_GPL(ioremap_uc);
 
@@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc);
 void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wc);
 
@@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc);
 void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_wt);
 
+void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long size)
+{
+   return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
+   __builtin_return_address(0), true);
+}
+EXPORT_SYMBOL(ioremap_encrypted);
+
 void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size)
 {
return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB,
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_cache);
 
@@ -374,7 +383,7 @@ void __iomem *ioremap_prot(resource_size_t phys_addr, 
unsigned long size,
 {
return __ioremap_caller(phys_addr, size,
pgprot2cachemode(__pgprot(prot_val)),
-   __builtin_return_address(0));
+   __builtin_return_address(0), false);
 }
 EXPORT_SYMBOL(ioremap_prot);
 
-- 
2.9.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/5 V5] Allocate pages for kdump without encryption when SME is enabled

2018-07-02 Thread Lianbo Jiang
When SME is enabled in the first kernel, we will allocate pages
for kdump without encryption in order to be able to boot the
second kernel in the same manner as kexec, which helps to keep
the same code style.

Signed-off-by: Lianbo Jiang 
---
Some changes:
1. remove some redundant codes for crash control pages.
2. add some comments.

 kernel/kexec_core.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 23a83a4..e7efcd1 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -471,6 +471,16 @@ static struct page 
*kimage_alloc_crash_control_pages(struct kimage *image,
}
}
 
+   if (pages) {
+   /*
+* For kdump, we need to ensure that these pages are
+* unencrypted pages if SME is enabled.
+* By the way, it is unnecessary to call the arch_
+* kexec_pre_free_pages(), which will make the code
+* become more simple.
+*/
+   arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0);
+   }
return pages;
 }
 
@@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result  = -ENOMEM;
goto out;
}
+   arch_kexec_post_alloc_pages(page_address(page), 1, 0);
ptr = kmap(page);
ptr += maddr & ~PAGE_MASK;
mchunk = min_t(size_t, mbytes,
@@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image,
result = copy_from_user(ptr, buf, uchunk);
kexec_flush_icache_page(page);
kunmap(page);
+   arch_kexec_pre_free_pages(page_address(page), 1);
if (result) {
result = -EFAULT;
goto out;
-- 
2.9.5

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/5 V5] Support kdump for AMD secure memory encryption(SME)

2018-07-02 Thread Lianbo Jiang
When sme enabled on AMD server, we also need to support kdump. Because
the memory is encrypted in the first kernel, we will remap the old memory
encrypted to the second kernel(crash kernel), and sme is also enabled in
the second kernel, otherwise the old memory encrypted can not be decrypted.
Because simply changing the value of a C-bit on a page will not
automatically encrypt the existing contents of a page, and any data in the
page prior to the C-bit modification will become unintelligible. A page of
memory that is marked encrypted will be automatically decrypted when read
from DRAM and will be automatically encrypted when written to DRAM.

For the kdump, it is necessary to distinguish whether the memory is
encrypted. Furthermore, we should also know which part of the memory is
encrypted or decrypted. We will appropriately remap the memory according
to the specific situation in order to tell cpu how to deal with the
data(encrypted or decrypted). For example, when sme enabled, if the old
memory is encrypted, we will remap the old memory in encrypted way, which
will automatically decrypt the old memory encrypted when we read those data
from the remapping address.

 --
| first-kernel | second-kernel | kdump support |
|  (mem_encrypt=on|off)|   (yes|no)|
|--+---+---|
| on   | on| yes   |
| off  | off   | yes   |
| on   | off   | no|
| off  | on| no|
|__|___|___|

This patch is only for SME kdump, it is not support SEV kdump.

For kdump(SME), there are two cases that doesn't support:
1. SME is enabled in the first kernel, but SME is disabled in the
second kernel
Because the old memory is encrypted, we can't decrypt the old memory
if SME is off in the second kernel.

2. SME is disabled in the first kernel, but SME is enabled in the
second kernel
Maybe it is unnecessary to support this case, because the old memory
is unencrypted, the old memory can be dumped as usual, we don't need
to enable sme in the second kernel, furthermore the requirement is
rare in actual deployment. Another, If we must support the scenario,
it will increase the complexity of the code, we will have to consider
how to transfer the sme flag from the first kernel to the second kernel,
in order to let the second kernel know that whether the old memory is
encrypted.
There are two manners to transfer the SME flag to the second kernel, the
first way is to modify the assembly code, which includes some common
code and the path is too long. The second way is to use kexec tool,
which could require the sme flag to be exported in the first kernel
by "proc" or "sysfs", kexec will read the sme flag from "proc" or
"sysfs" when we use kexec tool to load image, subsequently the sme flag
will be saved in boot_params, we can properly remap the old memory
according to the previously saved sme flag. Although we can fix this
issue, maybe it is too expensive to do this. By the way, we won't fix
the problem unless someone thinks it is necessary to do it.

Test tools:
makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
commit e1de103eca8f (A draft for kdump vmcore about AMD SME)
Author: Lianbo Jiang 
Date:   Mon May 14 17:02:40 2018 +0800
Note: This patch can only dump vmcore in the case of SME enabled.

crash-7.2.1: https://github.com/crash-utility/crash.git
commit 1e1bd9c4c1be (Fix for the "bpf" command display on Linux 4.17-rc1)
Author: Dave Anderson 
Date:   Fri May 11 15:54:32 2018 -0400

Test environment:
HP ProLiant DL385Gen10 AMD EPYC 7251
8-Core Processor
32768 MB memory
600 GB disk space

Linux 4.18-rc3:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit 021c91791a5e7e85c567452f1be3e4c2c6cb6063
Author: Linus Torvalds 
Date:   Sun Jul 1 16:04:53 2018 -0700

Reference:
AMD64 Architecture Programmer's Manual
https://support.amd.com/TechDocs/24593.pdf

Some changes:
1. remove the sme_active() check in __ioremap_caller().
2. remove the '#ifdef' stuff throughout this patch.
3. put some logic into the early_memremap_pgprot_adjust() and clean the
previous unnecessary changes, for example: arch/x86/include/asm/dmi.h,
arch/x86/kernel/acpi/boot.c, drivers/acpi/tables.c.
4. add a new file and modify Makefile.
5. clean compile warning in copy_device_table() and some compile error.
6. split the original patch into five patches, it will be better for
review.
7. add some comments.

Some known issues:
1. about SME
Upstream kernel doesn't work when we use kexec in the follow command. The
system will hang on 'HP ProLiant DL385Gen10 AMD EPYC 7251'. But it can't
reproduce on speedway.
(This issue doesn't matter with the kdump patch.)

Reproduce steps:
 # kexec -l /boot/vmlinuz-4.18.0-rc3+ --initrd=/boot/initramfs-4.18.0-rc3+.img 
--command-line="root=/dev/mapper/rhel_hp--dl385g10--03-root ro mem_encrypt=on