Re: [PATCH 1/5 V5] Add a function(ioremap_encrypted) for kdump when AMD sme enabled
在 2018年07月02日 18:14, Borislav Petkov 写道: > On Mon, Jul 02, 2018 at 03:26:35PM +0800, Lianbo Jiang wrote: >> @@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, >> unsigned long size, >> * caller shouldn't need to know that small detail. >> */ >> static void __iomem *__ioremap_caller(resource_size_t phys_addr, >> -unsigned long size, enum page_cache_mode pcm, void *caller) >> +unsigned long size, enum page_cache_mode pcm, >> +void *caller, bool encrypted) > > So instead of sprinkling that @encrypted argument everywhere and then > setting it based on sme_active() ... > >> { >> unsigned long offset, vaddr; >> resource_size_t last_addr; >> @@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t >> phys_addr, >> * resulting mapping. >> */ >> prot = PAGE_KERNEL_IO; >> -if (sev_active() && mem_flags.desc_other) >> +if ((sev_active() && mem_flags.desc_other) || encrypted) > > ... why can't you simply do your checks: > > sme_active() && is_kdump_kernel() > > here so that __ioremap_caller() can automatically choose the proper > pgprot value when ioremapping the memory in the kdump kernel? > > And this way the callers don't even have to care whether the memory is > encrypted or not? > Thank you, Boris. I'm very glad to read your comments. That's a good idea, but it has some unencrypted memory in kdump mode, for example, the elfcorehdr. In fact, the elfcorehdr and notes call the same function(read_from_oldmem->ioremap_cache), in this case, it is very difficult to properly remap the memory if the caller don't care whether the memory is encrypted. Regards, Lianbo >> prot = pgprot_encrypted(prot); >> >> switch (pcm) { > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: remove redundant variable tag
On 07/02/2018 11:37 AM, Colin King wrote: From: Colin Ian King Variable tag is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'tag' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King --- drivers/iommu/amd_iommu.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 596b95c50051..c6aed1b088e9 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -547,7 +547,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, static void iommu_print_event(struct amd_iommu *iommu, void *__evt) { struct device *dev = iommu->iommu.dev; - int type, devid, pasid, flags, tag; + int type, devid, pasid, flags; volatile u32 *event = __evt; int count = 0; u64 address; @@ -615,7 +615,6 @@ static void iommu_print_event(struct amd_iommu *iommu, void *__evt) case EVENT_TYPE_INV_PPR_REQ: pasid = ((event[0] >> 16) & 0x) | ((event[1] << 6) & 0xF); - tag = event[1] & 0x03FF; dev_err(dev, "INVALID_PPR_REQUEST device=%02x:%02x.%x pasid=0x%05x address=0x%016llx flags=0x%04x]\n", PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid), pasid, address, flags); If it's all the same to you, the intent was to print the tag as well. Could we add that to the message, rather than removing the variable, please? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/amd: remove redundant variable tag
From: Colin Ian King Variable tag is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'tag' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King --- drivers/iommu/amd_iommu.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 596b95c50051..c6aed1b088e9 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -547,7 +547,7 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id, static void iommu_print_event(struct amd_iommu *iommu, void *__evt) { struct device *dev = iommu->iommu.dev; - int type, devid, pasid, flags, tag; + int type, devid, pasid, flags; volatile u32 *event = __evt; int count = 0; u64 address; @@ -615,7 +615,6 @@ static void iommu_print_event(struct amd_iommu *iommu, void *__evt) case EVENT_TYPE_INV_PPR_REQ: pasid = ((event[0] >> 16) & 0x) | ((event[1] << 6) & 0xF); - tag = event[1] & 0x03FF; dev_err(dev, "INVALID_PPR_REQUEST device=%02x:%02x.%x pasid=0x%05x address=0x%016llx flags=0x%04x]\n", PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid), pasid, address, flags); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 1/2] ARM: dma-mapping: Set proper DMA ops in arm_iommu_detach_device()
Looks good: Acked-by: Christoph Hellwig ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 1/2] ARM: dma-mapping: Set proper DMA ops in arm_iommu_detach_device()
On Mon, Jul 02, 2018 at 01:53:17PM +0200, Thierry Reding wrote: > On Wed, May 30, 2018 at 04:06:24PM +0200, Thierry Reding wrote: > > From: Thierry Reding > > > > Instead of setting the DMA ops pointer to NULL, set the correct, > > non-IOMMU ops depending on the device's coherency setting. > > > > Signed-off-by: Thierry Reding > > --- > > Changes in v4: > > - new patch to fix existing arm_iommu_detach_device() to do what we need > > > > arch/arm/mm/dma-mapping.c | 12 ++-- > > 1 file changed, 6 insertions(+), 6 deletions(-) > > Christoph, Russell, > > could either of you provide an Acked-by for this? I think it makes the > most sense for Ben to pick this up into the Nouveau tree along with > patch 2/2. Looks fine to me. Acked-by: Russell King Thanks. > > Thierry > > > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > > index af27f1c22d93..87a0037574e4 100644 > > --- a/arch/arm/mm/dma-mapping.c > > +++ b/arch/arm/mm/dma-mapping.c > > @@ -1151,6 +1151,11 @@ int arm_dma_supported(struct device *dev, u64 mask) > > return __dma_supported(dev, mask, false); > > } > > > > +static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent) > > +{ > > + return coherent ? _coherent_dma_ops : _dma_ops; > > +} > > + > > #ifdef CONFIG_ARM_DMA_USE_IOMMU > > > > static int __dma_info_to_prot(enum dma_data_direction dir, unsigned long > > attrs) > > @@ -2296,7 +2301,7 @@ void arm_iommu_detach_device(struct device *dev) > > iommu_detach_device(mapping->domain, dev); > > kref_put(>kref, release_iommu_mapping); > > to_dma_iommu_mapping(dev) = NULL; > > - set_dma_ops(dev, NULL); > > + set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent)); > > > > pr_debug("Detached IOMMU controller from %s device.\n", dev_name(dev)); > > } > > @@ -2357,11 +2362,6 @@ static void arm_teardown_iommu_dma_ops(struct device > > *dev) { } > > > > #endif /* CONFIG_ARM_DMA_USE_IOMMU */ > > > > -static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent) > > -{ > > - return coherent ? _coherent_dma_ops : _dma_ops; > > -} > > - > > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > > const struct iommu_ops *iommu, bool coherent) > > { > > -- > > 2.17.0 > > -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: DMA mappings and crossing boundaries
On Mon, 2018-07-02 at 14:06 +0100, Robin Murphy wrote: .../... Thanks Robin, I was starting to depair anybody would reply ;-) > > AFAIK, dma_alloc_coherent() is defined (Documentation/DMA-API- > > HOWTO.txt) as always allocating to the next power-of-2 order, so we > > should never have the problem unless we allocate a single chunk larger > > than the IOMMU page size. > > (and even then it's not *that* much of a problem, since it comes down to > just finding n > 1 consecutive unused IOMMU entries for exclusive use by > that new chunk) Yes, this case is not my biggest worry. > > For dma_map_sg() however, if a request that has a single "entry" > > spawning such a boundary, we need to ensure that the result mapping is > > 2 contiguous "large" iommu pages as well. > > > > However, that doesn't fit well with us re-using existing mappings since > > they may already exist and either not be contiguous, or partially exist > > with no free hole around them. > > > > Now, we *could* possibly construe a way to solve this by detecting this > > case and just allocating another "pair" (or set if we cross even more > > pages) of IOMMU pages elsewhere, thus partially breaking our re-use > > scheme. > > > > But while doable, this introduce some serious complexity in the > > implementation, which I would very much like to avoid. > > > > So I was wondering if you guys thought that was ever likely to happen ? > > Do you see reasonable cases where dma_map_sg() would be called with a > > list in which a single entry crosses a 256M or 1G boundary ? > > For streaming mappings of buffers cobbled together out of any old CPU > pages (e.g. user memory), you may well happen to get two > physically-adjacent pages falling either side of an IOMMU boundary, > which comprise all or part of a single request - note that whilst it's > probably less likely than the scatterlist case, this could technically > happen for dma_map_{page, single}() calls too. Could it ? I wouldn't think dma_map_page is allows to cross page boundaries ... what about single() ? The main worry is people using these things on kmalloc'ed memory > Conceptually it looks pretty easy to extend the allocation constraints > to cope with that - even the pathological worst case would have an > absolute upper bound of 3 IOMMU entries for any one physical region - > but if in practice it's a case of mapping arbitrary CPU pages to 32-bit > DMA addresses having only 4 1GB slots to play with, I can't really see a > way to make that practical :( No we are talking about 40-ish-bits of address space, so there's a bit of leeway. Of course no scheme will work if the user app tries to map more than the GPU can possibly access. But with newer AMD adding a few more bits and nVidia being at 47-bits, I think we have some margin, it's just that they can't reach our discontiguous memory with a normal 'bypass' mapping and I'd rather not teach Linux about every single way our HW can scatter memory accross nodes, so an "on demand" mechanism is by far the most flexible way to deal with all configurations. > Maybe the best compromise would be some sort of hybrid scheme which > makes sure that one of the IOMMU entries always covers the SWIOTLB > buffer, and invokes software bouncing for the awkward cases. Hrm... not too sure about that. I'm happy to limit that scheme to well known GPU vendor/device IDs, and SW bouncing is pointless in these cases. It would be nice if we could have some kind of guarantee that a single mapping or sglist entry never crossed a specific boundary though... We more/less have that for 4G already (well, we are supposed to at least). Who are the main potential problematic subsystems here ? I'm thinking network skb allocation pools ... and page cache if it tries to coalesce entries before issuing the map request, does it ? Ben. > Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: DMA mappings and crossing boundaries
Hi Ben, On 24/06/18 08:32, Benjamin Herrenschmidt wrote: Hi Folks ! So due work around issues with devices having to strict limitations in DMA address bits (GPUs ugh) on POWER, we've been playing with a mechanism that does dynamic mapping in the IOMMU but using a very large IOMMU page size (256M on POWER8 and 1G on POWER9) for performances. Now, with such page size, we can't just pop out new entries for every DMA map, we need to try to re-use entries for mappings in the same "area". We've prototypes something using refcounts on the entires. It does imply some locking which is potentially problematic, and we'll be looking at options there long run, but it works... so far. My worry is that it will fail if we ever get a mapping request (or coherent allocation request) that spawns one of those giant pages boundaries. At least our current implementation. AFAIK, dma_alloc_coherent() is defined (Documentation/DMA-API- HOWTO.txt) as always allocating to the next power-of-2 order, so we should never have the problem unless we allocate a single chunk larger than the IOMMU page size. (and even then it's not *that* much of a problem, since it comes down to just finding n > 1 consecutive unused IOMMU entries for exclusive use by that new chunk) For dma_map_sg() however, if a request that has a single "entry" spawning such a boundary, we need to ensure that the result mapping is 2 contiguous "large" iommu pages as well. However, that doesn't fit well with us re-using existing mappings since they may already exist and either not be contiguous, or partially exist with no free hole around them. Now, we *could* possibly construe a way to solve this by detecting this case and just allocating another "pair" (or set if we cross even more pages) of IOMMU pages elsewhere, thus partially breaking our re-use scheme. But while doable, this introduce some serious complexity in the implementation, which I would very much like to avoid. So I was wondering if you guys thought that was ever likely to happen ? Do you see reasonable cases where dma_map_sg() would be called with a list in which a single entry crosses a 256M or 1G boundary ? For streaming mappings of buffers cobbled together out of any old CPU pages (e.g. user memory), you may well happen to get two physically-adjacent pages falling either side of an IOMMU boundary, which comprise all or part of a single request - note that whilst it's probably less likely than the scatterlist case, this could technically happen for dma_map_{page, single}() calls too. Conceptually it looks pretty easy to extend the allocation constraints to cope with that - even the pathological worst case would have an absolute upper bound of 3 IOMMU entries for any one physical region - but if in practice it's a case of mapping arbitrary CPU pages to 32-bit DMA addresses having only 4 1GB slots to play with, I can't really see a way to make that practical :( Maybe the best compromise would be some sort of hybrid scheme which makes sure that one of the IOMMU entries always covers the SWIOTLB buffer, and invokes software bouncing for the awkward cases. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 1/2] ARM: dma-mapping: Set proper DMA ops in arm_iommu_detach_device()
On Wed, May 30, 2018 at 04:06:24PM +0200, Thierry Reding wrote: > From: Thierry Reding > > Instead of setting the DMA ops pointer to NULL, set the correct, > non-IOMMU ops depending on the device's coherency setting. > > Signed-off-by: Thierry Reding > --- > Changes in v4: > - new patch to fix existing arm_iommu_detach_device() to do what we need > > arch/arm/mm/dma-mapping.c | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) Christoph, Russell, could either of you provide an Acked-by for this? I think it makes the most sense for Ben to pick this up into the Nouveau tree along with patch 2/2. Thierry > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index af27f1c22d93..87a0037574e4 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -1151,6 +1151,11 @@ int arm_dma_supported(struct device *dev, u64 mask) > return __dma_supported(dev, mask, false); > } > > +static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent) > +{ > + return coherent ? _coherent_dma_ops : _dma_ops; > +} > + > #ifdef CONFIG_ARM_DMA_USE_IOMMU > > static int __dma_info_to_prot(enum dma_data_direction dir, unsigned long > attrs) > @@ -2296,7 +2301,7 @@ void arm_iommu_detach_device(struct device *dev) > iommu_detach_device(mapping->domain, dev); > kref_put(>kref, release_iommu_mapping); > to_dma_iommu_mapping(dev) = NULL; > - set_dma_ops(dev, NULL); > + set_dma_ops(dev, arm_get_dma_map_ops(dev->archdata.dma_coherent)); > > pr_debug("Detached IOMMU controller from %s device.\n", dev_name(dev)); > } > @@ -2357,11 +2362,6 @@ static void arm_teardown_iommu_dma_ops(struct device > *dev) { } > > #endif /* CONFIG_ARM_DMA_USE_IOMMU */ > > -static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent) > -{ > - return coherent ? _coherent_dma_ops : _dma_ops; > -} > - > void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > const struct iommu_ops *iommu, bool coherent) > { > -- > 2.17.0 > signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/5 V5] Add a function(ioremap_encrypted) for kdump when AMD sme enabled
On Mon, Jul 02, 2018 at 03:26:35PM +0800, Lianbo Jiang wrote: > @@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, > unsigned long size, > * caller shouldn't need to know that small detail. > */ > static void __iomem *__ioremap_caller(resource_size_t phys_addr, > - unsigned long size, enum page_cache_mode pcm, void *caller) > + unsigned long size, enum page_cache_mode pcm, > + void *caller, bool encrypted) So instead of sprinkling that @encrypted argument everywhere and then setting it based on sme_active() ... > { > unsigned long offset, vaddr; > resource_size_t last_addr; > @@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t > phys_addr, >* resulting mapping. >*/ > prot = PAGE_KERNEL_IO; > - if (sev_active() && mem_flags.desc_other) > + if ((sev_active() && mem_flags.desc_other) || encrypted) ... why can't you simply do your checks: sme_active() && is_kdump_kernel() here so that __ioremap_caller() can automatically choose the proper pgprot value when ioremapping the memory in the kdump kernel? And this way the callers don't even have to care whether the memory is encrypted or not? > prot = pgprot_encrypted(prot); > > switch (pcm) { -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5/5 V5] Help to dump the old memory encrypted into vmcore file
In kdump mode, we need to dump the old memory into vmcore file, if SME is enabled in the first kernel, we must remap the old memory in encrypted manner, which will be automatically decrypted when we read from DRAM. It helps to parse the vmcore for some tools. Signed-off-by: Lianbo Jiang --- Some changes: 1. add a new file and modify Makefile. 2. revert some code about previously using sev_active(). arch/x86/kernel/Makefile | 1 + arch/x86/kernel/crash_dump_encrypt.c | 53 fs/proc/vmcore.c | 21 ++ include/linux/crash_dump.h | 12 4 files changed, 81 insertions(+), 6 deletions(-) create mode 100644 arch/x86/kernel/crash_dump_encrypt.c diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 02d6f5c..afb5bad 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -96,6 +96,7 @@ obj-$(CONFIG_KEXEC_CORE) += machine_kexec_$(BITS).o obj-$(CONFIG_KEXEC_CORE) += relocate_kernel_$(BITS).o crash.o obj-$(CONFIG_KEXEC_FILE) += kexec-bzimage64.o obj-$(CONFIG_CRASH_DUMP) += crash_dump_$(BITS).o +obj-$(CONFIG_AMD_MEM_ENCRYPT) += crash_dump_encrypt.o obj-y += kprobes/ obj-$(CONFIG_MODULES) += module.o obj-$(CONFIG_DOUBLEFAULT) += doublefault.o diff --git a/arch/x86/kernel/crash_dump_encrypt.c b/arch/x86/kernel/crash_dump_encrypt.c new file mode 100644 index 000..e1b1a57 --- /dev/null +++ b/arch/x86/kernel/crash_dump_encrypt.c @@ -0,0 +1,53 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Memory preserving reboot related code. + * + * Created by: Lianbo Jiang (liji...@redhat.com) + * Copyright (C) RedHat Corporation, 2018. All rights reserved + */ + +#include +#include +#include +#include + +/** + * copy_oldmem_page_encrypted - copy one page from "oldmem encrypted" + * @pfn: page frame number to be copied + * @buf: target memory address for the copy; this can be in kernel address + * space or user address space (see @userbuf) + * @csize: number of bytes to copy + * @offset: offset in bytes into the page (based on pfn) to begin the copy + * @userbuf: if set, @buf is in user address space, use copy_to_user(), + * otherwise @buf is in kernel address space, use memcpy(). + * + * Copy a page from "oldmem encrypted". For this page, there is no pte + * mapped in the current kernel. We stitch up a pte, similar to + * kmap_atomic. + */ + +ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, + size_t csize, unsigned long offset, int userbuf) +{ + void *vaddr; + + if (!csize) + return 0; + + vaddr = (__force void *)ioremap_encrypted(pfn << PAGE_SHIFT, + PAGE_SIZE); + if (!vaddr) + return -ENOMEM; + + if (userbuf) { + if (copy_to_user((void __user *)buf, vaddr + offset, csize)) { + iounmap((void __iomem *)vaddr); + return -EFAULT; + } + } else + memcpy(buf, vaddr + offset, csize); + + set_iounmap_nonlazy(); + iounmap((void __iomem *)vaddr); + return csize; +} diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c index cfb6674..07c1934 100644 --- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -25,6 +25,9 @@ #include #include #include +#include +#include +#include #include "internal.h" /* List representing chunks of contiguous memory areas and their offsets in @@ -98,7 +101,8 @@ static int pfn_is_ram(unsigned long pfn) /* Reads a page from the oldmem device from given offset. */ static ssize_t read_from_oldmem(char *buf, size_t count, - u64 *ppos, int userbuf) + u64 *ppos, int userbuf, + bool encrypted) { unsigned long pfn, offset; size_t nr_bytes; @@ -120,8 +124,11 @@ static ssize_t read_from_oldmem(char *buf, size_t count, if (pfn_is_ram(pfn) == 0) memset(buf, 0, nr_bytes); else { - tmp = copy_oldmem_page(pfn, buf, nr_bytes, - offset, userbuf); + tmp = encrypted ? copy_oldmem_page_encrypted(pfn, + buf, nr_bytes, offset, userbuf) + : copy_oldmem_page(pfn, buf, nr_bytes, + offset, userbuf); + if (tmp < 0) return tmp; } @@ -155,7 +162,7 @@ void __weak elfcorehdr_free(unsigned long long addr) */ ssize_t __weak elfcorehdr_read(char *buf, size_t count, u64 *ppos) { - return read_from_oldmem(buf, count, ppos, 0); + return read_from_oldmem(buf, count, ppos, 0, false); } /* @@ -163,7 +170,7
[PATCH 4/5 V5] Adjust some permanent mappings in unencrypted ways for kdump when SME is enabled.
For kdump, the acpi table and dmi table will need to be remapped in unencrypted ways during early init, they have just a simple wrapper around early_memremap(), but the early_memremap() remaps the memory in encrypted ways by default when SME is enabled, so we put some logic into the early_memremap_pgprot_adjust(), which will have an opportunity to adjust it. Signed-off-by: Lianbo Jiang --- arch/x86/mm/ioremap.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index e01e6c6..3c1c8c4 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -689,8 +689,17 @@ pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr, encrypted_prot = true; if (sme_active()) { + /* +* In kdump mode, the acpi table and dmi table will need to +* be remapped in unencrypted ways during early init when +* SME is enabled. They have just a simple wrapper around +* early_memremap(), but the early_memremap() remaps the +* memory in encrypted ways by default when SME is enabled, +* so we must adjust it. +*/ if (early_memremap_is_setup_data(phys_addr, size) || - memremap_is_efi_data(phys_addr, size)) + memremap_is_efi_data(phys_addr, size) || + is_kdump_kernel()) encrypted_prot = false; } -- 2.9.5 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/5 V5] Remap the device table of IOMMU in encrypted manner for kdump
In kdump mode, it will copy the device table of IOMMU from the old device table, which is encrypted when SME is enabled in the first kernel. So we must remap it in encrypted manner in order to be automatically decrypted when we read. Signed-off-by: Lianbo Jiang --- Some changes: 1. add some comments 2. clean compile warning. 3. remove unnecessary code when we clear sme mask bit. drivers/iommu/amd_iommu_init.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 904c575..4cebb00 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -888,12 +888,22 @@ static bool copy_device_table(void) } } - old_devtb_phys = entry & PAGE_MASK; + /* +* When SME is enabled in the first kernel, the entry includes the +* memory encryption mask(sme_me_mask), we must remove the memory +* encryption mask to obtain the true physical address in kdump mode. +*/ + old_devtb_phys = __sme_clr(entry) & PAGE_MASK; + if (old_devtb_phys >= 0x1ULL) { pr_err("The address of old device table is above 4G, not trustworthy!\n"); return false; } - old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB); + old_devtb = (sme_active() && is_kdump_kernel()) + ? (__force void *)ioremap_encrypted(old_devtb_phys, + dev_table_size) + : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB); + if (!old_devtb) return false; -- 2.9.5 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/5 V5] Add a function(ioremap_encrypted) for kdump when AMD sme enabled
It is convenient to remap the old memory encrypted to the second kernel by calling ioremap_encrypted(). Signed-off-by: Lianbo Jiang --- Some changes: 1. remove the sme_active() check in __ioremap_caller(). 2. revert some logic in the early_memremap_pgprot_adjust() for early memremap and make it separate a new patch. arch/x86/include/asm/io.h | 3 +++ arch/x86/mm/ioremap.c | 25 + 2 files changed, 20 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index 6de6484..f8795f9 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -192,6 +192,9 @@ extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size); #define ioremap_cache ioremap_cache extern void __iomem *ioremap_prot(resource_size_t offset, unsigned long size, unsigned long prot_val); #define ioremap_prot ioremap_prot +extern void __iomem *ioremap_encrypted(resource_size_t phys_addr, + unsigned long size); +#define ioremap_encrypted ioremap_encrypted /** * ioremap - map bus memory into CPU space diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index c63a545..e01e6c6 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -24,6 +24,7 @@ #include #include #include +#include #include "physaddr.h" @@ -131,7 +132,8 @@ static void __ioremap_check_mem(resource_size_t addr, unsigned long size, * caller shouldn't need to know that small detail. */ static void __iomem *__ioremap_caller(resource_size_t phys_addr, - unsigned long size, enum page_cache_mode pcm, void *caller) + unsigned long size, enum page_cache_mode pcm, + void *caller, bool encrypted) { unsigned long offset, vaddr; resource_size_t last_addr; @@ -199,7 +201,7 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, * resulting mapping. */ prot = PAGE_KERNEL_IO; - if (sev_active() && mem_flags.desc_other) + if ((sev_active() && mem_flags.desc_other) || encrypted) prot = pgprot_encrypted(prot); switch (pcm) { @@ -291,7 +293,7 @@ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size) enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC_MINUS; return __ioremap_caller(phys_addr, size, pcm, - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL(ioremap_nocache); @@ -324,7 +326,7 @@ void __iomem *ioremap_uc(resource_size_t phys_addr, unsigned long size) enum page_cache_mode pcm = _PAGE_CACHE_MODE_UC; return __ioremap_caller(phys_addr, size, pcm, - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL_GPL(ioremap_uc); @@ -341,7 +343,7 @@ EXPORT_SYMBOL_GPL(ioremap_uc); void __iomem *ioremap_wc(resource_size_t phys_addr, unsigned long size) { return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WC, - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL(ioremap_wc); @@ -358,14 +360,21 @@ EXPORT_SYMBOL(ioremap_wc); void __iomem *ioremap_wt(resource_size_t phys_addr, unsigned long size) { return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WT, - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL(ioremap_wt); +void __iomem *ioremap_encrypted(resource_size_t phys_addr, unsigned long size) +{ + return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB, + __builtin_return_address(0), true); +} +EXPORT_SYMBOL(ioremap_encrypted); + void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size) { return __ioremap_caller(phys_addr, size, _PAGE_CACHE_MODE_WB, - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL(ioremap_cache); @@ -374,7 +383,7 @@ void __iomem *ioremap_prot(resource_size_t phys_addr, unsigned long size, { return __ioremap_caller(phys_addr, size, pgprot2cachemode(__pgprot(prot_val)), - __builtin_return_address(0)); + __builtin_return_address(0), false); } EXPORT_SYMBOL(ioremap_prot); -- 2.9.5 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/5 V5] Allocate pages for kdump without encryption when SME is enabled
When SME is enabled in the first kernel, we will allocate pages for kdump without encryption in order to be able to boot the second kernel in the same manner as kexec, which helps to keep the same code style. Signed-off-by: Lianbo Jiang --- Some changes: 1. remove some redundant codes for crash control pages. 2. add some comments. kernel/kexec_core.c | 12 1 file changed, 12 insertions(+) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 23a83a4..e7efcd1 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -471,6 +471,16 @@ static struct page *kimage_alloc_crash_control_pages(struct kimage *image, } } + if (pages) { + /* +* For kdump, we need to ensure that these pages are +* unencrypted pages if SME is enabled. +* By the way, it is unnecessary to call the arch_ +* kexec_pre_free_pages(), which will make the code +* become more simple. +*/ + arch_kexec_post_alloc_pages(page_address(pages), 1 << order, 0); + } return pages; } @@ -867,6 +877,7 @@ static int kimage_load_crash_segment(struct kimage *image, result = -ENOMEM; goto out; } + arch_kexec_post_alloc_pages(page_address(page), 1, 0); ptr = kmap(page); ptr += maddr & ~PAGE_MASK; mchunk = min_t(size_t, mbytes, @@ -884,6 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image, result = copy_from_user(ptr, buf, uchunk); kexec_flush_icache_page(page); kunmap(page); + arch_kexec_pre_free_pages(page_address(page), 1); if (result) { result = -EFAULT; goto out; -- 2.9.5 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/5 V5] Support kdump for AMD secure memory encryption(SME)
When sme enabled on AMD server, we also need to support kdump. Because the memory is encrypted in the first kernel, we will remap the old memory encrypted to the second kernel(crash kernel), and sme is also enabled in the second kernel, otherwise the old memory encrypted can not be decrypted. Because simply changing the value of a C-bit on a page will not automatically encrypt the existing contents of a page, and any data in the page prior to the C-bit modification will become unintelligible. A page of memory that is marked encrypted will be automatically decrypted when read from DRAM and will be automatically encrypted when written to DRAM. For the kdump, it is necessary to distinguish whether the memory is encrypted. Furthermore, we should also know which part of the memory is encrypted or decrypted. We will appropriately remap the memory according to the specific situation in order to tell cpu how to deal with the data(encrypted or decrypted). For example, when sme enabled, if the old memory is encrypted, we will remap the old memory in encrypted way, which will automatically decrypt the old memory encrypted when we read those data from the remapping address. -- | first-kernel | second-kernel | kdump support | | (mem_encrypt=on|off)| (yes|no)| |--+---+---| | on | on| yes | | off | off | yes | | on | off | no| | off | on| no| |__|___|___| This patch is only for SME kdump, it is not support SEV kdump. For kdump(SME), there are two cases that doesn't support: 1. SME is enabled in the first kernel, but SME is disabled in the second kernel Because the old memory is encrypted, we can't decrypt the old memory if SME is off in the second kernel. 2. SME is disabled in the first kernel, but SME is enabled in the second kernel Maybe it is unnecessary to support this case, because the old memory is unencrypted, the old memory can be dumped as usual, we don't need to enable sme in the second kernel, furthermore the requirement is rare in actual deployment. Another, If we must support the scenario, it will increase the complexity of the code, we will have to consider how to transfer the sme flag from the first kernel to the second kernel, in order to let the second kernel know that whether the old memory is encrypted. There are two manners to transfer the SME flag to the second kernel, the first way is to modify the assembly code, which includes some common code and the path is too long. The second way is to use kexec tool, which could require the sme flag to be exported in the first kernel by "proc" or "sysfs", kexec will read the sme flag from "proc" or "sysfs" when we use kexec tool to load image, subsequently the sme flag will be saved in boot_params, we can properly remap the old memory according to the previously saved sme flag. Although we can fix this issue, maybe it is too expensive to do this. By the way, we won't fix the problem unless someone thinks it is necessary to do it. Test tools: makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile commit e1de103eca8f (A draft for kdump vmcore about AMD SME) Author: Lianbo Jiang Date: Mon May 14 17:02:40 2018 +0800 Note: This patch can only dump vmcore in the case of SME enabled. crash-7.2.1: https://github.com/crash-utility/crash.git commit 1e1bd9c4c1be (Fix for the "bpf" command display on Linux 4.17-rc1) Author: Dave Anderson Date: Fri May 11 15:54:32 2018 -0400 Test environment: HP ProLiant DL385Gen10 AMD EPYC 7251 8-Core Processor 32768 MB memory 600 GB disk space Linux 4.18-rc3: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git commit 021c91791a5e7e85c567452f1be3e4c2c6cb6063 Author: Linus Torvalds Date: Sun Jul 1 16:04:53 2018 -0700 Reference: AMD64 Architecture Programmer's Manual https://support.amd.com/TechDocs/24593.pdf Some changes: 1. remove the sme_active() check in __ioremap_caller(). 2. remove the '#ifdef' stuff throughout this patch. 3. put some logic into the early_memremap_pgprot_adjust() and clean the previous unnecessary changes, for example: arch/x86/include/asm/dmi.h, arch/x86/kernel/acpi/boot.c, drivers/acpi/tables.c. 4. add a new file and modify Makefile. 5. clean compile warning in copy_device_table() and some compile error. 6. split the original patch into five patches, it will be better for review. 7. add some comments. Some known issues: 1. about SME Upstream kernel doesn't work when we use kexec in the follow command. The system will hang on 'HP ProLiant DL385Gen10 AMD EPYC 7251'. But it can't reproduce on speedway. (This issue doesn't matter with the kdump patch.) Reproduce steps: # kexec -l /boot/vmlinuz-4.18.0-rc3+ --initrd=/boot/initramfs-4.18.0-rc3+.img --command-line="root=/dev/mapper/rhel_hp--dl385g10--03-root ro mem_encrypt=on