[PATCH 3/7] x86, LLVM: suppress clang warnings about unaligned accesses
Suppress clang warnings about potential unaliged accesses to members in packed structs. This gets rid of almost 10,000 warnings about accesses to the ring 0 stack pointer in the TSS. Signed-off-by: Michael Davidson--- arch/x86/Makefile | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 894a8d18bf97..7f21703c475d 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -128,6 +128,11 @@ endif KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args) endif +ifeq ($(cc-name),clang) +# Suppress clang warnings about potential unaligned accesses. +KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member) +endif + ifdef CONFIG_X86_X32 x32_ld_ok := $(call try-run,\ /bin/echo -e '1: .quad 1b' | \ -- 2.12.0.367.g23dc2f6d3c-goog
[PATCH 5/7] x86, boot, LLVM: Use regparm=0 for memcpy and memset
Use the standard regparm=0 calling convention for memcpy and memset when building with clang. This is a work around for a long standing clang bug (see https://llvm.org/bugs/show_bug.cgi?id=3997) where clang always uses the standard regparm=0 calling convention for any implcit calls to memcpy and memset that it generates (eg for structure assignments and initialization) even if an alternate calling convention such as regparm=3 has been specified. Signed-off-by: Michael Davidson--- arch/x86/boot/copy.S | 15 +-- arch/x86/boot/string.h | 13 + 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/arch/x86/boot/copy.S b/arch/x86/boot/copy.S index 1eb7d298b47d..57142d1ad0d2 100644 --- a/arch/x86/boot/copy.S +++ b/arch/x86/boot/copy.S @@ -18,6 +18,12 @@ .text GLOBAL(memcpy) +#ifdef __clang__ /* Use normal ABI calling conventions */ + movw4(%esp), %ax + movw8(%esp), %dx + movw12(%esp), %cx +#endif +_memcpy: pushw %si pushw %di movw%ax, %di @@ -34,6 +40,11 @@ GLOBAL(memcpy) ENDPROC(memcpy) GLOBAL(memset) +#ifdef __clang__ /* Use normal ABI calling conventions */ + movw4(%esp), %ax + movw8(%esp), %dx + movw12(%esp), %cx +#endif pushw %di movw%ax, %di movzbl %dl, %eax @@ -52,7 +63,7 @@ GLOBAL(copy_from_fs) pushw %ds pushw %fs popw%ds - calll memcpy + calll _memcpy popw%ds retl ENDPROC(copy_from_fs) @@ -61,7 +72,7 @@ GLOBAL(copy_to_fs) pushw %es pushw %fs popw%es - calll memcpy + calll _memcpy popw%es retl ENDPROC(copy_to_fs) diff --git a/arch/x86/boot/string.h b/arch/x86/boot/string.h index 113588ddb43f..e735cccb3fc8 100644 --- a/arch/x86/boot/string.h +++ b/arch/x86/boot/string.h @@ -6,8 +6,21 @@ #undef memset #undef memcmp +/* + * Use normal ABI calling conventions - i.e. regparm(0) - + * for memcpy() and memset() if we are building the real + * mode setup code with clang since clang may make implicit + * calls to these functions that assume regparm(0). + */ +#if defined(_SETUP) && defined(__clang__) +void __attribute__((regparm(0))) *memcpy(void *dst, const void *src, +size_t len); +void __attribute__((regparm(0))) *memset(void *dst, int c, size_t len); +#else void *memcpy(void *dst, const void *src, size_t len); void *memset(void *dst, int c, size_t len); +#endif + int memcmp(const void *s1, const void *s2, size_t len); /* -- 2.12.0.367.g23dc2f6d3c-goog
[PATCH 0/7] LLVM: make x86_64 kernel build with clang.
This patch set is sufficient to get the x86_64 kernel to build and boot correctly with clang-3.8 or greater. The resulting build still has about 300 warnings, very few of which appear to be significant. Most of them should be fixable with some minor code refactoring although a few of them, such as the complaints about implict conversions between enumerated types may be candidates for just being disabled. Michael Davidson (7): Makefile, LLVM: add -no-integrated-as to KBUILD_[AC]FLAGS Makefile, x86, LLVM: disable unsupported optimization flags x86, LLVM: suppress clang warnings about unaligned accesses x86, boot, LLVM: #undef memcpy etc in string.c x86, boot, LLVM: Use regparm=0 for memcpy and memset md/raid10, LLVM: get rid of variable length array crypto, x86, LLVM: aesni - fix token pasting Makefile| 4 arch/x86/Makefile | 7 +++ arch/x86/boot/copy.S| 15 +-- arch/x86/boot/string.c | 9 + arch/x86/boot/string.h | 13 + arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 7 ++- drivers/md/raid10.c | 9 - 7 files changed, 52 insertions(+), 12 deletions(-) -- 2.12.0.367.g23dc2f6d3c-goog
[PATCH 2/7] Makefile, x86, LLVM: disable unsupported optimization flags
Unfortunately, while clang generates a warning about these flags being unsupported it still exits with a status of 0 so we have to explicitly disable them instead of just using a cc-option check. Signed-off-by: Michael Davidson--- Makefile | 2 ++ arch/x86/Makefile | 2 ++ 2 files changed, 4 insertions(+) diff --git a/Makefile b/Makefile index b21fd0ca2946..5e97e5fc1eea 100644 --- a/Makefile +++ b/Makefile @@ -629,7 +629,9 @@ ARCH_AFLAGS := ARCH_CFLAGS := include arch/$(SRCARCH)/Makefile +ifneq ($(cc-name),clang) KBUILD_CFLAGS += $(call cc-option,-fno-delete-null-pointer-checks,) +endif KBUILD_CFLAGS += $(call cc-disable-warning,frame-address,) ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 2d449337a360..894a8d18bf97 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -87,11 +87,13 @@ else KBUILD_AFLAGS += -m64 KBUILD_CFLAGS += -m64 +ifneq ($(cc-name),clang) # Align jump targets to 1 byte, not the default 16 bytes: KBUILD_CFLAGS += -falign-jumps=1 # Pack loops tightly as well: KBUILD_CFLAGS += -falign-loops=1 +endif # Don't autogenerate traditional x87 instructions KBUILD_CFLAGS += $(call cc-option,-mno-80387) -- 2.12.0.367.g23dc2f6d3c-goog
[PATCH 4/7] x86, boot, LLVM: #undef memcpy etc in string.c
undef memcpy and friends in boot/string.c so that the functions defined here will have the correct names, otherwise we end up up trying to redefine __builtin_memcpy etc. Surprisingly, gcc allows this (and, helpfully, discards the __builtin_ prefix from the function name when compiling it), but clang does not. Adding these #undef's appears to preserve what I assume was the original intent of the code. Signed-off-by: Michael Davidson--- arch/x86/boot/string.c | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/x86/boot/string.c b/arch/x86/boot/string.c index 5457b02fc050..b40266850869 100644 --- a/arch/x86/boot/string.c +++ b/arch/x86/boot/string.c @@ -16,6 +16,15 @@ #include "ctype.h" #include "string.h" +/* + * Undef these macros so that the functions that we provide + * here will have the correct names regardless of how string.h + * may have chosen to #define them. + */ +#undef memcpy +#undef memset +#undef memcmp + int memcmp(const void *s1, const void *s2, size_t len) { bool diff; -- 2.12.0.367.g23dc2f6d3c-goog
[PATCH 6/7] md/raid10, LLVM: get rid of variable length array
Replace a variable length array in a struct by allocating the memory for the entire struct in a char array on the stack. Signed-off-by: Michael Davidson--- drivers/md/raid10.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 063c43d83b72..158ebdff782c 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -4654,11 +4654,10 @@ static int handle_reshape_read_error(struct mddev *mddev, /* Use sync reads to get the blocks from somewhere else */ int sectors = r10_bio->sectors; struct r10conf *conf = mddev->private; - struct { - struct r10bio r10_bio; - struct r10dev devs[conf->copies]; - } on_stack; - struct r10bio *r10b = _stack.r10_bio; + char on_stack_r10_bio[sizeof(struct r10bio) + + conf->copies * sizeof(struct r10dev)] + __aligned(__alignof__(struct r10bio)); + struct r10bio *r10b = (struct r10bio *)on_stack_r10_bio; int slot = 0; int idx = 0; struct bio_vec *bvec = r10_bio->master_bio->bi_io_vec; -- 2.12.0.367.g23dc2f6d3c-goog
[PATCH 7/7] crypto, x86, LLVM: aesni - fix token pasting
aes_ctrby8_avx-x86_64.S uses the C preprocessor for token pasting of character sequences that are not valid preprocessor tokens. While this is allowed when preprocessing assembler files it exposes an incompatibilty between the clang and gcc preprocessors where clang does not strip leading white space from macro parameters, leading to the CONCAT(%xmm, i) macro expansion on line 96 resulting in a token with a space character embedded in it. While this could be fixed by deleting the offending space character, the assembler is perfectly capable of handling the token pasting correctly for itself so it seems preferable to let it do so and to get rid or the CONCAT(), DDQ() and XMM() preprocessor macros. Signed-off-by: Michael Davidson--- arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S index a916c4a61165..5f6a5af9c489 100644 --- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S +++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S @@ -65,7 +65,6 @@ #include #include -#define CONCAT(a,b)a##b #define VMOVDQ vmovdqu #define xdata0 %xmm0 @@ -92,8 +91,6 @@ #define num_bytes %r8 #define tmp%r10 -#defineDDQ(i) CONCAT(ddq_add_,i) -#defineXMM(i) CONCAT(%xmm, i) #defineDDQ_DATA0 #defineXDATA 1 #define KEY_1281 @@ -131,12 +128,12 @@ ddq_add_8: /* generate a unique variable for ddq_add_x */ .macro setddq n - var_ddq_add = DDQ(\n) + var_ddq_add = ddq_add_\n .endm /* generate a unique variable for xmm register */ .macro setxdata n - var_xdata = XMM(\n) + var_xdata = %xmm\n .endm /* club the numeric 'id' to the symbol 'name' */ -- 2.12.0.367.g23dc2f6d3c-goog
[PATCH 1/7] Makefile, LLVM: add -no-integrated-as to KBUILD_[AC]FLAGS
Add -no-integrated-as to KBUILD_AFLAGS and KBUILD_CFLAGS for clang. Signed-off-by: Michael Davidson--- Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Makefile b/Makefile index b841fb36beb2..b21fd0ca2946 100644 --- a/Makefile +++ b/Makefile @@ -704,6 +704,8 @@ KBUILD_CFLAGS += $(call cc-disable-warning, tautological-compare) # See modpost pattern 2 KBUILD_CFLAGS += $(call cc-option, -mno-global-merge,) KBUILD_CFLAGS += $(call cc-option, -fcatch-undefined-behavior) +KBUILD_CFLAGS += $(call cc-option, -no-integrated-as) +KBUILD_AFLAGS += $(call cc-option, -no-integrated-as) else # These warnings generated too much noise in a regular build. -- 2.12.0.367.g23dc2f6d3c-goog
Re: [RFC 0/7] crypto: caam - add Queue Interface (QI) support
On Thu, 2017-03-16 at 14:13 +, Horia Geantă wrote: > On 3/16/2017 10:43 AM, Herbert Xu wrote: > > > > On Fri, Mar 03, 2017 at 04:52:06PM +0200, Horia Geantă wrote: > > > > > > The patchset adds support for CAAM Queue Interface (QI), the additional > > > interface (besides job ring) available for submitting jobs to the engine > > > on platforms having DPAA (Datapath Acceleration Architecture). > > > > > > Patches 1-4 are QMan dependencies. > > > I would prefer to take them through the crypto tree, > > > but I am open to suggestions. > > > > > > Patch 5 adds a missing double inclusion guard in desc_constr.h. > > > > > > Patch 6 adds the caam/qi job submission backend. > > > > > > Patch 7 adds algorithms (ablkcipher and authenc) that run on top > > > of caam/qi. For now, their priority is set lower than caam/jr. > > I'm fine with the crypto bits. > > > Herbert, > > Thanks for the review. > Should I submit formally, i.e. without the [RFC] prefix? > > Scott, Roy, > > Do you have any comments wrt. soc/qman patches or with them going > through the crypto tree? Going through the crypto tree is fine. -Scott
Re: [RFC PATCH v2 08/32] x86: Use PAGE_KERNEL protection for ioremap of memory page
On 3/7/2017 8:59 AM, Borislav Petkov wrote: On Thu, Mar 02, 2017 at 10:13:32AM -0500, Brijesh Singh wrote: From: Tom LendackyIn order for memory pages to be properly mapped when SEV is active, we need to use the PAGE_KERNEL protection attribute as the base protection. This will insure that memory mapping of, e.g. ACPI tables, receives the proper mapping attributes. Signed-off-by: Tom Lendacky --- diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index c400ab5..481c999 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -151,7 +151,15 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr, pcm = new_pcm; } + /* +* If the page being mapped is in memory and SEV is active then +* make sure the memory encryption attribute is enabled in the +* resulting mapping. +*/ prot = PAGE_KERNEL_IO; + if (sev_active() && page_is_mem(pfn)) Hmm, a resource tree walk per ioremap call. This could get expensive for ioremap-heavy workloads. __ioremap_caller() gets called here during boot 55 times so not a whole lot but I wouldn't be surprised if there were some nasty use cases which ioremap a lot. ... diff --git a/kernel/resource.c b/kernel/resource.c index 9b5f044..db56ba3 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -518,6 +518,46 @@ int __weak page_is_ram(unsigned long pfn) } EXPORT_SYMBOL_GPL(page_is_ram); +/* + * This function returns true if the target memory is marked as + * IORESOURCE_MEM and IORESOUCE_BUSY and described as other than + * IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES). + */ +static int walk_mem_range(unsigned long start_pfn, unsigned long nr_pages) +{ + struct resource res; + unsigned long pfn, end_pfn; + u64 orig_end; + int ret = -1; + + res.start = (u64) start_pfn << PAGE_SHIFT; + res.end = ((u64)(start_pfn + nr_pages) << PAGE_SHIFT) - 1; + res.flags = IORESOURCE_MEM | IORESOURCE_BUSY; + orig_end = res.end; + while ((res.start < res.end) && + (find_next_iomem_res(, IORES_DESC_NONE, true) >= 0)) { + pfn = (res.start + PAGE_SIZE - 1) >> PAGE_SHIFT; + end_pfn = (res.end + 1) >> PAGE_SHIFT; + if (end_pfn > pfn) + ret = (res.desc != IORES_DESC_NONE) ? 1 : 0; + if (ret) + break; + res.start = res.end + 1; + res.end = orig_end; + } + return ret; +} So the relevant difference between this one and walk_system_ram_range() is this: - ret = (*func)(pfn, end_pfn - pfn, arg); + ret = (res.desc != IORES_DESC_NONE) ? 1 : 0; so it seems to me you can have your own *func() pointer which does that IORES_DESC_NONE comparison. And then you can define your own workhorse __walk_memory_range() which gets called by both walk_mem_range() and walk_system_ram_range() instead of almost duplicating them. And looking at walk_system_ram_res(), that one looks similar too except the pfn computation. But AFAICT the pfn/end_pfn things are computed from res.start and res.end so it looks to me like all those three functions are crying for unification... I'll take a look at what it takes to consolidate these with a pre-patch. Then I'll add the new support. Thanks, Tom
Re: [RFC 4/7] soc/qman: add helper functions needed by caam/qi driver
On Fri, 2017-03-03 at 16:52 +0200, Horia Geantă wrote: > Add helper functions, macros, #defines for accessing / enabling > qman functionality from caam/qi driver, such that this driver > is not aware of the need for data conversion to big endian. Why? I complained about that (probably internally) when this driver was originally submitted. Having a bunch of accessors in a header file that just do an assignment with an endian conversion just obfuscates things. The driver still needs to know that the conversion needs to happen, or else it wouldn't know that the fields can't be accessed directly... and it gets particularly ridiculous when a single field has a growing pile of accessors depending on exactly what flags you want to set (e.g. qm_sg_entry_set_*). Just open-code it unless an accessor is justified by the call sites getting unwieldy or numerous. It looks like GCC 6 has added an attribute (scalar_storage_order) that could be used on structs to avoid having to manually swap such things. I look forward to the day when GCC 6 is old enough for the kernel to depend on this. -Scott
Re: [RFC PATCH v2 05/32] x86: Use encrypted access of BOOT related data with SEV
On 3/7/2017 5:09 AM, Borislav Petkov wrote: On Thu, Mar 02, 2017 at 10:12:59AM -0500, Brijesh Singh wrote: From: Tom LendackyWhen Secure Encrypted Virtualization (SEV) is active, BOOT data (such as EFI related data, setup data) is encrypted and needs to be accessed as such when mapped. Update the architecture override in early_memremap to keep the encryption attribute when mapping this data. This could also explain why persistent memory needs to be accessed decrypted with SEV. I'll add some comments about why persistent memory needs to be accessed decrypted (because the encryption key changes across reboots) for both SME and SEV. In general, what the difference in that aspect is in respect to SME. And I'd write that in the comment over the function. And not say "E820 areas are checked in making this determination." because that is visible but say *why* we need to check those ranges and determine access depending on their type. Will do. Thanks, Tom
Re: [RFC PATCH v2 29/32] kvm: svm: Add support for SEV DEBUG_DECRYPT command
On 03/16/2017 05:54 AM, Paolo Bonzini wrote: On 02/03/2017 16:18, Brijesh Singh wrote: +static int __sev_dbg_decrypt_page(struct kvm *kvm, unsigned long src, + void *dst, int *error) +{ + inpages = sev_pin_memory(src, PAGE_SIZE, ); + if (!inpages) { + ret = -ENOMEM; + goto err_1; + } + + data->handle = sev_get_handle(kvm); + data->dst_addr = __psp_pa(dst); + data->src_addr = __sev_page_pa(inpages[0]); + data->length = PAGE_SIZE; + + ret = sev_issue_cmd(kvm, SEV_CMD_DBG_DECRYPT, data, error); + if (ret) + printk(KERN_ERR "SEV: DEBUG_DECRYPT %d (%#010x)\n", + ret, *error); + sev_unpin_memory(inpages, npages); +err_1: + kfree(data); + return ret; +} + +static int sev_dbg_decrypt(struct kvm *kvm, struct kvm_sev_cmd *argp) +{ + void *data; + int ret, offset, len; + struct kvm_sev_dbg debug; + + if (!sev_guest(kvm)) + return -ENOTTY; + + if (copy_from_user(, (void *)argp->data, + sizeof(struct kvm_sev_dbg))) + return -EFAULT; + /* +* TODO: add support for decrypting length which crosses the +* page boundary. +*/ + offset = debug.src_addr & (PAGE_SIZE - 1); + if (offset + debug.length > PAGE_SIZE) + return -EINVAL; + Please do add it, it doesn't seem very different from what you're doing in LAUNCH_UPDATE_DATA. There's no need for a separate __sev_dbg_decrypt_page function, you can just pin/unpin here and do a per-page loop as in LAUNCH_UPDATE_DATA. I can certainly add support to handle crossing the page boundary cases. Should we limit the size to prevent user passing arbitrary long length and we end up looping inside the kernel? I was thinking to limit to a PAGE_SIZE. ~ Brijesh
Re: [RFC PATCH v2 30/32] kvm: svm: Add support for SEV DEBUG_ENCRYPT command
On 03/16/2017 06:03 AM, Paolo Bonzini wrote: On 02/03/2017 16:18, Brijesh Singh wrote: + data = (void *) get_zeroed_page(GFP_KERNEL); The page does not need to be zeroed, does it? No, we don't have to zero it. I will fix it. + + if ((len & 15) || (dst_addr & 15)) { + /* if destination address and length are not 16-byte +* aligned then: +* a) decrypt destination page into temporary buffer +* b) copy source data into temporary buffer at correct offset +* c) encrypt temporary buffer +*/ + ret = __sev_dbg_decrypt_page(kvm, dst_addr, data, >error); Ah, I see now you're using this function here for read-modify-write. data is already pinned here, so even if you keep the function it makes sense to push pinning out of __sev_dbg_decrypt_page and into sev_dbg_decrypt. I can push out pinning part outside __sev_dbg_decrypt_page + if (ret) + goto err_3; + d_off = dst_addr & (PAGE_SIZE - 1); + + if (copy_from_user(data + d_off, + (uint8_t *)debug.src_addr, len)) { + ret = -EFAULT; + goto err_3; + } + + encrypt->length = PAGE_SIZE; Why decrypt/re-encrypt all the page instead of just the 16 byte area around the [dst_addr, dst_addr+len) range? good catch, I should be fine just decrypting a 16 byte area. Will fix in next rev + encrypt->src_addr = __psp_pa(data); + encrypt->dst_addr = __sev_page_pa(inpages[0]); + } else { + if (copy_from_user(data, (uint8_t *)debug.src_addr, len)) { + ret = -EFAULT; + goto err_3; + } Do you need copy_from_user, or can you just pin/unpin memory as for DEBUG_DECRYPT? We can work either with pin/unpin or copy_from_user. I think I choose copy_from_user because in most of time ENCRYPT path was used when I set breakpoint through gdb which basically requires copying pretty small data into guest memory. It may be very much possible that someone can try to copy lot more data and then pin/unpin can speedup the things. -Brijesh
Re: [RFC PATCH v2 14/32] x86: mm: Provide support to use memblock when spliting large pages
On Fri, Mar 10, 2017 at 04:41:56PM -0600, Brijesh Singh wrote: > I can take a look at fixing those warning. In my initial attempt was to create > a new function to clear encryption bit but it ended up looking very similar to > __change_page_attr_set_clr() hence decided to extend the exiting function to > use memblock_alloc(). ... except that having all that SEV-specific code in main code paths is yucky and I'd like to avoid it, if possible. > Early in boot process, guest kernel allocates some structure (its either > statically allocated or dynamic allocated via memblock_alloc). And shares the > physical > address of these structure with hypervisor. Since entire guest memory area is > mapped > as encrypted hence those structure's are mapped as encrypted memory range. We > need > a method to clear the encryption bit. Sometime these structure maybe part of > 2M pages > and need to split into smaller pages. So how hard would it be if the hypervisor allocated that memory for the guest instead? It would allocate it decrypted and guest would need to access it decrypted too. All in preparation for SEV-ES which will need a block of unencrypted memory for the guest anyway... > In most cases, guest and hypervisor communication starts as soon as guest > provides > the physical address to hypervisor. So we must map the pages as decrypted > before > sharing the physical address to hypervisor. See above: so purely theoretically speaking, the hypervisor could prep that decrypted range for the guest. I'd look in Paolo's direction, though, for the feasibility of something like that. Thanks. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --
Re: [PATCH v3 1/3] clk: meson-gxbb: expose clock CLKID_RNG0
Hi Herbert, Herbert Xuwrites: > On Wed, Feb 22, 2017 at 07:55:24AM +0100, Heiner Kallweit wrote: >> Expose clock CLKID_RNG0 which is needed for the HW random number generator. >> >> Signed-off-by: Heiner Kallweit > > All patches applied. Thanks. Actually, can you just apply [PATCH 4/4] to your tree? The clock and DT patches need to go through their respective trees or will otherwise have conflicts with other things going in via those trees. Thanks, Kevin
Re: [RFC PATCH v2 26/32] kvm: svm: Add support for SEV LAUNCH_UPDATE_DATA command
On 03/16/2017 05:48 AM, Paolo Bonzini wrote: On 02/03/2017 16:17, Brijesh Singh wrote: +static struct page **sev_pin_memory(unsigned long uaddr, unsigned long ulen, + unsigned long *n) +{ + struct page **pages; + int first, last; + unsigned long npages, pinned; + + /* Get number of pages */ + first = (uaddr & PAGE_MASK) >> PAGE_SHIFT; + last = ((uaddr + ulen - 1) & PAGE_MASK) >> PAGE_SHIFT; + npages = (last - first + 1); + + pages = kzalloc(npages * sizeof(struct page *), GFP_KERNEL); + if (!pages) + return NULL; + + /* pin the user virtual address */ + down_read(>mm->mmap_sem); + pinned = get_user_pages_fast(uaddr, npages, 1, pages); + up_read(>mm->mmap_sem); get_user_pages_fast, like get_user_pages_unlocked, must be called without mmap_sem held. Sure. + if (pinned != npages) { + printk(KERN_ERR "SEV: failed to pin %ld pages (got %ld)\n", + npages, pinned); + goto err; + } + + *n = npages; + return pages; +err: + if (pinned > 0) + release_pages(pages, pinned, 0); + kfree(pages); + + return NULL; +} + /* the array of pages returned by get_user_pages() is a page-aligned +* memory. Since the user buffer is probably not page-aligned, we need +* to calculate the offset within a page for first update entry. +*/ + offset = uaddr & (PAGE_SIZE - 1); + len = min_t(size_t, (PAGE_SIZE - offset), ulen); + ulen -= len; + + /* update first page - +* special care need to be taken for the first page because we might +* be dealing with offset within the page +*/ No need to special case the first page; just set "offset = 0" inside the loop after the first iteration. Will do. -Brijesh
Re: [RFC PATCH v2 32/32] x86: kvm: Pin the guest memory when SEV is active
On 03/16/2017 05:38 AM, Paolo Bonzini wrote: On 02/03/2017 16:18, Brijesh Singh wrote: The SEV memory encryption engine uses a tweak such that two identical plaintexts at different location will have a different ciphertexts. So swapping or moving ciphertexts of two pages will not result in plaintexts being swapped. Relocating (or migrating) a physical backing pages for SEV guest will require some additional steps. The current SEV key management spec [1] does not provide commands to swap or migrate (move) ciphertexts. For now we pin the memory allocated for the SEV guest. In future when SEV key management spec provides the commands to support the page migration we can update the KVM code to remove the pinning logical without making any changes into userspace (qemu). The patch pins userspace memory when a new slot is created and unpin the memory when slot is removed. [1] http://support.amd.com/TechDocs/55766_SEV-KM%20API_Spec.pdf This is not enough, because memory can be hidden temporarily from the guest and remapped later. Think of a PCI BAR that is backed by RAM, or also SMRAM. The pinning must be kept even in that case. You need to add a pair of KVM_MEMORY_ENCRYPT_OPs (one that doesn't map to a PSP operation), such as KVM_REGISTER/UNREGISTER_ENCRYPTED_RAM. In QEMU you can use a RAMBlockNotifier to invoke the ioctls. I was hoping to avoid adding new ioctl, but I see your point. Will add a pair of ioctl's and use RAMBlocNotifier to invoke those ioctls. -Brijesh
Re: [RFC PATCH v2 12/32] x86: Add early boot support when running with SEV active
On Thu, Mar 16, 2017 at 11:11:26AM -0500, Tom Lendacky wrote: > Not quite. The guest still needs to understand about the encryption mask > so that it can protect memory by setting the encryption mask in the > pagetable entries. It can also decide when to share memory with the > hypervisor by not setting the encryption mask in the pagetable entries. Ok, so the kernel - by that I mean both the baremetal and guest kernel - needs to know whether we're encrypting stuff. So it needs to know about SME. > "Instruction fetches are always decrypted under SEV" means that, > regardless of how a virtual address is mapped, encrypted or decrypted, > if an instruction fetch is performed by the CPU from that address it > will always be decrypted. This is to prevent the hypervisor from > injecting executable code into the guest since it would have to be > valid encrypted instructions. Ok, so the guest needs to map its pages encrypted. Which reminds me, KSM might be a PITA to enable with SEV but that's a different story. :) > There are many areas that use the same logic, but there are certain > situations where we need to check between SME vs SEV (e.g. DMA operation > setup or decrypting the trampoline area) and act accordingly. Right, and I'd like to keep those areas where it differs at minimum and nicely cordoned off from the main paths. So looking back at the current patch in this subthread: we do check * CPUID 0x4000 * 8000_001F[EAX] for SME * 8000_001F[EBX][5:0] for the encryption bits. So how about we generate the following CPUID picture for the guest: CPUID_Fn801F_EAX = ...10b That is, SME bit is cleared, SEV is set. This will mean for the guest kernel that SEV is enabled and you can avoid yourself the 0x4000 leaf check and the additional KVM feature bit glue. 10b configuration will be invalid for baremetal as - I'm assuming - you can't have SEV=1b with SME=0b. It will be a virt-only configuration and this way you can even avoid the hypervisor-specific detection but do that for all. Hmmm? -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --
Re: [RFC PATCH v2 12/32] x86: Add early boot support when running with SEV active
On 3/16/2017 10:09 AM, Borislav Petkov wrote: On Thu, Mar 16, 2017 at 09:28:58AM -0500, Tom Lendacky wrote: Because there are differences between how SME and SEV behave (instruction fetches are always decrypted under SEV, DMA to an encrypted location is not supported under SEV, etc.) we need to determine which mode we are in so that things can be setup properly during boot. For example, if SEV is active the kernel will already be encrypted and so we don't perform that step or the trampoline area for bringing up an AP must be decrypted for SME but encrypted for SEV. So with SEV enabled, it seems to me a guest doesn't know anything about encryption and can run as if SME is disabled. So sme_active() will be false. And then the kernel can bypass all that code dealing with SME. So a guest should simply run like on baremetal with no SME, IMHO. Not quite. The guest still needs to understand about the encryption mask so that it can protect memory by setting the encryption mask in the pagetable entries. It can also decide when to share memory with the hypervisor by not setting the encryption mask in the pagetable entries. But then there's that part: "instruction fetches are always decrypted under SEV". What does that mean exactly? And how much of that code can "Instruction fetches are always decrypted under SEV" means that, regardless of how a virtual address is mapped, encrypted or decrypted, if an instruction fetch is performed by the CPU from that address it will always be decrypted. This is to prevent the hypervisor from injecting executable code into the guest since it would have to be valid encrypted instructions. be reused so that * SME on baremetal * SEV on guest use the same logic? There are many areas that use the same logic, but there are certain situations where we need to check between SME vs SEV (e.g. DMA operation setup or decrypting the trampoline area) and act accordingly. Thanks, Tom Having the larger SEV preparation part on the kvm host side is perfectly fine. But I'd like to keep kernel initialization paths clean. Thanks.
Re: [RFC PATCH v2 12/32] x86: Add early boot support when running with SEV active
On Thu, Mar 16, 2017 at 09:28:58AM -0500, Tom Lendacky wrote: > Because there are differences between how SME and SEV behave > (instruction fetches are always decrypted under SEV, DMA to an > encrypted location is not supported under SEV, etc.) we need to > determine which mode we are in so that things can be setup properly > during boot. For example, if SEV is active the kernel will already > be encrypted and so we don't perform that step or the trampoline area > for bringing up an AP must be decrypted for SME but encrypted for SEV. So with SEV enabled, it seems to me a guest doesn't know anything about encryption and can run as if SME is disabled. So sme_active() will be false. And then the kernel can bypass all that code dealing with SME. So a guest should simply run like on baremetal with no SME, IMHO. But then there's that part: "instruction fetches are always decrypted under SEV". What does that mean exactly? And how much of that code can be reused so that * SME on baremetal * SEV on guest use the same logic? Having the larger SEV preparation part on the kvm host side is perfectly fine. But I'd like to keep kernel initialization paths clean. Thanks. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --
Re: [RFC PATCH v2 12/32] x86: Add early boot support when running with SEV active
On 3/16/2017 5:16 AM, Borislav Petkov wrote: On Fri, Mar 10, 2017 at 10:35:30AM -0600, Brijesh Singh wrote: We could update this patch to use the below logic: * CPUID(0) - Check for AuthenticAMD * CPID(1) - Check if under hypervisor * CPUID(0x8000) - Check for highest supported leaf * CPUID(0x801F).EAX - Check for SME and SEV support * rdmsr (MSR_K8_SYSCFG)[MemEncryptionModeEnc] - Check if SMEE is set Actually, it is still not clear to me *why* we need to do anything special wrt SEV in the guest. Lemme clarify: why can't the guest boot just like a normal Linux on baremetal and use the SME(!) detection code to set sme_enable and so on? IOW, I'd like to avoid all those checks whether we're running under hypervisor and handle all that like we're running on baremetal. Because there are differences between how SME and SEV behave (instruction fetches are always decrypted under SEV, DMA to an encrypted location is not supported under SEV, etc.) we need to determine which mode we are in so that things can be setup properly during boot. For example, if SEV is active the kernel will already be encrypted and so we don't perform that step or the trampoline area for bringing up an AP must be decrypted for SME but encrypted for SEV. The hypervisor check will provide that ability to determine how we handle things. Thanks, Tom
[PATCH] md5: remove from lib and only live in crypto
The md5_transform function is no longer used any where in the tree, except for the crypto api's actual implementation of md5, so we can drop the function from lib and put it as a static function of the crypto file, where it belongs. There should be no new users of md5_transform, anyway, since there are more modern ways of doing what it once achieved. Signed-off-by: Jason A. Donenfeld--- In the last patch like this, we managed to get rid of halfmd4 from this file. In this series we get rid of md5, now that the patches have landed that remove such improper md5 usage from the kernel. When a final dependency on the (dead) sha1 is removed, then cryptohash.h will be removed all together. This patch is for the md5 removal. crypto/md5.c | 95 +- include/linux/cryptohash.h | 5 --- lib/Makefile | 2 +- lib/md5.c | 95 -- 4 files changed, 95 insertions(+), 102 deletions(-) delete mode 100644 lib/md5.c diff --git a/crypto/md5.c b/crypto/md5.c index 2355a7c25c45..f7ae1a48225b 100644 --- a/crypto/md5.c +++ b/crypto/md5.c @@ -21,9 +21,11 @@ #include #include #include -#include #include +#define MD5_DIGEST_WORDS 4 +#define MD5_MESSAGE_BYTES 64 + const u8 md5_zero_message_hash[MD5_DIGEST_SIZE] = { 0xd4, 0x1d, 0x8c, 0xd9, 0x8f, 0x00, 0xb2, 0x04, 0xe9, 0x80, 0x09, 0x98, 0xec, 0xf8, 0x42, 0x7e, @@ -47,6 +49,97 @@ static inline void cpu_to_le32_array(u32 *buf, unsigned int words) } } +#define F1(x, y, z)(z ^ (x & (y ^ z))) +#define F2(x, y, z)F1(z, x, y) +#define F3(x, y, z)(x ^ y ^ z) +#define F4(x, y, z)(y ^ (x | ~z)) + +#define MD5STEP(f, w, x, y, z, in, s) \ + (w += f(x, y, z) + in, w = (w<>(32-s)) + x) + +static void md5_transform(__u32 *hash, __u32 const *in) +{ + u32 a, b, c, d; + + a = hash[0]; + b = hash[1]; + c = hash[2]; + d = hash[3]; + + MD5STEP(F1, a, b, c, d, in[0] + 0xd76aa478, 7); + MD5STEP(F1, d, a, b, c, in[1] + 0xe8c7b756, 12); + MD5STEP(F1, c, d, a, b, in[2] + 0x242070db, 17); + MD5STEP(F1, b, c, d, a, in[3] + 0xc1bdceee, 22); + MD5STEP(F1, a, b, c, d, in[4] + 0xf57c0faf, 7); + MD5STEP(F1, d, a, b, c, in[5] + 0x4787c62a, 12); + MD5STEP(F1, c, d, a, b, in[6] + 0xa8304613, 17); + MD5STEP(F1, b, c, d, a, in[7] + 0xfd469501, 22); + MD5STEP(F1, a, b, c, d, in[8] + 0x698098d8, 7); + MD5STEP(F1, d, a, b, c, in[9] + 0x8b44f7af, 12); + MD5STEP(F1, c, d, a, b, in[10] + 0x5bb1, 17); + MD5STEP(F1, b, c, d, a, in[11] + 0x895cd7be, 22); + MD5STEP(F1, a, b, c, d, in[12] + 0x6b901122, 7); + MD5STEP(F1, d, a, b, c, in[13] + 0xfd987193, 12); + MD5STEP(F1, c, d, a, b, in[14] + 0xa679438e, 17); + MD5STEP(F1, b, c, d, a, in[15] + 0x49b40821, 22); + + MD5STEP(F2, a, b, c, d, in[1] + 0xf61e2562, 5); + MD5STEP(F2, d, a, b, c, in[6] + 0xc040b340, 9); + MD5STEP(F2, c, d, a, b, in[11] + 0x265e5a51, 14); + MD5STEP(F2, b, c, d, a, in[0] + 0xe9b6c7aa, 20); + MD5STEP(F2, a, b, c, d, in[5] + 0xd62f105d, 5); + MD5STEP(F2, d, a, b, c, in[10] + 0x02441453, 9); + MD5STEP(F2, c, d, a, b, in[15] + 0xd8a1e681, 14); + MD5STEP(F2, b, c, d, a, in[4] + 0xe7d3fbc8, 20); + MD5STEP(F2, a, b, c, d, in[9] + 0x21e1cde6, 5); + MD5STEP(F2, d, a, b, c, in[14] + 0xc33707d6, 9); + MD5STEP(F2, c, d, a, b, in[3] + 0xf4d50d87, 14); + MD5STEP(F2, b, c, d, a, in[8] + 0x455a14ed, 20); + MD5STEP(F2, a, b, c, d, in[13] + 0xa9e3e905, 5); + MD5STEP(F2, d, a, b, c, in[2] + 0xfcefa3f8, 9); + MD5STEP(F2, c, d, a, b, in[7] + 0x676f02d9, 14); + MD5STEP(F2, b, c, d, a, in[12] + 0x8d2a4c8a, 20); + + MD5STEP(F3, a, b, c, d, in[5] + 0xfffa3942, 4); + MD5STEP(F3, d, a, b, c, in[8] + 0x8771f681, 11); + MD5STEP(F3, c, d, a, b, in[11] + 0x6d9d6122, 16); + MD5STEP(F3, b, c, d, a, in[14] + 0xfde5380c, 23); + MD5STEP(F3, a, b, c, d, in[1] + 0xa4beea44, 4); + MD5STEP(F3, d, a, b, c, in[4] + 0x4bdecfa9, 11); + MD5STEP(F3, c, d, a, b, in[7] + 0xf6bb4b60, 16); + MD5STEP(F3, b, c, d, a, in[10] + 0xbebfbc70, 23); + MD5STEP(F3, a, b, c, d, in[13] + 0x289b7ec6, 4); + MD5STEP(F3, d, a, b, c, in[0] + 0xeaa127fa, 11); + MD5STEP(F3, c, d, a, b, in[3] + 0xd4ef3085, 16); + MD5STEP(F3, b, c, d, a, in[6] + 0x04881d05, 23); + MD5STEP(F3, a, b, c, d, in[9] + 0xd9d4d039, 4); + MD5STEP(F3, d, a, b, c, in[12] + 0xe6db99e5, 11); + MD5STEP(F3, c, d, a, b, in[15] + 0x1fa27cf8, 16); + MD5STEP(F3, b, c, d, a, in[2] + 0xc4ac5665, 23); + + MD5STEP(F4, a, b, c, d, in[0] + 0xf4292244, 6); + MD5STEP(F4, d, a, b, c, in[7] + 0x432aff97, 10); + MD5STEP(F4, c, d, a, b, in[14] + 0xab9423a7, 15); + MD5STEP(F4, b, c, d, a, in[5] + 0xfc93a039, 21); +
Re: CRYPTO_MAX_ALG_NAME is too low
Hello! On 10/03/17 12:55, Alexander Sverdlin wrote: > Hello crypto maintainers! > > We've found and example of the ipsec algorithm combination, which doesn't fit > into CRYPTO_MAX_ALG_NAME long buffers: > > ip x s add src 1.1.1.1 dst 1.1.1.2 proto esp spi 0 mode tunnel enc des3_ede > 0x0 auth sha256 0x0 flag esn replay-window 256 > > produces "echainiv(authencesn(hmac(sha256-generic),cbc(des3_ede-generic)))" > on the machines without optimized crypto drivers, which doesn't fit into > current > 64-bytes buffers. > > I see two possible options: > > a) split CRYPTO_MAX_ALG_NAME into CRYPTO_MAX_ALG_NAME + CRYPTO_MAX_DRV_NAME > pair > and make later, say, 96, because the former probably cannot be changed > because of > numerous user-space exports. And change half of the code to use new define. > > b) rename *-generic algorithms to *-gen, so that cra_driver_name will be > shortened, > while MODULE_ALIAS_CRYPTO() could still be maintained in old and new form. > > What are your thoughts? Any? This is a regression caused by 856e3f4092 ("crypto: seqiv - Add support for new AEAD interface") As I've said above, I can offer one of the two solutions, which patch should I send? Or do you see any better alternatives? -- Best regards, Alexander Sverdlin.
Re: [RFC 0/7] crypto: caam - add Queue Interface (QI) support
On 3/16/2017 10:43 AM, Herbert Xu wrote: > On Fri, Mar 03, 2017 at 04:52:06PM +0200, Horia Geantă wrote: >> The patchset adds support for CAAM Queue Interface (QI), the additional >> interface (besides job ring) available for submitting jobs to the engine >> on platforms having DPAA (Datapath Acceleration Architecture). >> >> Patches 1-4 are QMan dependencies. >> I would prefer to take them through the crypto tree, >> but I am open to suggestions. >> >> Patch 5 adds a missing double inclusion guard in desc_constr.h. >> >> Patch 6 adds the caam/qi job submission backend. >> >> Patch 7 adds algorithms (ablkcipher and authenc) that run on top >> of caam/qi. For now, their priority is set lower than caam/jr. > > I'm fine with the crypto bits. > Herbert, Thanks for the review. Should I submit formally, i.e. without the [RFC] prefix? Scott, Roy, Do you have any comments wrt. soc/qman patches or with them going through the crypto tree? Thanks, Horia
Re: [RFC PATCH v2 14/32] x86: mm: Provide support to use memblock when spliting large pages
On 10/03/2017 23:41, Brijesh Singh wrote: >> Maybe there's a reason this fires: >> >> WARNING: modpost: Found 2 section mismatch(es). >> To see full details build your kernel with: >> 'make CONFIG_DEBUG_SECTION_MISMATCH=y' >> >> WARNING: vmlinux.o(.text+0x48edc): Section mismatch in reference from >> the function __change_page_attr() to the function >> .init.text:memblock_alloc() >> The function __change_page_attr() references >> the function __init memblock_alloc(). >> This is often because __change_page_attr lacks a __init >> annotation or the annotation of memblock_alloc is wrong. >> >> WARNING: vmlinux.o(.text+0x491d1): Section mismatch in reference from >> the function __change_page_attr() to the function >> .meminit.text:memblock_free() >> The function __change_page_attr() references >> the function __meminit memblock_free(). >> This is often because __change_page_attr lacks a __meminit >> annotation or the annotation of memblock_free is wrong. >> >> But maybe Paolo might have an even better idea... > > I am sure he will have better idea :) Not sure if it's better or worse, but an alternative idea is to turn __change_page_attr and __change_page_attr_set_clr inside out, so that: 1) the alloc_pages/__free_page happens in __change_page_attr_set_clr; 2) __change_page_attr_set_clr overall does not beocome more complex. Then you can introduce __early_change_page_attr_set_clr and/or early_kernel_map_pages_in_pgd, for use in your next patches. They use the memblock allocator instead of alloc/free_page The attached patch is compile-tested only and almost certainly has some thinko in it. But it even skims a few lines from the code so the idea might have some merit. Paolo diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 28d42130243c..953c8e697562 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -490,11 +490,12 @@ static void __set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte) } static int -try_preserve_large_page(pte_t *kpte, unsigned long address, +try_preserve_large_page(pte_t **p_kpte, unsigned long address, struct cpa_data *cpa) { unsigned long nextpage_addr, numpages, pmask, psize, addr, pfn, old_pfn; - pte_t new_pte, old_pte, *tmp; + pte_t *kpte = *p_kpte; + pte_t new_pte, old_pte; pgprot_t old_prot, new_prot, req_prot; int i, do_split = 1; enum pg_level level; @@ -507,8 +508,8 @@ try_preserve_large_page(pte_t *kpte, unsigned long address, * Check for races, another CPU might have split this page * up already: */ - tmp = _lookup_address_cpa(cpa, address, ); - if (tmp != kpte) + *p_kpte = _lookup_address_cpa(cpa, address, ); + if (*p_kpte != kpte) goto out_unlock; switch (level) { @@ -634,17 +635,18 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address, unsigned int i, level; pte_t *tmp; pgprot_t ref_prot; + int retry = 1; + if (!debug_pagealloc_enabled()) + spin_lock(_lock); spin_lock(_lock); /* * Check for races, another CPU might have split this page * up for us already: */ tmp = _lookup_address_cpa(cpa, address, ); - if (tmp != kpte) { - spin_unlock(_lock); - return 1; - } + if (tmp != kpte) + goto out; paravirt_alloc_pte(_mm, page_to_pfn(base)); @@ -671,10 +673,11 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address, break; default: - spin_unlock(_lock); - return 1; + goto out; } + retry = 0; + /* * Set the GLOBAL flags only if the PRESENT flag is set * otherwise pmd/pte_present will return true even on a non @@ -718,28 +721,34 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address, * going on. */ __flush_tlb_all(); - spin_unlock(_lock); - return 0; -} - -static int split_large_page(struct cpa_data *cpa, pte_t *kpte, - unsigned long address) -{ - struct page *base; +out: + spin_unlock(_lock); + /* + * Do a global flush tlb after splitting the large page + * and before we do the actual change page attribute in the PTE. + * + * With out this, we violate the TLB application note, that says + * "The TLBs may contain both ordinary and large-page + * translations for a 4-KByte range of linear addresses. This + * may occur if software modifies the paging structures so that + * the page size used for the address range changes. If the two + * translations differ with respect to page frame or attributes + * (e.g., permissions), processor behavior is undefined and may + * be implementation-specific." + * + * We do this global tlb flush inside the cpa_lock, so that we + * don't allow any other cpu, with stale tlb entries change the + * page attribute in parallel, that also falls into the + * just split large page entry. + */ + if (!retry) + flush_tlb_all(); if (!debug_pagealloc_enabled()) spin_unlock(_lock); - base = alloc_pages(GFP_KERNEL | __GFP_NOTRACK, 0); - if (!debug_pagealloc_enabled()) - spin_lock(_lock); - if (!base) - return
Re: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm
> So although this sits in arch/powerpc, it's heavy on the crypto which is > not my area of expertise (to say the least!), so I think it should > probably go via Herbert and the crypto tree? That was my thought as well. Sorry - probably should have put that in the comments somewhere. Regards, Daniel
Re: [PATCH v5 1/2] crypto: skcipher AF_ALG - overhaul memory management
Am Donnerstag, 16. März 2017, 11:18:33 CET schrieb Stephan Müller: Hi Stephan, > Am Donnerstag, 16. März 2017, 10:52:48 CET schrieb Herbert Xu: > > Hi Herbert, > > > First of all you're only limiting the amount of memory occupied > > by the SG list which is not the same thing as the memory pinned > > down by the actual recvmsg. > > I am fully aware of that. As this was present in the code, I thought I could > reuse that approach. > > Are you saying that you want to stop this approach? I would like to bring another point to the table for this discussion relating to the maximum amount of memory to be used as well as for the size of the RX- SGL: allow us please to consider skcipher_sendmsg (sendpage works conceptually similar, so it should be covered with this discussion as well). skcipher_alloc_sgl used in the sendmsg code path is conceptually identical to the RX-SGL allocation that I propose here: it allocates a new SGL with MAX_SGL_ENTS entries as long as the caller sends data. It limits the amount of memory to be allocated based on the maximum size of the SG management data and not the actual plaintext/ciphertext data sent by user space. I again was therefore under the impression that my suggested approach in the recvmsg code path regarding allocation of memory would be allowed. Regarding the amount of data processed with the RX-SGL, I would like to consider the size of the TX-SGL. In the skcipher case, the RX-SGL (or the anticipated RX data to be processed) does not need to be larger than the TX- SGL. Hence the currently existing while() loop of the upstream code that we discuss here will only be executed as often to fulfill the available TX-SGL data size. Given that both are limited on the sock_kmalloc memory limitation, I would imply that using a conceptually identical allocation approach for the TX SGL and RX SGL would be a sufficient replacement for the while-loop without being visible to user space (i.e. without causing a limit that was not there before). Ciao Stephan
Re: [RFC PATCH v2 14/32] x86: mm: Provide support to use memblock when spliting large pages
On 02/03/2017 16:15, Brijesh Singh wrote: > > __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address, > -struct page *base) > + pte_t *pbase, unsigned long new_pfn) > { > - pte_t *pbase = (pte_t *)page_address(base); Just one comment and I'll reply to Boris, I think you can compute pbase with pfn_to_kaddr, and avoid adding a new argument. >*/ > - __set_pmd_pte(kpte, address, mk_pte(base, __pgprot(_KERNPG_TABLE))); > + __set_pmd_pte(kpte, address, > + native_make_pte((new_pfn << PAGE_SHIFT) + _KERNPG_TABLE)); And this probably is better written as: __set_pmd_pte(kpte, address, pfn_pte(new_pfn, __pgprot(_KERNPG_TABLE)); Paolo
Re: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm
Hi David, > While not part of this change, the unrolled loops look as though > they just destroy the cpu cache. > I'd like be convinced that anything does CRC over long enough buffers > to make it a gain at all. btrfs data checksumming is one area. > With modern (not that modern now) superscalar cpus you can often > get the loop instructions 'for free'. A branch on POWER8 is a three cycle redirect. The vpmsum instructions are 6 cycles. > Sometimes pipelining the loop is needed to get full throughput. > Unlike the IP checksum, you don't even have to 'loop carry' the > cpu carry flag. It went through quite a lot of simulation to reach peak performance. The loop is quite delicate, we have to pace it just right to avoid some pipeline reject conditions. Note also that we already modulo schedule the loop across three iterations, required to hide the latency of the vpmsum instructions. Anton
Re: [RFC PATCH v2 16/32] x86: kvm: Provide support to create Guest and HV shared per-CPU variables
On 02/03/2017 16:15, Brijesh Singh wrote: > Some KVM specific MSR's (steal-time, asyncpf, avic_eio) allocates per-CPU > variable at compile time and share its physical address with hypervisor. > It presents a challege when SEV is active in guest OS. When SEV is active, > guest memory is encrypted with guest key and hypervisor will no longer able > to modify the guest memory. When SEV is active, we need to clear the > encryption attribute of shared physical addresses so that both guest and > hypervisor can access the data. > > To solve this problem, I have tried these three options: > > 1) Convert the static per-CPU to dynamic per-CPU allocation. When SEV is > detected then clear the encryption attribute. But while doing so I found > that per-CPU dynamic allocator was not ready when kvm_guest_cpu_init was > called. > > 2) Since the encryption attributes works on PAGE_SIZE hence add some extra > padding to 'struct kvm-steal-time' to make it PAGE_SIZE and then at runtime > clear the encryption attribute of the full PAGE. The downside of this was > now we need to modify structure which may break the compatibility. > > 3) Define a new per-CPU section (.data..percpu.hv_shared) which will be > used to hold the compile time shared per-CPU variables. When SEV is > detected we map this section with encryption attribute cleared. > > This patch implements #3. It introduces a new DEFINE_PER_CPU_HV_SHAHRED > macro to create a compile time per-CPU variable. When SEV is detected we > map the per-CPU variable as decrypted (i.e with encryption attribute cleared). > > Signed-off-by: Brijesh SinghLooks good to me. Paolo > --- > arch/x86/kernel/kvm.c | 43 > +++-- > include/asm-generic/vmlinux.lds.h |3 +++ > include/linux/percpu-defs.h |9 > 3 files changed, 48 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index 099fcba..706a08e 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -75,8 +75,8 @@ static int parse_no_kvmclock_vsyscall(char *arg) > > early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall); > > -static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64); > -static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64); > +static DEFINE_PER_CPU_HV_SHARED(struct kvm_vcpu_pv_apf_data, apf_reason) > __aligned(64); > +static DEFINE_PER_CPU_HV_SHARED(struct kvm_steal_time, steal_time) > __aligned(64); > static int has_steal_clock = 0; > > /* > @@ -290,6 +290,22 @@ static void __init paravirt_ops_setup(void) > #endif > } > > +static int kvm_map_percpu_hv_shared(void *addr, unsigned long size) > +{ > + /* When SEV is active, the percpu static variables initialized > + * in data section will contain the encrypted data so we first > + * need to decrypt it and then map it as decrypted. > + */ > + if (sev_active()) { > + unsigned long pa = slow_virt_to_phys(addr); > + > + sme_early_decrypt(pa, size); > + return early_set_memory_decrypted(addr, size); > + } > + > + return 0; > +} > + > static void kvm_register_steal_time(void) > { > int cpu = smp_processor_id(); > @@ -298,12 +314,17 @@ static void kvm_register_steal_time(void) > if (!has_steal_clock) > return; > > + if (kvm_map_percpu_hv_shared(st, sizeof(*st))) { > + pr_err("kvm-stealtime: failed to map hv_shared percpu\n"); > + return; > + } > + > wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED)); > pr_info("kvm-stealtime: cpu %d, msr %llx\n", > cpu, (unsigned long long) slow_virt_to_phys(st)); > } > > -static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED; > +static DEFINE_PER_CPU_HV_SHARED(unsigned long, kvm_apic_eoi) = > KVM_PV_EOI_DISABLED; > > static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val) > { > @@ -327,25 +348,33 @@ static void kvm_guest_cpu_init(void) > if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) { > u64 pa = slow_virt_to_phys(this_cpu_ptr(_reason)); > > + if (kvm_map_percpu_hv_shared(this_cpu_ptr(_reason), > + sizeof(struct kvm_vcpu_pv_apf_data))) > + goto skip_asyncpf; > #ifdef CONFIG_PREEMPT > pa |= KVM_ASYNC_PF_SEND_ALWAYS; > #endif > wrmsrl(MSR_KVM_ASYNC_PF_EN, pa | KVM_ASYNC_PF_ENABLED); > __this_cpu_write(apf_reason.enabled, 1); > - printk(KERN_INFO"KVM setup async PF for cpu %d\n", > -smp_processor_id()); > + printk(KERN_INFO"KVM setup async PF for cpu %d msr %llx\n", > +smp_processor_id(), pa); > } > - > +skip_asyncpf: > if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) { > unsigned long pa;
Re: [RFC PATCH v2 30/32] kvm: svm: Add support for SEV DEBUG_ENCRYPT command
On 02/03/2017 16:18, Brijesh Singh wrote: > + data = (void *) get_zeroed_page(GFP_KERNEL); The page does not need to be zeroed, does it? > + > + if ((len & 15) || (dst_addr & 15)) { > + /* if destination address and length are not 16-byte > + * aligned then: > + * a) decrypt destination page into temporary buffer > + * b) copy source data into temporary buffer at correct offset > + * c) encrypt temporary buffer > + */ > + ret = __sev_dbg_decrypt_page(kvm, dst_addr, data, >error); Ah, I see now you're using this function here for read-modify-write. data is already pinned here, so even if you keep the function it makes sense to push pinning out of __sev_dbg_decrypt_page and into sev_dbg_decrypt. > + if (ret) > + goto err_3; > + d_off = dst_addr & (PAGE_SIZE - 1); > + > + if (copy_from_user(data + d_off, > + (uint8_t *)debug.src_addr, len)) { > + ret = -EFAULT; > + goto err_3; > + } > + > + encrypt->length = PAGE_SIZE; Why decrypt/re-encrypt all the page instead of just the 16 byte area around the [dst_addr, dst_addr+len) range? > + encrypt->src_addr = __psp_pa(data); > + encrypt->dst_addr = __sev_page_pa(inpages[0]); > + } else { > + if (copy_from_user(data, (uint8_t *)debug.src_addr, len)) { > + ret = -EFAULT; > + goto err_3; > + } Do you need copy_from_user, or can you just pin/unpin memory as for DEBUG_DECRYPT? Paolo > + d_off = dst_addr & (PAGE_SIZE - 1); > + encrypt->length = len; > + encrypt->src_addr = __psp_pa(data); > + encrypt->dst_addr = __sev_page_pa(inpages[0]); > + encrypt->dst_addr += d_off; > + } > + > + encrypt->handle = sev_get_handle(kvm); > + ret = sev_issue_cmd(kvm, SEV_CMD_DBG_ENCRYPT, encrypt, >error);
Re: [RFC PATCH v2 29/32] kvm: svm: Add support for SEV DEBUG_DECRYPT command
On 02/03/2017 16:18, Brijesh Singh wrote: > +static int __sev_dbg_decrypt_page(struct kvm *kvm, unsigned long src, > + void *dst, int *error) > +{ > + inpages = sev_pin_memory(src, PAGE_SIZE, ); > + if (!inpages) { > + ret = -ENOMEM; > + goto err_1; > + } > + > + data->handle = sev_get_handle(kvm); > + data->dst_addr = __psp_pa(dst); > + data->src_addr = __sev_page_pa(inpages[0]); > + data->length = PAGE_SIZE; > + > + ret = sev_issue_cmd(kvm, SEV_CMD_DBG_DECRYPT, data, error); > + if (ret) > + printk(KERN_ERR "SEV: DEBUG_DECRYPT %d (%#010x)\n", > + ret, *error); > + sev_unpin_memory(inpages, npages); > +err_1: > + kfree(data); > + return ret; > +} > + > +static int sev_dbg_decrypt(struct kvm *kvm, struct kvm_sev_cmd *argp) > +{ > + void *data; > + int ret, offset, len; > + struct kvm_sev_dbg debug; > + > + if (!sev_guest(kvm)) > + return -ENOTTY; > + > + if (copy_from_user(, (void *)argp->data, > + sizeof(struct kvm_sev_dbg))) > + return -EFAULT; > + /* > + * TODO: add support for decrypting length which crosses the > + * page boundary. > + */ > + offset = debug.src_addr & (PAGE_SIZE - 1); > + if (offset + debug.length > PAGE_SIZE) > + return -EINVAL; > + Please do add it, it doesn't seem very different from what you're doing in LAUNCH_UPDATE_DATA. There's no need for a separate __sev_dbg_decrypt_page function, you can just pin/unpin here and do a per-page loop as in LAUNCH_UPDATE_DATA. Paolo
Re: [RFC PATCH v2 26/32] kvm: svm: Add support for SEV LAUNCH_UPDATE_DATA command
On 02/03/2017 16:17, Brijesh Singh wrote: > +static struct page **sev_pin_memory(unsigned long uaddr, unsigned long ulen, > + unsigned long *n) > +{ > + struct page **pages; > + int first, last; > + unsigned long npages, pinned; > + > + /* Get number of pages */ > + first = (uaddr & PAGE_MASK) >> PAGE_SHIFT; > + last = ((uaddr + ulen - 1) & PAGE_MASK) >> PAGE_SHIFT; > + npages = (last - first + 1); > + > + pages = kzalloc(npages * sizeof(struct page *), GFP_KERNEL); > + if (!pages) > + return NULL; > + > + /* pin the user virtual address */ > + down_read(>mm->mmap_sem); > + pinned = get_user_pages_fast(uaddr, npages, 1, pages); > + up_read(>mm->mmap_sem); get_user_pages_fast, like get_user_pages_unlocked, must be called without mmap_sem held. > + if (pinned != npages) { > + printk(KERN_ERR "SEV: failed to pin %ld pages (got %ld)\n", > + npages, pinned); > + goto err; > + } > + > + *n = npages; > + return pages; > +err: > + if (pinned > 0) > + release_pages(pages, pinned, 0); > + kfree(pages); > + > + return NULL; > +} > > + /* the array of pages returned by get_user_pages() is a page-aligned > + * memory. Since the user buffer is probably not page-aligned, we need > + * to calculate the offset within a page for first update entry. > + */ > + offset = uaddr & (PAGE_SIZE - 1); > + len = min_t(size_t, (PAGE_SIZE - offset), ulen); > + ulen -= len; > + > + /* update first page - > + * special care need to be taken for the first page because we might > + * be dealing with offset within the page > + */ No need to special case the first page; just set "offset = 0" inside the loop after the first iteration. Paolo > + data->handle = sev_get_handle(kvm); > + data->length = len; > + data->address = __sev_page_pa(inpages[0]) + offset; > + ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_DATA, > + data, >error); > + if (ret) > + goto err_3; > + > + /* update remaining pages */ > + for (i = 1; i < nr_pages; i++) { > + > + len = min_t(size_t, PAGE_SIZE, ulen); > + ulen -= len; > + data->length = len; > + data->address = __sev_page_pa(inpages[i]); > + ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_DATA, > + data, >error); > + if (ret) > + goto err_3; > + }
Re: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm
Daniel Axtenswrites: > The core nuts and bolts of the crc32c vpmsum algorithm will > also work for a number of other CRC algorithms with different > polynomials. Factor out the function into a new asm file. > > To handle multiple users of the function, a user simply > provides constants, defines the name of their CRC function, > and then #includes the core algorithm file. > > Cc: Anton Blanchard > Signed-off-by: Daniel Axtens > > -- > > It's possible at this point to argue that the address > of the constant tables should be passed in to the function, > rather than doing this somewhat unconventional #include. > > However, we're about to add further #ifdef's back into the core > that will be provided by the encapsulaing code, and which couldn't > be done as a variable without performance loss. > --- > arch/powerpc/crypto/crc32-vpmsum_core.S | 726 > > arch/powerpc/crypto/crc32c-vpmsum_asm.S | 714 +-- > 2 files changed, 729 insertions(+), 711 deletions(-) > create mode 100644 arch/powerpc/crypto/crc32-vpmsum_core.S So although this sits in arch/powerpc, it's heavy on the crypto which is not my area of expertise (to say the least!), so I think it should probably go via Herbert and the crypto tree? cheers
Re: [PATCH 2/4] crypto: powerpc - Re-enable non-REFLECTed CRCs
Daniel Axtenswrites: > When CRC32c was included in the kernel, Anton ripped out > the #ifdefs around reflected polynomials, because CRC32c > is always reflected. However, not all CRCs use reflection > so we'd like to make it optional. > > Restore the REFLECT parts from Anton's original CRC32 > implementation (https://github.com/antonblanchard/crc32-vpmsum) > > That implementation is available under GPLv2+, so we're OK > from a licensing point of view: > https://github.com/antonblanchard/crc32-vpmsum/blob/master/LICENSE.TXT It's also written by Anton and copyright IBM, so you (we (IBM)) could always just relicense it anyway. So doubly OK IMO. cheers
Re: [RFC PATCH v2 32/32] x86: kvm: Pin the guest memory when SEV is active
On 02/03/2017 16:18, Brijesh Singh wrote: > The SEV memory encryption engine uses a tweak such that two identical > plaintexts at different location will have a different ciphertexts. > So swapping or moving ciphertexts of two pages will not result in > plaintexts being swapped. Relocating (or migrating) a physical backing pages > for SEV guest will require some additional steps. The current SEV key > management spec [1] does not provide commands to swap or migrate (move) > ciphertexts. For now we pin the memory allocated for the SEV guest. In > future when SEV key management spec provides the commands to support the > page migration we can update the KVM code to remove the pinning logical > without making any changes into userspace (qemu). > > The patch pins userspace memory when a new slot is created and unpin the > memory when slot is removed. > > [1] http://support.amd.com/TechDocs/55766_SEV-KM%20API_Spec.pdf This is not enough, because memory can be hidden temporarily from the guest and remapped later. Think of a PCI BAR that is backed by RAM, or also SMRAM. The pinning must be kept even in that case. You need to add a pair of KVM_MEMORY_ENCRYPT_OPs (one that doesn't map to a PSP operation), such as KVM_REGISTER/UNREGISTER_ENCRYPTED_RAM. In QEMU you can use a RAMBlockNotifier to invoke the ioctls. Paolo > Signed-off-by: Brijesh Singh> --- > arch/x86/include/asm/kvm_host.h |6 +++ > arch/x86/kvm/svm.c | 93 > +++ > arch/x86/kvm/x86.c |3 + > 3 files changed, 102 insertions(+) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index fcc4710..9dc59f0 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -723,6 +723,7 @@ struct kvm_sev_info { > unsigned int handle;/* firmware handle */ > unsigned int asid; /* asid for this guest */ > int sev_fd; /* SEV device fd */ > + struct list_head pinned_memory_slot; > }; > > struct kvm_arch { > @@ -1043,6 +1044,11 @@ struct kvm_x86_ops { > void (*setup_mce)(struct kvm_vcpu *vcpu); > > int (*memory_encryption_op)(struct kvm *kvm, void __user *argp); > + > + void (*prepare_memory_region)(struct kvm *kvm, > + struct kvm_memory_slot *memslot, > + const struct kvm_userspace_memory_region *mem, > + enum kvm_mr_change change); > }; > > struct kvm_arch_async_pf { > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index 13996d6..ab973f9 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -498,12 +498,21 @@ static inline bool gif_set(struct vcpu_svm *svm) > } > > /* Secure Encrypted Virtualization */ > +struct kvm_sev_pinned_memory_slot { > + struct list_head list; > + unsigned long npages; > + struct page **pages; > + unsigned long userspace_addr; > + short id; > +}; > + > static unsigned int max_sev_asid; > static unsigned long *sev_asid_bitmap; > static void sev_deactivate_handle(struct kvm *kvm); > static void sev_decommission_handle(struct kvm *kvm); > static int sev_asid_new(void); > static void sev_asid_free(int asid); > +static void sev_unpin_memory(struct page **pages, unsigned long npages); > #define __sev_page_pa(x) ((page_to_pfn(x) << PAGE_SHIFT) | sme_me_mask) > > static bool kvm_sev_enabled(void) > @@ -1544,9 +1553,25 @@ static inline int avic_free_vm_id(int id) > > static void sev_vm_destroy(struct kvm *kvm) > { > + struct list_head *pos, *q; > + struct kvm_sev_pinned_memory_slot *pinned_slot; > + struct list_head *head = >arch.sev_info.pinned_memory_slot; > + > if (!sev_guest(kvm)) > return; > > + /* if guest memory is pinned then unpin it now */ > + if (!list_empty(head)) { > + list_for_each_safe(pos, q, head) { > + pinned_slot = list_entry(pos, > + struct kvm_sev_pinned_memory_slot, list); > + sev_unpin_memory(pinned_slot->pages, > + pinned_slot->npages); > + list_del(pos); > + kfree(pinned_slot); > + } > + } > + > /* release the firmware resources */ > sev_deactivate_handle(kvm); > sev_decommission_handle(kvm); > @@ -5663,6 +5688,8 @@ static int sev_pre_start(struct kvm *kvm, int *asid) > } > *asid = ret; > ret = 0; > + > + INIT_LIST_HEAD(>arch.sev_info.pinned_memory_slot); > } > > return ret; > @@ -6189,6 +6216,71 @@ static int sev_launch_measure(struct kvm *kvm, struct > kvm_sev_cmd *argp) > return ret; > } > > +static struct kvm_sev_pinned_memory_slot *sev_find_pinned_memory_slot( > + struct kvm *kvm, struct kvm_memory_slot *slot) > +{ > + struct
Re: [RFC PATCH v2 24/32] kvm: x86: prepare for SEV guest management API support
On 02/03/2017 16:17, Brijesh Singh wrote: > ASID management: > - Reserve asid range for SEV guest, SEV asid range is obtained through >CPUID Fn8000_001f[ECX]. A non-SEV guest can use any asid outside the SEV >asid range. How is backwards compatibility handled? > - SEV guest must have asid value within asid range obtained through CPUID. > - SEV guest must have the same asid for all vcpu's. A TLB flush is required >if different vcpu for the same ASID is to be run on the same host CPU. [...] > + > + /* which host cpu was used for running this vcpu */ > + bool last_cpuid; Should be unsigned int. > > + /* Assign the asid allocated for this SEV guest */ > + svm->vmcb->control.asid = asid; > + > + /* Flush guest TLB: > + * - when different VMCB for the same ASID is to be run on the > + * same host CPU > + * or > + * - this VMCB was executed on different host cpu in previous VMRUNs. > + */ > + if (sd->sev_vmcbs[asid] != (void *)svm->vmcb || Why the cast? > + svm->last_cpuid != cpu) > + svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ALL_ASID; If there is a match, you don't need to do anything else (neither reset the asid, nor mark it as dirty, nor update the fields), so: if (sd->sev_vmcbs[asid] == svm->vmcb && svm->last_cpuid == cpu) return; svm->last_cpuid = cpu; sd->sev_vmcbs[asid] = svm->vmcb; svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ALL_ASID; svm->vmcb->control.asid = asid; mark_dirty(svm->vmcb, VMCB_ASID); (plus comments ;)). Also, why not TLB_CONTROL_FLUSH_ASID if possible? > + svm->last_cpuid = cpu; > + sd->sev_vmcbs[asid] = (void *)svm->vmcb; > + > + mark_dirty(svm->vmcb, VMCB_ASID); [...] > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index fef7d83..9df37a2 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -1284,6 +1284,104 @@ struct kvm_s390_ucas_mapping { > /* Memory Encryption Commands */ > #define KVM_MEMORY_ENCRYPT_OP _IOWR(KVMIO, 0xb8, unsigned long) > > +/* Secure Encrypted Virtualization mode */ > +enum sev_cmd_id { Please add documentation in Documentation/virtual/kvm/memory_encrypt.txt. Paolo
Re: [RFC PATCH v2 23/32] kvm: introduce KVM_MEMORY_ENCRYPT_OP ioctl
On 02/03/2017 16:17, Brijesh Singh wrote: > If hardware supports encrypting then KVM_MEMORY_ENCRYPT_OP ioctl can > be used by qemu to issue platform specific memory encryption commands. > > Signed-off-by: Brijesh Singh> --- > arch/x86/include/asm/kvm_host.h |2 ++ > arch/x86/kvm/x86.c | 12 > include/uapi/linux/kvm.h|2 ++ > 3 files changed, 16 insertions(+) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index bff1f15..62651ad 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1033,6 +1033,8 @@ struct kvm_x86_ops { > void (*cancel_hv_timer)(struct kvm_vcpu *vcpu); > > void (*setup_mce)(struct kvm_vcpu *vcpu); > + > + int (*memory_encryption_op)(struct kvm *kvm, void __user *argp); > }; > > struct kvm_arch_async_pf { > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 2099df8..6a737e9 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -3926,6 +3926,14 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, > return r; > } > > +static int kvm_vm_ioctl_memory_encryption_op(struct kvm *kvm, void __user > *argp) > +{ > + if (kvm_x86_ops->memory_encryption_op) > + return kvm_x86_ops->memory_encryption_op(kvm, argp); > + > + return -ENOTTY; > +} > + > long kvm_arch_vm_ioctl(struct file *filp, > unsigned int ioctl, unsigned long arg) > { > @@ -4189,6 +4197,10 @@ long kvm_arch_vm_ioctl(struct file *filp, > r = kvm_vm_ioctl_enable_cap(kvm, ); > break; > } > + case KVM_MEMORY_ENCRYPT_OP: { > + r = kvm_vm_ioctl_memory_encryption_op(kvm, argp); > + break; > + } > default: > r = kvm_vm_ioctl_assigned_device(kvm, ioctl, arg); > } > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index cac48ed..fef7d83 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -1281,6 +1281,8 @@ struct kvm_s390_ucas_mapping { > #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct > kvm_s390_irq_state) > /* Available with KVM_CAP_X86_SMM */ > #define KVM_SMI _IO(KVMIO, 0xb7) > +/* Memory Encryption Commands */ > +#define KVM_MEMORY_ENCRYPT_OP _IOWR(KVMIO, 0xb8, unsigned long) > > #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) > #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) > Reviewed-by: Paolo Bonzini
RE: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm
From: Daniel Axtens > Sent: 15 March 2017 22:30 > Hi David, > > > While not part of this change, the unrolled loops look as though > > they just destroy the cpu cache. > > I'd like be convinced that anything does CRC over long enough buffers > > to make it a gain at all. > > > > With modern (not that modern now) superscalar cpus you can often > > get the loop instructions 'for free'. > > Sometimes pipelining the loop is needed to get full throughput. > > Unlike the IP checksum, you don't even have to 'loop carry' the > > cpu carry flag. > > Internal testing on a NVMe device with T10DIF enabled on 4k blocks > shows a 20x - 30x improvement. Without these patches, crc_t10dif_generic > uses over 60% of CPU time - with these patches CRC drops to single > digits. > > I should probably have lead with that, sorry. I'm not doubting that using the cpu instruction for crcs gives a massive performance boost. Just that the heavily unrolled loop is unlikely to help overall. Some 'cold cache' tests on shorter buffers might be illuminating. > FWIW, the original patch showed a 3.7x gain on btrfs as well - > 6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c") > > When Anton wrote the original code he had access to IBM's internal > tooling for looking at how instructions flow through the various stages > of the CPU, so I trust it's pretty much optimal from that point of view. Doesn't mean he used it :-) David
Re: [PATCH v5 1/2] crypto: skcipher AF_ALG - overhaul memory management
Am Donnerstag, 16. März 2017, 10:52:48 CET schrieb Herbert Xu: Hi Herbert, > First of all you're only limiting the amount of memory occupied > by the SG list which is not the same thing as the memory pinned > down by the actual recvmsg. I am fully aware of that. As this was present in the code, I thought I could reuse that approach. Are you saying that you want to stop this approach? > > More importantly, with the current code, a very large recvmsg > would still work by doing it piecemeal. This piecemeal would be done in the sync case. In the current AIO code path, such piecemeal would not be achieved. > With your patch, won't > it fail because sock_kmalloc could fail to allocate memory for > the whole thing? I this case yes. To handle AIO and sync in a common code path (and thus make both behave identically), this would be the result. If very large recvmsg requests are to be handled, the socket option could be increased or the caller would piecemeal that request. Otherwise, if do a piecemeal handling like it is done now, I see the following drawbacks: - no common sync/AIO code and no common behavior - no common approach between skcipher and other implementations (algif_aead and the algif_akcipher that I have in the pipe which looks identically to the proposed patchset) Ciao Stephan
Re: [RFC PATCH v2 12/32] x86: Add early boot support when running with SEV active
On Fri, Mar 10, 2017 at 10:35:30AM -0600, Brijesh Singh wrote: > We could update this patch to use the below logic: > > * CPUID(0) - Check for AuthenticAMD > * CPID(1) - Check if under hypervisor > * CPUID(0x8000) - Check for highest supported leaf > * CPUID(0x801F).EAX - Check for SME and SEV support > * rdmsr (MSR_K8_SYSCFG)[MemEncryptionModeEnc] - Check if SMEE is set Actually, it is still not clear to me *why* we need to do anything special wrt SEV in the guest. Lemme clarify: why can't the guest boot just like a normal Linux on baremetal and use the SME(!) detection code to set sme_enable and so on? IOW, I'd like to avoid all those checks whether we're running under hypervisor and handle all that like we're running on baremetal. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --
Re: [PATCH] crypto: doc - fix typo (struct sdesc)
Fabien Dessennewrote: > Add missing " " in api-samples.rst > > Signed-off-by: Fabien Dessenne Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v2] MAINTAINERS: Add maintianer entry for crypto/s5p-sss
On Sat, Mar 11, 2017 at 08:11:00AM +0200, Krzysztof Kozlowski wrote: > Add Krzysztof Kozlowski and Vladimir Zapolskiy as maintainers of s5p-sss > driver for handling reviews, testing and getting bug reports from the > users. > > Cc: Vladimir Zapolskiy> Cc: Herbert Xu > Signed-off-by: Krzysztof Kozlowski Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v1 0/8] improve performances on mediatek crypto driver
On Thu, Mar 09, 2017 at 10:11:11AM +0800, Ryder Lee wrote: > Hi all, > > Some patches of this series improve the performances whereas others > clean up code and refine data structure to make it more efficient > > Changes since v1: > - drop OFB and CFB patch > > Ryder Lee (8): > crypto: mediatek - rework interrupt handler > crypto: mediatek - add MTK_* prefix and correct annotations. > crypto: mediatek - make mtk_sha_xmit() more generic > crypto: mediatek - simplify descriptor ring management > crypto: mediatek - add queue_task tasklet > crypto: mediatek - fix error handling in mtk_aes_complete() > crypto: mediatek - add mtk_aes_gcm_tag_verify() > crypto: mediatek - make hardware operation flow more efficient All applied. Thanks. -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH 0/4] hwrng: omap - fixes and improvements
On Tue, Mar 07, 2017 at 03:14:45PM +0100, Thomas Petazzoni wrote: > Hello, > > This small patch series brings a few fixes and improvements to the > omap_rng driver. The first fix is particularly important, as it fixes > using the driver built as a module on SoCs that require a clock for > the IP to work properly. > > Thanks, > > Thomas > > Thomas Petazzoni (4): > hwrng: omap - write registers after enabling the clock > hwrng: omap - use devm_clk_get() instead of of_clk_get() > hwrng: omap - Do not access INTMASK_REG on EIP76 > hwrng: omap - move clock related code to omap_rng_probe() Patches 1-3 applied to crypto and patch 4 applied to cryptodev. Thanks, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v3 1/3] clk: meson-gxbb: expose clock CLKID_RNG0
On Wed, Feb 22, 2017 at 07:55:24AM +0100, Heiner Kallweit wrote: > Expose clock CLKID_RNG0 which is needed for the HW random number generator. > > Signed-off-by: Heiner KallweitAll patches applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH 0/2] hwrng: revert managed API changes for amd and geode
On Tue, Mar 14, 2017 at 07:36:00AM -0400, Prarit Bhargava wrote: > When booting top-of-tree the following WARN_ON triggers in the kernel on > a 15h AMD system. > > WARNING: CPU: 2 PID: 621 at drivers/base/dd.c:349 driver_probe_device+0x38c > Modules linked in: i2c_amd756(+) amd_rng sg pcspkr parport_pc(+) parport k8 > CPU: 2 PID: 621 Comm: systemd-udevd Not tainted 4.11.0-0.rc1.git0.1.el7_UNS > Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./TYAN High-End > Call Trace: > dump_stack+0x63/0x8e > __warn+0xd1/0xf0 > warn_slowpath_null+0x1d/0x20 > driver_probe_device+0x38c/0x470 > __driver_attach+0xc9/0xf0 > ? driver_probe_device+0x470/0x470 > bus_for_each_dev+0x5d/0x90 > driver_attach+0x1e/0x20 > bus_add_driver+0x1d0/0x290 > driver_register+0x60/0xe0 > ? 0xa0037000 > __pci_register_driver+0x4c/0x50 > amd756_driver_init+0x1e/0x1000 [i2c_amd756] > do_one_initcall+0x51/0x1b0 > ? __vunmap+0x85/0xd0 > ? do_init_module+0x27/0x1fa > do_init_module+0x60/0x1fa > load_module+0x15d1/0x1ad0 > ? m_show+0x1c0/0x1c0 > SYSC_finit_module+0xa9/0xd0 > > There are PCI devices that contain both a RNG and SMBUS device. The > RNG device is initialized by the amd-rng driver but the driver does not > register against the device. The SMBUS device is initialized by the > i2c-amd756 driver and registers against the device and hits the WARN_ON() > because the amd-rng driver has already allocated resources against the > device. > > The amd-rng driver was incorrectly migrated to the device resource model > (devres), and after code inspection I found that the geode-rng driver was also > incorrectly migrated. These drivers are using devres but do not register a > driver against the device, and both drivers are expecting a memory cleanup on > a driver detach that will never happen. This results in a memory leak when > the > driver is unloaded and the inability to reload the driver. > > Revert 31b2a73c9c5f ("hwrng: amd - Migrate to managed API"), and 6e9b5e76882c > ("hwrng: geode - Migrate to managed API"). > > Signed-off-by: Prarit Bhargava> Fixes: 31b2a73c9c5f ("hwrng: amd - Migrate to managed API"). > Fixes: 6e9b5e76882c ("hwrng: geode - Migrate to managed API") > Cc: Matt Mackall > Cc: Herbert Xu > Cc: Corentin LABBE > Cc: PrasannaKumar Muralidharan > Cc: Wei Yongjun > Cc: linux-crypto@vger.kernel.org > Cc: linux-ge...@lists.infradead.org Both patches applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: ccp - Assign DMA commands to the channel's CCP
On Fri, Mar 10, 2017 at 12:28:18PM -0600, Gary R Hook wrote: > From: Gary R Hook> > The CCP driver generally uses a round-robin approach when > assigning operations to available CCPs. For the DMA engine, > however, the DMA mappings of the SGs are associated with a > specific CCP. When an IOMMU is enabled, the IOMMU is > programmed based on this specific device. > > If the DMA operations are not performed by that specific > CCP then addressing errors and I/O page faults will occur. > > Update the CCP driver to allow a specific CCP device to be > requested for an operation and use this in the DMA engine > support. > > Cc: # 4.9.x- > Signed-off-by: Gary R Hook Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v5 1/2] crypto: skcipher AF_ALG - overhaul memory management
On Thu, Mar 16, 2017 at 10:23:49AM +0100, Stephan Müller wrote: > Am Donnerstag, 16. März 2017, 10:08:23 CET schrieb Herbert Xu: > > Hi Herbert, > > > On Thu, Mar 16, 2017 at 09:55:17AM +0100, Stephan Müller wrote: > > > With this approach I thought that the while loop could be a thing of the > > > past, considering that this is also the approach taken in > > > skcipher_recvmsg_async that is present in the current upstream code base. > > > > The reason there is a limit is so that user-space doesn't pin down > > unlimited amounts of memory. How is this addressed under your > > scheme? > > sock_kmalloc limits the number of SG tables that we can manage. On my system, > sock_kmalloc has 20480 bytes at its disposal as the limiting factor. > > As this concept for limiting the impact of user space on kernel memory is > used > in the current upstream skcipher_recvmsg_async, I simply re-used that > approach: > > while (iov_iter_count(>msg_iter)) { > struct skcipher_async_rsgl *rsgl; > ... > rsgl = kmalloc(sizeof(*rsgl), GFP_KERNEL); > ... > > Note I use sock_kmalloc instead of the currently existing kmalloc to ensure > such limitation. First of all you're only limiting the amount of memory occupied by the SG list which is not the same thing as the memory pinned down by the actual recvmsg. More importantly, with the current code, a very large recvmsg would still work by doing it piecemeal. With your patch, won't it fail because sock_kmalloc could fail to allocate memory for the whole thing? Cheers, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v5 1/2] crypto: skcipher AF_ALG - overhaul memory management
Am Donnerstag, 16. März 2017, 10:08:23 CET schrieb Herbert Xu: Hi Herbert, > On Thu, Mar 16, 2017 at 09:55:17AM +0100, Stephan Müller wrote: > > With this approach I thought that the while loop could be a thing of the > > past, considering that this is also the approach taken in > > skcipher_recvmsg_async that is present in the current upstream code base. > > The reason there is a limit is so that user-space doesn't pin down > unlimited amounts of memory. How is this addressed under your > scheme? sock_kmalloc limits the number of SG tables that we can manage. On my system, sock_kmalloc has 20480 bytes at its disposal as the limiting factor. As this concept for limiting the impact of user space on kernel memory is used in the current upstream skcipher_recvmsg_async, I simply re-used that approach: while (iov_iter_count(>msg_iter)) { struct skcipher_async_rsgl *rsgl; ... rsgl = kmalloc(sizeof(*rsgl), GFP_KERNEL); ... Note I use sock_kmalloc instead of the currently existing kmalloc to ensure such limitation. Also, please note that this approach is identical to the currently used code in algif_aead:aead_recvmsg_sync. This implementation led be to belief that this my new code would be appropriate in this case, too. Ciao Stephan
Re: [PATCH v5 1/2] crypto: skcipher AF_ALG - overhaul memory management
On Thu, Mar 16, 2017 at 09:55:17AM +0100, Stephan Müller wrote: > > With this approach I thought that the while loop could be a thing of the > past, > considering that this is also the approach taken in skcipher_recvmsg_async > that is present in the current upstream code base. The reason there is a limit is so that user-space doesn't pin down unlimited amounts of memory. How is this addressed under your scheme? Cheers, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v5 1/2] crypto: skcipher AF_ALG - overhaul memory management
Am Donnerstag, 16. März 2017, 09:39:23 CET schrieb Herbert Xu: Hi Herbert, > On Fri, Feb 17, 2017 at 11:31:41PM +0100, Stephan Müller wrote: > > + } else { > > + /* Synchronous operation */ > > + skcipher_request_set_callback(>req, > > + CRYPTO_TFM_REQ_MAY_SLEEP | > > + CRYPTO_TFM_REQ_MAY_BACKLOG, > > + af_alg_complete, > > + >completion); > > + err = af_alg_wait_for_completion(ctx->enc ? > > + crypto_skcipher_encrypt(>req) : > > + crypto_skcipher_decrypt(>req), > > +>completion); > > + } > > This is now outside of the loop for the sync case. The purpose > of the loop in the sync case was to segment the data when we get > a very large SG list that does not fit inside a single call. > > Or did I miss something? The while loop present in the skcipher_recvmsg_sync present in the current upstream kernel operates on ctx->rsgl. That data structure is defined as: struct af_alg_sgl { struct scatterlist sg[ALG_MAX_PAGES + 1]; struct page *pages[ALG_MAX_PAGES]; unsigned int npages; }; I.e. recvmsg operates on at most 16 SGs where each can take one page of data. Thus, if a user provides more data than 16 pages, such while loop would make much sense. The patch proposed here is not limited on a fixed set of 16 SGs in the RX-SGL. The recvmsg code path allocates the required amount of SGLs space: /* convert iovecs of output buffers into RX SGL */ while (len < ctx->used && iov_iter_count(>msg_iter)) { struct skcipher_rsgl *rsgl; ... rsgl = sock_kmalloc(sk, sizeof(*rsgl), GFP_KERNEL); ... struct skcipher_rsgl uses the af_alg_sgl data structure with its 16 SG entries. But now you can have arbitrary numbers of af_alg_sgl instances up to the memory provided by sock_kmalloc. I.e. recvmsg can now operate on multiples of af_alg_sgl in one go. With this approach I thought that the while loop could be a thing of the past, considering that this is also the approach taken in skcipher_recvmsg_async that is present in the current upstream code base. > > Overall I like this patch. Thank you very much. > > Thanks, Ciao Stephan
Re: [RFC 0/7] crypto: caam - add Queue Interface (QI) support
On Fri, Mar 03, 2017 at 04:52:06PM +0200, Horia Geantă wrote: > The patchset adds support for CAAM Queue Interface (QI), the additional > interface (besides job ring) available for submitting jobs to the engine > on platforms having DPAA (Datapath Acceleration Architecture). > > Patches 1-4 are QMan dependencies. > I would prefer to take them through the crypto tree, > but I am open to suggestions. > > Patch 5 adds a missing double inclusion guard in desc_constr.h. > > Patch 6 adds the caam/qi job submission backend. > > Patch 7 adds algorithms (ablkcipher and authenc) that run on top > of caam/qi. For now, their priority is set lower than caam/jr. I'm fine with the crypto bits. Cheers, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH v5 1/2] crypto: skcipher AF_ALG - overhaul memory management
On Fri, Feb 17, 2017 at 11:31:41PM +0100, Stephan Müller wrote: > > + } else { > + /* Synchronous operation */ > + skcipher_request_set_callback(>req, > + CRYPTO_TFM_REQ_MAY_SLEEP | > + CRYPTO_TFM_REQ_MAY_BACKLOG, > + af_alg_complete, > + >completion); > + err = af_alg_wait_for_completion(ctx->enc ? > + crypto_skcipher_encrypt(>req) : > + crypto_skcipher_decrypt(>req), > + >completion); > + } This is now outside of the loop for the sync case. The purpose of the loop in the sync case was to segment the data when we get a very large SG list that does not fit inside a single call. Or did I miss something? Overall I like this patch. Thanks, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt