date:20170316

[PATCH 3/7] x86, LLVM: suppress clang warnings about unaligned accesses

2017-03-16 Thread Michael Davidson

Suppress clang warnings about potential unaliged accesses
to members in packed structs. This gets rid of almost 10,000
warnings about accesses to the ring 0 stack pointer in the TSS.

Signed-off-by: Michael Davidson 
---
 arch/x86/Makefile | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 894a8d18bf97..7f21703c475d 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -128,6 +128,11 @@ endif
 KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args)
 endif
 
+ifeq ($(cc-name),clang)
+# Suppress clang warnings about potential unaligned accesses.
+KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)
+endif
+
 ifdef CONFIG_X86_X32
x32_ld_ok := $(call try-run,\
/bin/echo -e '1: .quad 1b' | \
-- 
2.12.0.367.g23dc2f6d3c-goog

[PATCH 5/7] x86, boot, LLVM: Use regparm=0 for memcpy and memset

2017-03-16 Thread Michael Davidson

Use the standard regparm=0 calling convention for memcpy and
memset when building with clang.

This is a work around for a long standing clang bug
(see https://llvm.org/bugs/show_bug.cgi?id=3997) where
clang always uses the standard regparm=0 calling convention
for any implcit calls to memcpy and memset that it generates
(eg for structure assignments and initialization) even if an
alternate calling convention such as regparm=3 has been specified.

Signed-off-by: Michael Davidson 
---
 arch/x86/boot/copy.S   | 15 +--
 arch/x86/boot/string.h | 13 +
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/copy.S b/arch/x86/boot/copy.S
index 1eb7d298b47d..57142d1ad0d2 100644
--- a/arch/x86/boot/copy.S
+++ b/arch/x86/boot/copy.S
@@ -18,6 +18,12 @@
.text
 
 GLOBAL(memcpy)
+#ifdef __clang__   /* Use normal ABI calling conventions */
+   movw4(%esp), %ax
+   movw8(%esp), %dx
+   movw12(%esp), %cx
+#endif
+_memcpy:
pushw   %si
pushw   %di
movw%ax, %di
@@ -34,6 +40,11 @@ GLOBAL(memcpy)
 ENDPROC(memcpy)
 
 GLOBAL(memset)
+#ifdef __clang__   /* Use normal ABI calling conventions */
+   movw4(%esp), %ax
+   movw8(%esp), %dx
+   movw12(%esp), %cx
+#endif
pushw   %di
movw%ax, %di
movzbl  %dl, %eax
@@ -52,7 +63,7 @@ GLOBAL(copy_from_fs)
pushw   %ds
pushw   %fs
popw%ds
-   calll   memcpy
+   calll   _memcpy
popw%ds
retl
 ENDPROC(copy_from_fs)
@@ -61,7 +72,7 @@ GLOBAL(copy_to_fs)
pushw   %es
pushw   %fs
popw%es
-   calll   memcpy
+   calll   _memcpy
popw%es
retl
 ENDPROC(copy_to_fs)
diff --git a/arch/x86/boot/string.h b/arch/x86/boot/string.h
index 113588ddb43f..e735cccb3fc8 100644
--- a/arch/x86/boot/string.h
+++ b/arch/x86/boot/string.h
@@ -6,8 +6,21 @@
 #undef memset
 #undef memcmp
 
+/*
+ * Use normal ABI calling conventions - i.e. regparm(0) -
+ * for memcpy() and memset() if we are building the real
+ * mode setup code with clang since clang may make implicit
+ * calls to these functions that assume regparm(0).
+ */
+#if defined(_SETUP) && defined(__clang__)
+void __attribute__((regparm(0))) *memcpy(void *dst, const void *src,
+size_t len);
+void __attribute__((regparm(0))) *memset(void *dst, int c, size_t len);
+#else
 void *memcpy(void *dst, const void *src, size_t len);
 void *memset(void *dst, int c, size_t len);
+#endif
+
 int memcmp(const void *s1, const void *s2, size_t len);
 
 /*
-- 
2.12.0.367.g23dc2f6d3c-goog

[PATCH 0/7] LLVM: make x86_64 kernel build with clang.

2017-03-16 Thread Michael Davidson

This patch set is sufficient to get the x86_64 kernel to build
and boot correctly with clang-3.8 or greater.

The resulting build still has about 300 warnings, very few of
which appear to be significant. Most of them should be fixable
with some minor code refactoring although a few of them, such
as the complaints about implict conversions between enumerated
types may be candidates for just being disabled.

Michael Davidson (7):
  Makefile, LLVM: add -no-integrated-as to KBUILD_[AC]FLAGS
  Makefile, x86, LLVM: disable unsupported optimization flags
  x86, LLVM: suppress clang warnings about unaligned accesses
  x86, boot, LLVM: #undef memcpy etc in string.c
  x86, boot, LLVM: Use regparm=0 for memcpy and memset
  md/raid10, LLVM: get rid of variable length array
  crypto, x86, LLVM: aesni - fix token pasting

 Makefile|  4 
 arch/x86/Makefile   |  7 +++
 arch/x86/boot/copy.S| 15 +--
 arch/x86/boot/string.c  |  9 +
 arch/x86/boot/string.h  | 13 +
 arch/x86/crypto/aes_ctrby8_avx-x86_64.S |  7 ++-
 drivers/md/raid10.c |  9 -
 7 files changed, 52 insertions(+), 12 deletions(-)

-- 
2.12.0.367.g23dc2f6d3c-goog

[PATCH 2/7] Makefile, x86, LLVM: disable unsupported optimization flags

2017-03-16 Thread Michael Davidson

Unfortunately, while clang generates a warning about these flags
being unsupported it still exits with a status of 0 so we have
to explicitly disable them instead of just using a cc-option check.

Signed-off-by: Michael Davidson 
---
 Makefile  | 2 ++
 arch/x86/Makefile | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/Makefile b/Makefile
index b21fd0ca2946..5e97e5fc1eea 100644
--- a/Makefile
+++ b/Makefile
@@ -629,7 +629,9 @@ ARCH_AFLAGS :=
 ARCH_CFLAGS :=
 include arch/$(SRCARCH)/Makefile
 
+ifneq ($(cc-name),clang)
 KBUILD_CFLAGS  += $(call cc-option,-fno-delete-null-pointer-checks,)
+endif
 KBUILD_CFLAGS  += $(call cc-disable-warning,frame-address,)
 
 ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 2d449337a360..894a8d18bf97 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -87,11 +87,13 @@ else
 KBUILD_AFLAGS += -m64
 KBUILD_CFLAGS += -m64
 
+ifneq ($(cc-name),clang)
 # Align jump targets to 1 byte, not the default 16 bytes:
 KBUILD_CFLAGS += -falign-jumps=1
 
 # Pack loops tightly as well:
 KBUILD_CFLAGS += -falign-loops=1
+endif
 
 # Don't autogenerate traditional x87 instructions
 KBUILD_CFLAGS += $(call cc-option,-mno-80387)
-- 
2.12.0.367.g23dc2f6d3c-goog

[PATCH 4/7] x86, boot, LLVM: #undef memcpy etc in string.c

2017-03-16 Thread Michael Davidson

undef memcpy and friends in boot/string.c so that the functions
defined here will have the correct names, otherwise we end up
up trying to redefine __builtin_memcpy etc.
Surprisingly, gcc allows this (and, helpfully, discards the
__builtin_ prefix from the function name when compiling it),
but clang does not.

Adding these #undef's appears to preserve what I assume was
the original intent of the code.

Signed-off-by: Michael Davidson 
---
 arch/x86/boot/string.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/boot/string.c b/arch/x86/boot/string.c
index 5457b02fc050..b40266850869 100644
--- a/arch/x86/boot/string.c
+++ b/arch/x86/boot/string.c
@@ -16,6 +16,15 @@
 #include "ctype.h"
 #include "string.h"
 
+/*
+ * Undef these macros so that the functions that we provide
+ * here will have the correct names regardless of how string.h
+ * may have chosen to #define them.
+ */
+#undef memcpy
+#undef memset
+#undef memcmp
+
 int memcmp(const void *s1, const void *s2, size_t len)
 {
bool diff;
-- 
2.12.0.367.g23dc2f6d3c-goog

[PATCH 6/7] md/raid10, LLVM: get rid of variable length array

2017-03-16 Thread Michael Davidson

Replace a variable length array in a struct by allocating
the memory for the entire struct in a char array on the stack.

Signed-off-by: Michael Davidson 
---
 drivers/md/raid10.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 063c43d83b72..158ebdff782c 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4654,11 +4654,10 @@ static int handle_reshape_read_error(struct mddev 
*mddev,
/* Use sync reads to get the blocks from somewhere else */
int sectors = r10_bio->sectors;
struct r10conf *conf = mddev->private;
-   struct {
-   struct r10bio r10_bio;
-   struct r10dev devs[conf->copies];
-   } on_stack;
-   struct r10bio *r10b = _stack.r10_bio;
+   char on_stack_r10_bio[sizeof(struct r10bio) +
+ conf->copies * sizeof(struct r10dev)]
+ __aligned(__alignof__(struct r10bio));
+   struct r10bio *r10b = (struct r10bio *)on_stack_r10_bio;
int slot = 0;
int idx = 0;
struct bio_vec *bvec = r10_bio->master_bio->bi_io_vec;
-- 
2.12.0.367.g23dc2f6d3c-goog

[PATCH 7/7] crypto, x86, LLVM: aesni - fix token pasting

2017-03-16 Thread Michael Davidson

aes_ctrby8_avx-x86_64.S uses the C preprocessor for token pasting
of character sequences that are not valid preprocessor tokens.
While this is allowed when preprocessing assembler files it exposes
an incompatibilty between the clang and gcc preprocessors where
clang does not strip leading white space from macro parameters,
leading to the CONCAT(%xmm, i) macro expansion on line 96 resulting
in a token with a space character embedded in it.

While this could be fixed by deleting the offending space character,
the assembler is perfectly capable of handling the token pasting
correctly for itself so it seems preferable to let it do so and to
get rid or the CONCAT(), DDQ() and XMM() preprocessor macros.

Signed-off-by: Michael Davidson 
---
 arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S 
b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
index a916c4a61165..5f6a5af9c489 100644
--- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
+++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S
@@ -65,7 +65,6 @@
 #include 
 #include 
 
-#define CONCAT(a,b)a##b
 #define VMOVDQ vmovdqu
 
 #define xdata0 %xmm0
@@ -92,8 +91,6 @@
 #define num_bytes  %r8
 
 #define tmp%r10
-#defineDDQ(i)  CONCAT(ddq_add_,i)
-#defineXMM(i)  CONCAT(%xmm, i)
 #defineDDQ_DATA0
 #defineXDATA   1
 #define KEY_1281
@@ -131,12 +128,12 @@ ddq_add_8:
 /* generate a unique variable for ddq_add_x */
 
 .macro setddq n
-   var_ddq_add = DDQ(\n)
+   var_ddq_add = ddq_add_\n
 .endm
 
 /* generate a unique variable for xmm register */
 .macro setxdata n
-   var_xdata = XMM(\n)
+   var_xdata = %xmm\n
 .endm
 
 /* club the numeric 'id' to the symbol 'name' */
-- 
2.12.0.367.g23dc2f6d3c-goog

[PATCH 1/7] Makefile, LLVM: add -no-integrated-as to KBUILD_[AC]FLAGS

2017-03-16 Thread Michael Davidson

Add -no-integrated-as to KBUILD_AFLAGS and KBUILD_CFLAGS
for clang.

Signed-off-by: Michael Davidson 
---
 Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Makefile b/Makefile
index b841fb36beb2..b21fd0ca2946 100644
--- a/Makefile
+++ b/Makefile
@@ -704,6 +704,8 @@ KBUILD_CFLAGS += $(call cc-disable-warning, 
tautological-compare)
 # See modpost pattern 2
 KBUILD_CFLAGS += $(call cc-option, -mno-global-merge,)
 KBUILD_CFLAGS += $(call cc-option, -fcatch-undefined-behavior)
+KBUILD_CFLAGS += $(call cc-option, -no-integrated-as)
+KBUILD_AFLAGS += $(call cc-option, -no-integrated-as)
 else
 
 # These warnings generated too much noise in a regular build.
-- 
2.12.0.367.g23dc2f6d3c-goog

Re: [RFC 0/7] crypto: caam - add Queue Interface (QI) support

2017-03-16 Thread Scott Wood

On Thu, 2017-03-16 at 14:13 +, Horia Geantă wrote:
> On 3/16/2017 10:43 AM, Herbert Xu wrote:
> > 
> > On Fri, Mar 03, 2017 at 04:52:06PM +0200, Horia Geantă wrote:
> > > 
> > > The patchset adds support for CAAM Queue Interface (QI), the additional
> > > interface (besides job ring) available for submitting jobs to the engine
> > > on platforms having DPAA (Datapath Acceleration Architecture).
> > > 
> > > Patches 1-4 are QMan dependencies.
> > > I would prefer to take them through the crypto tree,
> > > but I am open to suggestions.
> > > 
> > > Patch 5 adds a missing double inclusion guard in desc_constr.h.
> > > 
> > > Patch 6 adds the caam/qi job submission backend.
> > > 
> > > Patch 7 adds algorithms (ablkcipher and authenc) that run on top
> > > of caam/qi. For now, their priority is set lower than caam/jr.
> > I'm fine with the crypto bits.
> > 
> Herbert,
> 
> Thanks for the review.
> Should I submit formally, i.e. without the [RFC] prefix?
> 
> Scott, Roy,
> 
> Do you have any comments wrt. soc/qman patches or with them going
> through the crypto tree?

Going through the crypto tree is fine.

-Scott

Re: [RFC PATCH v2 08/32] x86: Use PAGE_KERNEL protection for ioremap of memory page

2017-03-16 Thread Tom Lendacky


On 3/7/2017 8:59 AM, Borislav Petkov wrote:

On Thu, Mar 02, 2017 at 10:13:32AM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

In order for memory pages to be properly mapped when SEV is active, we
need to use the PAGE_KERNEL protection attribute as the base protection.
This will insure that memory mapping of, e.g. ACPI tables, receives the
proper mapping attributes.

Signed-off-by: Tom Lendacky 
---



diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c400ab5..481c999 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -151,7 +151,15 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
pcm = new_pcm;
}

+   /*
+* If the page being mapped is in memory and SEV is active then
+* make sure the memory encryption attribute is enabled in the
+* resulting mapping.
+*/
prot = PAGE_KERNEL_IO;
+   if (sev_active() && page_is_mem(pfn))


Hmm, a resource tree walk per ioremap call. This could get expensive for
ioremap-heavy workloads.

__ioremap_caller() gets called here during boot 55 times so not a whole
lot but I wouldn't be surprised if there were some nasty use cases which
ioremap a lot.

...


diff --git a/kernel/resource.c b/kernel/resource.c
index 9b5f044..db56ba3 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -518,6 +518,46 @@ int __weak page_is_ram(unsigned long pfn)
 }
 EXPORT_SYMBOL_GPL(page_is_ram);

+/*
+ * This function returns true if the target memory is marked as
+ * IORESOURCE_MEM and IORESOUCE_BUSY and described as other than
+ * IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
+ */
+static int walk_mem_range(unsigned long start_pfn, unsigned long nr_pages)
+{
+   struct resource res;
+   unsigned long pfn, end_pfn;
+   u64 orig_end;
+   int ret = -1;
+
+   res.start = (u64) start_pfn << PAGE_SHIFT;
+   res.end = ((u64)(start_pfn + nr_pages) << PAGE_SHIFT) - 1;
+   res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+   orig_end = res.end;
+   while ((res.start < res.end) &&
+   (find_next_iomem_res(, IORES_DESC_NONE, true) >= 0)) {
+   pfn = (res.start + PAGE_SIZE - 1) >> PAGE_SHIFT;
+   end_pfn = (res.end + 1) >> PAGE_SHIFT;
+   if (end_pfn > pfn)
+   ret = (res.desc != IORES_DESC_NONE) ? 1 : 0;
+   if (ret)
+   break;
+   res.start = res.end + 1;
+   res.end = orig_end;
+   }
+   return ret;
+}


So the relevant difference between this one and walk_system_ram_range()
is this:

-   ret = (*func)(pfn, end_pfn - pfn, arg);
+   ret = (res.desc != IORES_DESC_NONE) ? 1 : 0;

so it seems to me you can have your own *func() pointer which does that
IORES_DESC_NONE comparison. And then you can define your own workhorse
__walk_memory_range() which gets called by both walk_mem_range() and
walk_system_ram_range() instead of almost duplicating them.

And looking at walk_system_ram_res(), that one looks similar too except
the pfn computation. But AFAICT the pfn/end_pfn things are computed from
res.start and res.end so it looks to me like all those three functions
are crying for unification...


I'll take a look at what it takes to consolidate these with a pre-patch. 
Then I'll add the new support.


Thanks,
Tom

Re: [RFC 4/7] soc/qman: add helper functions needed by caam/qi driver

2017-03-16 Thread Scott Wood

On Fri, 2017-03-03 at 16:52 +0200, Horia Geantă wrote:
> Add helper functions, macros, #defines for accessing / enabling
> qman functionality from caam/qi driver, such that this driver
> is not aware of the need for data conversion to big endian.

Why?  I complained about that (probably internally) when this driver was
originally submitted.

Having a bunch of accessors in a header file that just do an assignment with
an endian conversion just obfuscates things.  The driver still needs to know
that the conversion needs to happen, or else it wouldn't know that the fields
can't be accessed directly... and it gets particularly ridiculous when a
single field has a growing pile of accessors depending on exactly what flags
you want to set (e.g. qm_sg_entry_set_*).  Just open-code it unless an
accessor is justified by the call sites getting unwieldy or numerous.

It looks like GCC 6 has added an attribute (scalar_storage_order) that could
be used on structs to avoid having to manually swap such things.  I look
forward to the day when GCC 6 is old enough for the kernel to depend on this.

-Scott

Re: [RFC PATCH v2 05/32] x86: Use encrypted access of BOOT related data with SEV

2017-03-16 Thread Tom Lendacky


On 3/7/2017 5:09 AM, Borislav Petkov wrote:

On Thu, Mar 02, 2017 at 10:12:59AM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

When Secure Encrypted Virtualization (SEV) is active, BOOT data (such as
EFI related data, setup data) is encrypted and needs to be accessed as
such when mapped. Update the architecture override in early_memremap to
keep the encryption attribute when mapping this data.


This could also explain why persistent memory needs to be accessed
decrypted with SEV.


I'll add some comments about why persistent memory needs to be accessed
decrypted (because the encryption key changes across reboots) for both
SME and SEV.



In general, what the difference in that aspect is in respect to SME. And
I'd write that in the comment over the function. And not say "E820 areas
are checked in making this determination." because that is visible but
say *why* we need to check those ranges and determine access depending
on their type.


Will do.

Thanks,
Tom

Re: [RFC PATCH v2 29/32] kvm: svm: Add support for SEV DEBUG_DECRYPT command

2017-03-16 Thread Brijesh Singh




On 03/16/2017 05:54 AM, Paolo Bonzini wrote:



On 02/03/2017 16:18, Brijesh Singh wrote:

+static int __sev_dbg_decrypt_page(struct kvm *kvm, unsigned long src,
+   void *dst, int *error)
+{
+   inpages = sev_pin_memory(src, PAGE_SIZE, );
+   if (!inpages) {
+   ret = -ENOMEM;
+   goto err_1;
+   }
+
+   data->handle = sev_get_handle(kvm);
+   data->dst_addr = __psp_pa(dst);
+   data->src_addr = __sev_page_pa(inpages[0]);
+   data->length = PAGE_SIZE;
+
+   ret = sev_issue_cmd(kvm, SEV_CMD_DBG_DECRYPT, data, error);
+   if (ret)
+   printk(KERN_ERR "SEV: DEBUG_DECRYPT %d (%#010x)\n",
+   ret, *error);
+   sev_unpin_memory(inpages, npages);
+err_1:
+   kfree(data);
+   return ret;
+}
+
+static int sev_dbg_decrypt(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   void *data;
+   int ret, offset, len;
+   struct kvm_sev_dbg debug;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   if (copy_from_user(, (void *)argp->data,
+   sizeof(struct kvm_sev_dbg)))
+   return -EFAULT;
+   /*
+* TODO: add support for decrypting length which crosses the
+* page boundary.
+*/
+   offset = debug.src_addr & (PAGE_SIZE - 1);
+   if (offset + debug.length > PAGE_SIZE)
+   return -EINVAL;
+


Please do add it, it doesn't seem very different from what you're doing
in LAUNCH_UPDATE_DATA.  There's no need for a separate
__sev_dbg_decrypt_page function, you can just pin/unpin here and do a
per-page loop as in LAUNCH_UPDATE_DATA.



I can certainly add support to handle crossing the page boundary cases.
Should we limit the size to prevent user passing arbitrary long length
and we end up looping inside the kernel? I was thinking to limit to a PAGE_SIZE.

~ Brijesh

Re: [RFC PATCH v2 30/32] kvm: svm: Add support for SEV DEBUG_ENCRYPT command

2017-03-16 Thread Brijesh Singh




On 03/16/2017 06:03 AM, Paolo Bonzini wrote:



On 02/03/2017 16:18, Brijesh Singh wrote:

+   data = (void *) get_zeroed_page(GFP_KERNEL);


The page does not need to be zeroed, does it?



No, we don't have to zero it. I will fix it.


+
+   if ((len & 15) || (dst_addr & 15)) {
+   /* if destination address and length are not 16-byte
+* aligned then:
+* a) decrypt destination page into temporary buffer
+* b) copy source data into temporary buffer at correct offset
+* c) encrypt temporary buffer
+*/
+   ret = __sev_dbg_decrypt_page(kvm, dst_addr, data, >error);


Ah, I see now you're using this function here for read-modify-write.
data is already pinned here, so even if you keep the function it makes
sense to push pinning out of __sev_dbg_decrypt_page and into
sev_dbg_decrypt.


I can push out pinning part outside __sev_dbg_decrypt_page




+   if (ret)
+   goto err_3;
+   d_off = dst_addr & (PAGE_SIZE - 1);
+
+   if (copy_from_user(data + d_off,
+   (uint8_t *)debug.src_addr, len)) {
+   ret = -EFAULT;
+   goto err_3;
+   }
+
+   encrypt->length = PAGE_SIZE;


Why decrypt/re-encrypt all the page instead of just the 16 byte area
around the [dst_addr, dst_addr+len) range?



good catch, I should be fine just decrypting a 16 byte area. Will fix in next 
rev


+   encrypt->src_addr = __psp_pa(data);
+   encrypt->dst_addr =  __sev_page_pa(inpages[0]);
+   } else {
+   if (copy_from_user(data, (uint8_t *)debug.src_addr, len)) {
+   ret = -EFAULT;
+   goto err_3;
+   }


Do you need copy_from_user, or can you just pin/unpin memory as for
DEBUG_DECRYPT?



We can work either with pin/unpin or copy_from_user. I think I choose 
copy_from_user because
in most of time ENCRYPT path was used when I set breakpoint through gdb which 
basically
requires copying pretty small data into guest memory. It may be very much 
possible that
someone can try to copy lot more data and then pin/unpin can speedup the things.

-Brijesh

Re: [RFC PATCH v2 14/32] x86: mm: Provide support to use memblock when spliting large pages

2017-03-16 Thread Borislav Petkov

On Fri, Mar 10, 2017 at 04:41:56PM -0600, Brijesh Singh wrote:
> I can take a look at fixing those warning. In my initial attempt was to create
> a new function to clear encryption bit but it ended up looking very similar to
> __change_page_attr_set_clr() hence decided to extend the exiting function to
> use memblock_alloc().

... except that having all that SEV-specific code in main code paths is
yucky and I'd like to avoid it, if possible.

> Early in boot process, guest kernel allocates some structure (its either
> statically allocated or dynamic allocated via memblock_alloc). And shares the 
> physical
> address of these structure with hypervisor. Since entire guest memory area is 
> mapped
> as encrypted hence those structure's are mapped as encrypted memory range. We 
> need
> a method to clear the encryption bit. Sometime these structure maybe part of 
> 2M pages
> and need to split into smaller pages.

So how hard would it be if the hypervisor allocated that memory for the
guest instead? It would allocate it decrypted and guest would need to
access it decrypted too. All in preparation for SEV-ES which will need a
block of unencrypted memory for the guest anyway...

> In most cases, guest and hypervisor communication starts as soon as guest 
> provides
> the physical address to hypervisor. So we must map the pages as decrypted 
> before
> sharing the physical address to hypervisor.

See above: so purely theoretically speaking, the hypervisor could prep
that decrypted range for the guest. I'd look in Paolo's direction,
though, for the feasibility of something like that.

Thanks.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [PATCH v3 1/3] clk: meson-gxbb: expose clock CLKID_RNG0

2017-03-16 Thread Kevin Hilman

Hi Herbert,

Herbert Xu  writes:

> On Wed, Feb 22, 2017 at 07:55:24AM +0100, Heiner Kallweit wrote:
>> Expose clock CLKID_RNG0 which is needed for the HW random number generator.
>> 
>> Signed-off-by: Heiner Kallweit 
>
> All patches applied.  Thanks.

Actually, can you just apply [PATCH 4/4] to your tree?

The clock and DT patches need to go through their respective trees or
will otherwise have conflicts with other things going in via those
trees.

Thanks,

Kevin

Re: [RFC PATCH v2 26/32] kvm: svm: Add support for SEV LAUNCH_UPDATE_DATA command

2017-03-16 Thread Brijesh Singh



On 03/16/2017 05:48 AM, Paolo Bonzini wrote:



On 02/03/2017 16:17, Brijesh Singh wrote:

+static struct page **sev_pin_memory(unsigned long uaddr, unsigned long ulen,
+   unsigned long *n)
+{
+   struct page **pages;
+   int first, last;
+   unsigned long npages, pinned;
+
+   /* Get number of pages */
+   first = (uaddr & PAGE_MASK) >> PAGE_SHIFT;
+   last = ((uaddr + ulen - 1) & PAGE_MASK) >> PAGE_SHIFT;
+   npages = (last - first + 1);
+
+   pages = kzalloc(npages * sizeof(struct page *), GFP_KERNEL);
+   if (!pages)
+   return NULL;
+
+   /* pin the user virtual address */
+   down_read(>mm->mmap_sem);
+   pinned = get_user_pages_fast(uaddr, npages, 1, pages);
+   up_read(>mm->mmap_sem);


get_user_pages_fast, like get_user_pages_unlocked, must be called
without mmap_sem held.


Sure.




+   if (pinned != npages) {
+   printk(KERN_ERR "SEV: failed to pin  %ld pages (got %ld)\n",
+   npages, pinned);
+   goto err;
+   }
+
+   *n = npages;
+   return pages;
+err:
+   if (pinned > 0)
+   release_pages(pages, pinned, 0);
+   kfree(pages);
+
+   return NULL;
+}

+   /* the array of pages returned by get_user_pages() is a page-aligned
+* memory. Since the user buffer is probably not page-aligned, we need
+* to calculate the offset within a page for first update entry.
+*/
+   offset = uaddr & (PAGE_SIZE - 1);
+   len = min_t(size_t, (PAGE_SIZE - offset), ulen);
+   ulen -= len;
+
+   /* update first page -
+* special care need to be taken for the first page because we might
+* be dealing with offset within the page
+*/


No need to special case the first page; just set "offset = 0" inside the
loop after the first iteration.



Will do.

-Brijesh

Re: [RFC PATCH v2 32/32] x86: kvm: Pin the guest memory when SEV is active

2017-03-16 Thread Brijesh Singh




On 03/16/2017 05:38 AM, Paolo Bonzini wrote:



On 02/03/2017 16:18, Brijesh Singh wrote:

The SEV memory encryption engine uses a tweak such that two identical
plaintexts at different location will have a different ciphertexts.
So swapping or moving ciphertexts of two pages will not result in
plaintexts being swapped. Relocating (or migrating) a physical backing pages
for SEV guest will require some additional steps. The current SEV key
management spec [1] does not provide commands to swap or migrate (move)
ciphertexts. For now we pin the memory allocated for the SEV guest. In
future when SEV key management spec provides the commands to support the
page migration we can update the KVM code to remove the pinning logical
without making any changes into userspace (qemu).

The patch pins userspace memory when a new slot is created and unpin the
memory when slot is removed.

[1] http://support.amd.com/TechDocs/55766_SEV-KM%20API_Spec.pdf


This is not enough, because memory can be hidden temporarily from the
guest and remapped later.  Think of a PCI BAR that is backed by RAM, or
also SMRAM.  The pinning must be kept even in that case.

You need to add a pair of KVM_MEMORY_ENCRYPT_OPs (one that doesn't map
to a PSP operation), such as KVM_REGISTER/UNREGISTER_ENCRYPTED_RAM.  In
QEMU you can use a RAMBlockNotifier to invoke the ioctls.



I was hoping to avoid adding new ioctl, but I see your point. Will add a pair 
of ioctl's
and use RAMBlocNotifier to invoke those ioctls.

-Brijesh

Re: [RFC PATCH v2 12/32] x86: Add early boot support when running with SEV active

2017-03-16 Thread Borislav Petkov

On Thu, Mar 16, 2017 at 11:11:26AM -0500, Tom Lendacky wrote:
> Not quite. The guest still needs to understand about the encryption mask
> so that it can protect memory by setting the encryption mask in the
> pagetable entries.  It can also decide when to share memory with the
> hypervisor by not setting the encryption mask in the pagetable entries.

Ok, so the kernel - by that I mean both the baremetal and guest kernel -
needs to know whether we're encrypting stuff. So it needs to know about
SME.

> "Instruction fetches are always decrypted under SEV" means that,
> regardless of how a virtual address is mapped, encrypted or decrypted,
> if an instruction fetch is performed by the CPU from that address it
> will always be decrypted. This is to prevent the hypervisor from
> injecting executable code into the guest since it would have to be
> valid encrypted instructions.

Ok, so the guest needs to map its pages encrypted.

Which reminds me, KSM might be a PITA to enable with SEV but that's a
different story. :)

> There are many areas that use the same logic, but there are certain
> situations where we need to check between SME vs SEV (e.g. DMA operation
> setup or decrypting the trampoline area) and act accordingly.

Right, and I'd like to keep those areas where it differs at minimum and
nicely cordoned off from the main paths.

So looking back at the current patch in this subthread:

we do check

* CPUID 0x4000
* 8000_001F[EAX] for SME
* 8000_001F[EBX][5:0] for the encryption bits.

So how about we generate the following CPUID picture for the guest:

CPUID_Fn801F_EAX = ...10b

That is, SME bit is cleared, SEV is set. This will mean for the guest
kernel that SEV is enabled and you can avoid yourself the 0x4000
leaf check and the additional KVM feature bit glue.

10b configuration will be invalid for baremetal as - I'm assuming - you
can't have SEV=1b with SME=0b. It will be a virt-only configuration and
this way you can even avoid the hypervisor-specific detection but do
that for all.

Hmmm?

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [RFC PATCH v2 12/32] x86: Add early boot support when running with SEV active

2017-03-16 Thread Tom Lendacky


On 3/16/2017 10:09 AM, Borislav Petkov wrote:

On Thu, Mar 16, 2017 at 09:28:58AM -0500, Tom Lendacky wrote:

Because there are differences between how SME and SEV behave
(instruction fetches are always decrypted under SEV, DMA to an
encrypted location is not supported under SEV, etc.) we need to
determine which mode we are in so that things can be setup properly
during boot. For example, if SEV is active the kernel will already
be encrypted and so we don't perform that step or the trampoline area
for bringing up an AP must be decrypted for SME but encrypted for SEV.


So with SEV enabled, it seems to me a guest doesn't know anything about
encryption and can run as if SME is disabled. So sme_active() will be
false. And then the kernel can bypass all that code dealing with SME.

So a guest should simply run like on baremetal with no SME, IMHO.



Not quite. The guest still needs to understand about the encryption mask
so that it can protect memory by setting the encryption mask in the
pagetable entries.  It can also decide when to share memory with the
hypervisor by not setting the encryption mask in the pagetable entries.


But then there's that part: "instruction fetches are always decrypted
under SEV". What does that mean exactly? And how much of that code can


"Instruction fetches are always decrypted under SEV" means that,
regardless of how a virtual address is mapped, encrypted or decrypted,
if an instruction fetch is performed by the CPU from that address it
will always be decrypted. This is to prevent the hypervisor from
injecting executable code into the guest since it would have to be
valid encrypted instructions.


be reused so that

* SME on baremetal
* SEV on guest

use the same logic?


There are many areas that use the same logic, but there are certain
situations where we need to check between SME vs SEV (e.g. DMA operation
setup or decrypting the trampoline area) and act accordingly.

Thanks,
Tom



Having the larger SEV preparation part on the kvm host side is perfectly
fine. But I'd like to keep kernel initialization paths clean.

Thanks.

Re: [RFC PATCH v2 12/32] x86: Add early boot support when running with SEV active

2017-03-16 Thread Borislav Petkov

On Thu, Mar 16, 2017 at 09:28:58AM -0500, Tom Lendacky wrote:
> Because there are differences between how SME and SEV behave
> (instruction fetches are always decrypted under SEV, DMA to an
> encrypted location is not supported under SEV, etc.) we need to
> determine which mode we are in so that things can be setup properly
> during boot. For example, if SEV is active the kernel will already
> be encrypted and so we don't perform that step or the trampoline area
> for bringing up an AP must be decrypted for SME but encrypted for SEV.

So with SEV enabled, it seems to me a guest doesn't know anything about
encryption and can run as if SME is disabled. So sme_active() will be
false. And then the kernel can bypass all that code dealing with SME.

So a guest should simply run like on baremetal with no SME, IMHO.

But then there's that part: "instruction fetches are always decrypted
under SEV". What does that mean exactly? And how much of that code can
be reused so that

* SME on baremetal
* SEV on guest

use the same logic?

Having the larger SEV preparation part on the kvm host side is perfectly
fine. But I'd like to keep kernel initialization paths clean.

Thanks.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [RFC PATCH v2 12/32] x86: Add early boot support when running with SEV active

2017-03-16 Thread Tom Lendacky


On 3/16/2017 5:16 AM, Borislav Petkov wrote:

On Fri, Mar 10, 2017 at 10:35:30AM -0600, Brijesh Singh wrote:

We could update this patch to use the below logic:

 * CPUID(0) - Check for AuthenticAMD
 * CPID(1) - Check if under hypervisor
 * CPUID(0x8000) - Check for highest supported leaf
 * CPUID(0x801F).EAX - Check for SME and SEV support
 * rdmsr (MSR_K8_SYSCFG)[MemEncryptionModeEnc] - Check if SMEE is set


Actually, it is still not clear to me *why* we need to do anything
special wrt SEV in the guest.

Lemme clarify: why can't the guest boot just like a normal Linux on
baremetal and use the SME(!) detection code to set sme_enable and so
on? IOW, I'd like to avoid all those checks whether we're running under
hypervisor and handle all that like we're running on baremetal.


Because there are differences between how SME and SEV behave
(instruction fetches are always decrypted under SEV, DMA to an
encrypted location is not supported under SEV, etc.) we need to
determine which mode we are in so that things can be setup properly
during boot. For example, if SEV is active the kernel will already
be encrypted and so we don't perform that step or the trampoline area
for bringing up an AP must be decrypted for SME but encrypted for SEV.
The hypervisor check will provide that ability to determine how we
handle things.

Thanks,
Tom

[PATCH] md5: remove from lib and only live in crypto

2017-03-16 Thread Jason A. Donenfeld

The md5_transform function is no longer used any where in the tree,
except for the crypto api's actual implementation of md5, so we can drop
the function from lib and put it as a static function of the crypto
file, where it belongs. There should be no new users of md5_transform,
anyway, since there are more modern ways of doing what it once achieved.

Signed-off-by: Jason A. Donenfeld 
---
In the last patch like this, we managed to get rid of halfmd4 from this
file. In this series we get rid of md5, now that the patches have landed
that remove such improper md5 usage from the kernel. When a final
dependency on the (dead) sha1 is removed, then cryptohash.h will be removed
all together. This patch is for the md5 removal.


 crypto/md5.c   | 95 +-
 include/linux/cryptohash.h |  5 ---
 lib/Makefile   |  2 +-
 lib/md5.c  | 95 --
 4 files changed, 95 insertions(+), 102 deletions(-)
 delete mode 100644 lib/md5.c

diff --git a/crypto/md5.c b/crypto/md5.c
index 2355a7c25c45..f7ae1a48225b 100644
--- a/crypto/md5.c
+++ b/crypto/md5.c
@@ -21,9 +21,11 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
+#define MD5_DIGEST_WORDS 4
+#define MD5_MESSAGE_BYTES 64
+
 const u8 md5_zero_message_hash[MD5_DIGEST_SIZE] = {
0xd4, 0x1d, 0x8c, 0xd9, 0x8f, 0x00, 0xb2, 0x04,
0xe9, 0x80, 0x09, 0x98, 0xec, 0xf8, 0x42, 0x7e,
@@ -47,6 +49,97 @@ static inline void cpu_to_le32_array(u32 *buf, unsigned int 
words)
}
 }
 
+#define F1(x, y, z)(z ^ (x & (y ^ z)))
+#define F2(x, y, z)F1(z, x, y)
+#define F3(x, y, z)(x ^ y ^ z)
+#define F4(x, y, z)(y ^ (x | ~z))
+
+#define MD5STEP(f, w, x, y, z, in, s) \
+   (w += f(x, y, z) + in, w = (w<>(32-s)) + x)
+
+static void md5_transform(__u32 *hash, __u32 const *in)
+{
+   u32 a, b, c, d;
+
+   a = hash[0];
+   b = hash[1];
+   c = hash[2];
+   d = hash[3];
+
+   MD5STEP(F1, a, b, c, d, in[0] + 0xd76aa478, 7);
+   MD5STEP(F1, d, a, b, c, in[1] + 0xe8c7b756, 12);
+   MD5STEP(F1, c, d, a, b, in[2] + 0x242070db, 17);
+   MD5STEP(F1, b, c, d, a, in[3] + 0xc1bdceee, 22);
+   MD5STEP(F1, a, b, c, d, in[4] + 0xf57c0faf, 7);
+   MD5STEP(F1, d, a, b, c, in[5] + 0x4787c62a, 12);
+   MD5STEP(F1, c, d, a, b, in[6] + 0xa8304613, 17);
+   MD5STEP(F1, b, c, d, a, in[7] + 0xfd469501, 22);
+   MD5STEP(F1, a, b, c, d, in[8] + 0x698098d8, 7);
+   MD5STEP(F1, d, a, b, c, in[9] + 0x8b44f7af, 12);
+   MD5STEP(F1, c, d, a, b, in[10] + 0x5bb1, 17);
+   MD5STEP(F1, b, c, d, a, in[11] + 0x895cd7be, 22);
+   MD5STEP(F1, a, b, c, d, in[12] + 0x6b901122, 7);
+   MD5STEP(F1, d, a, b, c, in[13] + 0xfd987193, 12);
+   MD5STEP(F1, c, d, a, b, in[14] + 0xa679438e, 17);
+   MD5STEP(F1, b, c, d, a, in[15] + 0x49b40821, 22);
+
+   MD5STEP(F2, a, b, c, d, in[1] + 0xf61e2562, 5);
+   MD5STEP(F2, d, a, b, c, in[6] + 0xc040b340, 9);
+   MD5STEP(F2, c, d, a, b, in[11] + 0x265e5a51, 14);
+   MD5STEP(F2, b, c, d, a, in[0] + 0xe9b6c7aa, 20);
+   MD5STEP(F2, a, b, c, d, in[5] + 0xd62f105d, 5);
+   MD5STEP(F2, d, a, b, c, in[10] + 0x02441453, 9);
+   MD5STEP(F2, c, d, a, b, in[15] + 0xd8a1e681, 14);
+   MD5STEP(F2, b, c, d, a, in[4] + 0xe7d3fbc8, 20);
+   MD5STEP(F2, a, b, c, d, in[9] + 0x21e1cde6, 5);
+   MD5STEP(F2, d, a, b, c, in[14] + 0xc33707d6, 9);
+   MD5STEP(F2, c, d, a, b, in[3] + 0xf4d50d87, 14);
+   MD5STEP(F2, b, c, d, a, in[8] + 0x455a14ed, 20);
+   MD5STEP(F2, a, b, c, d, in[13] + 0xa9e3e905, 5);
+   MD5STEP(F2, d, a, b, c, in[2] + 0xfcefa3f8, 9);
+   MD5STEP(F2, c, d, a, b, in[7] + 0x676f02d9, 14);
+   MD5STEP(F2, b, c, d, a, in[12] + 0x8d2a4c8a, 20);
+
+   MD5STEP(F3, a, b, c, d, in[5] + 0xfffa3942, 4);
+   MD5STEP(F3, d, a, b, c, in[8] + 0x8771f681, 11);
+   MD5STEP(F3, c, d, a, b, in[11] + 0x6d9d6122, 16);
+   MD5STEP(F3, b, c, d, a, in[14] + 0xfde5380c, 23);
+   MD5STEP(F3, a, b, c, d, in[1] + 0xa4beea44, 4);
+   MD5STEP(F3, d, a, b, c, in[4] + 0x4bdecfa9, 11);
+   MD5STEP(F3, c, d, a, b, in[7] + 0xf6bb4b60, 16);
+   MD5STEP(F3, b, c, d, a, in[10] + 0xbebfbc70, 23);
+   MD5STEP(F3, a, b, c, d, in[13] + 0x289b7ec6, 4);
+   MD5STEP(F3, d, a, b, c, in[0] + 0xeaa127fa, 11);
+   MD5STEP(F3, c, d, a, b, in[3] + 0xd4ef3085, 16);
+   MD5STEP(F3, b, c, d, a, in[6] + 0x04881d05, 23);
+   MD5STEP(F3, a, b, c, d, in[9] + 0xd9d4d039, 4);
+   MD5STEP(F3, d, a, b, c, in[12] + 0xe6db99e5, 11);
+   MD5STEP(F3, c, d, a, b, in[15] + 0x1fa27cf8, 16);
+   MD5STEP(F3, b, c, d, a, in[2] + 0xc4ac5665, 23);
+
+   MD5STEP(F4, a, b, c, d, in[0] + 0xf4292244, 6);
+   MD5STEP(F4, d, a, b, c, in[7] + 0x432aff97, 10);
+   MD5STEP(F4, c, d, a, b, in[14] + 0xab9423a7, 15);
+   MD5STEP(F4, b, c, d, a, in[5] + 0xfc93a039, 21);
+

Re: CRYPTO_MAX_ALG_NAME is too low

2017-03-16 Thread Alexander Sverdlin

Hello!

On 10/03/17 12:55, Alexander Sverdlin wrote:
> Hello crypto maintainers!
> 
> We've found and example of the ipsec algorithm combination, which doesn't fit
> into CRYPTO_MAX_ALG_NAME long buffers:
> 
> ip x s add src 1.1.1.1 dst 1.1.1.2 proto esp spi 0 mode tunnel enc des3_ede 
> 0x0 auth sha256 0x0 flag esn replay-window 256
> 
> produces "echainiv(authencesn(hmac(sha256-generic),cbc(des3_ede-generic)))"
> on the machines without optimized crypto drivers, which doesn't fit into 
> current
> 64-bytes buffers.
> 
> I see two possible options:
> 
> a) split CRYPTO_MAX_ALG_NAME into CRYPTO_MAX_ALG_NAME + CRYPTO_MAX_DRV_NAME 
> pair
> and make later, say, 96, because the former probably cannot be changed 
> because of
> numerous user-space exports. And change half of the code to use new define.
> 
> b) rename *-generic algorithms to *-gen, so that cra_driver_name will be 
> shortened,
> while MODULE_ALIAS_CRYPTO() could still be maintained in old and new form.
> 
> What are your thoughts?

Any?

This is a regression caused by 856e3f4092
("crypto: seqiv - Add support for new AEAD interface")

As I've said above, I can offer one of the two solutions, which patch should I 
send?
Or do you see any better alternatives?

-- 
Best regards,
Alexander Sverdlin.

Re: [RFC 0/7] crypto: caam - add Queue Interface (QI) support

2017-03-16 Thread Horia Geantă

On 3/16/2017 10:43 AM, Herbert Xu wrote:
> On Fri, Mar 03, 2017 at 04:52:06PM +0200, Horia Geantă wrote:
>> The patchset adds support for CAAM Queue Interface (QI), the additional
>> interface (besides job ring) available for submitting jobs to the engine
>> on platforms having DPAA (Datapath Acceleration Architecture).
>>
>> Patches 1-4 are QMan dependencies.
>> I would prefer to take them through the crypto tree,
>> but I am open to suggestions.
>>
>> Patch 5 adds a missing double inclusion guard in desc_constr.h.
>>
>> Patch 6 adds the caam/qi job submission backend.
>>
>> Patch 7 adds algorithms (ablkcipher and authenc) that run on top
>> of caam/qi. For now, their priority is set lower than caam/jr.
> 
> I'm fine with the crypto bits.
> 
Herbert,

Thanks for the review.
Should I submit formally, i.e. without the [RFC] prefix?

Scott, Roy,

Do you have any comments wrt. soc/qman patches or with them going
through the crypto tree?

Thanks,
Horia

Re: [RFC PATCH v2 14/32] x86: mm: Provide support to use memblock when spliting large pages

2017-03-16 Thread Paolo Bonzini



On 10/03/2017 23:41, Brijesh Singh wrote:
>> Maybe there's a reason this fires:
>>
>> WARNING: modpost: Found 2 section mismatch(es).
>> To see full details build your kernel with:
>> 'make CONFIG_DEBUG_SECTION_MISMATCH=y'
>>
>> WARNING: vmlinux.o(.text+0x48edc): Section mismatch in reference from
>> the function __change_page_attr() to the function
>> .init.text:memblock_alloc()
>> The function __change_page_attr() references
>> the function __init memblock_alloc().
>> This is often because __change_page_attr lacks a __init
>> annotation or the annotation of memblock_alloc is wrong.
>>
>> WARNING: vmlinux.o(.text+0x491d1): Section mismatch in reference from
>> the function __change_page_attr() to the function
>> .meminit.text:memblock_free()
>> The function __change_page_attr() references
>> the function __meminit memblock_free().
>> This is often because __change_page_attr lacks a __meminit
>> annotation or the annotation of memblock_free is wrong.
>> 
>> But maybe Paolo might have an even better idea...
> 
> I am sure he will have better idea :)

Not sure if it's better or worse, but an alternative idea is to turn
__change_page_attr and __change_page_attr_set_clr inside out, so that:
1) the alloc_pages/__free_page happens in __change_page_attr_set_clr;
2) __change_page_attr_set_clr overall does not beocome more complex.

Then you can introduce __early_change_page_attr_set_clr and/or
early_kernel_map_pages_in_pgd, for use in your next patches.  They use
the memblock allocator instead of alloc/free_page

The attached patch is compile-tested only and almost certainly has some
thinko in it.  But it even skims a few lines from the code so the idea
might have some merit.

Paolo
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 28d42130243c..953c8e697562 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -490,11 +490,12 @@ static void __set_pmd_pte(pte_t *kpte, unsigned long address, pte_t pte)
 }
 
 static int
-try_preserve_large_page(pte_t *kpte, unsigned long address,
+try_preserve_large_page(pte_t **p_kpte, unsigned long address,
 			struct cpa_data *cpa)
 {
 	unsigned long nextpage_addr, numpages, pmask, psize, addr, pfn, old_pfn;
-	pte_t new_pte, old_pte, *tmp;
+	pte_t *kpte = *p_kpte;
+	pte_t new_pte, old_pte;
 	pgprot_t old_prot, new_prot, req_prot;
 	int i, do_split = 1;
 	enum pg_level level;
@@ -507,8 +508,8 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
 	 * Check for races, another CPU might have split this page
 	 * up already:
 	 */
-	tmp = _lookup_address_cpa(cpa, address, );
-	if (tmp != kpte)
+	*p_kpte = _lookup_address_cpa(cpa, address, );
+	if (*p_kpte != kpte)
 		goto out_unlock;
 
 	switch (level) {
@@ -634,17 +635,18 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
 	unsigned int i, level;
 	pte_t *tmp;
 	pgprot_t ref_prot;
+	int retry = 1;
 
+	if (!debug_pagealloc_enabled())
+		spin_lock(_lock);
 	spin_lock(_lock);
 	/*
 	 * Check for races, another CPU might have split this page
 	 * up for us already:
 	 */
 	tmp = _lookup_address_cpa(cpa, address, );
-	if (tmp != kpte) {
-		spin_unlock(_lock);
-		return 1;
-	}
+	if (tmp != kpte)
+		goto out;
 
 	paravirt_alloc_pte(_mm, page_to_pfn(base));
 
@@ -671,10 +673,11 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
 		break;
 
 	default:
-		spin_unlock(_lock);
-		return 1;
+		goto out;
 	}
 
+	retry = 0;
+
 	/*
 	 * Set the GLOBAL flags only if the PRESENT flag is set
 	 * otherwise pmd/pte_present will return true even on a non
@@ -718,28 +721,34 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
 	 * going on.
 	 */
 	__flush_tlb_all();
-	spin_unlock(_lock);
 
-	return 0;
-}
-
-static int split_large_page(struct cpa_data *cpa, pte_t *kpte,
-			unsigned long address)
-{
-	struct page *base;
+out:
+	spin_unlock(_lock);
 
+	/*
+	 * Do a global flush tlb after splitting the large page
+ 	 * and before we do the actual change page attribute in the PTE.
+ 	 *
+ 	 * With out this, we violate the TLB application note, that says
+ 	 * "The TLBs may contain both ordinary and large-page
+	 *  translations for a 4-KByte range of linear addresses. This
+	 *  may occur if software modifies the paging structures so that
+	 *  the page size used for the address range changes. If the two
+	 *  translations differ with respect to page frame or attributes
+	 *  (e.g., permissions), processor behavior is undefined and may
+	 *  be implementation-specific."
+ 	 *
+ 	 * We do this global tlb flush inside the cpa_lock, so that we
+	 * don't allow any other cpu, with stale tlb entries change the
+	 * page attribute in parallel, that also falls into the
+	 * just split large page entry.
+ 	 */
+	if (!retry)
+		flush_tlb_all();
 	if (!debug_pagealloc_enabled())
 		spin_unlock(_lock);
-	base = alloc_pages(GFP_KERNEL | __GFP_NOTRACK, 0);
-	if (!debug_pagealloc_enabled())
-		spin_lock(_lock);
-	if (!base)
-		return

Re: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm

2017-03-16 Thread Daniel Axtens

> So although this sits in arch/powerpc, it's heavy on the crypto which is
> not my area of expertise (to say the least!), so I think it should
> probably go via Herbert and the crypto tree?

That was my thought as well. Sorry - probably should have put that in
the comments somewhere.

Regards,
Daniel

Re: [PATCH v5 1/2] crypto: skcipher AF_ALG - overhaul memory management

2017-03-16 Thread Stephan Müller

Am Donnerstag, 16. März 2017, 11:18:33 CET schrieb Stephan Müller:

Hi Stephan,

> Am Donnerstag, 16. März 2017, 10:52:48 CET schrieb Herbert Xu:
> 
> Hi Herbert,
> 
> > First of all you're only limiting the amount of memory occupied
> > by the SG list which is not the same thing as the memory pinned
> > down by the actual recvmsg.
> 
> I am fully aware of that. As this was present in the code, I thought I could
> reuse that approach.
> 
> Are you saying that you want to stop this approach?

I would like to bring another point to the table for this discussion relating 
to the maximum amount of memory to be used as well as for the size of the RX-
SGL: allow us please to consider skcipher_sendmsg (sendpage works conceptually 
similar, so it should be covered with this discussion as well).

skcipher_alloc_sgl used in the sendmsg code path is conceptually identical to 
the RX-SGL allocation that I propose here: it allocates a new SGL with 
MAX_SGL_ENTS entries as long as the caller sends data. It limits the amount of 
memory to be allocated based on the maximum size of the SG management data and 
not the actual plaintext/ciphertext data sent by user space. I again was 
therefore under the impression that my suggested approach in the recvmsg code 
path regarding allocation of memory would be allowed.

Regarding the amount of data processed with the RX-SGL, I would like to 
consider the size of the TX-SGL. In the skcipher case, the RX-SGL (or the 
anticipated RX data to be processed) does not need to be larger than the TX-
SGL. Hence the currently existing while() loop of the upstream code that we 
discuss here will only be executed as often to fulfill the available TX-SGL 
data size. Given that both are limited on the sock_kmalloc memory limitation, 
I would imply that using a conceptually identical allocation approach for the 
TX SGL and RX SGL would be a sufficient replacement for the while-loop without 
being visible to user space (i.e. without causing a limit that was not there 
before).

Ciao
Stephan

Re: [RFC PATCH v2 14/32] x86: mm: Provide support to use memblock when spliting large pages

2017-03-16 Thread Paolo Bonzini



On 02/03/2017 16:15, Brijesh Singh wrote:
> 
>  __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
> -struct page *base)
> +   pte_t *pbase, unsigned long new_pfn)
>  {
> - pte_t *pbase = (pte_t *)page_address(base);

Just one comment and I'll reply to Boris, I think you can compute pbase 
with pfn_to_kaddr, and avoid adding a new argument.

>*/
> - __set_pmd_pte(kpte, address, mk_pte(base, __pgprot(_KERNPG_TABLE)));
> + __set_pmd_pte(kpte, address,
> + native_make_pte((new_pfn << PAGE_SHIFT) + _KERNPG_TABLE));

And this probably is better written as:

__set_pmd_pte(kpte, address, pfn_pte(new_pfn, __pgprot(_KERNPG_TABLE));

Paolo

Re: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm

2017-03-16 Thread Anton Blanchard

Hi David,

> While not part of this change, the unrolled loops look as though
> they just destroy the cpu cache.
> I'd like be convinced that anything does CRC over long enough buffers
> to make it a gain at all.

btrfs data checksumming is one area.

> With modern (not that modern now) superscalar cpus you can often
> get the loop instructions 'for free'.

A branch on POWER8 is a three cycle redirect. The vpmsum instructions
are 6 cycles.

> Sometimes pipelining the loop is needed to get full throughput.
> Unlike the IP checksum, you don't even have to 'loop carry' the
> cpu carry flag.

It went through quite a lot of simulation to reach peak performance.
The loop is quite delicate, we have to pace it just right to avoid
some pipeline reject conditions.

Note also that we already modulo schedule the loop across three
iterations, required to hide the latency of the vpmsum instructions.

Anton

Re: [RFC PATCH v2 16/32] x86: kvm: Provide support to create Guest and HV shared per-CPU variables

2017-03-16 Thread Paolo Bonzini



On 02/03/2017 16:15, Brijesh Singh wrote:
> Some KVM specific MSR's (steal-time, asyncpf, avic_eio) allocates per-CPU
> variable at compile time and share its physical address with hypervisor.
> It presents a challege when SEV is active in guest OS. When SEV is active,
> guest memory is encrypted with guest key and hypervisor will no longer able
> to modify the guest memory. When SEV is active, we need to clear the
> encryption attribute of shared physical addresses so that both guest and
> hypervisor can access the data.
> 
> To solve this problem, I have tried these three options:
> 
> 1) Convert the static per-CPU to dynamic per-CPU allocation. When SEV is
> detected then clear the encryption attribute. But while doing so I found
> that per-CPU dynamic allocator was not ready when kvm_guest_cpu_init was
> called.
> 
> 2) Since the encryption attributes works on PAGE_SIZE hence add some extra
> padding to 'struct kvm-steal-time' to make it PAGE_SIZE and then at runtime
> clear the encryption attribute of the full PAGE. The downside of this was
> now we need to modify structure which may break the compatibility.
> 
> 3) Define a new per-CPU section (.data..percpu.hv_shared) which will be
> used to hold the compile time shared per-CPU variables. When SEV is
> detected we map this section with encryption attribute cleared.
> 
> This patch implements #3. It introduces a new DEFINE_PER_CPU_HV_SHAHRED
> macro to create a compile time per-CPU variable. When SEV is detected we
> map the per-CPU variable as decrypted (i.e with encryption attribute cleared).
> 
> Signed-off-by: Brijesh Singh 

Looks good to me.

Paolo

> ---
>  arch/x86/kernel/kvm.c |   43 
> +++--
>  include/asm-generic/vmlinux.lds.h |3 +++
>  include/linux/percpu-defs.h   |9 
>  3 files changed, 48 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 099fcba..706a08e 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -75,8 +75,8 @@ static int parse_no_kvmclock_vsyscall(char *arg)
>  
>  early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
>  
> -static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
> -static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
> +static DEFINE_PER_CPU_HV_SHARED(struct kvm_vcpu_pv_apf_data, apf_reason) 
> __aligned(64);
> +static DEFINE_PER_CPU_HV_SHARED(struct kvm_steal_time, steal_time) 
> __aligned(64);
>  static int has_steal_clock = 0;
>  
>  /*
> @@ -290,6 +290,22 @@ static void __init paravirt_ops_setup(void)
>  #endif
>  }
>  
> +static int kvm_map_percpu_hv_shared(void *addr, unsigned long size)
> +{
> + /* When SEV is active, the percpu static variables initialized
> +  * in data section will contain the encrypted data so we first
> +  * need to decrypt it and then map it as decrypted.
> +  */
> + if (sev_active()) {
> + unsigned long pa = slow_virt_to_phys(addr);
> +
> + sme_early_decrypt(pa, size);
> + return early_set_memory_decrypted(addr, size);
> + }
> +
> + return 0;
> +}
> +
>  static void kvm_register_steal_time(void)
>  {
>   int cpu = smp_processor_id();
> @@ -298,12 +314,17 @@ static void kvm_register_steal_time(void)
>   if (!has_steal_clock)
>   return;
>  
> + if (kvm_map_percpu_hv_shared(st, sizeof(*st))) {
> + pr_err("kvm-stealtime: failed to map hv_shared percpu\n");
> + return;
> + }
> +
>   wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
>   pr_info("kvm-stealtime: cpu %d, msr %llx\n",
>   cpu, (unsigned long long) slow_virt_to_phys(st));
>  }
>  
> -static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
> +static DEFINE_PER_CPU_HV_SHARED(unsigned long, kvm_apic_eoi) = 
> KVM_PV_EOI_DISABLED;
>  
>  static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val)
>  {
> @@ -327,25 +348,33 @@ static void kvm_guest_cpu_init(void)
>   if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) {
>   u64 pa = slow_virt_to_phys(this_cpu_ptr(_reason));
>  
> + if (kvm_map_percpu_hv_shared(this_cpu_ptr(_reason),
> + sizeof(struct kvm_vcpu_pv_apf_data)))
> + goto skip_asyncpf;
>  #ifdef CONFIG_PREEMPT
>   pa |= KVM_ASYNC_PF_SEND_ALWAYS;
>  #endif
>   wrmsrl(MSR_KVM_ASYNC_PF_EN, pa | KVM_ASYNC_PF_ENABLED);
>   __this_cpu_write(apf_reason.enabled, 1);
> - printk(KERN_INFO"KVM setup async PF for cpu %d\n",
> -smp_processor_id());
> + printk(KERN_INFO"KVM setup async PF for cpu %d msr %llx\n",
> +smp_processor_id(), pa);
>   }
> -
> +skip_asyncpf:
>   if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
>   unsigned long pa;

Re: [RFC PATCH v2 30/32] kvm: svm: Add support for SEV DEBUG_ENCRYPT command

2017-03-16 Thread Paolo Bonzini



On 02/03/2017 16:18, Brijesh Singh wrote:
> + data = (void *) get_zeroed_page(GFP_KERNEL);

The page does not need to be zeroed, does it?

> +
> + if ((len & 15) || (dst_addr & 15)) {
> + /* if destination address and length are not 16-byte
> +  * aligned then:
> +  * a) decrypt destination page into temporary buffer
> +  * b) copy source data into temporary buffer at correct offset
> +  * c) encrypt temporary buffer
> +  */
> + ret = __sev_dbg_decrypt_page(kvm, dst_addr, data, >error);

Ah, I see now you're using this function here for read-modify-write.
data is already pinned here, so even if you keep the function it makes
sense to push pinning out of __sev_dbg_decrypt_page and into
sev_dbg_decrypt.

> + if (ret)
> + goto err_3;
> + d_off = dst_addr & (PAGE_SIZE - 1);
> +
> + if (copy_from_user(data + d_off,
> + (uint8_t *)debug.src_addr, len)) {
> + ret = -EFAULT;
> + goto err_3;
> + }
> +
> + encrypt->length = PAGE_SIZE;

Why decrypt/re-encrypt all the page instead of just the 16 byte area
around the [dst_addr, dst_addr+len) range?

> + encrypt->src_addr = __psp_pa(data);
> + encrypt->dst_addr =  __sev_page_pa(inpages[0]);
> + } else {
> + if (copy_from_user(data, (uint8_t *)debug.src_addr, len)) {
> + ret = -EFAULT;
> + goto err_3;
> + }

Do you need copy_from_user, or can you just pin/unpin memory as for
DEBUG_DECRYPT?

Paolo

> + d_off = dst_addr & (PAGE_SIZE - 1);
> + encrypt->length = len;
> + encrypt->src_addr = __psp_pa(data);
> + encrypt->dst_addr = __sev_page_pa(inpages[0]);
> + encrypt->dst_addr += d_off;
> + }
> +
> + encrypt->handle = sev_get_handle(kvm);
> + ret = sev_issue_cmd(kvm, SEV_CMD_DBG_ENCRYPT, encrypt, >error);

Re: [RFC PATCH v2 29/32] kvm: svm: Add support for SEV DEBUG_DECRYPT command

2017-03-16 Thread Paolo Bonzini



On 02/03/2017 16:18, Brijesh Singh wrote:
> +static int __sev_dbg_decrypt_page(struct kvm *kvm, unsigned long src,
> + void *dst, int *error)
> +{
> + inpages = sev_pin_memory(src, PAGE_SIZE, );
> + if (!inpages) {
> + ret = -ENOMEM;
> + goto err_1;
> + }
> +
> + data->handle = sev_get_handle(kvm);
> + data->dst_addr = __psp_pa(dst);
> + data->src_addr = __sev_page_pa(inpages[0]);
> + data->length = PAGE_SIZE;
> +
> + ret = sev_issue_cmd(kvm, SEV_CMD_DBG_DECRYPT, data, error);
> + if (ret)
> + printk(KERN_ERR "SEV: DEBUG_DECRYPT %d (%#010x)\n",
> + ret, *error);
> + sev_unpin_memory(inpages, npages);
> +err_1:
> + kfree(data);
> + return ret;
> +}
> +
> +static int sev_dbg_decrypt(struct kvm *kvm, struct kvm_sev_cmd *argp)
> +{
> + void *data;
> + int ret, offset, len;
> + struct kvm_sev_dbg debug;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + if (copy_from_user(, (void *)argp->data,
> + sizeof(struct kvm_sev_dbg)))
> + return -EFAULT;
> + /*
> +  * TODO: add support for decrypting length which crosses the
> +  * page boundary.
> +  */
> + offset = debug.src_addr & (PAGE_SIZE - 1);
> + if (offset + debug.length > PAGE_SIZE)
> + return -EINVAL;
> +

Please do add it, it doesn't seem very different from what you're doing
in LAUNCH_UPDATE_DATA.  There's no need for a separate
__sev_dbg_decrypt_page function, you can just pin/unpin here and do a
per-page loop as in LAUNCH_UPDATE_DATA.

Paolo

Re: [RFC PATCH v2 26/32] kvm: svm: Add support for SEV LAUNCH_UPDATE_DATA command

2017-03-16 Thread Paolo Bonzini



On 02/03/2017 16:17, Brijesh Singh wrote:
> +static struct page **sev_pin_memory(unsigned long uaddr, unsigned long ulen,
> + unsigned long *n)
> +{
> + struct page **pages;
> + int first, last;
> + unsigned long npages, pinned;
> +
> + /* Get number of pages */
> + first = (uaddr & PAGE_MASK) >> PAGE_SHIFT;
> + last = ((uaddr + ulen - 1) & PAGE_MASK) >> PAGE_SHIFT;
> + npages = (last - first + 1);
> +
> + pages = kzalloc(npages * sizeof(struct page *), GFP_KERNEL);
> + if (!pages)
> + return NULL;
> +
> + /* pin the user virtual address */
> + down_read(>mm->mmap_sem);
> + pinned = get_user_pages_fast(uaddr, npages, 1, pages);
> + up_read(>mm->mmap_sem);

get_user_pages_fast, like get_user_pages_unlocked, must be called
without mmap_sem held.

> + if (pinned != npages) {
> + printk(KERN_ERR "SEV: failed to pin  %ld pages (got %ld)\n",
> + npages, pinned);
> + goto err;
> + }
> +
> + *n = npages;
> + return pages;
> +err:
> + if (pinned > 0)
> + release_pages(pages, pinned, 0);
> + kfree(pages);
> +
> + return NULL;
> +}
>
> + /* the array of pages returned by get_user_pages() is a page-aligned
> +  * memory. Since the user buffer is probably not page-aligned, we need
> +  * to calculate the offset within a page for first update entry.
> +  */
> + offset = uaddr & (PAGE_SIZE - 1);
> + len = min_t(size_t, (PAGE_SIZE - offset), ulen);
> + ulen -= len;
> +
> + /* update first page -
> +  * special care need to be taken for the first page because we might
> +  * be dealing with offset within the page
> +  */

No need to special case the first page; just set "offset = 0" inside the
loop after the first iteration.

Paolo

> + data->handle = sev_get_handle(kvm);
> + data->length = len;
> + data->address = __sev_page_pa(inpages[0]) + offset;
> + ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_DATA,
> + data, >error);
> + if (ret)
> + goto err_3;
> +
> + /* update remaining pages */
> + for (i = 1; i < nr_pages; i++) {
> +
> + len = min_t(size_t, PAGE_SIZE, ulen);
> + ulen -= len;
> + data->length = len;
> + data->address = __sev_page_pa(inpages[i]);
> + ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_UPDATE_DATA,
> + data, >error);
> + if (ret)
> + goto err_3;
> + }

Re: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm

2017-03-16 Thread Michael Ellerman

Daniel Axtens  writes:

> The core nuts and bolts of the crc32c vpmsum algorithm will
> also work for a number of other CRC algorithms with different
> polynomials. Factor out the function into a new asm file.
>
> To handle multiple users of the function, a user simply
> provides constants, defines the name of their CRC function,
> and then #includes the core algorithm file.
>
> Cc: Anton Blanchard 
> Signed-off-by: Daniel Axtens 
>
> --
>
> It's possible at this point to argue that the address
> of the constant tables should be passed in to the function,
> rather than doing this somewhat unconventional #include.
>
> However, we're about to add further #ifdef's back into the core
> that will be provided by the encapsulaing code, and which couldn't
> be done as a variable without performance loss.
> ---
>  arch/powerpc/crypto/crc32-vpmsum_core.S | 726 
> 
>  arch/powerpc/crypto/crc32c-vpmsum_asm.S | 714 +--
>  2 files changed, 729 insertions(+), 711 deletions(-)
>  create mode 100644 arch/powerpc/crypto/crc32-vpmsum_core.S

So although this sits in arch/powerpc, it's heavy on the crypto which is
not my area of expertise (to say the least!), so I think it should
probably go via Herbert and the crypto tree?

cheers

Re: [PATCH 2/4] crypto: powerpc - Re-enable non-REFLECTed CRCs

2017-03-16 Thread Michael Ellerman

Daniel Axtens  writes:

> When CRC32c was included in the kernel, Anton ripped out
> the #ifdefs around reflected polynomials, because CRC32c
> is always reflected. However, not all CRCs use reflection
> so we'd like to make it optional.
>
> Restore the REFLECT parts from Anton's original CRC32
> implementation (https://github.com/antonblanchard/crc32-vpmsum)
>
> That implementation is available under GPLv2+, so we're OK
> from a licensing point of view:
> https://github.com/antonblanchard/crc32-vpmsum/blob/master/LICENSE.TXT

It's also written by Anton and copyright IBM, so you (we (IBM)) could
always just relicense it anyway.

So doubly OK IMO.

cheers

Re: [RFC PATCH v2 32/32] x86: kvm: Pin the guest memory when SEV is active

2017-03-16 Thread Paolo Bonzini



On 02/03/2017 16:18, Brijesh Singh wrote:
> The SEV memory encryption engine uses a tweak such that two identical
> plaintexts at different location will have a different ciphertexts.
> So swapping or moving ciphertexts of two pages will not result in
> plaintexts being swapped. Relocating (or migrating) a physical backing pages
> for SEV guest will require some additional steps. The current SEV key
> management spec [1] does not provide commands to swap or migrate (move)
> ciphertexts. For now we pin the memory allocated for the SEV guest. In
> future when SEV key management spec provides the commands to support the
> page migration we can update the KVM code to remove the pinning logical
> without making any changes into userspace (qemu).
> 
> The patch pins userspace memory when a new slot is created and unpin the
> memory when slot is removed.
> 
> [1] http://support.amd.com/TechDocs/55766_SEV-KM%20API_Spec.pdf

This is not enough, because memory can be hidden temporarily from the
guest and remapped later.  Think of a PCI BAR that is backed by RAM, or
also SMRAM.  The pinning must be kept even in that case.

You need to add a pair of KVM_MEMORY_ENCRYPT_OPs (one that doesn't map
to a PSP operation), such as KVM_REGISTER/UNREGISTER_ENCRYPTED_RAM.  In
QEMU you can use a RAMBlockNotifier to invoke the ioctls.

Paolo

> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/include/asm/kvm_host.h |6 +++
>  arch/x86/kvm/svm.c  |   93 
> +++
>  arch/x86/kvm/x86.c  |3 +
>  3 files changed, 102 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index fcc4710..9dc59f0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -723,6 +723,7 @@ struct kvm_sev_info {
>   unsigned int handle;/* firmware handle */
>   unsigned int asid;  /* asid for this guest */
>   int sev_fd; /* SEV device fd */
> + struct list_head pinned_memory_slot;
>  };
>  
>  struct kvm_arch {
> @@ -1043,6 +1044,11 @@ struct kvm_x86_ops {
>   void (*setup_mce)(struct kvm_vcpu *vcpu);
>  
>   int (*memory_encryption_op)(struct kvm *kvm, void __user *argp);
> +
> + void (*prepare_memory_region)(struct kvm *kvm,
> + struct kvm_memory_slot *memslot,
> + const struct kvm_userspace_memory_region *mem,
> + enum kvm_mr_change change);
>  };
>  
>  struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 13996d6..ab973f9 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -498,12 +498,21 @@ static inline bool gif_set(struct vcpu_svm *svm)
>  }
>  
>  /* Secure Encrypted Virtualization */
> +struct kvm_sev_pinned_memory_slot {
> + struct list_head list;
> + unsigned long npages;
> + struct page **pages;
> + unsigned long userspace_addr;
> + short id;
> +};
> +
>  static unsigned int max_sev_asid;
>  static unsigned long *sev_asid_bitmap;
>  static void sev_deactivate_handle(struct kvm *kvm);
>  static void sev_decommission_handle(struct kvm *kvm);
>  static int sev_asid_new(void);
>  static void sev_asid_free(int asid);
> +static void sev_unpin_memory(struct page **pages, unsigned long npages);
>  #define __sev_page_pa(x) ((page_to_pfn(x) << PAGE_SHIFT) | sme_me_mask)
>  
>  static bool kvm_sev_enabled(void)
> @@ -1544,9 +1553,25 @@ static inline int avic_free_vm_id(int id)
>  
>  static void sev_vm_destroy(struct kvm *kvm)
>  {
> + struct list_head *pos, *q;
> + struct kvm_sev_pinned_memory_slot *pinned_slot;
> + struct list_head *head = >arch.sev_info.pinned_memory_slot;
> +
>   if (!sev_guest(kvm))
>   return;
>  
> + /* if guest memory is pinned then unpin it now */
> + if (!list_empty(head)) {
> + list_for_each_safe(pos, q, head) {
> + pinned_slot = list_entry(pos,
> + struct kvm_sev_pinned_memory_slot, list);
> + sev_unpin_memory(pinned_slot->pages,
> + pinned_slot->npages);
> + list_del(pos);
> + kfree(pinned_slot);
> + }
> + }
> +
>   /* release the firmware resources */
>   sev_deactivate_handle(kvm);
>   sev_decommission_handle(kvm);
> @@ -5663,6 +5688,8 @@ static int sev_pre_start(struct kvm *kvm, int *asid)
>   }
>   *asid = ret;
>   ret = 0;
> +
> + INIT_LIST_HEAD(>arch.sev_info.pinned_memory_slot);
>   }
>  
>   return ret;
> @@ -6189,6 +6216,71 @@ static int sev_launch_measure(struct kvm *kvm, struct 
> kvm_sev_cmd *argp)
>   return ret;
>  }
>  
> +static struct kvm_sev_pinned_memory_slot *sev_find_pinned_memory_slot(
> + struct kvm *kvm, struct kvm_memory_slot *slot)
> +{
> + struct

Re: [RFC PATCH v2 24/32] kvm: x86: prepare for SEV guest management API support

2017-03-16 Thread Paolo Bonzini



On 02/03/2017 16:17, Brijesh Singh wrote:
> ASID management:
>  - Reserve asid range for SEV guest, SEV asid range is obtained through
>CPUID Fn8000_001f[ECX]. A non-SEV guest can use any asid outside the SEV
>asid range.

How is backwards compatibility handled?

>  - SEV guest must have asid value within asid range obtained through CPUID.
>  - SEV guest must have the same asid for all vcpu's. A TLB flush is required
>if different vcpu for the same ASID is to be run on the same host CPU.

[...]

> +
> + /* which host cpu was used for running this vcpu */
> + bool last_cpuid;

Should be unsigned int.

> 
> + /* Assign the asid allocated for this SEV guest */
> + svm->vmcb->control.asid = asid;
> +
> + /* Flush guest TLB:
> +  * - when different VMCB for the same ASID is to be run on the
> +  *   same host CPU
> +  *   or
> +  * - this VMCB was executed on different host cpu in previous VMRUNs.
> +  */
> + if (sd->sev_vmcbs[asid] != (void *)svm->vmcb ||

Why the cast?

> + svm->last_cpuid != cpu)
> + svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ALL_ASID;

If there is a match, you don't need to do anything else (neither reset
the asid, nor mark it as dirty, nor update the fields), so:

if (sd->sev_vmcbs[asid] == svm->vmcb &&
svm->last_cpuid == cpu)
return;

svm->last_cpuid = cpu;
sd->sev_vmcbs[asid] = svm->vmcb;
svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ALL_ASID;
svm->vmcb->control.asid = asid;
mark_dirty(svm->vmcb, VMCB_ASID);

(plus comments ;)).

Also, why not TLB_CONTROL_FLUSH_ASID if possible?

> + svm->last_cpuid = cpu;
> + sd->sev_vmcbs[asid] = (void *)svm->vmcb;
> +
> + mark_dirty(svm->vmcb, VMCB_ASID);

[...]

> 
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index fef7d83..9df37a2 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1284,6 +1284,104 @@ struct kvm_s390_ucas_mapping {
>  /* Memory Encryption Commands */
>  #define KVM_MEMORY_ENCRYPT_OP  _IOWR(KVMIO, 0xb8, unsigned long)
>  
> +/* Secure Encrypted Virtualization mode */
> +enum sev_cmd_id {

Please add documentation in Documentation/virtual/kvm/memory_encrypt.txt.

Paolo

Re: [RFC PATCH v2 23/32] kvm: introduce KVM_MEMORY_ENCRYPT_OP ioctl

2017-03-16 Thread Paolo Bonzini



On 02/03/2017 16:17, Brijesh Singh wrote:
> If hardware supports encrypting then KVM_MEMORY_ENCRYPT_OP ioctl can
> be used by qemu to issue platform specific memory encryption commands.
> 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/include/asm/kvm_host.h |2 ++
>  arch/x86/kvm/x86.c  |   12 
>  include/uapi/linux/kvm.h|2 ++
>  3 files changed, 16 insertions(+)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index bff1f15..62651ad 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1033,6 +1033,8 @@ struct kvm_x86_ops {
>   void (*cancel_hv_timer)(struct kvm_vcpu *vcpu);
>  
>   void (*setup_mce)(struct kvm_vcpu *vcpu);
> +
> + int (*memory_encryption_op)(struct kvm *kvm, void __user *argp);
>  };
>  
>  struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2099df8..6a737e9 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3926,6 +3926,14 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   return r;
>  }
>  
> +static int kvm_vm_ioctl_memory_encryption_op(struct kvm *kvm, void __user 
> *argp)
> +{
> + if (kvm_x86_ops->memory_encryption_op)
> + return kvm_x86_ops->memory_encryption_op(kvm, argp);
> +
> + return -ENOTTY;
> +}
> +
>  long kvm_arch_vm_ioctl(struct file *filp,
>  unsigned int ioctl, unsigned long arg)
>  {
> @@ -4189,6 +4197,10 @@ long kvm_arch_vm_ioctl(struct file *filp,
>   r = kvm_vm_ioctl_enable_cap(kvm, );
>   break;
>   }
> + case KVM_MEMORY_ENCRYPT_OP: {
> + r = kvm_vm_ioctl_memory_encryption_op(kvm, argp);
> + break;
> + }
>   default:
>   r = kvm_vm_ioctl_assigned_device(kvm, ioctl, arg);
>   }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index cac48ed..fef7d83 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1281,6 +1281,8 @@ struct kvm_s390_ucas_mapping {
>  #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct 
> kvm_s390_irq_state)
>  /* Available with KVM_CAP_X86_SMM */
>  #define KVM_SMI   _IO(KVMIO,   0xb7)
> +/* Memory Encryption Commands */
> +#define KVM_MEMORY_ENCRYPT_OP  _IOWR(KVMIO, 0xb8, unsigned long)
>  
>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU  (1 << 0)
>  #define KVM_DEV_ASSIGN_PCI_2_3   (1 << 1)
> 

Reviewed-by: Paolo Bonzini

RE: [PATCH 1/4] crypto: powerpc - Factor out the core CRC vpmsum algorithm

2017-03-16 Thread David Laight

From: Daniel Axtens
> Sent: 15 March 2017 22:30
> Hi David,
> 
> > While not part of this change, the unrolled loops look as though
> > they just destroy the cpu cache.
> > I'd like be convinced that anything does CRC over long enough buffers
> > to make it a gain at all.
> >
> > With modern (not that modern now) superscalar cpus you can often
> > get the loop instructions 'for free'.
> > Sometimes pipelining the loop is needed to get full throughput.
> > Unlike the IP checksum, you don't even have to 'loop carry' the
> > cpu carry flag.
> 
> Internal testing on a NVMe device with T10DIF enabled on 4k blocks
> shows a 20x - 30x improvement. Without these patches, crc_t10dif_generic
> uses over 60% of CPU time - with these patches CRC drops to single
> digits.
> 
> I should probably have lead with that, sorry.

I'm not doubting that using the cpu instruction for crcs gives a
massive performance boost.

Just that the heavily unrolled loop is unlikely to help overall.
Some 'cold cache' tests on shorter buffers might be illuminating.
 
> FWIW, the original patch showed a 3.7x gain on btrfs as well -
> 6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c")
> 
> When Anton wrote the original code he had access to IBM's internal
> tooling for looking at how instructions flow through the various stages
> of the CPU, so I trust it's pretty much optimal from that point of view.

Doesn't mean he used it :-)

David

Re: [PATCH v5 1/2] crypto: skcipher AF_ALG - overhaul memory management

2017-03-16 Thread Stephan Müller

Am Donnerstag, 16. März 2017, 10:52:48 CET schrieb Herbert Xu:

Hi Herbert,

> First of all you're only limiting the amount of memory occupied
> by the SG list which is not the same thing as the memory pinned
> down by the actual recvmsg.

I am fully aware of that. As this was present in the code, I thought I could 
reuse that approach.

Are you saying that you want to stop this approach? 
> 
> More importantly, with the current code, a very large recvmsg
> would still work by doing it piecemeal.

This piecemeal would be done in the sync case. In the current AIO code path, 
such piecemeal would not be achieved.

> With your patch, won't
> it fail because sock_kmalloc could fail to allocate memory for
> the whole thing?

I this case yes. To handle AIO and sync in a common code path (and thus make 
both behave identically), this would be the result.

If very large recvmsg requests are to be handled, the socket option could be 
increased or the caller would piecemeal that request.

Otherwise, if do a piecemeal handling like it is done now, I see the following 
drawbacks:

- no common sync/AIO code and no common behavior

- no common approach between skcipher and other implementations (algif_aead 
and the algif_akcipher that I have in the pipe which looks identically to the 
proposed patchset)

Ciao
Stephan

Re: [RFC PATCH v2 12/32] x86: Add early boot support when running with SEV active

2017-03-16 Thread Borislav Petkov

On Fri, Mar 10, 2017 at 10:35:30AM -0600, Brijesh Singh wrote:
> We could update this patch to use the below logic:
> 
>  * CPUID(0) - Check for AuthenticAMD
>  * CPID(1) - Check if under hypervisor
>  * CPUID(0x8000) - Check for highest supported leaf
>  * CPUID(0x801F).EAX - Check for SME and SEV support
>  * rdmsr (MSR_K8_SYSCFG)[MemEncryptionModeEnc] - Check if SMEE is set

Actually, it is still not clear to me *why* we need to do anything
special wrt SEV in the guest.

Lemme clarify: why can't the guest boot just like a normal Linux on
baremetal and use the SME(!) detection code to set sme_enable and so
on? IOW, I'd like to avoid all those checks whether we're running under
hypervisor and handle all that like we're running on baremetal.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [PATCH] crypto: doc - fix typo (struct sdesc)

2017-03-16 Thread Herbert Xu

Fabien Dessenne  wrote:
> Add missing " " in api-samples.rst
> 
> Signed-off-by: Fabien Dessenne 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH v2] MAINTAINERS: Add maintianer entry for crypto/s5p-sss

2017-03-16 Thread Herbert Xu

On Sat, Mar 11, 2017 at 08:11:00AM +0200, Krzysztof Kozlowski wrote:
> Add Krzysztof Kozlowski and Vladimir Zapolskiy as maintainers of s5p-sss
> driver for handling reviews, testing and getting bug reports from the
> users.
> 
> Cc: Vladimir Zapolskiy 
> Cc: Herbert Xu 
> Signed-off-by: Krzysztof Kozlowski 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH v1 0/8] improve performances on mediatek crypto driver

2017-03-16 Thread Herbert Xu

On Thu, Mar 09, 2017 at 10:11:11AM +0800, Ryder Lee wrote:
> Hi all,
> 
> Some patches of this series improve the performances whereas others
> clean up code and refine data structure to make it more efficient
> 
> Changes since v1:
> - drop OFB and CFB patch
> 
> Ryder Lee (8):
>   crypto: mediatek - rework interrupt handler
>   crypto: mediatek - add MTK_* prefix and correct annotations.
>   crypto: mediatek - make mtk_sha_xmit() more generic
>   crypto: mediatek - simplify descriptor ring management
>   crypto: mediatek - add queue_task tasklet
>   crypto: mediatek - fix error handling in mtk_aes_complete()
>   crypto: mediatek - add mtk_aes_gcm_tag_verify()
>   crypto: mediatek - make hardware operation flow more efficient

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 0/4] hwrng: omap - fixes and improvements

2017-03-16 Thread Herbert Xu

On Tue, Mar 07, 2017 at 03:14:45PM +0100, Thomas Petazzoni wrote:
> Hello,
> 
> This small patch series brings a few fixes and improvements to the
> omap_rng driver. The first fix is particularly important, as it fixes
> using the driver built as a module on SoCs that require a clock for
> the IP to work properly.
> 
> Thanks,
> 
> Thomas
> 
> Thomas Petazzoni (4):
>   hwrng: omap - write registers after enabling the clock
>   hwrng: omap - use devm_clk_get() instead of of_clk_get()
>   hwrng: omap - Do not access INTMASK_REG on EIP76
>   hwrng: omap - move clock related code to omap_rng_probe()

Patches 1-3 applied to crypto and patch 4 applied to cryptodev.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH v3 1/3] clk: meson-gxbb: expose clock CLKID_RNG0

2017-03-16 Thread Herbert Xu

On Wed, Feb 22, 2017 at 07:55:24AM +0100, Heiner Kallweit wrote:
> Expose clock CLKID_RNG0 which is needed for the HW random number generator.
> 
> Signed-off-by: Heiner Kallweit 

All patches applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 0/2] hwrng: revert managed API changes for amd and geode

2017-03-16 Thread Herbert Xu

On Tue, Mar 14, 2017 at 07:36:00AM -0400, Prarit Bhargava wrote:
> When booting top-of-tree the following WARN_ON triggers in the kernel on
> a 15h AMD system.
> 
> WARNING: CPU: 2 PID: 621 at drivers/base/dd.c:349 driver_probe_device+0x38c
> Modules linked in: i2c_amd756(+) amd_rng sg pcspkr parport_pc(+) parport k8
> CPU: 2 PID: 621 Comm: systemd-udevd Not tainted 4.11.0-0.rc1.git0.1.el7_UNS
> Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./TYAN High-End 
> Call Trace:
> dump_stack+0x63/0x8e
> __warn+0xd1/0xf0
> warn_slowpath_null+0x1d/0x20
> driver_probe_device+0x38c/0x470
> __driver_attach+0xc9/0xf0
> ? driver_probe_device+0x470/0x470
> bus_for_each_dev+0x5d/0x90
> driver_attach+0x1e/0x20
> bus_add_driver+0x1d0/0x290
> driver_register+0x60/0xe0
> ? 0xa0037000
> __pci_register_driver+0x4c/0x50
> amd756_driver_init+0x1e/0x1000 [i2c_amd756]
> do_one_initcall+0x51/0x1b0
> ? __vunmap+0x85/0xd0
> ? do_init_module+0x27/0x1fa
> do_init_module+0x60/0x1fa
> load_module+0x15d1/0x1ad0
> ? m_show+0x1c0/0x1c0
> SYSC_finit_module+0xa9/0xd0
> 
> There are PCI devices that contain both a RNG and SMBUS device.  The
> RNG device is initialized by the amd-rng driver but the driver does not
> register against the device.  The SMBUS device is initialized by the
> i2c-amd756 driver and registers against the device and hits the WARN_ON()
> because the amd-rng driver has already allocated resources against the
> device.
> 
> The amd-rng driver was incorrectly migrated to the device resource model
> (devres), and after code inspection I found that the geode-rng driver was also
> incorrectly migrated.  These drivers are using devres but do not register a
> driver against the device, and both drivers are expecting a memory cleanup on
> a driver detach that will never happen.  This results in a memory leak when 
> the
> driver is unloaded and the inability to reload the driver.
> 
> Revert 31b2a73c9c5f ("hwrng: amd - Migrate to managed API"), and 6e9b5e76882c
> ("hwrng: geode - Migrate to managed API").
> 
> Signed-off-by: Prarit Bhargava 
> Fixes: 31b2a73c9c5f ("hwrng: amd - Migrate to managed API").
> Fixes: 6e9b5e76882c ("hwrng: geode - Migrate to managed API")
> Cc: Matt Mackall 
> Cc: Herbert Xu 
> Cc: Corentin LABBE 
> Cc: PrasannaKumar Muralidharan 
> Cc: Wei Yongjun 
> Cc: linux-crypto@vger.kernel.org
> Cc: linux-ge...@lists.infradead.org

Both patches applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto: ccp - Assign DMA commands to the channel's CCP

2017-03-16 Thread Herbert Xu

On Fri, Mar 10, 2017 at 12:28:18PM -0600, Gary R Hook wrote:
> From: Gary R Hook 
> 
> The CCP driver generally uses a round-robin approach when
> assigning operations to available CCPs. For the DMA engine,
> however, the DMA mappings of the SGs are associated with a
> specific CCP. When an IOMMU is enabled, the IOMMU is
> programmed based on this specific device.
> 
> If the DMA operations are not performed by that specific
> CCP then addressing errors and I/O page faults will occur.
> 
> Update the CCP driver to allow a specific CCP device to be
> requested for an operation and use this in the DMA engine
> support.
> 
> Cc:  # 4.9.x-
> Signed-off-by: Gary R Hook 

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH v5 1/2] crypto: skcipher AF_ALG - overhaul memory management

2017-03-16 Thread Herbert Xu

On Thu, Mar 16, 2017 at 10:23:49AM +0100, Stephan Müller wrote:
> Am Donnerstag, 16. März 2017, 10:08:23 CET schrieb Herbert Xu:
> 
> Hi Herbert,
> 
> > On Thu, Mar 16, 2017 at 09:55:17AM +0100, Stephan Müller wrote:
> > > With this approach I thought that the while loop could be a thing of the
> > > past, considering that this is also the approach taken in
> > > skcipher_recvmsg_async that is present in the current upstream code base.
> > 
> > The reason there is a limit is so that user-space doesn't pin down
> > unlimited amounts of memory.  How is this addressed under your
> > scheme?
> 
> sock_kmalloc limits the number of SG tables that we can manage. On my system, 
> sock_kmalloc has 20480 bytes at its disposal as the limiting factor.
> 
> As this concept for limiting the impact of user space on kernel memory is 
> used 
> in the current upstream skcipher_recvmsg_async, I simply re-used that 
> approach:
> 
> while (iov_iter_count(>msg_iter)) {
> struct skcipher_async_rsgl *rsgl;
> ...
> rsgl = kmalloc(sizeof(*rsgl), GFP_KERNEL);
> ...
> 
> Note I use sock_kmalloc instead of the currently existing kmalloc to ensure 
> such limitation.

First of all you're only limiting the amount of memory occupied
by the SG list which is not the same thing as the memory pinned
down by the actual recvmsg.

More importantly, with the current code, a very large recvmsg
would still work by doing it piecemeal.  With your patch, won't
it fail because sock_kmalloc could fail to allocate memory for
the whole thing?

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH v5 1/2] crypto: skcipher AF_ALG - overhaul memory management

2017-03-16 Thread Stephan Müller

Am Donnerstag, 16. März 2017, 10:08:23 CET schrieb Herbert Xu:

Hi Herbert,

> On Thu, Mar 16, 2017 at 09:55:17AM +0100, Stephan Müller wrote:
> > With this approach I thought that the while loop could be a thing of the
> > past, considering that this is also the approach taken in
> > skcipher_recvmsg_async that is present in the current upstream code base.
> 
> The reason there is a limit is so that user-space doesn't pin down
> unlimited amounts of memory.  How is this addressed under your
> scheme?

sock_kmalloc limits the number of SG tables that we can manage. On my system, 
sock_kmalloc has 20480 bytes at its disposal as the limiting factor.

As this concept for limiting the impact of user space on kernel memory is used 
in the current upstream skcipher_recvmsg_async, I simply re-used that 
approach:

while (iov_iter_count(>msg_iter)) {
struct skcipher_async_rsgl *rsgl;
...
rsgl = kmalloc(sizeof(*rsgl), GFP_KERNEL);
...

Note I use sock_kmalloc instead of the currently existing kmalloc to ensure 
such limitation.

Also, please note that this approach is identical to the currently used code 
in algif_aead:aead_recvmsg_sync. This implementation led be to belief that 
this my new code would be appropriate in this case, too.

Ciao
Stephan

Re: [PATCH v5 1/2] crypto: skcipher AF_ALG - overhaul memory management

2017-03-16 Thread Herbert Xu

On Thu, Mar 16, 2017 at 09:55:17AM +0100, Stephan Müller wrote:
>
> With this approach I thought that the while loop could be a thing of the 
> past, 
> considering that this is also the approach taken in skcipher_recvmsg_async 
> that is present in the current upstream code base.

The reason there is a limit is so that user-space doesn't pin down
unlimited amounts of memory.  How is this addressed under your
scheme?

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH v5 1/2] crypto: skcipher AF_ALG - overhaul memory management

2017-03-16 Thread Stephan Müller

Am Donnerstag, 16. März 2017, 09:39:23 CET schrieb Herbert Xu:

Hi Herbert,

> On Fri, Feb 17, 2017 at 11:31:41PM +0100, Stephan Müller wrote:
> > +   } else {
> > +   /* Synchronous operation */
> > +   skcipher_request_set_callback(>req,
> > + CRYPTO_TFM_REQ_MAY_SLEEP |
> > + CRYPTO_TFM_REQ_MAY_BACKLOG,
> > + af_alg_complete,
> > + >completion);
> > +   err = af_alg_wait_for_completion(ctx->enc ?
> > +   crypto_skcipher_encrypt(>req) :
> > +   crypto_skcipher_decrypt(>req),
> > +>completion);
> > +   }
> 
> This is now outside of the loop for the sync case.  The purpose
> of the loop in the sync case was to segment the data when we get
> a very large SG list that does not fit inside a single call.
> 
> Or did I miss something?

The while loop present in the skcipher_recvmsg_sync present in the current 
upstream kernel operates on ctx->rsgl. That data structure is defined as:

struct af_alg_sgl {
struct scatterlist sg[ALG_MAX_PAGES + 1];
struct page *pages[ALG_MAX_PAGES];
unsigned int npages;
};

I.e. recvmsg operates on at most 16 SGs where each can take one page of data. 
Thus, if a user provides more data than 16 pages, such while loop would make 
much sense.

The patch proposed here is not limited on a fixed set of 16 SGs in the RX-SGL. 
The recvmsg code path allocates the required amount of SGLs space: 

/* convert iovecs of output buffers into RX SGL */
while (len < ctx->used && iov_iter_count(>msg_iter)) {
struct skcipher_rsgl *rsgl;
...
rsgl = sock_kmalloc(sk, sizeof(*rsgl), GFP_KERNEL);
...

struct skcipher_rsgl uses the af_alg_sgl data structure with its 16 SG 
entries. But now you can have arbitrary numbers of af_alg_sgl instances up to 
the memory provided by sock_kmalloc. I.e. recvmsg can now operate on multiples 
of af_alg_sgl in one go.

With this approach I thought that the while loop could be a thing of the past, 
considering that this is also the approach taken in skcipher_recvmsg_async 
that is present in the current upstream code base.

> 
> Overall I like this patch.

Thank you very much.
> 
> Thanks,

Ciao
Stephan

Re: [RFC 0/7] crypto: caam - add Queue Interface (QI) support

2017-03-16 Thread Herbert Xu

On Fri, Mar 03, 2017 at 04:52:06PM +0200, Horia Geantă wrote:
> The patchset adds support for CAAM Queue Interface (QI), the additional
> interface (besides job ring) available for submitting jobs to the engine
> on platforms having DPAA (Datapath Acceleration Architecture).
> 
> Patches 1-4 are QMan dependencies.
> I would prefer to take them through the crypto tree,
> but I am open to suggestions.
> 
> Patch 5 adds a missing double inclusion guard in desc_constr.h.
> 
> Patch 6 adds the caam/qi job submission backend.
> 
> Patch 7 adds algorithms (ablkcipher and authenc) that run on top
> of caam/qi. For now, their priority is set lower than caam/jr.

I'm fine with the crypto bits.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH v5 1/2] crypto: skcipher AF_ALG - overhaul memory management

2017-03-16 Thread Herbert Xu

On Fri, Feb 17, 2017 at 11:31:41PM +0100, Stephan Müller wrote:
>
> + } else {
> + /* Synchronous operation */
> + skcipher_request_set_callback(>req,
> +   CRYPTO_TFM_REQ_MAY_SLEEP |
> +   CRYPTO_TFM_REQ_MAY_BACKLOG,
> +   af_alg_complete,
> +   >completion);
> + err = af_alg_wait_for_completion(ctx->enc ?
> + crypto_skcipher_encrypt(>req) :
> + crypto_skcipher_decrypt(>req),
> +  >completion);
> + }

This is now outside of the loop for the sync case.  The purpose
of the loop in the sync case was to segment the data when we get
a very large SG list that does not fit inside a single call.

Or did I miss something?

Overall I like this patch.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

55 matches

Mail list logo