RE: [PATCHv2] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Vakul Garg


> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Wednesday, January 31, 2018 11:05 PM
> To: Vakul Garg 
> Cc: linux-crypto@vger.kernel.org; il...@mellanox.com;
> avia...@mellanox.com; da...@davemloft.net; net...@vger.kernel.org;
> Gilad Ben-Yossef 
> Subject: Re: [PATCHv2] tls: Add support for encryption using async offload
> accelerator
> 
> On 01/31/18 05:22 PM, Vakul Garg wrote:
> > > > On second though in stable we should probably just disable async
> > > > tfm allocations.
> > > > It's simpler. But this approach is still good for -next
> > > >
> > > >
> > > > Gilad
> > >
> > > I agree with Gilad, just disable async for now.
> > >
> >
> > How to do it? Can you help with the api name?
> 
> *aead = crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC);
> 
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
> hub.com%2Fktls%2Fnet_next_ktls%2Fcommit%2Ff3b9b402e755e4b0623fa8
> 3f88137173fc249f2d=02%7C01%7Cvakul.garg%40nxp.com%7Cf1c707
> 9af97e48c9e89308d568d1633a%7C686ea1d3bc2b4c6fa92cd99c5c301635%
> 7C0%7C0%7C636530170887502735=qTdwbuDsCRmK1UHAYiOcwfPC
> U%2FPXgKg%2BICh%2F2N0b9Nw%3D=0
> 
 
The above patch you authored works fine. I tested it on my test platform.
I have nothing to contribute here.
Would you send it to stable kernel yourself or still want me to do it?


> > > If the flag MSG_DONTWAIT is set, we should be returning -EINPROGRESS
> > > and not wait for a response.  I had started working on a patch for
> > > that, but it's pretty tricky to get right.
> >
> > Can you point me to your WIP code branch for this?
> 
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
> hub.com%2Fktls%2Fnet_next_ktls%2Fcommit%2F9cc839aa551ed972d148ec
> ebf353b25ee93543b9=02%7C01%7Cvakul.garg%40nxp.com%7Cf1c70
> 79af97e48c9e89308d568d1633a%7C686ea1d3bc2b4c6fa92cd99c5c301635
> %7C0%7C0%7C636530170887502735=0ASwmPPXXSGCXpBGes9vBma
> y8Gojv0wSUGAOOOjBExc%3D=0
> 
> > If MSG_DONTWAIT is not used, will it be sane if enqueue the crypto
> > request to accelerator and return to user space back so that user
> > space can send more plaintext data while crypto accelerator is working in
> parallel?
> 
> Right, that's roughly what the above does.  I believe the tricky unfinished 
> part
> was getting poll() to work correctly if there is an async crypto request
> outstanding.  Currently the tls poll() just relies on backpressure from
> do_tcp_sendpages.
 
I wanted to add async accelerator support for ktls where multiple crypto 
requests can be pipelined.
But since you are already doing it, I won't duplicate the efforts.
I would leverage your work on my platform, test it and try to contribute from 
here.



[PATCH v3 3/4] crypto: aesni - Directly use kmap_atomic instead of scatter_walk object in gcm(aes)

2018-01-31 Thread Junaid Shahid
gcmaes_crypt uses a scatter_walk object to map and unmap the crypto
request sglists. But the only purpose that appears to serve here is to allow
the D-Cache to be flushed at the end for pages that were used as output.
However, that is not applicable on x86, so we can avoid using the scatter_walk
object for simplicity.

Signed-off-by: Junaid Shahid 
---
 arch/x86/crypto/aesni-intel_glue.c | 36 +++-
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index c11e531d21dd..9e69e02076d2 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -750,6 +750,11 @@ static bool is_mappable(struct scatterlist *sgl, unsigned 
long len)
   (!PageHighMem(sg_page(sgl)) || sgl->offset + len <= PAGE_SIZE);
 }
 
+static u8 *map_buffer(struct scatterlist *sgl)
+{
+   return kmap_atomic(sg_page(sgl)) + sgl->offset;
+}
+
 /*
  * Maps the sglist buffer and returns a pointer to the mapped buffer in
  * data_buf.
@@ -762,14 +767,12 @@ static bool is_mappable(struct scatterlist *sgl, unsigned 
long len)
  * the data_buf and the bounce_buf should be freed using kfree().
  */
 static int get_request_buffer(struct scatterlist *sgl,
- struct scatter_walk *sg_walk,
  unsigned long bounce_buf_size,
  u8 **data_buf, u8 **bounce_buf, bool *mapped)
 {
if (sg_is_last(sgl) && is_mappable(sgl, sgl->length)) {
*mapped = true;
-   scatterwalk_start(sg_walk, sgl);
-   *data_buf = scatterwalk_map(sg_walk);
+   *data_buf = map_buffer(sgl);
return 0;
}
 
@@ -785,14 +788,10 @@ static int get_request_buffer(struct scatterlist *sgl,
return 0;
 }
 
-static void put_request_buffer(u8 *data_buf, unsigned long len, bool mapped,
-  struct scatter_walk *sg_walk, bool output)
+static void put_request_buffer(u8 *data_buf, bool mapped)
 {
-   if (mapped) {
-   scatterwalk_unmap(data_buf);
-   scatterwalk_advance(sg_walk, len);
-   scatterwalk_done(sg_walk, output, 0);
-   }
+   if (mapped)
+   kunmap_atomic(data_buf);
 }
 
 /*
@@ -809,16 +808,14 @@ static int gcmaes_crypt(struct aead_request *req, 
unsigned int assoclen,
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
unsigned long auth_tag_len = crypto_aead_authsize(tfm);
unsigned long data_len = req->cryptlen - (decrypt ? auth_tag_len : 0);
-   struct scatter_walk src_sg_walk;
-   struct scatter_walk dst_sg_walk = {};
int retval = 0;
unsigned long bounce_buf_size = data_len + auth_tag_len + req->assoclen;
 
if (auth_tag_len > 16)
return -EINVAL;
 
-   retval = get_request_buffer(req->src, _sg_walk, bounce_buf_size,
-   , _buf, _mapped);
+   retval = get_request_buffer(req->src, bounce_buf_size, ,
+   _buf, _mapped);
if (retval)
goto exit;
 
@@ -828,9 +825,8 @@ static int gcmaes_crypt(struct aead_request *req, unsigned 
int assoclen,
dst = src;
dst_mapped = src_mapped;
} else {
-   retval = get_request_buffer(req->dst, _sg_walk,
-   bounce_buf_size, , _buf,
-   _mapped);
+   retval = get_request_buffer(req->dst, bounce_buf_size, ,
+   _buf, _mapped);
if (retval)
goto exit;
 
@@ -866,11 +862,9 @@ static int gcmaes_crypt(struct aead_request *req, unsigned 
int assoclen,
 1);
 exit:
if (req->dst != req->src)
-   put_request_buffer(dst - req->assoclen, req->dst->length,
-  dst_mapped, _sg_walk, true);
+   put_request_buffer(dst - req->assoclen, dst_mapped);
 
-   put_request_buffer(assoc, req->src->length, src_mapped, _sg_walk,
-  false);
+   put_request_buffer(assoc, src_mapped);
 
kfree(bounce_buf);
return retval;
-- 
2.16.0.rc1.238.g530d649a79-goog



[PATCH v3 4/4] crypto: aesni - Use zero-copy for gcm(aes) even if the AAD/Data/AuthTag are separate

2018-01-31 Thread Junaid Shahid
Enable the use of zero-copy even if the AAD and/or Auth Tag are in different
buffers than the actual data, as long as each of them individually satisfies
the zero-copy conditions (i.e. the entire buffer is either in low-mem or
within a single high-mem page).

Signed-off-by: Junaid Shahid 
---
 arch/x86/crypto/aesni-intel_glue.c | 121 +++--
 1 file changed, 89 insertions(+), 32 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index 9e69e02076d2..7cebc99a0405 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -756,42 +756,91 @@ static u8 *map_buffer(struct scatterlist *sgl)
 }
 
 /*
- * Maps the sglist buffer and returns a pointer to the mapped buffer in
- * data_buf.
+ * Maps the sglist buffer and returns pointers to the mapped buffers in assoc,
+ * data and (optionally) auth_tag.
  *
  * If direct mapping is not feasible, then allocates a bounce buffer if one
- * isn't already available in bounce_buf, and returns a pointer to the bounce
- * buffer in data_buf.
+ * isn't already available in bounce_buf, and returns pointers within the 
bounce
+ * buffer in assoc, data and auth_tag.
  *
- * When the buffer is no longer needed, put_request_buffer() should be called 
on
- * the data_buf and the bounce_buf should be freed using kfree().
+ * When the buffers are no longer needed, put_request_buffers() should be 
called
+ * and the bounce_buf should be freed using kfree().
  */
-static int get_request_buffer(struct scatterlist *sgl,
- unsigned long bounce_buf_size,
- u8 **data_buf, u8 **bounce_buf, bool *mapped)
+static int get_request_buffers(struct scatterlist *sgl,
+  unsigned long assoc_len, unsigned long data_len,
+  unsigned long auth_tag_len,
+  u8 **assoc, u8 **data, u8 **auth_tag,
+  u8 **bounce_buf, bool *mapped)
 {
-   if (sg_is_last(sgl) && is_mappable(sgl, sgl->length)) {
+   struct scatterlist sgl_data_chain[2], sgl_auth_tag_chain[2];
+   struct scatterlist *sgl_data, *sgl_auth_tag;
+
+   sgl_data = scatterwalk_ffwd(sgl_data_chain, sgl, assoc_len);
+   sgl_auth_tag = scatterwalk_ffwd(sgl_auth_tag_chain, sgl,
+   assoc_len + data_len);
+
+   if (is_mappable(sgl, assoc_len) && is_mappable(sgl_data, data_len) &&
+   (auth_tag == NULL || is_mappable(sgl_auth_tag, auth_tag_len))) {
*mapped = true;
-   *data_buf = map_buffer(sgl);
+
+   *assoc = map_buffer(sgl);
+
+   if (sgl->length >= assoc_len + data_len)
+   *data = *assoc + assoc_len;
+   else
+   *data = map_buffer(sgl_data);
+
+   if (auth_tag != NULL) {
+   if (sgl_data->length >= data_len + auth_tag_len)
+   *auth_tag = *data + data_len;
+   else
+   *auth_tag = map_buffer(sgl_auth_tag);
+   }
+
return 0;
}
 
*mapped = false;
 
if (*bounce_buf == NULL) {
-   *bounce_buf = kmalloc(bounce_buf_size, GFP_ATOMIC);
+   *bounce_buf = kmalloc(assoc_len + data_len + auth_tag_len,
+ GFP_ATOMIC);
if (unlikely(*bounce_buf == NULL))
return -ENOMEM;
}
 
-   *data_buf = *bounce_buf;
+   *assoc = *bounce_buf;
+   *data = *assoc + assoc_len;
+
+   if (auth_tag != NULL)
+   *auth_tag = *data + data_len;
+
return 0;
 }
 
-static void put_request_buffer(u8 *data_buf, bool mapped)
+static void put_request_buffers(struct scatterlist *sgl, bool mapped,
+   u8 *assoc, u8 *data, u8 *auth_tag,
+   unsigned long assoc_len,
+   unsigned long data_len,
+   unsigned long auth_tag_len)
 {
-   if (mapped)
-   kunmap_atomic(data_buf);
+   struct scatterlist sgl_data_chain[2];
+   struct scatterlist *sgl_data;
+
+   if (!mapped)
+   return;
+
+   sgl_data = scatterwalk_ffwd(sgl_data_chain, sgl, assoc_len);
+
+   /* The unmaps need to be done in reverse order of the maps. */
+
+   if (auth_tag != NULL && sgl_data->length < data_len + auth_tag_len)
+   kunmap_atomic(auth_tag);
+
+   if (sgl->length < assoc_len + data_len)
+   kunmap_atomic(data);
+
+   kunmap_atomic(assoc);
 }
 
 /*
@@ -803,34 +852,38 @@ static void put_request_buffer(u8 *data_buf, bool mapped)
 static int gcmaes_crypt(struct aead_request *req, unsigned int assoclen,
u8 *hash_subkey, u8 *iv, void *aes_ctx, bool decrypt)
 {
-  

[PATCH v3 1/4] crypto: aesni - Fix out-of-bounds access of the AAD buffer in AVX gcm-aesni

2018-01-31 Thread Junaid Shahid
The AVX/AVX2 versions of gcm-aes encryption/decryption functions can
access memory after the end of the AAD buffer if the AAD length is
not a multiple of 4 bytes. It didn't matter as long as the AAD and
data buffers were always contiguous, since the AVX version are not used
for small data sizes and hence enough data bytes were always present to
cover the over-run. However, now that we have support for non-contiguous
AAD and data buffers, that is no longer the case. This can potentially
result in accessing a page that is not mapped and thus causing the
machine to crash. This patch fixes that by reading the last <16 byte
block of the AAD byte-by-byte and optionally via an 8-byte load if the
block was at least 8 bytes.

Signed-off-by: Junaid Shahid 
---
 arch/x86/crypto/aesni-intel_avx-x86_64.S | 154 +--
 1 file changed, 42 insertions(+), 112 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S 
b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index faecb1518bf8..97029059dc1a 100644
--- a/arch/x86/crypto/aesni-intel_avx-x86_64.S
+++ b/arch/x86/crypto/aesni-intel_avx-x86_64.S
@@ -106,14 +106,6 @@
 ##
 ##AAD Format with 64-bit Extended Sequence Number
 ##
-##
-## aadLen:
-##   from the definition of the spec, aadLen can only be 8 or 12 bytes.
-##  The code additionally supports aadLen of length 16 bytes.
-##
-## TLen:
-##   from the definition of the spec, TLen can only be 8, 12 or 16 bytes.
-##
 ## poly = x^128 + x^127 + x^126 + x^121 + 1
 ## throughout the code, one tab and two tab indentations are used. one tab is
 ## for GHASH part, two tabs is for AES part.
@@ -155,30 +147,6 @@ SHIFT_MASK:  .octa 
0x0f0e0d0c0b0a09080706050403020100
 ALL_F:   .octa 0x
  .octa 0x
 
-.section .rodata
-.align 16
-.type aad_shift_arr, @object
-.size aad_shift_arr, 272
-aad_shift_arr:
-.octa 0x
-.octa 0xff0C
-.octa 0x0D0C
-.octa 0xff0E0D0C
-.octa 0x0F0E0D0C
-.octa 0xff0C0B0A0908
-.octa 0x0D0C0B0A0908
-.octa 0xff0E0D0C0B0A0908
-.octa 0x0F0E0D0C0B0A0908
-.octa 0xff0C0B0A090807060504
-.octa 0x0D0C0B0A090807060504
-.octa 0xff0E0D0C0B0A090807060504
-.octa 0x0F0E0D0C0B0A090807060504
-.octa 0xff0C0B0A09080706050403020100
-.octa 0x0D0C0B0A09080706050403020100
-.octa 0xff0E0D0C0B0A09080706050403020100
-.octa 0x0F0E0D0C0B0A09080706050403020100
-
-
 .text
 
 
@@ -280,6 +248,36 @@ VARIABLE_OFFSET = 16*8
 vaesenclast 16*10(arg1), \XMM0, \XMM0
 .endm
 
+# Reads DLEN bytes starting at DPTR and stores in XMMDst
+# where 0 < DLEN < 16
+# Clobbers %rax, DLEN and XMM1
+.macro READ_PARTIAL_BLOCK DPTR DLEN XMM1 XMMDst
+cmp $8, \DLEN
+jl _read_lt8_\@
+movq (\DPTR), \XMMDst
+sub $8, \DLEN
+jz _done_read_partial_block_\@
+   xor %eax, %eax
+_read_next_byte_\@:
+shl $8, %rax
+mov 7(\DPTR, \DLEN, 1), %al
+dec \DLEN
+jnz _read_next_byte_\@
+movq %rax, \XMM1
+   pslldq $8, \XMM1
+por \XMM1, \XMMDst
+   jmp _done_read_partial_block_\@
+_read_lt8_\@:
+   xor %eax, %eax
+_read_next_byte_lt8_\@:
+shl $8, %rax
+mov -1(\DPTR, \DLEN, 1), %al
+dec \DLEN
+jnz _read_next_byte_lt8_\@
+movq %rax, \XMMDst
+_done_read_partial_block_\@:
+.endm
+
 #ifdef CONFIG_AS_AVX
 ###
 # GHASH_MUL MACRO to implement: Data*HashKey mod (128,127,126,121,0)
@@ -400,63 +398,29 @@ VARIABLE_OFFSET = 16*8
setreg
 
mov arg6, %r10  # r10 = AAD
-   mov arg7, %r12  # r12 = aadLen
-
-
-   mov %r12, %r11
+   mov arg7, %r11  # r11 = aadLen
 
vpxor   reg_j, reg_j, reg_j
vpxor   reg_i, reg_i, reg_i
cmp $16, %r11
-   jl  _get_AAD_rest8\@
+   jl  _get_AAD_rest\@
 _get_AAD_blocks\@:
vmovdqu (%r10), reg_i
vpshufb SHUF_MASK(%rip), reg_i, reg_i
vpxor   reg_i, reg_j, reg_j
GHASH_MUL_AVX   reg_j, \T2, \T1, \T3, \T4, \T5, \T6
add $16, %r10
-   sub $16, %r12
sub $16, %r11
cmp $16, %r11
jge _get_AAD_blocks\@
vmovdqu reg_j, reg_i
+
+   /* read the last <16B of AAD. */
+_get_AAD_rest\@:
cmp $0, %r11
je  _get_AAD_done\@
 
-   vpxor   reg_i, reg_i, reg_i
-
-   /* read the 

[PATCH v3 2/4] crypto: aesni - Enable one-sided zero copy for gcm(aes) request buffers

2018-01-31 Thread Junaid Shahid
gcmaes_encrypt/decrypt perform zero-copy crypto if both the source and
destination satisfy certain conditions (single sglist entry located in
low-mem or within a single high-mem page). But two copies are done
otherwise, even if one of source or destination still satisfies the
zero-copy conditions. This optimization is now extended to avoid the
copy on the side that does satisfy the zero-copy conditions.

Signed-off-by: Junaid Shahid 
---
 arch/x86/crypto/aesni-intel_glue.c | 256 +++--
 1 file changed, 134 insertions(+), 122 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index 34cf1c1f8c98..c11e531d21dd 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -744,136 +744,148 @@ static int generic_gcmaes_set_authsize(struct 
crypto_aead *tfm,
return 0;
 }
 
+static bool is_mappable(struct scatterlist *sgl, unsigned long len)
+{
+   return sgl->length > 0 && len <= sgl->length &&
+  (!PageHighMem(sg_page(sgl)) || sgl->offset + len <= PAGE_SIZE);
+}
+
+/*
+ * Maps the sglist buffer and returns a pointer to the mapped buffer in
+ * data_buf.
+ *
+ * If direct mapping is not feasible, then allocates a bounce buffer if one
+ * isn't already available in bounce_buf, and returns a pointer to the bounce
+ * buffer in data_buf.
+ *
+ * When the buffer is no longer needed, put_request_buffer() should be called 
on
+ * the data_buf and the bounce_buf should be freed using kfree().
+ */
+static int get_request_buffer(struct scatterlist *sgl,
+ struct scatter_walk *sg_walk,
+ unsigned long bounce_buf_size,
+ u8 **data_buf, u8 **bounce_buf, bool *mapped)
+{
+   if (sg_is_last(sgl) && is_mappable(sgl, sgl->length)) {
+   *mapped = true;
+   scatterwalk_start(sg_walk, sgl);
+   *data_buf = scatterwalk_map(sg_walk);
+   return 0;
+   }
+
+   *mapped = false;
+
+   if (*bounce_buf == NULL) {
+   *bounce_buf = kmalloc(bounce_buf_size, GFP_ATOMIC);
+   if (unlikely(*bounce_buf == NULL))
+   return -ENOMEM;
+   }
+
+   *data_buf = *bounce_buf;
+   return 0;
+}
+
+static void put_request_buffer(u8 *data_buf, unsigned long len, bool mapped,
+  struct scatter_walk *sg_walk, bool output)
+{
+   if (mapped) {
+   scatterwalk_unmap(data_buf);
+   scatterwalk_advance(sg_walk, len);
+   scatterwalk_done(sg_walk, output, 0);
+   }
+}
+
+/*
+ * Performs the encryption/decryption operation for the given request. The src
+ * and dst sglists in the request are directly mapped if possible. Otherwise, a
+ * bounce buffer is allocated and used to copy the data from the src or to the
+ * dst, or both.
+ */
+static int gcmaes_crypt(struct aead_request *req, unsigned int assoclen,
+   u8 *hash_subkey, u8 *iv, void *aes_ctx, bool decrypt)
+{
+   u8 *src, *dst, *assoc, *bounce_buf = NULL;
+   bool src_mapped = false, dst_mapped = false;
+   struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+   unsigned long auth_tag_len = crypto_aead_authsize(tfm);
+   unsigned long data_len = req->cryptlen - (decrypt ? auth_tag_len : 0);
+   struct scatter_walk src_sg_walk;
+   struct scatter_walk dst_sg_walk = {};
+   int retval = 0;
+   unsigned long bounce_buf_size = data_len + auth_tag_len + req->assoclen;
+
+   if (auth_tag_len > 16)
+   return -EINVAL;
+
+   retval = get_request_buffer(req->src, _sg_walk, bounce_buf_size,
+   , _buf, _mapped);
+   if (retval)
+   goto exit;
+
+   src = assoc + req->assoclen;
+
+   if (req->src == req->dst) {
+   dst = src;
+   dst_mapped = src_mapped;
+   } else {
+   retval = get_request_buffer(req->dst, _sg_walk,
+   bounce_buf_size, , _buf,
+   _mapped);
+   if (retval)
+   goto exit;
+
+   dst += req->assoclen;
+   }
+
+   if (!src_mapped)
+   scatterwalk_map_and_copy(bounce_buf, req->src, 0,
+req->assoclen + req->cryptlen, 0);
+
+   kernel_fpu_begin();
+
+   if (decrypt) {
+   u8 gen_auth_tag[16];
+
+   aesni_gcm_dec_tfm(aes_ctx, dst, src, data_len, iv,
+ hash_subkey, assoc, assoclen,
+ gen_auth_tag, auth_tag_len);
+   /* Compare generated tag with passed in tag. */
+   if (crypto_memneq(src + data_len, gen_auth_tag, auth_tag_len))
+   retval = -EBADMSG;
+
+   } else
+   

[PATCH v3 0/4] crypto: aesni - Use zero-copy for gcm(aes) buffers that are partially contiguous

2018-01-31 Thread Junaid Shahid
Changes in v3:
- Rebased on top of the latest linux-next
Changes in v2:
- Integrated https://patchwork.kernel.org/patch/10173981

Currently, the AESNI gcm(aes) implementation uses zero-copy only when the
entire src and dest request buffers, including the AAD, the data and the
Auth Tag are contiguous. This series enables the use of zero-copy even if the
AAD and/or Auth Tag are in different buffers than the actual data, as long as
each of them individually satisfies the zero-copy conditions (i.e. the entire
buffer is either in low-mem or within a single high-mem page). Furthermore,
it also enables the use of zero-copy even if only one of src and dest satisfies
these conditions rather than only when both of them do.

Junaid Shahid (4):
  crypto: aesni - Fix out-of-bounds access of the AAD buffer in AVX
gcm-aesni
  crypto: aesni - Enable one-sided zero copy for gcm(aes) request
buffers
  crypto: aesni - Directly use kmap_atomic instead of scatter_walk
object in gcm(aes)
  crypto: aesni - Use zero-copy for gcm(aes) even if the
AAD/Data/AuthTag are separate

 arch/x86/crypto/aesni-intel_avx-x86_64.S | 154 +---
 arch/x86/crypto/aesni-intel_glue.c   | 307 +++
 2 files changed, 227 insertions(+), 234 deletions(-)

-- 
2.16.0.rc1.238.g530d649a79-goog



Re: [PATCH v2 0/4] crypto: aesni - Use zero-copy for gcm(aes) buffers that are partially contiguous

2018-01-31 Thread Junaid Shahid
Hi Steffen,

On Wed, Jan 31, 2018 at 12:13 AM, Steffen Klassert
 wrote:
>
> I wonder which special usecase you have in mind that will be improved
> by your patches.
>

This is not related to IPsec. We have an internal use case where the
data buffer itself is a single memory page but the authentication tag
needs to be in a separate buffer. This patch set allows us to have
zero copy in that case.

>
> Maybe it would be better to investigate in the direction to
> support SG operations with gcm-aesni instead of trying to
> find all corner cases where the existing implementation can
> do zero-copy.
>

Yes, if gcm-aesni could be implemented to handle arbitrarily
fragmented buffers, that would be better, but it would likely be a
much bigger and more complicated change. Just handling the case where
the AAD/data/authtag are separate contiguous buffers is relatively
simpler but still useful.

Thanks,
Junaid


Re: [PATCH v2 2/4] crypto: aesni - Enable one-sided zero copy for gcm(aes) request buffers

2018-01-31 Thread Junaid Shahid
Hi Stephan,

Sure, I'll rebase and send another revision.

Thanks,
Junaid


RE: [RFC crypto v3 8/9] chtls: Register the ULP

2018-01-31 Thread Atul Gupta


-Original Message-
From: Dave Watson [mailto:davejwat...@fb.com] 
Sent: Wednesday, January 31, 2018 10:14 PM
To: Atul Gupta 
Cc: s...@queasysnail.net; herb...@gondor.apana.org.au; 
linux-crypto@vger.kernel.org; ganes...@chelsio.co; net...@vger.kernel.org; 
da...@davemloft.net; Boris Pismenny ; Ilya Lesokhin 

Subject: Re: [RFC crypto v3 8/9] chtls: Register the ULP

On 01/31/18 04:14 PM, Atul Gupta wrote:
> 
> 
> On Tuesday 30 January 2018 10:41 PM, Dave Watson wrote:
> > On 01/30/18 06:51 AM, Atul Gupta wrote:
> > 
> > > What I was referring is that passing "tls" ulp type in setsockopt 
> > > may be insufficient to make the decision when multi HW assist 
> > > Inline TLS solution exists.
> > Setting the ULP doesn't choose HW or SW implementation, I think that 
> > should be done later when setting up crypto with
> > 
> > setsockopt(SOL_TLS, TLS_TX, struct crypto_info).
> setsockpot [mentioned above] is quite late for driver to enable HW 
> implementation, we require something as early as tls_init 
> [setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"))], for driver 
> to set HW prot and offload connection beside Inline Tx/Rx.
> > 
> > Any reason we can't use ethtool to choose HW vs SW implementation, 
> > if available on the device?
> Thought about it,  the interface index is not available to fetch 
> netdev and caps check to set HW prot eg. bind [prot.hash] --> tls_hash to 
> program HW.

Perhaps this is the part I don't follow - why do you need to override hash and 
check for LISTEN?  I briefly looked through the patch named "CPL handler 
definition", this looks like it is a full TCP offload?

Yes, this is connection and record layer offload, and the reason I used 
different ulp type, need to see what additional info or check can help setup 
the required sk prot.


Re: [PATCHv2] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Gilad Ben-Yossef
On Wed, Jan 31, 2018 at 7:34 PM, Dave Watson  wrote:
> On 01/31/18 05:22 PM, Vakul Garg wrote:
>> > > On second though in stable we should probably just disable async tfm
>> > > allocations.
>> > > It's simpler. But this approach is still good for -next
>> > >
>> > >
>> > > Gilad
>> >
>> > I agree with Gilad, just disable async for now.
>> >
>>
>> How to do it? Can you help with the api name?
>
> *aead = crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC);
>
> https://github.com/ktls/net_next_ktls/commit/f3b9b402e755e4b0623fa83f88137173fc249f2d

I said disabling async tfms is the right way to go for -stable since
it's the simplest and less risky way
of plugging this bug.

I don't think this is the way to go for -next (and it seems davem
agrees with me). Vakul's
patch looks good to me for now.

>
>> > If the flag MSG_DONTWAIT is set, we should be returning -EINPROGRESS and
>> > not wait for a response.  I had started working on a patch for that, but 
>> > it's
>> > pretty tricky to get right.
>>

That would be a great addition, but I don't think we need to wait for
that. It can come later.

Gilad
-- 
Gilad Ben-Yossef
Chief Coffee Drinker

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru


Re: [PATCHv2] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Dave Watson
On 01/31/18 05:22 PM, Vakul Garg wrote:
> > > On second though in stable we should probably just disable async tfm
> > > allocations.
> > > It's simpler. But this approach is still good for -next
> > >
> > >
> > > Gilad
> > 
> > I agree with Gilad, just disable async for now.
> > 
> 
> How to do it? Can you help with the api name?

*aead = crypto_alloc_aead("gcm(aes)", 0, CRYPTO_ALG_ASYNC);

https://github.com/ktls/net_next_ktls/commit/f3b9b402e755e4b0623fa83f88137173fc249f2d

> > If the flag MSG_DONTWAIT is set, we should be returning -EINPROGRESS and
> > not wait for a response.  I had started working on a patch for that, but 
> > it's
> > pretty tricky to get right.
> 
> Can you point me to your WIP code branch for this?

https://github.com/ktls/net_next_ktls/commit/9cc839aa551ed972d148ecebf353b25ee93543b9

> If MSG_DONTWAIT is not used, will it be sane if enqueue the crypto request to 
> accelerator and return to user space back so that user space can send more 
> plaintext data while 
> crypto accelerator is working in parallel?

Right, that's roughly what the above does.  I believe the tricky
unfinished part was getting poll() to work correctly if there is an
async crypto request outstanding.  Currently the tls poll() just
relies on backpressure from do_tcp_sendpages.


RE: [PATCHv2] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Vakul Garg


> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Wednesday, January 31, 2018 8:52 PM
> To: Vakul Garg 
> Cc: linux-crypto@vger.kernel.org; il...@mellanox.com;
> avia...@mellanox.com; da...@davemloft.net; net...@vger.kernel.org;
> Gilad Ben-Yossef 
> Subject: Re: [PATCHv2] tls: Add support for encryption using async offload
> accelerator
> 
> On 01/31/18 09:34 PM, Vakul Garg wrote:
> > Async crypto accelerators (e.g. drivers/crypto/caam) support
> > offloading GCM operation. If they are enabled, crypto_aead_encrypt()
> > return error code -EINPROGRESS. In this case tls_do_encryption() needs
> > to wait on a completion till the time the response for crypto offload
> > request is received.
> 
> Comments from V1
> > On Wed, Jan 31, 2018 at 8:10 AM, Gilad Ben-Yossef
>  wrote:
> >> Hi Vakul,
> >>
> >> On Wed, Jan 31, 2018 at 12:36 PM, Vakul Garg 
> wrote:
> >>> Async crypto accelerators (e.g. drivers/crypto/caam) support
> >>> offloading GCM operation. If they are enabled, crypto_aead_encrypt()
> >>> return error code -EINPROGRESS. In this case tls_do_encryption()
> >>> needs to wait on a completion till the time the response for crypto
> >>> offload request is received.
> >>>
> >>
> >> Thank you for this patch. I think it is actually a bug fix and should
> >> probably go into stable
> >
> > On second though in stable we should probably just disable async tfm
> > allocations.
> > It's simpler. But this approach is still good for -next
> >
> >
> > Gilad
> 
> I agree with Gilad, just disable async for now.
> 

How to do it? Can you help with the api name?

> If the flag MSG_DONTWAIT is set, we should be returning -EINPROGRESS and
> not wait for a response.  I had started working on a patch for that, but it's
> pretty tricky to get right.

Can you point me to your WIP code branch for this?

If MSG_DONTWAIT is not used, will it be sane if enqueue the crypto request to 
accelerator and return to user space back so that user space can send more 
plaintext data while 
crypto accelerator is working in parallel?
 


Re: [RFC crypto v3 8/9] chtls: Register the ULP

2018-01-31 Thread Dave Watson
On 01/31/18 04:14 PM, Atul Gupta wrote:
> 
> 
> On Tuesday 30 January 2018 10:41 PM, Dave Watson wrote:
> > On 01/30/18 06:51 AM, Atul Gupta wrote:
> > 
> > > What I was referring is that passing "tls" ulp type in setsockopt
> > > may be insufficient to make the decision when multi HW assist Inline
> > > TLS solution exists.
> > Setting the ULP doesn't choose HW or SW implementation, I think that
> > should be done later when setting up crypto with
> > 
> > setsockopt(SOL_TLS, TLS_TX, struct crypto_info).
> setsockpot [mentioned above] is quite late for driver to enable HW
> implementation, we require something as early as tls_init [setsockopt(sock,
> SOL_TCP, TCP_ULP, "tls", sizeof("tls"))], for driver to set HW prot and
> offload connection beside Inline Tx/Rx.
> > 
> > Any reason we can't use ethtool to choose HW vs SW implementation, if
> > available on the device?
> Thought about it,  the interface index is not available to fetch netdev and
> caps check to set HW prot eg. bind [prot.hash] --> tls_hash to program HW.

Perhaps this is the part I don't follow - why do you need to override
hash and check for LISTEN?  I briefly looked through the patch named
"CPL handler definition", this looks like it is a full TCP offload?


Re: [PATCHv2] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread David Miller
From: Vakul Garg 
Date: Wed, 31 Jan 2018 21:34:37 +0530

> Async crypto accelerators (e.g. drivers/crypto/caam) support offloading
> GCM operation. If they are enabled, crypto_aead_encrypt() return error
> code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a
> completion till the time the response for crypto offload request is
> received.
> 
> Signed-off-by: Vakul Garg 
> ---
> v1-v2:
>  - Used crypto_wait_req() to wait for async operation completion
>  - Passed CRYPTO_TFM_REQ_MAY_BACKLOG to crypto_aead_encrypt

Applied.


Re: [PATCHv2] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Dave Watson
On 01/31/18 09:34 PM, Vakul Garg wrote:
> Async crypto accelerators (e.g. drivers/crypto/caam) support offloading
> GCM operation. If they are enabled, crypto_aead_encrypt() return error
> code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a
> completion till the time the response for crypto offload request is
> received.

Comments from V1
> On Wed, Jan 31, 2018 at 8:10 AM, Gilad Ben-Yossef  wrote:
>> Hi Vakul,
>>
>> On Wed, Jan 31, 2018 at 12:36 PM, Vakul Garg  wrote:
>>> Async crypto accelerators (e.g. drivers/crypto/caam) support offloading
>>> GCM operation. If they are enabled, crypto_aead_encrypt() return error
>>> code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a
>>> completion till the time the response for crypto offload request is
>>> received.
>>>
>>
>> Thank you for this patch. I think it is actually a bug fix and should
>> probably go into stable
>
> On second though in stable we should probably just disable async tfm
> allocations.
> It's simpler. But this approach is still good for -next
>
>
> Gilad

I agree with Gilad, just disable async for now. 

If the flag MSG_DONTWAIT is set, we should be returning -EINPROGRESS
and not wait for a response.  I had started working on a patch for
that, but it's pretty tricky to get right.


Re: [PATCH] crypto: AF_ALG AIO - lock context IV

2018-01-31 Thread Jonathan Cameron
On Tue, 30 Jan 2018 15:51:40 +
Jonathan Cameron  wrote:

> On Tue, 30 Jan 2018 09:27:04 +0100
> Stephan Müller  wrote:

A few clarifications from me after discussions with Stephan this morning.

> 
> > Hi Harsh,
> > 
> > may I as you to try the attached patch on your Chelsio driver? Please 
> > invoke 
> > the following command from libkcapi:
> > 
> > kcapi  -d 2 -x 9  -e -c "cbc(aes)" -k 
> > 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i 
> > 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910
> > 
> > The result should now be a fully block-chained result.
> > 
> > Note, the patch goes on top of the inline IV patch.
> > 
> > Thanks
> > 
> > ---8<---
> > 
> > In case of AIO where multiple IOCBs are sent to the kernel and inline IV
> > is not used, the ctx->iv is used for each IOCB. The ctx->iv is read for
> > the crypto operation and written back after the crypto operation. Thus,
> > processing of multiple IOCBs must be serialized.
> > 
> > As the AF_ALG interface is used by user space, a mutex provides the
> > serialization which puts the caller to sleep in case a previous IOCB
> > processing is not yet finished.
> > 
> > If multiple IOCBs arrive that all are blocked, the mutex' FIFO operation
> > of processing arriving requests ensures that the blocked IOCBs are
> > unblocked in the right order.
> > 
> > Signed-off-by: Stephan Mueller  

Firstly, as far as I can tell (not tested it yet) the patch does the
job and is about the best we can easily do in the AF_ALG code.

I'd suggest that this (or v2 anyway) is stable material as well
(which, as Stephan observed, will require reordering of the two patches).

> 
> As a reference point, holding outside the kernel (in at least
> the case of our hardware with 8K CBC) drops us down to around 30%
> of performance with separate IVs.  Putting a queue in kernel
> (and hence allowing most setup of descriptors / DMA etc) gets us
> up to 50% of raw non chained performance.
> 
> So whilst this will work in principle I suspect anyone who
> cares about the cost will be looking at adding their own
> software queuing and dependency handling in driver.  How
> much that matters will depend on just how quick the hardware
> is vs overheads.
> 
> Anyhow, just posted the driver as an RFC. There are a few corners
> that need resolving and the outcome of this thread effects whether
> we need to play games with IVs or not.
> 
> https://marc.info/?l=linux-crypto-vger=151732626428206=2
> grep for softqueue.
> 
> We have a number of processing units that will grab requests
> from the HW queues in parallel.  So 'lots' of requests
> from a given HW queue may be in flight at the same time (the problem
> case).
> 
> Logic is fairly simple.
> 1. Only create a softqueue if the encryption mode requires chaining.
>Currently in this driver that is CBC and CTR modes only.
>Otherwise put everything straight in the hardware monitored queues.
> 2. If we have an IV embedded within the request it will have a different
>address to the one used previously (guaranteed if that one is still
>in flight and it doesn't matter if it isn't).  If that occurs we
>can safely add it to the hardware queue. To avoid DoS issues against
>and earlier set of messages using the a chained IV we actually make
>sure nothing is in the software queue (not needed for correctness)
> 3. If IV matches previous IV and there are existing elements in SW or HW
>queues we need to hold it in the SW queue.
> 4. On receiving a response from the hardware and if the hardware queue
>is empty, we can release an element from the software queue into the
>hardware queue.

The differences are better shown with pictures...
To compare the two approaches. If we break up the data flow the
alternatives are

1) Mutex causes queuing in AF ALG code

The key thing is the [Build / Map HW Descs] for small packets,
up to perhaps 32K, is a significant task we can't avoid.
At 8k it looks like it takes perhaps 20-30% of the time
(though I only have end to end performance numbers so far)


[REQUEST 1 from userspace]   
   |  [REQUEST 2 from userspace] 
[AF_ALG/SOCKET]   |
   |   [AF_ALG/SOCKET]
NOTHING BLOCKING (lock mut)   |
   |Queued on Mutex
 [BUILD / MAP HW DESCS]   | 
   |  |
   [Place in HW Queue]|
   |  |
   [Process]  |
   |  |
  [Interrupt] | 
   |  |
 [Respond and update IV] Nothing Blocking (lock mut)
   |[BUILD / MAP HW DESCS] * AFTER SERIALIZATION *  
   
 Don't care beyond here.  |
  

Re: [RFC crypto v3 8/9] chtls: Register the ULP

2018-01-31 Thread Atul Gupta



On Tuesday 30 January 2018 10:41 PM, Dave Watson wrote:

On 01/30/18 06:51 AM, Atul Gupta wrote:


What I was referring is that passing "tls" ulp type in setsockopt
may be insufficient to make the decision when multi HW assist Inline
TLS solution exists.

Setting the ULP doesn't choose HW or SW implementation, I think that
should be done later when setting up crypto with

setsockopt(SOL_TLS, TLS_TX, struct crypto_info).
setsockpot [mentioned above] is quite late for driver to enable HW 
implementation, we require something as early as tls_init 
[setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"))], for driver 
to set HW prot and offload connection beside Inline Tx/Rx.


Any reason we can't use ethtool to choose HW vs SW implementation, if
available on the device?
Thought about it,  the interface index is not available to fetch netdev 
and caps check to set HW prot eg. bind [prot.hash] --> tls_hash to 
program HW.



Some HW may go beyond defining sendmsg/sendpage of the prot and
require additional info to setup the env? Also, we need to keep
vendor specific code out of tls_main.c i.e anything other than
base/sw_tx prot perhaps go to hw driver.

Sure, but I think we can add hooks to tls_main to do this without a
new ULP.
Current code calls update_sk_prot for TLS_BASE_TX and TLS_SW_TX, future 
Inline TLS assist HWs will add TLS_HW_TX, TLS_OFLD, ... etc additional 
hooks to update sk prots can make code confusing?




[PATCHv2] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Vakul Garg
Async crypto accelerators (e.g. drivers/crypto/caam) support offloading
GCM operation. If they are enabled, crypto_aead_encrypt() return error
code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a
completion till the time the response for crypto offload request is
received.

Signed-off-by: Vakul Garg 
---
v1-v2:
 - Used crypto_wait_req() to wait for async operation completion
 - Passed CRYPTO_TFM_REQ_MAY_BACKLOG to crypto_aead_encrypt

 include/net/tls.h | 2 ++
 net/tls/tls_sw.c  | 8 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 936cfc5cab7d..8a8e1256caee 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -36,6 +36,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -57,6 +58,7 @@
 
 struct tls_sw_context {
struct crypto_aead *aead_send;
+   struct crypto_wait async_wait;
 
/* Sending context */
char aad_space[TLS_AAD_SPACE_SIZE];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 73d19210dd49..31a9f0ef8af6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -214,7 +214,11 @@ static int tls_do_encryption(struct tls_context *tls_ctx,
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, ctx->sg_aead_in, ctx->sg_aead_out,
   data_len, tls_ctx->iv);
-   rc = crypto_aead_encrypt(aead_req);
+
+   aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+ crypto_req_done, >async_wait);
+
+   rc = crypto_wait_req(crypto_aead_encrypt(aead_req), >async_wait);
 
ctx->sg_encrypted_data[0].offset -= tls_ctx->prepend_size;
ctx->sg_encrypted_data[0].length += tls_ctx->prepend_size;
@@ -663,6 +667,8 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context 
*ctx)
goto out;
}
 
+   crypto_init_wait(_ctx->async_wait);
+
ctx->priv_ctx = (struct tls_offload_context *)sw_ctx;
 
crypto_info = >crypto_send;
-- 
2.13.6



RE: [PATCH] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Vakul Garg
Forgot to add 'v2' in subject line.
I will be resending.

-Original Message-
From: Vakul Garg 
Sent: Wednesday, January 31, 2018 9:29 PM
To: linux-crypto@vger.kernel.org
Cc: il...@mellanox.com; avia...@mellanox.com; davejwat...@fb.com; 
da...@davemloft.net; net...@vger.kernel.org; Vakul Garg 
Subject: [PATCH] tls: Add support for encryption using async offload accelerator

Async crypto accelerators (e.g. drivers/crypto/caam) support offloading GCM 
operation. If they are enabled, crypto_aead_encrypt() return error code 
-EINPROGRESS. In this case tls_do_encryption() needs to wait on a completion 
till the time the response for crypto offload request is received.

Signed-off-by: Vakul Garg 
---
v1-v2:
 - Used crypto_wait_req() to wait for async operation completion
 - Passed CRYPTO_TFM_REQ_MAY_BACKLOG to crypto_aead_encrypt

 include/net/tls.h | 2 ++
 net/tls/tls_sw.c  | 8 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/net/tls.h b/include/net/tls.h index 
936cfc5cab7d..8a8e1256caee 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -36,6 +36,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -57,6 +58,7 @@
 
 struct tls_sw_context {
struct crypto_aead *aead_send;
+   struct crypto_wait async_wait;
 
/* Sending context */
char aad_space[TLS_AAD_SPACE_SIZE];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 
73d19210dd49..31a9f0ef8af6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -214,7 +214,11 @@ static int tls_do_encryption(struct tls_context *tls_ctx,
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, ctx->sg_aead_in, ctx->sg_aead_out,
   data_len, tls_ctx->iv);
-   rc = crypto_aead_encrypt(aead_req);
+
+   aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+ crypto_req_done, >async_wait);
+
+   rc = crypto_wait_req(crypto_aead_encrypt(aead_req), >async_wait);
 
ctx->sg_encrypted_data[0].offset -= tls_ctx->prepend_size;
ctx->sg_encrypted_data[0].length += tls_ctx->prepend_size; @@ -663,6 
+667,8 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx)
goto out;
}
 
+   crypto_init_wait(_ctx->async_wait);
+
ctx->priv_ctx = (struct tls_offload_context *)sw_ctx;
 
crypto_info = >crypto_send;
--
2.13.6



[PATCH] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Vakul Garg
Async crypto accelerators (e.g. drivers/crypto/caam) support offloading
GCM operation. If they are enabled, crypto_aead_encrypt() return error
code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a
completion till the time the response for crypto offload request is
received.

Signed-off-by: Vakul Garg 
---
v1-v2:
 - Used crypto_wait_req() to wait for async operation completion
 - Passed CRYPTO_TFM_REQ_MAY_BACKLOG to crypto_aead_encrypt

 include/net/tls.h | 2 ++
 net/tls/tls_sw.c  | 8 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/net/tls.h b/include/net/tls.h
index 936cfc5cab7d..8a8e1256caee 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -36,6 +36,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -57,6 +58,7 @@
 
 struct tls_sw_context {
struct crypto_aead *aead_send;
+   struct crypto_wait async_wait;
 
/* Sending context */
char aad_space[TLS_AAD_SPACE_SIZE];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 73d19210dd49..31a9f0ef8af6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -214,7 +214,11 @@ static int tls_do_encryption(struct tls_context *tls_ctx,
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, ctx->sg_aead_in, ctx->sg_aead_out,
   data_len, tls_ctx->iv);
-   rc = crypto_aead_encrypt(aead_req);
+
+   aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+ crypto_req_done, >async_wait);
+
+   rc = crypto_wait_req(crypto_aead_encrypt(aead_req), >async_wait);
 
ctx->sg_encrypted_data[0].offset -= tls_ctx->prepend_size;
ctx->sg_encrypted_data[0].length += tls_ctx->prepend_size;
@@ -663,6 +667,8 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context 
*ctx)
goto out;
}
 
+   crypto_init_wait(_ctx->async_wait);
+
ctx->priv_ctx = (struct tls_offload_context *)sw_ctx;
 
crypto_info = >crypto_send;
-- 
2.13.6



Re: Subject: [PATCH] crypto: add zbewalgo compression algorithm for zram

2018-01-31 Thread Benjamin Warnke
Hi,

I am working on a new Version for this patch addressing all comments, and 
following all guidelines.

Best Regards,
Benjamin Warnke


[PATCH] tls: Add support for encryption using async offload accelerator

2018-01-31 Thread Vakul Garg
Async crypto accelerators (e.g. drivers/crypto/caam) support offloading
GCM operation. If they are enabled, crypto_aead_encrypt() return error
code -EINPROGRESS. In this case tls_do_encryption() needs to wait on a
completion till the time the response for crypto offload request is
received.

Signed-off-by: Vakul Garg 
---
 net/tls/tls_sw.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 73d19210dd49..390e6dc7b135 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -39,6 +39,11 @@
 
 #include 
 
+struct crypto_completion {
+   struct completion comp;
+   int result;
+};
+
 static void trim_sg(struct sock *sk, struct scatterlist *sg,
int *sg_num_elem, unsigned int *sg_size, int target_size)
 {
@@ -194,6 +199,14 @@ static void tls_free_both_sg(struct sock *sk)
>sg_plaintext_size);
 }
 
+static void crypto_complete(struct crypto_async_request *req, int err)
+{
+   struct crypto_completion *x = req->data;
+
+   x->result = err;
+   complete(>comp);
+}
+
 static int tls_do_encryption(struct tls_context *tls_ctx,
 struct tls_sw_context *ctx, size_t data_len,
 gfp_t flags)
@@ -202,6 +215,7 @@ static int tls_do_encryption(struct tls_context *tls_ctx,
crypto_aead_reqsize(ctx->aead_send);
struct aead_request *aead_req;
int rc;
+   struct crypto_completion crypto_done;
 
aead_req = kzalloc(req_size, flags);
if (!aead_req)
@@ -214,7 +228,15 @@ static int tls_do_encryption(struct tls_context *tls_ctx,
aead_request_set_ad(aead_req, TLS_AAD_SPACE_SIZE);
aead_request_set_crypt(aead_req, ctx->sg_aead_in, ctx->sg_aead_out,
   data_len, tls_ctx->iv);
+   aead_request_set_callback(aead_req, 0, crypto_complete, _done);
+
+   init_completion(_done.comp);
+
rc = crypto_aead_encrypt(aead_req);
+   if (rc == -EINPROGRESS) {
+   wait_for_completion(_done.comp);
+   rc = crypto_done.result;
+   }
 
ctx->sg_encrypted_data[0].offset -= tls_ctx->prepend_size;
ctx->sg_encrypted_data[0].length += tls_ctx->prepend_size;
-- 
2.13.6



Re: [PATCH 1/2] crypto: Implement a generic crypto statistics

2018-01-31 Thread Steffen Klassert
On Fri, Jan 26, 2018 at 11:43:28PM +0800, Herbert Xu wrote:
> On Thu, Jan 18, 2018 at 09:58:13AM +0100, LABBE Corentin wrote:
> >
> > I have two way of adding a new netlink request
> > - keep the current patch and simply add a new CRYPTO_MSG_GETSTAT which use 
> > the same function than CRYPTO_MSG_GETALG
> > => minimal changes, in fact CRYPTO_MSG_GETSTAT and CRYPTO_MSG_GETALG 
> > would be the same, but it is easy for userspace to test presence of stat.
> > - Create a new CRYPTO_MSG_GETSTAT which imply lot of code and add a new 
> > crypto_user_stat.c
> > => this imply also to change makefile (rename crypto_user.c to 
> > crypto_user_base.c) since crypto_user.ko is made of two files.
> > 
> > Which one do you prefer ?
> 
> I think you should check with the crconf author, Steffen Klassert
> since that will be affected by this change.

I still haven't had time to look into the details, but I'm
fine with everything that does not break existing userspace.
Everything else would be a 'no go'.


Re: [PATCH v2 0/4] crypto: aesni - Use zero-copy for gcm(aes) buffers that are partially contiguous

2018-01-31 Thread Steffen Klassert
Hi Junaid.

On Tue, Jan 23, 2018 at 12:19:12PM -0800, Junaid Shahid wrote:
> Changes in v2:
> - Integrated https://patchwork.kernel.org/patch/10173981
> 
> Currently, the AESNI gcm(aes) implementation uses zero-copy only when the
> entire src and dest request buffers, including the AAD, the data and the
> Auth Tag are contiguous. This series enables the use of zero-copy even if the
> AAD and/or Auth Tag are in different buffers than the actual data, as long as
> each of them individually satisfies the zero-copy conditions (i.e. the entire
> buffer is either in low-mem or within a single high-mem page). Furthermore,
> it also enables the use of zero-copy even if only one of src and dest 
> satisfies
> these conditions rather than only when both of them do.

I wonder which special usecase you have in mind that will be improved
by your patches.

For IPsec, the crypto layer either gets a contiguous buffer because
networking already linearized the packet data, or the packet data
and IPsec trailer are scattered over multiple page fragments.
So in the first case, it is OK already without your patches. In
the second case, your zero-copy conditions are not satisfied.

This looks a bit like a workaround of the real problem, that is
that the gcm-aesni implementation does not support SG operations.

Maybe it would be better to investigate in the direction to
support SG operations with gcm-aesni instead of trying to
find all corner cases where the existing implementation can
do zero-copy.