[PATCH 2/2] xfrm: add rfc4494 AES-CMAC-96 support

2013-04-08 Thread Jussi Kivilinna
Now that CryptoAPI has support for CMAC, we can add support for AES-CMAC-96
(rfc4494).

Cc: Tom St Denis tstde...@elliptictech.com
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 net/xfrm/xfrm_algo.c |   13 +
 1 file changed, 13 insertions(+)

diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c
index 6fb9d00..ab4ef72 100644
--- a/net/xfrm/xfrm_algo.c
+++ b/net/xfrm/xfrm_algo.c
@@ -311,6 +311,19 @@ static struct xfrm_algo_desc aalg_list[] = {
.sadb_alg_maxbits = 128
}
 },
+{
+   /* rfc4494 */
+   .name = cmac(aes),
+
+   .uinfo = {
+   .auth = {
+   .icv_truncbits = 96,
+   .icv_fullbits = 128,
+   }
+   },
+
+   .pfkey_supported = 0,
+},
 };
 
 static struct xfrm_algo_desc ealg_list[] = {

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] crypto: add CMAC support to CryptoAPI

2013-04-08 Thread Steffen Klassert
On Mon, Apr 08, 2013 at 10:48:44AM +0300, Jussi Kivilinna wrote:
 Patch adds support for NIST recommended block cipher mode CMAC to CryptoAPI.
 
 This work is based on Tom St Denis' earlier patch,
  http://marc.info/?l=linux-crypto-vgerm=135877306305466w=2
 
 Cc: Tom St Denis tstde...@elliptictech.com
 Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi

This patch does not apply clean to the ipsec-next tree
because of some crypto changes I don't have in ipsec-next.
The IPsec part should apply to the cryptodev tree,
so it's probaply the best if we route this patchset
through the cryptodev tree.

Herbert,

are you going to take these patches?
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] crypto: add CMAC support to CryptoAPI

2013-04-08 Thread Jussi Kivilinna
On 08.04.2013 11:24, Steffen Klassert wrote:
 On Mon, Apr 08, 2013 at 10:48:44AM +0300, Jussi Kivilinna wrote:
 Patch adds support for NIST recommended block cipher mode CMAC to CryptoAPI.

 This work is based on Tom St Denis' earlier patch,
  http://marc.info/?l=linux-crypto-vgerm=135877306305466w=2

 Cc: Tom St Denis tstde...@elliptictech.com
 Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
 
 This patch does not apply clean to the ipsec-next tree
 because of some crypto changes I don't have in ipsec-next.
 The IPsec part should apply to the cryptodev tree,
 so it's probaply the best if we route this patchset
 through the cryptodev tree.

I should have mentioned that the patchset is on top of cryptodev tree and
previous crypto patches that I send yesterday, likely to cause problems
atleast at tcrypt.c:

http://marc.info/?l=linux-crypto-vgerm=136534223503368w=2

-Jussi

 
 Herbert,
 
 are you going to take these patches?
 

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] crypto: add CMAC support to CryptoAPI

2013-04-08 Thread Herbert Xu
On Mon, Apr 08, 2013 at 10:24:16AM +0200, Steffen Klassert wrote:
 On Mon, Apr 08, 2013 at 10:48:44AM +0300, Jussi Kivilinna wrote:
  Patch adds support for NIST recommended block cipher mode CMAC to CryptoAPI.
  
  This work is based on Tom St Denis' earlier patch,
   http://marc.info/?l=linux-crypto-vgerm=135877306305466w=2
  
  Cc: Tom St Denis tstde...@elliptictech.com
  Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
 
 This patch does not apply clean to the ipsec-next tree
 because of some crypto changes I don't have in ipsec-next.
 The IPsec part should apply to the cryptodev tree,
 so it's probaply the best if we route this patchset
 through the cryptodev tree.
 
 Herbert,
 
 are you going to take these patches?

Sure I can do that.

Cheers,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: questions of crypto async api

2013-04-08 Thread Hsieh, Che-Min
Thanks for the answer.

I have further question on the same subject.
With regard to the commit in talitos.c, (attached at end of this mail),  
  the driver submits requests of same tfm to the same channel to ensure the 
ordering.  

Is it because the tfm context needs to be maintained from one operation to next 
operation? Eg, aead_givencrypt() generates iv based on previous iv result 
stored in tfm.

If requests are sent to different channels dynamically. And driver at the 
completion of a request from HW, reorders the requests completion callback, 
what would happen? 

Thanks in advance.

Chemin



commit 5228f0f79e983c2b39c202c75af901ceb0003fc1
Author: Kim Phillips kim.phill...@freescale.com
Date:   Fri Jul 15 11:21:38 2011 +0800

crypto: talitos - ensure request ordering within a single tfm

Assign single target channel per tfm in talitos_cra_init instead of
performing channel scheduling dynamically during the encryption request.
This changes the talitos_submit interface to accept a new channel
number argument.  Without this, rapid bursts of misc. sized requests
could make it possible for IPsec packets to be encrypted out-of-order,
which would result in packet drops due to sequence numbers falling
outside the anti-reply window on a peer gateway.

Signed-off-by: Kim Phillips kim.phill...@freescale.com
Signed-off-by: Herbert Xu herb...@gondor.apana.org.au

-Original Message-
From: Kim Phillips [mailto:kim.phill...@freescale.com] 
Sent: Friday, April 05, 2013 6:33 PM
To: Hsieh, Che-Min
Cc: linux-crypto@vger.kernel.org
Subject: Re: questions of crypto async api

On Thu, 4 Apr 2013 14:38:41 +
Hsieh, Che-Min chem...@qti.qualcomm.com wrote:

 If a driver supports multiple instances of HW crypto engines, the order of 
 the request completion from HW can be different from the order of requests 
 submitted to different HW.  The 2nd request sent out to the 2nd HW instance 
 may take shorter time to complete than the first request for different HW 
 instance.  Is the driver responsible for re-ordering the completion callout? 
 Or the agents (such as IP protocol stack) are responsible for reordering? How 
 does pcrypt do it?
 
  Does it make sense for a transform to send multiple requests outstanding to 
 async crypto api?

see:

http://comments.gmane.org/gmane.linux.kernel.cryptoapi/5350

  Is scatterwalk_sg_next() preferred method over sg_next()?  Why?

scatterwalk_* is the crypto subsystem's version of the function, so yes.

  sg_copy_to_buffer() and sg_copy_from_buffer() - 
 sg_copy_buffer()-sg_copy_buffer() - sg_miter_next()- sg_next() Sometimes 
 sg_copy_to_buffer() and sg_copy_from_buffer() in our driver do not copy the 
 whole list. We have to rewrite those functions by using scattewalk_sg_next() 
 to walk down the list. Is this the correct behavior?

sounds like you're on the right track, although buffers shouldn't be being 
copied that often, if at all.

Kim

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] crypto: x86 - add more optimized XTS-mode for serpent-avx

2013-04-08 Thread Jussi Kivilinna
This patch adds AVX optimized XTS-mode helper functions/macros and converts
serpent-avx to use the new facilities. Benefits are slightly improved speed
and reduced stack usage as use of temporary IV-array is avoided.

tcrypt results, with Intel i5-2450M:
enc dec
16B 1.00x   1.00x
64B 1.00x   1.00x
256B1.04x   1.06x
1024B   1.09x   1.09x
8192B   1.10x   1.09x

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/glue_helper-asm-avx.S   |   61 +
 arch/x86/crypto/glue_helper.c   |   97 +++
 arch/x86/crypto/serpent-avx-x86_64-asm_64.S |   45 -
 arch/x86/crypto/serpent_avx_glue.c  |   87 +---
 arch/x86/include/asm/crypto/glue_helper.h   |   24 +++
 arch/x86/include/asm/crypto/serpent-avx.h   |5 +
 6 files changed, 273 insertions(+), 46 deletions(-)

diff --git a/arch/x86/crypto/glue_helper-asm-avx.S 
b/arch/x86/crypto/glue_helper-asm-avx.S
index f7b6ea2..02ee230 100644
--- a/arch/x86/crypto/glue_helper-asm-avx.S
+++ b/arch/x86/crypto/glue_helper-asm-avx.S
@@ -1,7 +1,7 @@
 /*
  * Shared glue code for 128bit block ciphers, AVX assembler macros
  *
- * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -89,3 +89,62 @@
vpxor (6*16)(src), x6, x6; \
vpxor (7*16)(src), x7, x7; \
store_8way(dst, x0, x1, x2, x3, x4, x5, x6, x7);
+
+#define gf128mul_x_ble(iv, mask, tmp) \
+   vpsrad $31, iv, tmp; \
+   vpaddq iv, iv, iv; \
+   vpshufd $0x13, tmp, tmp; \
+   vpand mask, tmp, tmp; \
+   vpxor tmp, iv, iv;
+
+#define load_xts_8way(iv, src, dst, x0, x1, x2, x3, x4, x5, x6, x7, tiv, t0, \
+ t1, xts_gf128mul_and_shl1_mask) \
+   vmovdqa xts_gf128mul_and_shl1_mask, t0; \
+   \
+   /* load IV */ \
+   vmovdqu (iv), tiv; \
+   vpxor (0*16)(src), tiv, x0; \
+   vmovdqu tiv, (0*16)(dst); \
+   \
+   /* construct and store IVs, also xor with source */ \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (1*16)(src), tiv, x1; \
+   vmovdqu tiv, (1*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (2*16)(src), tiv, x2; \
+   vmovdqu tiv, (2*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (3*16)(src), tiv, x3; \
+   vmovdqu tiv, (3*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (4*16)(src), tiv, x4; \
+   vmovdqu tiv, (4*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (5*16)(src), tiv, x5; \
+   vmovdqu tiv, (5*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (6*16)(src), tiv, x6; \
+   vmovdqu tiv, (6*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vpxor (7*16)(src), tiv, x7; \
+   vmovdqu tiv, (7*16)(dst); \
+   \
+   gf128mul_x_ble(tiv, t0, t1); \
+   vmovdqu tiv, (iv);
+
+#define store_xts_8way(dst, x0, x1, x2, x3, x4, x5, x6, x7) \
+   vpxor (0*16)(dst), x0, x0; \
+   vpxor (1*16)(dst), x1, x1; \
+   vpxor (2*16)(dst), x2, x2; \
+   vpxor (3*16)(dst), x3, x3; \
+   vpxor (4*16)(dst), x4, x4; \
+   vpxor (5*16)(dst), x5, x5; \
+   vpxor (6*16)(dst), x6, x6; \
+   vpxor (7*16)(dst), x7, x7; \
+   store_8way(dst, x0, x1, x2, x3, x4, x5, x6, x7);
diff --git a/arch/x86/crypto/glue_helper.c b/arch/x86/crypto/glue_helper.c
index 22ce4f6..432f1d76 100644
--- a/arch/x86/crypto/glue_helper.c
+++ b/arch/x86/crypto/glue_helper.c
@@ -1,7 +1,7 @@
 /*
  * Shared glue code for 128bit block ciphers
  *
- * Copyright (c) 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * CBC  ECB parts based on code (crypto/cbc.c,ecb.c) by:
  *   Copyright (c) 2006 Herbert Xu herb...@gondor.apana.org.au
@@ -304,4 +304,99 @@ int glue_ctr_crypt_128bit(const struct common_glue_ctx 
*gctx,
 }
 EXPORT_SYMBOL_GPL(glue_ctr_crypt_128bit);
 
+static unsigned int __glue_xts_crypt_128bit(const struct common_glue_ctx *gctx,
+   void *ctx,
+   struct blkcipher_desc *desc,
+   struct blkcipher_walk *walk)
+{
+   const unsigned int bsize = 128 / 8;
+   unsigned int nbytes = walk-nbytes;
+   u128 *src = (u128 *)walk-src.virt.addr;
+   u128 *dst = (u128 *)walk-dst.virt.addr;
+   unsigned int num_blocks, func_bytes;
+   unsigned int i;
+
+   /* Process multi-block batch */
+   for (i = 0; i  gctx-num_funcs; i++) {
+   num_blocks = gctx-funcs[i].num_blocks;
+   func_bytes = bsize * num_blocks;
+
+   if (nbytes = func_bytes) {
+  

[PATCH 3/5] crypto: cast6-avx: use new optimized XTS code

2013-04-08 Thread Jussi Kivilinna
Change cast6-avx to use the new XTS code, for smaller stack usage and small
boost to performance.

tcrypt results, with Intel i5-2450M:
enc dec
16B 1.01x   1.01x
64B 1.01x   1.00x
256B1.09x   1.02x
1024B   1.08x   1.06x
8192B   1.08x   1.07x

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/cast6-avx-x86_64-asm_64.S |   48 +++
 arch/x86/crypto/cast6_avx_glue.c  |   91 -
 2 files changed, 98 insertions(+), 41 deletions(-)

diff --git a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S 
b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
index f93b610..e3531f8 100644
--- a/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/cast6-avx-x86_64-asm_64.S
@@ -4,7 +4,7 @@
  * Copyright (C) 2012 Johannes Goetzfried
  * johannes.goetzfr...@informatik.stud.uni-erlangen.de
  *
- * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -227,6 +227,8 @@
 .data
 
 .align 16
+.Lxts_gf128mul_and_shl1_mask:
+   .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
 .Lbswap_mask:
.byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
 .Lbswap128_mask:
@@ -424,3 +426,47 @@ ENTRY(cast6_ctr_8way)
 
ret;
 ENDPROC(cast6_ctr_8way)
+
+ENTRY(cast6_xts_enc_8way)
+   /* input:
+*  %rdi: ctx, CTX
+*  %rsi: dst
+*  %rdx: src
+*  %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸))
+*/
+
+   movq %rsi, %r11;
+
+   /* regs = src, dst = IVs, regs = regs xor IVs */
+   load_xts_8way(%rcx, %rdx, %rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2,
+ RX, RKR, RKM, .Lxts_gf128mul_and_shl1_mask);
+
+   call __cast6_enc_blk8;
+
+   /* dst = regs xor IVs(in dst) */
+   store_xts_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2);
+
+   ret;
+ENDPROC(cast6_xts_enc_8way)
+
+ENTRY(cast6_xts_dec_8way)
+   /* input:
+*  %rdi: ctx, CTX
+*  %rsi: dst
+*  %rdx: src
+*  %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸))
+*/
+
+   movq %rsi, %r11;
+
+   /* regs = src, dst = IVs, regs = regs xor IVs */
+   load_xts_8way(%rcx, %rdx, %rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2,
+ RX, RKR, RKM, .Lxts_gf128mul_and_shl1_mask);
+
+   call __cast6_dec_blk8;
+
+   /* dst = regs xor IVs(in dst) */
+   store_xts_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2);
+
+   ret;
+ENDPROC(cast6_xts_dec_8way)
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index 92f7ca2..8d0dfb8 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -4,6 +4,8 @@
  * Copyright (C) 2012 Johannes Goetzfried
  * johannes.goetzfr...@informatik.stud.uni-erlangen.de
  *
+ * Copyright © 2013 Jussi Kivilinna jussi.kivili...@iki.fi
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -50,6 +52,23 @@ asmlinkage void cast6_cbc_dec_8way(struct cast6_ctx *ctx, u8 
*dst,
 asmlinkage void cast6_ctr_8way(struct cast6_ctx *ctx, u8 *dst, const u8 *src,
   le128 *iv);
 
+asmlinkage void cast6_xts_enc_8way(struct cast6_ctx *ctx, u8 *dst,
+  const u8 *src, le128 *iv);
+asmlinkage void cast6_xts_dec_8way(struct cast6_ctx *ctx, u8 *dst,
+  const u8 *src, le128 *iv);
+
+static void cast6_xts_enc(void *ctx, u128 *dst, const u128 *src, le128 *iv)
+{
+   glue_xts_crypt_128bit_one(ctx, dst, src, iv,
+ GLUE_FUNC_CAST(__cast6_encrypt));
+}
+
+static void cast6_xts_dec(void *ctx, u128 *dst, const u128 *src, le128 *iv)
+{
+   glue_xts_crypt_128bit_one(ctx, dst, src, iv,
+ GLUE_FUNC_CAST(__cast6_decrypt));
+}
+
 static void cast6_crypt_ctr(void *ctx, u128 *dst, const u128 *src, le128 *iv)
 {
be128 ctrblk;
@@ -87,6 +106,19 @@ static const struct common_glue_ctx cast6_ctr = {
} }
 };
 
+static const struct common_glue_ctx cast6_enc_xts = {
+   .num_funcs = 2,
+   .fpu_blocks_limit = CAST6_PARALLEL_BLOCKS,
+
+   .funcs = { {
+   .num_blocks = CAST6_PARALLEL_BLOCKS,
+   .fn_u = { .xts = GLUE_XTS_FUNC_CAST(cast6_xts_enc_8way) }
+   }, {
+   .num_blocks = 1,
+   .fn_u = { .xts = GLUE_XTS_FUNC_CAST(cast6_xts_enc) }
+   } }
+};
+
 static const struct common_glue_ctx cast6_dec = {
.num_funcs = 2,
.fpu_blocks_limit = CAST6_PARALLEL_BLOCKS,
@@ -113,6 +145,19 @@ static const struct common_glue_ctx cast6_dec_cbc = {
} }
 };
 
+static 

[PATCH 2/5] crypto: x86/twofish-avx - use optimized XTS code

2013-04-08 Thread Jussi Kivilinna
Change twofish-avx to use the new XTS code, for smaller stack usage and small
boost to performance.

tcrypt results, with Intel i5-2450M:
enc dec
16B 1.03x   1.02x
64B 0.91x   0.91x
256B1.10x   1.09x
1024B   1.12x   1.11x
8192B   1.12x   1.11x

Since XTS is practically always used with data blocks of size 512 bytes or
more, I chose to not make use of twofish-3way for block sized smaller than
128 bytes. This causes slower result in tcrypt for 64 bytes.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/twofish-avx-x86_64-asm_64.S |   48 ++
 arch/x86/crypto/twofish_avx_glue.c  |   91 +++
 2 files changed, 98 insertions(+), 41 deletions(-)

diff --git a/arch/x86/crypto/twofish-avx-x86_64-asm_64.S 
b/arch/x86/crypto/twofish-avx-x86_64-asm_64.S
index 8d3e113..0505813 100644
--- a/arch/x86/crypto/twofish-avx-x86_64-asm_64.S
+++ b/arch/x86/crypto/twofish-avx-x86_64-asm_64.S
@@ -4,7 +4,7 @@
  * Copyright (C) 2012 Johannes Goetzfried
  * johannes.goetzfr...@informatik.stud.uni-erlangen.de
  *
- * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -33,6 +33,8 @@
 
 .Lbswap128_mask:
.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
+.Lxts_gf128mul_and_shl1_mask:
+   .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
 
 .text
 
@@ -408,3 +410,47 @@ ENTRY(twofish_ctr_8way)
 
ret;
 ENDPROC(twofish_ctr_8way)
+
+ENTRY(twofish_xts_enc_8way)
+   /* input:
+*  %rdi: ctx, CTX
+*  %rsi: dst
+*  %rdx: src
+*  %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸))
+*/
+
+   movq %rsi, %r11;
+
+   /* regs = src, dst = IVs, regs = regs xor IVs */
+   load_xts_8way(%rcx, %rdx, %rsi, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2,
+ RX0, RX1, RY0, .Lxts_gf128mul_and_shl1_mask);
+
+   call __twofish_enc_blk8;
+
+   /* dst = regs xor IVs(in dst) */
+   store_xts_8way(%r11, RC1, RD1, RA1, RB1, RC2, RD2, RA2, RB2);
+
+   ret;
+ENDPROC(twofish_xts_enc_8way)
+
+ENTRY(twofish_xts_dec_8way)
+   /* input:
+*  %rdi: ctx, CTX
+*  %rsi: dst
+*  %rdx: src
+*  %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸))
+*/
+
+   movq %rsi, %r11;
+
+   /* regs = src, dst = IVs, regs = regs xor IVs */
+   load_xts_8way(%rcx, %rdx, %rsi, RC1, RD1, RA1, RB1, RC2, RD2, RA2, RB2,
+ RX0, RX1, RY0, .Lxts_gf128mul_and_shl1_mask);
+
+   call __twofish_dec_blk8;
+
+   /* dst = regs xor IVs(in dst) */
+   store_xts_8way(%r11, RA1, RB1, RC1, RD1, RA2, RB2, RC2, RD2);
+
+   ret;
+ENDPROC(twofish_xts_dec_8way)
diff --git a/arch/x86/crypto/twofish_avx_glue.c 
b/arch/x86/crypto/twofish_avx_glue.c
index 94ac91d..a62ba54 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -4,6 +4,8 @@
  * Copyright (C) 2012 Johannes Goetzfried
  * johannes.goetzfr...@informatik.stud.uni-erlangen.de
  *
+ * Copyright © 2013 Jussi Kivilinna jussi.kivili...@iki.fi
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -56,12 +58,29 @@ asmlinkage void twofish_cbc_dec_8way(struct twofish_ctx 
*ctx, u8 *dst,
 asmlinkage void twofish_ctr_8way(struct twofish_ctx *ctx, u8 *dst,
 const u8 *src, le128 *iv);
 
+asmlinkage void twofish_xts_enc_8way(struct twofish_ctx *ctx, u8 *dst,
+const u8 *src, le128 *iv);
+asmlinkage void twofish_xts_dec_8way(struct twofish_ctx *ctx, u8 *dst,
+const u8 *src, le128 *iv);
+
 static inline void twofish_enc_blk_3way(struct twofish_ctx *ctx, u8 *dst,
const u8 *src)
 {
__twofish_enc_blk_3way(ctx, dst, src, false);
 }
 
+static void twofish_xts_enc(void *ctx, u128 *dst, const u128 *src, le128 *iv)
+{
+   glue_xts_crypt_128bit_one(ctx, dst, src, iv,
+ GLUE_FUNC_CAST(twofish_enc_blk));
+}
+
+static void twofish_xts_dec(void *ctx, u128 *dst, const u128 *src, le128 *iv)
+{
+   glue_xts_crypt_128bit_one(ctx, dst, src, iv,
+ GLUE_FUNC_CAST(twofish_dec_blk));
+}
+
 
 static const struct common_glue_ctx twofish_enc = {
.num_funcs = 3,
@@ -95,6 +114,19 @@ static const struct common_glue_ctx twofish_ctr = {
} }
 };
 
+static const struct common_glue_ctx twofish_enc_xts = {
+   .num_funcs = 2,
+   .fpu_blocks_limit = TWOFISH_PARALLEL_BLOCKS,
+
+   .funcs = { {
+   .num_blocks = TWOFISH_PARALLEL_BLOCKS,
+   

[PATCH 4/5] crypto: x86/camellia-aesni-avx - add more optimized XTS code

2013-04-08 Thread Jussi Kivilinna
Add more optimized XTS code for camellia-aesni-avx, for smaller stack usage
and small boost for speed.

tcrypt results, with Intel i5-2450M:
enc dec
16B 1.10x   1.01x
64B 0.82x   0.77x
256B1.14x   1.10x
1024B   1.17x   1.16x
8192B   1.10x   1.11x

Since XTS is practically always used with data blocks of size 512 bytes or
more, I chose to not make use of camellia-2way for block sized smaller than
256 bytes. This causes slower result in tcrypt for 64 bytes.

Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/camellia-aesni-avx-asm_64.S |  180 +++
 arch/x86/crypto/camellia_aesni_avx_glue.c   |   91 --
 2 files changed, 229 insertions(+), 42 deletions(-)

diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S 
b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index cfc1634..ce71f92 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -1,7 +1,7 @@
 /*
  * x86_64/AVX/AES-NI assembler implementation of Camellia
  *
- * Copyright © 2012 Jussi Kivilinna jussi.kivili...@mbnet.fi
+ * Copyright © 2012-2013 Jussi Kivilinna jussi.kivili...@iki.fi
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -589,6 +589,10 @@ 
ENDPROC(roundsm16_x4_x5_x6_x7_x0_x1_x2_x3_y4_y5_y6_y7_y0_y1_y2_y3_ab)
 .Lbswap128_mask:
.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
 
+/* For XTS mode IV generation */
+.Lxts_gf128mul_and_shl1_mask:
+   .byte 0x87, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
+
 /*
  * pre-SubByte transform
  *
@@ -1090,3 +1094,177 @@ ENTRY(camellia_ctr_16way)
 
ret;
 ENDPROC(camellia_ctr_16way)
+
+#define gf128mul_x_ble(iv, mask, tmp) \
+   vpsrad $31, iv, tmp; \
+   vpaddq iv, iv, iv; \
+   vpshufd $0x13, tmp, tmp; \
+   vpand mask, tmp, tmp; \
+   vpxor tmp, iv, iv;
+
+.align 8
+camellia_xts_crypt_16way:
+   /* input:
+*  %rdi: ctx, CTX
+*  %rsi: dst (16 blocks)
+*  %rdx: src (16 blocks)
+*  %rcx: iv (t ⊕ αⁿ ∈ GF(2¹²⁸))
+*  %r8: index for input whitening key
+*  %r9: pointer to  __camellia_enc_blk16 or __camellia_dec_blk16
+*/
+
+   subq $(16 * 16), %rsp;
+   movq %rsp, %rax;
+
+   vmovdqa .Lxts_gf128mul_and_shl1_mask, %xmm14;
+
+   /* load IV */
+   vmovdqu (%rcx), %xmm0;
+   vpxor 0 * 16(%rdx), %xmm0, %xmm15;
+   vmovdqu %xmm15, 15 * 16(%rax);
+   vmovdqu %xmm0, 0 * 16(%rsi);
+
+   /* construct IVs */
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 1 * 16(%rdx), %xmm0, %xmm15;
+   vmovdqu %xmm15, 14 * 16(%rax);
+   vmovdqu %xmm0, 1 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 2 * 16(%rdx), %xmm0, %xmm13;
+   vmovdqu %xmm0, 2 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 3 * 16(%rdx), %xmm0, %xmm12;
+   vmovdqu %xmm0, 3 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 4 * 16(%rdx), %xmm0, %xmm11;
+   vmovdqu %xmm0, 4 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 5 * 16(%rdx), %xmm0, %xmm10;
+   vmovdqu %xmm0, 5 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 6 * 16(%rdx), %xmm0, %xmm9;
+   vmovdqu %xmm0, 6 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 7 * 16(%rdx), %xmm0, %xmm8;
+   vmovdqu %xmm0, 7 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 8 * 16(%rdx), %xmm0, %xmm7;
+   vmovdqu %xmm0, 8 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 9 * 16(%rdx), %xmm0, %xmm6;
+   vmovdqu %xmm0, 9 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 10 * 16(%rdx), %xmm0, %xmm5;
+   vmovdqu %xmm0, 10 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 11 * 16(%rdx), %xmm0, %xmm4;
+   vmovdqu %xmm0, 11 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 12 * 16(%rdx), %xmm0, %xmm3;
+   vmovdqu %xmm0, 12 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 13 * 16(%rdx), %xmm0, %xmm2;
+   vmovdqu %xmm0, 13 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 14 * 16(%rdx), %xmm0, %xmm1;
+   vmovdqu %xmm0, 14 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vpxor 15 * 16(%rdx), %xmm0, %xmm15;
+   vmovdqu %xmm15, 0 * 16(%rax);
+   vmovdqu %xmm0, 15 * 16(%rsi);
+
+   gf128mul_x_ble(%xmm0, %xmm14, %xmm15);
+   vmovdqu %xmm0, (%rcx);
+
+   /* inpack16_pre: */
+   vmovq (key_table)(CTX, %r8, 8), %xmm15;
+   vpshufb .Lpack_bswap, %xmm15, %xmm15;
+   vpxor 0 * 16(%rax), %xmm15, %xmm0;
+   vpxor %xmm1, %xmm15, %xmm1;
+   vpxor %xmm2, %xmm15, %xmm2;
+   vpxor %xmm3, 

[PATCH 5/5] crypto: aesni_intel - add more optimized XTS mode for x86-64

2013-04-08 Thread Jussi Kivilinna
Add more optimized XTS code for aesni_intel in 64-bit mode, for smaller stack
usage and boost for speed.

tcrypt results, with Intel i5-2450M:
256-bit key
enc dec
16B 0.98x   0.99x
64B 0.64x   0.63x
256B1.29x   1.32x
1024B   1.54x   1.58x
8192B   1.57x   1.60x

512-bit key
enc dec
16B 0.98x   0.99x
64B 0.60x   0.59x
256B1.24x   1.25x
1024B   1.39x   1.42x
8192B   1.38x   1.42x

I chose not to optimize smaller than block size of 256 bytes, since XTS is
practically always used with data blocks of size 512 bytes. This is why
performance is reduced in tcrypt for 64 byte long blocks.

Cc: Huang Ying ying.hu...@intel.com
Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
---
 arch/x86/crypto/aesni-intel_asm.S  |  117 
 arch/x86/crypto/aesni-intel_glue.c |   80 +
 crypto/Kconfig |1 
 3 files changed, 198 insertions(+)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 04b7977..62fe22c 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -34,6 +34,10 @@
 
 #ifdef __x86_64__
 .data
+.align 16
+.Lgf128mul_x_ble_mask:
+   .octa 0x00010087
+
 POLY:   .octa 0xC201
 TWOONE: .octa 0x00010001
 
@@ -105,6 +109,8 @@ enc:.octa 0x2
 #define CTR%xmm11
 #define INC%xmm12
 
+#define GF128MUL_MASK %xmm10
+
 #ifdef __x86_64__
 #define AREG   %rax
 #define KEYP   %rdi
@@ -2636,4 +2642,115 @@ ENTRY(aesni_ctr_enc)
 .Lctr_enc_just_ret:
ret
 ENDPROC(aesni_ctr_enc)
+
+/*
+ * _aesni_gf128mul_x_ble:  internal ABI
+ * Multiply in GF(2^128) for XTS IVs
+ * input:
+ * IV: current IV
+ * GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ * IV: next IV
+ * changed:
+ * CTR:== temporary value
+ */
+#define _aesni_gf128mul_x_ble() \
+   pshufd $0x13, IV, CTR; \
+   paddq IV, IV; \
+   psrad $31, CTR; \
+   pand GF128MUL_MASK, CTR; \
+   pxor CTR, IV;
+
+/*
+ * void aesni_xts_crypt8(struct crypto_aes_ctx *ctx, const u8 *dst, u8 *src,
+ *  bool enc, u8 *iv)
+ */
+ENTRY(aesni_xts_crypt8)
+   cmpb $0, %cl
+   movl $0, %ecx
+   movl $240, %r10d
+   leaq _aesni_enc4, %r11
+   leaq _aesni_dec4, %rax
+   cmovel %r10d, %ecx
+   cmoveq %rax, %r11
+
+   movdqa .Lgf128mul_x_ble_mask, GF128MUL_MASK
+   movups (IVP), IV
+
+   mov 480(KEYP), KLEN
+   addq %rcx, KEYP
+
+   movdqa IV, STATE1
+   pxor 0x00(INP), STATE1
+   movdqu IV, 0x00(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE2
+   pxor 0x10(INP), STATE2
+   movdqu IV, 0x10(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE3
+   pxor 0x20(INP), STATE3
+   movdqu IV, 0x20(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE4
+   pxor 0x30(INP), STATE4
+   movdqu IV, 0x30(OUTP)
+
+   call *%r11
+
+   pxor 0x00(OUTP), STATE1
+   movdqu STATE1, 0x00(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE1
+   pxor 0x40(INP), STATE1
+   movdqu IV, 0x40(OUTP)
+
+   pxor 0x10(OUTP), STATE2
+   movdqu STATE2, 0x10(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE2
+   pxor 0x50(INP), STATE2
+   movdqu IV, 0x50(OUTP)
+
+   pxor 0x20(OUTP), STATE3
+   movdqu STATE3, 0x20(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE3
+   pxor 0x60(INP), STATE3
+   movdqu IV, 0x60(OUTP)
+
+   pxor 0x30(OUTP), STATE4
+   movdqu STATE4, 0x30(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movdqa IV, STATE4
+   pxor 0x70(INP), STATE4
+   movdqu IV, 0x70(OUTP)
+
+   _aesni_gf128mul_x_ble()
+   movups IV, (IVP)
+
+   call *%r11
+
+   pxor 0x40(OUTP), STATE1
+   movdqu STATE1, 0x40(OUTP)
+
+   pxor 0x50(OUTP), STATE2
+   movdqu STATE2, 0x50(OUTP)
+
+   pxor 0x60(OUTP), STATE3
+   movdqu STATE3, 0x60(OUTP)
+
+   pxor 0x70(OUTP), STATE4
+   movdqu STATE4, 0x70(OUTP)
+
+   ret
+ENDPROC(aesni_xts_crypt8)
+
 #endif
diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index a0795da..f80e668 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -39,6 +39,9 @@
 #include crypto/internal/aead.h
 #include linux/workqueue.h
 #include linux/spinlock.h
+#ifdef CONFIG_X86_64
+#include asm/crypto/glue_helper.h
+#endif
 
 #if defined(CONFIG_CRYPTO_PCBC) || defined(CONFIG_CRYPTO_PCBC_MODULE)
 #define HAS_PCBC
@@ -102,6 +105,9 @@ void crypto_fpu_exit(void);
 asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out,
  const u8 *in, unsigned int len, u8 *iv);
 
+asmlinkage void aesni_xts_crypt8(struct crypto_aes_ctx *ctx, u8 *out,
+const u8 *in, 

Re: [PATCH 1/2] crypto: add CMAC support to CryptoAPI

2013-04-08 Thread David Miller
From: Herbert Xu herb...@gondor.apana.org.au
Date: Mon, 8 Apr 2013 17:33:40 +0800

 On Mon, Apr 08, 2013 at 10:24:16AM +0200, Steffen Klassert wrote:
 On Mon, Apr 08, 2013 at 10:48:44AM +0300, Jussi Kivilinna wrote:
  Patch adds support for NIST recommended block cipher mode CMAC to 
  CryptoAPI.
  
  This work is based on Tom St Denis' earlier patch,
   http://marc.info/?l=linux-crypto-vgerm=135877306305466w=2
  
  Cc: Tom St Denis tstde...@elliptictech.com
  Signed-off-by: Jussi Kivilinna jussi.kivili...@iki.fi
 
 This patch does not apply clean to the ipsec-next tree
 because of some crypto changes I don't have in ipsec-next.
 The IPsec part should apply to the cryptodev tree,
 so it's probaply the best if we route this patchset
 through the cryptodev tree.
 
 Herbert,
 
 are you going to take these patches?
 
 Sure I can do that.

I'm fine with this:

Acked-by: David S. Miller da...@davemloft.net
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: questions of crypto async api

2013-04-08 Thread Kim Phillips
On Mon, 8 Apr 2013 13:49:58 +
Hsieh, Che-Min chem...@qti.qualcomm.com wrote:

 Thanks for the answer.
 
 I have further question on the same subject.
 With regard to the commit in talitos.c, (attached at end of this mail),  
   the driver submits requests of same tfm to the same channel to ensure the 
 ordering.  
 
 Is it because the tfm context needs to be maintained from one operation to 
 next operation? Eg, aead_givencrypt() generates iv based on previous iv 
 result stored in tfm.

is that what the commit text says?

 If requests are sent to different channels dynamically. And driver at the 
 completion of a request from HW, reorders the requests completion callback, 
 what would happen? 

about the same thing as wrapping the driver with pcrypt?  why not
use the h/w to maintain ordering?

Kim

 Thanks in advance.
 
 Chemin
 
 
 
 commit 5228f0f79e983c2b39c202c75af901ceb0003fc1
 Author: Kim Phillips kim.phill...@freescale.com
 Date:   Fri Jul 15 11:21:38 2011 +0800
 
 crypto: talitos - ensure request ordering within a single tfm
 
 Assign single target channel per tfm in talitos_cra_init instead of
 performing channel scheduling dynamically during the encryption request.
 This changes the talitos_submit interface to accept a new channel
 number argument.  Without this, rapid bursts of misc. sized requests
 could make it possible for IPsec packets to be encrypted out-of-order,
 which would result in packet drops due to sequence numbers falling
 outside the anti-reply window on a peer gateway.
 
 Signed-off-by: Kim Phillips kim.phill...@freescale.com
 Signed-off-by: Herbert Xu herb...@gondor.apana.org.au
 
 -Original Message-
 From: Kim Phillips [mailto:kim.phill...@freescale.com] 
 Sent: Friday, April 05, 2013 6:33 PM
 To: Hsieh, Che-Min
 Cc: linux-crypto@vger.kernel.org
 Subject: Re: questions of crypto async api
 
 On Thu, 4 Apr 2013 14:38:41 +
 Hsieh, Che-Min chem...@qti.qualcomm.com wrote:
 
  If a driver supports multiple instances of HW crypto engines, the order of 
  the request completion from HW can be different from the order of requests 
  submitted to different HW.  The 2nd request sent out to the 2nd HW instance 
  may take shorter time to complete than the first request for different HW 
  instance.  Is the driver responsible for re-ordering the completion 
  callout? Or the agents (such as IP protocol stack) are responsible for 
  reordering? How does pcrypt do it?
  
   Does it make sense for a transform to send multiple requests outstanding 
  to async crypto api?
 
 see:
 
 http://comments.gmane.org/gmane.linux.kernel.cryptoapi/5350
 
   Is scatterwalk_sg_next() preferred method over sg_next()?  Why?
 
 scatterwalk_* is the crypto subsystem's version of the function, so yes.
 
   sg_copy_to_buffer() and sg_copy_from_buffer() - 
  sg_copy_buffer()-sg_copy_buffer() - sg_miter_next()- sg_next() Sometimes 
  sg_copy_to_buffer() and sg_copy_from_buffer() in our driver do not copy the 
  whole list. We have to rewrite those functions by using 
  scattewalk_sg_next() to walk down the list. Is this the correct behavior?
 
 sounds like you're on the right track, although buffers shouldn't be being 
 copied that often, if at all.
 
 Kim
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html