[PATCH v2 0/5] crypto: Speck support

2018-02-12 Thread Eric Biggers
Hello,

This series adds Speck support to the crypto API, including the Speck128
and Speck64 variants.  Speck is a lightweight block cipher that can be
much faster than AES on processors that don't have AES instructions.

We are planning to offer Speck-XTS (probably Speck128/256-XTS) as an
option for dm-crypt and fscrypt on Android, for low-end mobile devices
with older CPUs such as ARMv7 which don't have the Cryptography
Extensions.  Currently, such devices are unencrypted because AES is not
fast enough, even when the NEON bit-sliced implementation of AES is
used.  Other AES alternatives such as Twofish, Threefish, Camellia,
CAST6, and Serpent aren't fast enough either; it seems that only a
modern ARX cipher can provide sufficient performance on these devices.

This is a replacement for our original proposal
(https://patchwork.kernel.org/patch/10101451/) which was to offer
ChaCha20 for these devices.  However, the use of a stream cipher for
disk/file encryption with no space to store nonces would have been much
more insecure than we thought initially, given that it would be used on
top of flash storage as well as potentially on top of F2FS, neither of
which is guaranteed to overwrite data in-place.

Speck has been somewhat controversial due to its origin.  Nevertheless,
it has a straightforward design (it's an ARX cipher), and it appears to
be the leading software-optimized lightweight block cipher currently,
with the most cryptanalysis.  It's also easy to implement without side
channels, unlike AES.  Moreover, we only intend Speck to be used when
the status quo is no encryption, due to AES not being fast enough.

We've also considered a novel length-preserving encryption mode based on
ChaCha20 and Poly1305.  While theoretically attractive, such a mode
would be a brand new crypto construction and would be more complicated
and difficult to implement efficiently in comparison to Speck-XTS.

Thus, patch 1 adds a generic implementation of Speck, and the following
patches add a 32-bit ARM NEON implementation of Speck-XTS.  The
NEON-accelerated implementation is much faster than the generic
implementation and therefore is the implementation that would primarily
be used in practice on the devices we are targeting.

There is no AArch64 implementation added, since such CPUs are likely to
have the Cryptography Extensions, allowing the use of AES.

Changed since v1:

  - Use the word order recommended by the Speck authors.  All test
vectors were updated.

Eric Biggers (5):
  crypto: add support for the Speck block cipher
  crypto: speck - export common helpers
  crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS
  crypto: speck - add test vectors for Speck128-XTS
  crypto: speck - add test vectors for Speck64-XTS

 arch/arm/crypto/Kconfig   |6 +
 arch/arm/crypto/Makefile  |2 +
 arch/arm/crypto/speck-neon-core.S |  432 +++
 arch/arm/crypto/speck-neon-glue.c |  290 
 crypto/Kconfig|   14 +
 crypto/Makefile   |1 +
 crypto/speck.c|  307 
 crypto/testmgr.c  |   36 +
 crypto/testmgr.h  | 1486 +
 include/crypto/speck.h|   62 ++
 10 files changed, 2636 insertions(+)
 create mode 100644 arch/arm/crypto/speck-neon-core.S
 create mode 100644 arch/arm/crypto/speck-neon-glue.c
 create mode 100644 crypto/speck.c
 create mode 100644 include/crypto/speck.h

-- 
2.16.0.rc1.238.g530d649a79-goog



[PATCH v2 1/5] crypto: add support for the Speck block cipher

2018-02-12 Thread Eric Biggers
Add a generic implementation of Speck, including the Speck128 and
Speck64 variants.  Speck is a lightweight block cipher that can be much
faster than AES on processors that don't have AES instructions.

We are planning to offer Speck-XTS (probably Speck128/256-XTS) as an
option for dm-crypt and fscrypt on Android, for low-end mobile devices
with older CPUs such as ARMv7 which don't have the Cryptography
Extensions.  Currently, such devices are unencrypted because AES is not
fast enough, even when the NEON bit-sliced implementation of AES is
used.  Other AES alternatives such as Twofish, Threefish, Camellia,
CAST6, and Serpent aren't fast enough either; it seems that only a
modern ARX cipher can provide sufficient performance on these devices.

This is a replacement for our original proposal
(https://patchwork.kernel.org/patch/10101451/) which was to offer
ChaCha20 for these devices.  However, the use of a stream cipher for
disk/file encryption with no space to store nonces would have been much
more insecure than we thought initially, given that it would be used on
top of flash storage as well as potentially on top of F2FS, neither of
which is guaranteed to overwrite data in-place.

Speck has been somewhat controversial due to its origin.  Nevertheless,
it has a straightforward design (it's an ARX cipher), and it appears to
be the leading software-optimized lightweight block cipher currently,
with the most cryptanalysis.  It's also easy to implement without side
channels, unlike AES.  Moreover, we only intend Speck to be used when
the status quo is no encryption, due to AES not being fast enough.

We've also considered a novel length-preserving encryption mode based on
ChaCha20 and Poly1305.  While theoretically attractive, such a mode
would be a brand new crypto construction and would be more complicated
and difficult to implement efficiently in comparison to Speck-XTS.

There is confusion about the byte and word orders of Speck, since the
original paper doesn't specify them.  But we have implemented it using
the orders the authors recommended in a correspondence with them.  The
test vectors are taken from the original paper but were mapped to byte
arrays using the recommended byte and word orders.

Signed-off-by: Eric Biggers 
---
 crypto/Kconfig   |  14 +++
 crypto/Makefile  |   1 +
 crypto/speck.c   | 299 +++
 crypto/testmgr.c |  18 
 crypto/testmgr.h | 128 
 5 files changed, 460 insertions(+)
 create mode 100644 crypto/speck.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index b75264b09a46..558eff07b799 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1508,6 +1508,20 @@ config CRYPTO_SERPENT_AVX2_X86_64
  See also:
  
 
+config CRYPTO_SPECK
+   tristate "Speck cipher algorithm"
+   select CRYPTO_ALGAPI
+   help
+ Speck is a lightweight block cipher that is tuned for optimal
+ performance in software (rather than hardware).
+
+ Speck may not be as secure as AES, and should only be used on systems
+ where AES is not fast enough.
+
+ See also: 
+
+ If unsure, say N.
+
 config CRYPTO_TEA
tristate "TEA, XTEA and XETA cipher algorithms"
select CRYPTO_ALGAPI
diff --git a/crypto/Makefile b/crypto/Makefile
index cdbc03b35510..ba6019471447 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -110,6 +110,7 @@ obj-$(CONFIG_CRYPTO_TEA) += tea.o
 obj-$(CONFIG_CRYPTO_KHAZAD) += khazad.o
 obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o
 obj-$(CONFIG_CRYPTO_SEED) += seed.o
+obj-$(CONFIG_CRYPTO_SPECK) += speck.o
 obj-$(CONFIG_CRYPTO_SALSA20) += salsa20_generic.o
 obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_generic.o
 obj-$(CONFIG_CRYPTO_POLY1305) += poly1305_generic.o
diff --git a/crypto/speck.c b/crypto/speck.c
new file mode 100644
index ..4e80ad76bcd7
--- /dev/null
+++ b/crypto/speck.c
@@ -0,0 +1,299 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Speck: a lightweight block cipher
+ *
+ * Copyright (c) 2018 Google, Inc
+ *
+ * Speck has 10 variants, including 5 block sizes.  For now we only implement
+ * the variants Speck128/128, Speck128/192, Speck128/256, Speck64/96, and
+ * Speck64/128.   Speck${B}/${K} denotes the variant with a block size of B 
bits
+ * and a key size of K bits.  The Speck128 variants are believed to be the most
+ * secure variants, and they use the same block size and key sizes as AES.  The
+ * Speck64 variants are less secure, but on 32-bit processors are usually
+ * faster.  The remaining variants (Speck32, Speck48, and Speck96) are even 
less
+ * secure and/or not as well suited for implementation on either 32-bit or
+ * 64-bit processors, so are omitted.
+ *
+ * Reference: "The Simon and Speck Families of Lightweight Block Ciphers"
+ * https://eprint.iacr.org/2013/404.pdf
+ *
+ * In a correspondence, the Speck 

[PATCH v2 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS

2018-02-12 Thread Eric Biggers
Add an ARM NEON-accelerated implementation of Speck-XTS.  It operates on
128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
Speck64.  Each 128-byte chunk goes through XTS preprocessing, then is
encrypted/decrypted (doing one cipher round for all the blocks, then the
next round, etc.), then goes through XTS postprocessing.

The performance depends on the processor but can be about 3 times faster
than the generic code.  For example, on an ARMv7 processor we observe
the following performance with Speck128/256-XTS:

xts-speck128-neon: Encryption 107.9 MB/s, Decryption 108.1 MB/s
xts(speck128-generic): Encryption  32.1 MB/s, Decryption  36.6 MB/s

In comparison to AES-256-XTS without the Cryptography Extensions:

xts-aes-neonbs:Encryption  41.2 MB/s, Decryption  36.7 MB/s
xts(aes-asm):  Encryption  31.7 MB/s, Decryption  30.8 MB/s
xts(aes-generic):  Encryption  21.2 MB/s, Decryption  20.9 MB/s

Speck64/128-XTS is even faster:

xts-speck64-neon:  Encryption 138.6 MB/s, Decryption 139.1 MB/s

Note that as with the generic code, only the Speck128 and Speck64
variants are supported.  Also, for now only the XTS mode of operation is
supported, to target the disk and file encryption use cases.  The NEON
code also only handles the portion of the data that is evenly divisible
into 128-byte chunks, with any remainder handled by a C fallback.  Of
course, other modes of operation could be added later if needed, and/or
the NEON code could be updated to handle other buffer sizes.

The XTS specification is only defined for AES which has a 128-bit block
size, so for the GF(2^64) math needed for Speck64-XTS we use the
reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
paper.  Of course, when possible users should use Speck128-XTS, but even
that may be too slow on some processors; Speck64-XTS can be faster.

Signed-off-by: Eric Biggers 
---
 arch/arm/crypto/Kconfig   |   6 +
 arch/arm/crypto/Makefile  |   2 +
 arch/arm/crypto/speck-neon-core.S | 432 ++
 arch/arm/crypto/speck-neon-glue.c | 290 +
 4 files changed, 730 insertions(+)
 create mode 100644 arch/arm/crypto/speck-neon-core.S
 create mode 100644 arch/arm/crypto/speck-neon-glue.c

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index b8e69fe282b8..925d1364727a 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -121,4 +121,10 @@ config CRYPTO_CHACHA20_NEON
select CRYPTO_BLKCIPHER
select CRYPTO_CHACHA20
 
+config CRYPTO_SPECK_NEON
+   tristate "NEON accelerated Speck cipher algorithms"
+   depends on KERNEL_MODE_NEON
+   select CRYPTO_BLKCIPHER
+   select CRYPTO_SPECK
+
 endif
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index 30ef8e291271..a758107c5525 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
 obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
 obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
+obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
 
 ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
@@ -53,6 +54,7 @@ ghash-arm-ce-y:= ghash-ce-core.o ghash-ce-glue.o
 crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
 crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
 chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
+speck-neon-y := speck-neon-core.o speck-neon-glue.o
 
 quiet_cmd_perl = PERL$@
   cmd_perl = $(PERL) $(<) > $(@)
diff --git a/arch/arm/crypto/speck-neon-core.S 
b/arch/arm/crypto/speck-neon-core.S
new file mode 100644
index ..3c1e203e53b9
--- /dev/null
+++ b/arch/arm/crypto/speck-neon-core.S
@@ -0,0 +1,432 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
+ *
+ * Copyright (c) 2018 Google, Inc
+ *
+ * Author: Eric Biggers 
+ */
+
+#include 
+
+   .text
+   .fpuneon
+
+   // arguments
+   ROUND_KEYS  .reqr0  // const {u64,u32} *round_keys
+   NROUNDS .reqr1  // int nrounds
+   DST .reqr2  // void *dst
+   SRC .reqr3  // const void *src
+   NBYTES  .reqr4  // unsigned int nbytes
+   TWEAK   .reqr5  // void *tweak
+
+   // registers which hold the data being encrypted/decrypted
+   X0  .reqq0
+   X0_L.reqd0
+   X0_H.reqd1
+   Y0  .reqq1
+   Y0_H.reqd3
+   X1  .reqq2
+   X1_L.reqd4
+   X1_H.reqd5
+   Y1  .reqq3
+   Y1_H

[PATCH v2 4/5] crypto: speck - add test vectors for Speck128-XTS

2018-02-12 Thread Eric Biggers
Add test vectors for Speck128-XTS, generated in userspace using C code.
The inputs were borrowed from the AES-XTS test vectors.

Both xts(speck128-generic) and xts-speck128-neon pass these tests.

Signed-off-by: Eric Biggers 
---
 crypto/testmgr.c |   9 +
 crypto/testmgr.h | 687 +++
 2 files changed, 696 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 058ed5eb6620..e011a347d51b 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3575,6 +3575,15 @@ static const struct alg_test_desc alg_test_descs[] = {
.dec = __VECS(serpent_xts_dec_tv_template)
}
}
+   }, {
+   .alg = "xts(speck128)",
+   .test = alg_test_skcipher,
+   .suite = {
+   .cipher = {
+   .enc = __VECS(speck128_xts_enc_tv_template),
+   .dec = __VECS(speck128_xts_dec_tv_template)
+   }
+   }
}, {
.alg = "xts(twofish)",
.test = alg_test_skcipher,
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 3818210f77cf..0212e0ebcd0c 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -14411,6 +14411,693 @@ static const struct cipher_testvec 
speck128_dec_tv_template[] = {
},
 };
 
+/*
+ * Speck128-XTS test vectors, taken from the AES-XTS test vectors with the
+ * result recomputed with Speck128 as the cipher
+ */
+
+static const struct cipher_testvec speck128_xts_enc_tv_template[] = {
+   {
+   .key= "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .klen   = 32,
+   .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .ilen   = 32,
+   .result = "\xbe\xa0\xe7\x03\xd7\xfe\xab\x62"
+ "\x3b\x99\x4a\x64\x74\x77\xac\xed"
+ "\xd8\xf4\xa6\xcf\xae\xb9\x07\x42"
+ "\x51\xd9\xb6\x1d\xe0\x5e\xbc\x54",
+   .rlen   = 32,
+   }, {
+   .key= "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x22\x22\x22\x22\x22\x22\x22\x22"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+   .klen   = 32,
+   .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+   .ilen   = 32,
+   .result = "\xfb\x53\x81\x75\x6f\x9f\x34\xad"
+ "\x7e\x01\xed\x7b\xcc\xda\x4e\x4a"
+ "\xd4\x84\xa4\x53\xd5\x88\x73\x1b"
+ "\xfd\xcb\xae\x0d\xf3\x04\xee\xe6",
+   .rlen   = 32,
+   }, {
+   .key= "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+ "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+ "\x22\x22\x22\x22\x22\x22\x22\x22"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+   .klen   = 32,
+   .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+   .ilen   = 32,
+   .result = "\x21\x52\x84\x15\xd1\xf7\x21\x55"
+ "\xd9\x75\x4a\xd3\xc5\xdb\x9f\x7d"
+ "\xda\x63\xb2\xf1\x82\xb0\x89\x59"
+ "\x86\xd4\xaa\xaa\xdd\xff\x4f\x92",
+   .rlen   = 32,
+   }, {
+   .key= "\x27\x18\x28\x18\x28\x45\x90\x45"
+ "\x23\x53\x60\x28\x74\x71\x35\x26"
+ "\x31\x41\x59\x26\x53\x58\x97\x93"
+ "\x23\x84\x62\x64\x33\x83\x27\x95",
+   .klen   = 32,
+   .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x00\x01\x02\x03\x04\x05\x06\x07"
+

[PATCH v2 5/5] crypto: speck - add test vectors for Speck64-XTS

2018-02-12 Thread Eric Biggers
Add test vectors for Speck64-XTS, generated in userspace using C code.
The inputs were borrowed from the AES-XTS test vectors, with key lengths
adjusted.

xts-speck64-neon passes these tests.  However, they aren't currently
applicable for the generic XTS template, as that only supports a 128-bit
block size.

Signed-off-by: Eric Biggers 
---
 crypto/testmgr.c |   9 +
 crypto/testmgr.h | 671 +++
 2 files changed, 680 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index e011a347d51b..9f82e7bc9c56 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3584,6 +3584,15 @@ static const struct alg_test_desc alg_test_descs[] = {
.dec = __VECS(speck128_xts_dec_tv_template)
}
}
+   }, {
+   .alg = "xts(speck64)",
+   .test = alg_test_skcipher,
+   .suite = {
+   .cipher = {
+   .enc = __VECS(speck64_xts_enc_tv_template),
+   .dec = __VECS(speck64_xts_dec_tv_template)
+   }
+   }
}, {
.alg = "xts(twofish)",
.test = alg_test_skcipher,
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 0212e0ebcd0c..da72fd394f35 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -15138,6 +15138,677 @@ static const struct cipher_testvec 
speck64_dec_tv_template[] = {
},
 };
 
+/*
+ * Speck64-XTS test vectors, taken from the AES-XTS test vectors with the 
result
+ * recomputed with Speck64 as the cipher, and key lengths adjusted
+ */
+
+static const struct cipher_testvec speck64_xts_enc_tv_template[] = {
+   {
+   .key= "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .klen   = 24,
+   .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .ilen   = 32,
+   .result = "\x84\xaf\x54\x07\x19\xd4\x7c\xa6"
+ "\xe4\xfe\xdf\xc4\x1f\x34\xc3\xc2"
+ "\x80\xf5\x72\xe7\xcd\xf0\x99\x22"
+ "\x35\xa7\x2f\x06\xef\xdc\x51\xaa",
+   .rlen   = 32,
+   }, {
+   .key= "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+   .klen   = 24,
+   .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+   .ilen   = 32,
+   .result = "\x12\x56\x73\xcd\x15\x87\xa8\x59"
+ "\xcf\x84\xae\xd9\x1c\x66\xd6\x9f"
+ "\xb3\x12\x69\x7e\x36\xeb\x52\xff"
+ "\x62\xdd\xba\x90\xb3\xe1\xee\x99",
+   .rlen   = 32,
+   }, {
+   .key= "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+ "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+   .klen   = 24,
+   .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+   .ilen   = 32,
+   .result = "\x15\x1b\xe4\x2c\xa2\x5a\x2d\x2c"
+ "\x27\x36\xc0\xbf\x5d\xea\x36\x37"
+ "\x2d\x1a\x88\xbc\x66\xb5\xd0\x0b"
+ "\xa1\xbc\x19\xb2\x0f\x3b\x75\x34",
+   .rlen   = 32,
+   }, {
+   .key= "\x27\x18\x28\x18\x28\x45\x90\x45"
+ "\x23\x53\x60\x28\x74\x71\x35\x26"
+ "\x31\x41\x59\x26\x53\x58\x97\x93",
+   .klen   = 24,
+   .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+   .input  = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+

[PATCH v2 2/5] crypto: speck - export common helpers

2018-02-12 Thread Eric Biggers
Export the Speck constants and transform context and the ->setkey(),
->encrypt(), and ->decrypt() functions so that they can be reused by the
ARM NEON implementation of Speck-XTS.  The generic key expansion code
will be reused because it is not performance-critical and is not
vectorizable, while the generic encryption and decryption functions are
needed as fallbacks and for the XTS tweak encryption.

Signed-off-by: Eric Biggers 
---
 crypto/speck.c | 90 +++---
 include/crypto/speck.h | 62 ++
 2 files changed, 111 insertions(+), 41 deletions(-)
 create mode 100644 include/crypto/speck.h

diff --git a/crypto/speck.c b/crypto/speck.c
index 4e80ad76bcd7..58aa9f7f91f7 100644
--- a/crypto/speck.c
+++ b/crypto/speck.c
@@ -24,6 +24,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -31,22 +32,6 @@
 
 /* Speck128 */
 
-#define SPECK128_BLOCK_SIZE16
-
-#define SPECK128_128_KEY_SIZE  16
-#define SPECK128_128_NROUNDS   32
-
-#define SPECK128_192_KEY_SIZE  24
-#define SPECK128_192_NROUNDS   33
-
-#define SPECK128_256_KEY_SIZE  32
-#define SPECK128_256_NROUNDS   34
-
-struct speck128_tfm_ctx {
-   u64 round_keys[SPECK128_256_NROUNDS];
-   int nrounds;
-};
-
 static __always_inline void speck128_round(u64 *x, u64 *y, u64 k)
 {
*x = ror64(*x, 8);
@@ -65,9 +50,9 @@ static __always_inline void speck128_unround(u64 *x, u64 *y, 
u64 k)
*x = rol64(*x, 8);
 }
 
-static void speck128_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+void crypto_speck128_encrypt(const struct speck128_tfm_ctx *ctx,
+u8 *out, const u8 *in)
 {
-   const struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
u64 y = get_unaligned_le64(in);
u64 x = get_unaligned_le64(in + 8);
int i;
@@ -78,10 +63,16 @@ static void speck128_encrypt(struct crypto_tfm *tfm, u8 
*out, const u8 *in)
put_unaligned_le64(y, out);
put_unaligned_le64(x, out + 8);
 }
+EXPORT_SYMBOL_GPL(crypto_speck128_encrypt);
 
-static void speck128_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+static void speck128_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+   crypto_speck128_encrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+void crypto_speck128_decrypt(const struct speck128_tfm_ctx *ctx,
+u8 *out, const u8 *in)
 {
-   const struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
u64 y = get_unaligned_le64(in);
u64 x = get_unaligned_le64(in + 8);
int i;
@@ -92,11 +83,16 @@ static void speck128_decrypt(struct crypto_tfm *tfm, u8 
*out, const u8 *in)
put_unaligned_le64(y, out);
put_unaligned_le64(x, out + 8);
 }
+EXPORT_SYMBOL_GPL(crypto_speck128_decrypt);
 
-static int speck128_setkey(struct crypto_tfm *tfm, const u8 *key,
+static void speck128_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+   crypto_speck128_decrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+int crypto_speck128_setkey(struct speck128_tfm_ctx *ctx, const u8 *key,
   unsigned int keylen)
 {
-   struct speck128_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
u64 l[3];
u64 k;
int i;
@@ -138,21 +134,15 @@ static int speck128_setkey(struct crypto_tfm *tfm, const 
u8 *key,
 
return 0;
 }
+EXPORT_SYMBOL_GPL(crypto_speck128_setkey);
 
-/* Speck64 */
-
-#define SPECK64_BLOCK_SIZE 8
-
-#define SPECK64_96_KEY_SIZE12
-#define SPECK64_96_NROUNDS 26
-
-#define SPECK64_128_KEY_SIZE   16
-#define SPECK64_128_NROUNDS27
+static int speck128_setkey(struct crypto_tfm *tfm, const u8 *key,
+  unsigned int keylen)
+{
+   return crypto_speck128_setkey(crypto_tfm_ctx(tfm), key, keylen);
+}
 
-struct speck64_tfm_ctx {
-   u32 round_keys[SPECK64_128_NROUNDS];
-   int nrounds;
-};
+/* Speck64 */
 
 static __always_inline void speck64_round(u32 *x, u32 *y, u32 k)
 {
@@ -172,9 +162,9 @@ static __always_inline void speck64_unround(u32 *x, u32 *y, 
u32 k)
*x = rol32(*x, 8);
 }
 
-static void speck64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+void crypto_speck64_encrypt(const struct speck64_tfm_ctx *ctx,
+   u8 *out, const u8 *in)
 {
-   const struct speck64_tfm_ctx *ctx = crypto_tfm_ctx(tfm);
u32 y = get_unaligned_le32(in);
u32 x = get_unaligned_le32(in + 4);
int i;
@@ -185,10 +175,16 @@ static void speck64_encrypt(struct crypto_tfm *tfm, u8 
*out, const u8 *in)
put_unaligned_le32(y, out);
put_unaligned_le32(x, out + 4);
 }
+EXPORT_SYMBOL_GPL(crypto_speck64_encrypt);
 
-static void speck64_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+static void speck64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+   crypto_speck64_encrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+void crypto_speck64_decrypt(const struct speck64_tfm_ctx *ctx,

Re: [PATCH 14/14] x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather

2018-02-12 Thread Junaid Shahid
Hi Dave,


On 02/12/2018 11:51 AM, Dave Watson wrote:

> +static int gcmaes_encrypt_sg(struct aead_request *req, unsigned int assoclen,
> + u8 *hash_subkey, u8 *iv, void *aes_ctx)
>  
> +static int gcmaes_decrypt_sg(struct aead_request *req, unsigned int assoclen,
> + u8 *hash_subkey, u8 *iv, void *aes_ctx)

These two functions are almost identical. Wouldn't it be better to combine them 
into a single encrypt/decrypt function, similar to what you have done for the 
assembly macros?

> + if (((struct crypto_aes_ctx *)aes_ctx)->key_length != AES_KEYSIZE_128 ||
> + aesni_gcm_enc_tfm == aesni_gcm_enc) {

Shouldn't we also include a check for the buffer length being less than 
AVX_GEN2_OPTSIZE? AVX will not be used in that case either.


Thanks,
Junaid



Re: [PATCH] hwrng: bcm2835: Handle deferred clock properly

2018-02-12 Thread Florian Fainelli
On 02/12/2018 12:11 PM, Stefan Wahren wrote:
> In case the probe of the clock is deferred, we would assume it is
> optional. This is wrong, so defer the probe of this driver until
> the clock is available.
> 
> Fixes: 791af4f4907a ("hwrng: bcm2835 - Manage an optional clock")
> Signed-off-by: Stefan Wahren 

Acked-by: Florian Fainelli 

Thanks Stephan!
-- 
Florian


Re: [PATCH 0/5] crypto: Speck support

2018-02-12 Thread Eric Biggers
Hi Jeff,

On Mon, Feb 12, 2018 at 02:57:06PM -0500, Jeffrey Walton wrote:
> On Mon, Feb 12, 2018 at 2:19 PM, Eric Biggers  wrote:
> > Hi all,
> >
> > On Fri, Feb 09, 2018 at 07:07:01PM -0500, Jeffrey Walton wrote:
> >> > Hi Jeffrey,
> >> >
> >> > I see you wrote the SPECK implementation in Crypto++, and you are 
> >> > treating the
> >> > words as big endian.
> >> >
> >> > Do you have a reference for this being the "correct" order?  
> >> > Unfortunately the
> >> > authors of the cipher failed to mention the byte order in their paper.  
> >> > And they
> >> > gave the test vectors as words, so the test vectors don't clarify it 
> >> > either.
> >> >
> >> > I had assumed little endian words, but now I am having second 
> >> > thoughts...  And
> >> > to confuse things further, it seems that some implementations (including 
> >> > the
> >> > authors own implementation for the SUPERCOP benchmark toolkit [1]) even 
> >> > consider
> >> > the words themselves in the order (y, x) rather than the more intuitive 
> >> > (x, y).
> >> >
> >> > [1] 
> >> > https://github.com/iadgov/simon-speck-supercop/blob/master/crypto_stream/speck128128ctr/ref/stream.c
> >> >
> >> > In fact, even the reference code from the paper treats pt[0] as y and 
> >> > pt[1] as
> >> > x, where 'pt' is a u64 array -- although that being said, it's not shown 
> >> > how the
> >> > actual bytes should be translated to/from those u64 arrays.
> >> >
> >> > I'd really like to avoid people having to add additional versions of 
> >> > SPECK later
> >> > for the different byte and word orders...
> >>
> >> Hi Eric,
> >>
> >> Yeah, this was a point of confusion for us as well. After the sidebar
> >> conversations I am wondering about the correctness of Crypto++
> >> implementation.
> >>
> >
> > We've received another response from one of the Speck creators (Louis 
> > Wingers)
> > that (to summarize) the intended byte order is little endian, and the 
> > intended
> > word order is (y, x), i.e. 'y' is at a lower memory address than 'x'.  Or
> > equivalently: the test vectors given in the original paper need to be read 
> > as
> > byte arrays from *right-to-left*.
> >
> > (y, x) is not the intuitive order, but it's not a huge deal.  The more 
> > important
> > thing is that we don't end up with multiple implementations with different 
> > byte
> > and/or word orders.
> >
> > So, barring any additional confusion, I'll send a revised version of this
> > patchset that flips the word order.  Jeff would need to flip both the byte 
> > and
> > word orders in his implementation in Crypto++ as well.
> 
> Thanks Eric.
> 
> Yeah, the (y,x) explains a lot of the confusion, and explains the
> modification I needed in my GitHub clone of the IAD Team's SUPERCOP to
> arrive at test vector results. My clone is available at
> https://github.com/noloader/simon-speck-supercop.
> 
> So let me ask you... Given the Speck-128(128) test vector from Appendix C:
> 
> Key: 0f0e0d0c0b0a0908 0706050403020100
> Plaintext: 6c61766975716520 7469206564616d20
> Ciphertext: a65d985179783265 7860fedf5c570d18
> 
> Will the Linux implementation arrive at the published result, or will
> it arrive at a different result? I guess what I am asking, where is
> the presentation detail going to be handled?
> 
> A related question is, will the kernel be parsing just the key as
> (y,x), or will all parameters be handled as (y,x)? At this point I
> believe it only needs to apply to the key but I did not investigate
> the word swapping in detail because I was chasing the test vector.
> 

The kernel implementation has to operate on byte arrays.  But the test vectors
in the original paper are given as *words* in the order (x, y) and likewise for
the key (i.e. the rightmost word shown becomes the first round key).  But based
on the clarifications from the Speck team, the actual byte arrays that
correspond to the Speck-128/128 test vector would be:

const uint8_t key[16] = 
"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f";
const uint8_t plaintext[16] = 
"\x20\x6d\x61\x64\x65\x20\x69\x74\x20\x65\x71\x75\x69\x76\x61\x6c";
const uint8_t ciphertext[16] = 
"\x18\x0d\x57\x5c\xdf\xfe\x60\x78\x65\x32\x78\x79\x51\x98\x5d\xa6";

So equivalently, if we consider the printed test vectors as just listing the
bytes (ignoring the whitespace between the words), then they are backwards.
That applies to all 3 parts (Key, Plaintext, and Ciphertext).

Note that my patch 1/5 adds the Speck test vectors to testmgr.h so that they are
hooked into the Linux kernel's crypto self-tests, so on appropriately-configured
kernels it will be automatically verified that the implementation matches the
test vectors.  The ones in the current version of the patchset have the "wrong"
word order though, so I will need to send out a new version with the correct
implementation and test vectors.

Thanks,

Eric


[PATCH] hwrng: bcm2835: Handle deferred clock properly

2018-02-12 Thread Stefan Wahren
In case the probe of the clock is deferred, we would assume it is
optional. This is wrong, so defer the probe of this driver until
the clock is available.

Fixes: 791af4f4907a ("hwrng: bcm2835 - Manage an optional clock")
Signed-off-by: Stefan Wahren 
---
 drivers/char/hw_random/bcm2835-rng.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/char/hw_random/bcm2835-rng.c 
b/drivers/char/hw_random/bcm2835-rng.c
index 7a84cec..6767d96 100644
--- a/drivers/char/hw_random/bcm2835-rng.c
+++ b/drivers/char/hw_random/bcm2835-rng.c
@@ -163,6 +163,8 @@ static int bcm2835_rng_probe(struct platform_device *pdev)
 
/* Clock is optional on most platforms */
priv->clk = devm_clk_get(dev, NULL);
+   if (IS_ERR(priv->clk) && PTR_ERR(priv->clk) == -EPROBE_DEFER)
+   return -EPROBE_DEFER;
 
priv->rng.name = pdev->name;
priv->rng.init = bcm2835_rng_init;
-- 
2.7.4



Re: [PATCH 0/5] crypto: Speck support

2018-02-12 Thread Jeffrey Walton
On Mon, Feb 12, 2018 at 2:19 PM, Eric Biggers  wrote:
> Hi all,
>
> On Fri, Feb 09, 2018 at 07:07:01PM -0500, Jeffrey Walton wrote:
>> > Hi Jeffrey,
>> >
>> > I see you wrote the SPECK implementation in Crypto++, and you are treating 
>> > the
>> > words as big endian.
>> >
>> > Do you have a reference for this being the "correct" order?  Unfortunately 
>> > the
>> > authors of the cipher failed to mention the byte order in their paper.  
>> > And they
>> > gave the test vectors as words, so the test vectors don't clarify it 
>> > either.
>> >
>> > I had assumed little endian words, but now I am having second thoughts...  
>> > And
>> > to confuse things further, it seems that some implementations (including 
>> > the
>> > authors own implementation for the SUPERCOP benchmark toolkit [1]) even 
>> > consider
>> > the words themselves in the order (y, x) rather than the more intuitive 
>> > (x, y).
>> >
>> > [1] 
>> > https://github.com/iadgov/simon-speck-supercop/blob/master/crypto_stream/speck128128ctr/ref/stream.c
>> >
>> > In fact, even the reference code from the paper treats pt[0] as y and 
>> > pt[1] as
>> > x, where 'pt' is a u64 array -- although that being said, it's not shown 
>> > how the
>> > actual bytes should be translated to/from those u64 arrays.
>> >
>> > I'd really like to avoid people having to add additional versions of SPECK 
>> > later
>> > for the different byte and word orders...
>>
>> Hi Eric,
>>
>> Yeah, this was a point of confusion for us as well. After the sidebar
>> conversations I am wondering about the correctness of Crypto++
>> implementation.
>>
>
> We've received another response from one of the Speck creators (Louis Wingers)
> that (to summarize) the intended byte order is little endian, and the intended
> word order is (y, x), i.e. 'y' is at a lower memory address than 'x'.  Or
> equivalently: the test vectors given in the original paper need to be read as
> byte arrays from *right-to-left*.
>
> (y, x) is not the intuitive order, but it's not a huge deal.  The more 
> important
> thing is that we don't end up with multiple implementations with different 
> byte
> and/or word orders.
>
> So, barring any additional confusion, I'll send a revised version of this
> patchset that flips the word order.  Jeff would need to flip both the byte and
> word orders in his implementation in Crypto++ as well.

Thanks Eric.

Yeah, the (y,x) explains a lot of the confusion, and explains the
modification I needed in my GitHub clone of the IAD Team's SUPERCOP to
arrive at test vector results. My clone is available at
https://github.com/noloader/simon-speck-supercop.

So let me ask you... Given the Speck-128(128) test vector from Appendix C:

Key: 0f0e0d0c0b0a0908 0706050403020100
Plaintext: 6c61766975716520 7469206564616d20
Ciphertext: a65d985179783265 7860fedf5c570d18

Will the Linux implementation arrive at the published result, or will
it arrive at a different result? I guess what I am asking, where is
the presentation detail going to be handled?

A related question is, will the kernel be parsing just the key as
(y,x), or will all parameters be handled as (y,x)? At this point I
believe it only needs to apply to the key but I did not investigate
the word swapping in detail because I was chasing the test vector.

Jeff


[PATCH 06/14] x86/crypto: aesni: Introduce gcm_context_data

2018-02-12 Thread Dave Watson
Introduce a gcm_context_data struct that will be used to pass
context data between scatter/gather update calls.  It is passed
as the second argument (after crypto keys), other args are
renumbered.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S  | 115 +
 arch/x86/crypto/aesni-intel_glue.c |  81 ++
 2 files changed, 121 insertions(+), 75 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 8021fd1..6c5a80d 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -111,6 +111,14 @@ ALL_F:  .octa 0x
// (for Karatsuba purposes)
 #defineVARIABLE_OFFSET 16*8
 
+#define AadHash 16*0
+#define AadLen 16*1
+#define InLen (16*1)+8
+#define PBlockEncKey 16*2
+#define OrigIV 16*3
+#define CurCount 16*4
+#define PBlockLen 16*5
+
 #define arg1 rdi
 #define arg2 rsi
 #define arg3 rdx
@@ -121,6 +129,7 @@ ALL_F:  .octa 0x
 #define arg8 STACK_OFFSET+16(%r14)
 #define arg9 STACK_OFFSET+24(%r14)
 #define arg10 STACK_OFFSET+32(%r14)
+#define arg11 STACK_OFFSET+40(%r14)
 #define keysize 2*15*16(%arg1)
 #endif
 
@@ -195,9 +204,9 @@ ALL_F:  .octa 0x
 # GCM_INIT initializes a gcm_context struct to prepare for encoding/decoding.
 # Clobbers rax, r10-r13 and xmm0-xmm6, %xmm13
 .macro GCM_INIT
-   mov %arg6, %r12
+   mov arg7, %r12
movdqu  (%r12), %xmm13
-   movdqa  SHUF_MASK(%rip), %xmm2
+   movdqa  SHUF_MASK(%rip), %xmm2
PSHUFB_XMM %xmm2, %xmm13
 
# precompute HashKey<<1 mod poly from the HashKey (required for GHASH)
@@ -217,7 +226,7 @@ ALL_F:  .octa 0x
pandPOLY(%rip), %xmm2
pxor%xmm2, %xmm13
movdqa  %xmm13, HashKey(%rsp)
-   mov %arg4, %r13 # %xmm13 holds HashKey<<1 (mod 
poly)
+   mov %arg5, %r13 # %xmm13 holds HashKey<<1 (mod poly)
and $-16, %r13
mov %r13, %r12
 .endm
@@ -271,18 +280,18 @@ _four_cipher_left_\@:
GHASH_LAST_4%xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, \
 %xmm15, %xmm1, %xmm2, %xmm3, %xmm4, %xmm8
 _zero_cipher_left_\@:
-   mov %arg4, %r13
-   and $15, %r13   # %r13 = arg4 (mod 16)
+   mov %arg5, %r13
+   and $15, %r13   # %r13 = arg5 (mod 16)
je  _multiple_of_16_bytes_\@
 
# Handle the last <16 Byte block separately
paddd ONE(%rip), %xmm0# INCR CNT to get Yn
-movdqa SHUF_MASK(%rip), %xmm10
+   movdqa SHUF_MASK(%rip), %xmm10
PSHUFB_XMM %xmm10, %xmm0
 
ENCRYPT_SINGLE_BLOCK%xmm0, %xmm1# Encrypt(K, Yn)
 
-   lea (%arg3,%r11,1), %r10
+   lea (%arg4,%r11,1), %r10
mov %r13, %r12
READ_PARTIAL_BLOCK %r10 %r12 %xmm2 %xmm1
 
@@ -320,13 +329,13 @@ _zero_cipher_left_\@:
MOVQ_R64_XMM %xmm0, %rax
cmp $8, %r13
jle _less_than_8_bytes_left_\@
-   mov %rax, (%arg2 , %r11, 1)
+   mov %rax, (%arg3 , %r11, 1)
add $8, %r11
psrldq $8, %xmm0
MOVQ_R64_XMM %xmm0, %rax
sub $8, %r13
 _less_than_8_bytes_left_\@:
-   mov %al,  (%arg2, %r11, 1)
+   mov %al,  (%arg3, %r11, 1)
add $1, %r11
shr $8, %rax
sub $1, %r13
@@ -338,11 +347,11 @@ _multiple_of_16_bytes_\@:
 # Output: Authorization Tag (AUTH_TAG)
 # Clobbers rax, r10-r12, and xmm0, xmm1, xmm5-xmm15
 .macro GCM_COMPLETE
-   mov arg8, %r12# %r13 = aadLen (number of bytes)
+   mov arg9, %r12# %r13 = aadLen (number of bytes)
shl $3, %r12  # convert into number of bits
movd%r12d, %xmm15 # len(A) in %xmm15
-   shl $3, %arg4 # len(C) in bits (*128)
-   MOVQ_R64_XMM%arg4, %xmm1
+   shl $3, %arg5 # len(C) in bits (*128)
+   MOVQ_R64_XMM%arg5, %xmm1
pslldq  $8, %xmm15# %xmm15 = len(A)||0x
pxor%xmm1, %xmm15 # %xmm15 = len(A)||len(C)
pxor%xmm15, %xmm8
@@ -351,13 +360,13 @@ _multiple_of_16_bytes_\@:
movdqa SHUF_MASK(%rip), %xmm10
PSHUFB_XMM %xmm10, %xmm8
 
-   mov %arg5, %rax   # %rax = *Y0
+   mov %arg6, %rax   # %rax = *Y0
movdqu  (%rax), %xmm0 # %xmm0 = Y0
ENCRYPT_SINGLE_BLOCK%xmm0,  %xmm1 # E(K, Y0)
pxor%xmm8, %xmm0
 _return_T_\@:
-   mov arg9, %r10 # %r10 = authTag
-   mov arg10, %r11# %r11 = auth_tag_len
+   mov arg10, %r10 # %r10 = authTag
+   mov arg11, %r11

[PATCH 14/14] x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather

2018-02-12 Thread Dave Watson
Add gcmaes_en/decrypt_sg routines, that will do scatter/gather
by sg. Either src or dst may contain multiple buffers, so
iterate over both at the same time if they are different.
If the input is the same as the output, iterate only over one.

Currently both the AAD and TAG must be linear, so copy them out
with scatterlist_map_and_copy. 

Only the SSE routines are updated so far, so leave the previous
gcmaes_en/decrypt routines, and branch to the sg ones if the
keysize is inappropriate for avx, or we are SSE only.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_glue.c | 166 +
 1 file changed, 166 insertions(+)

diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index de986f9..1e32fbe 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -791,6 +791,82 @@ static int generic_gcmaes_set_authsize(struct crypto_aead 
*tfm,
return 0;
 }
 
+static int gcmaes_encrypt_sg(struct aead_request *req, unsigned int assoclen,
+   u8 *hash_subkey, u8 *iv, void *aes_ctx)
+{
+   struct crypto_aead *tfm = crypto_aead_reqtfm(req);
+   unsigned long auth_tag_len = crypto_aead_authsize(tfm);
+   struct gcm_context_data data AESNI_ALIGN_ATTR;
+   struct scatter_walk dst_sg_walk = {};
+   unsigned long left = req->cryptlen;
+   unsigned long len, srclen, dstlen;
+   struct scatter_walk src_sg_walk;
+   struct scatterlist src_start[2];
+   struct scatterlist dst_start[2];
+   struct scatterlist *src_sg;
+   struct scatterlist *dst_sg;
+   u8 *src, *dst, *assoc;
+   u8 authTag[16];
+
+   assoc = kmalloc(assoclen, GFP_ATOMIC);
+   if (unlikely(!assoc))
+   return -ENOMEM;
+   scatterwalk_map_and_copy(assoc, req->src, 0, assoclen, 0);
+
+   src_sg = scatterwalk_ffwd(src_start, req->src, req->assoclen);
+   scatterwalk_start(_sg_walk, src_sg);
+   if (req->src != req->dst) {
+   dst_sg = scatterwalk_ffwd(dst_start, req->dst, req->assoclen);
+   scatterwalk_start(_sg_walk, dst_sg);
+   }
+
+   kernel_fpu_begin();
+   aesni_gcm_init(aes_ctx, , iv,
+   hash_subkey, assoc, assoclen);
+   if (req->src != req->dst) {
+   while (left) {
+   src = scatterwalk_map(_sg_walk);
+   dst = scatterwalk_map(_sg_walk);
+   srclen = scatterwalk_clamp(_sg_walk, left);
+   dstlen = scatterwalk_clamp(_sg_walk, left);
+   len = min(srclen, dstlen);
+   if (len)
+   aesni_gcm_enc_update(aes_ctx, ,
+dst, src, len);
+   left -= len;
+
+   scatterwalk_unmap(src);
+   scatterwalk_unmap(dst);
+   scatterwalk_advance(_sg_walk, len);
+   scatterwalk_advance(_sg_walk, len);
+   scatterwalk_done(_sg_walk, 0, left);
+   scatterwalk_done(_sg_walk, 1, left);
+   }
+   } else {
+   while (left) {
+   dst = src = scatterwalk_map(_sg_walk);
+   len = scatterwalk_clamp(_sg_walk, left);
+   if (len)
+   aesni_gcm_enc_update(aes_ctx, ,
+src, src, len);
+   left -= len;
+   scatterwalk_unmap(src);
+   scatterwalk_advance(_sg_walk, len);
+   scatterwalk_done(_sg_walk, 1, left);
+   }
+   }
+   aesni_gcm_finalize(aes_ctx, , authTag, auth_tag_len);
+   kernel_fpu_end();
+
+   kfree(assoc);
+
+   /* Copy in the authTag */
+   scatterwalk_map_and_copy(authTag, req->dst,
+   req->assoclen + req->cryptlen,
+   auth_tag_len, 1);
+   return 0;
+}
+
 static int gcmaes_encrypt(struct aead_request *req, unsigned int assoclen,
  u8 *hash_subkey, u8 *iv, void *aes_ctx)
 {
@@ -802,6 +878,11 @@ static int gcmaes_encrypt(struct aead_request *req, 
unsigned int assoclen,
struct scatter_walk dst_sg_walk = {};
struct gcm_context_data data AESNI_ALIGN_ATTR;
 
+   if (((struct crypto_aes_ctx *)aes_ctx)->key_length != AES_KEYSIZE_128 ||
+   aesni_gcm_enc_tfm == aesni_gcm_enc) {
+   return gcmaes_encrypt_sg(req, assoclen, hash_subkey, iv,
+   aes_ctx);
+   }
if (sg_is_last(req->src) &&
(!PageHighMem(sg_page(req->src)) ||
req->src->offset + req->src->length <= PAGE_SIZE) &&
@@ -854,6 +935,86 @@ static int gcmaes_encrypt(struct aead_request *req, 
unsigned int assoclen,

[PATCH 13/14] x86/crypto: aesni: Introduce scatter/gather asm function stubs

2018-02-12 Thread Dave Watson
The asm macros are all set up now, introduce entry points.

GCM_INIT and GCM_COMPLETE have arguments supplied, so that
the new scatter/gather entry points don't have to take all the
arguments, and only the ones they need.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S  | 116 -
 arch/x86/crypto/aesni-intel_glue.c |  16 +
 2 files changed, 106 insertions(+), 26 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index b941952..311b2de 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -200,8 +200,8 @@ ALL_F:  .octa 0x
 # Output: HashKeys stored in gcm_context_data.  Only needs to be called
 # once per key.
 # clobbers r12, and tmp xmm registers.
-.macro PRECOMPUTE TMP1 TMP2 TMP3 TMP4 TMP5 TMP6 TMP7
-   mov arg7, %r12
+.macro PRECOMPUTE SUBKEY TMP1 TMP2 TMP3 TMP4 TMP5 TMP6 TMP7
+   mov \SUBKEY, %r12
movdqu  (%r12), \TMP3
movdqa  SHUF_MASK(%rip), \TMP2
PSHUFB_XMM \TMP2, \TMP3
@@ -254,14 +254,14 @@ ALL_F:  .octa 0x
 
 # GCM_INIT initializes a gcm_context struct to prepare for encoding/decoding.
 # Clobbers rax, r10-r13 and xmm0-xmm6, %xmm13
-.macro GCM_INIT
-   mov arg9, %r11
+.macro GCM_INIT Iv SUBKEY AAD AADLEN
+   mov \AADLEN, %r11
mov %r11, AadLen(%arg2) # ctx_data.aad_length = aad_length
xor %r11, %r11
mov %r11, InLen(%arg2) # ctx_data.in_length = 0
mov %r11, PBlockLen(%arg2) # ctx_data.partial_block_length = 0
mov %r11, PBlockEncKey(%arg2) # ctx_data.partial_block_enc_key = 0
-   mov %arg6, %rax
+   mov \Iv, %rax
movdqu (%rax), %xmm0
movdqu %xmm0, OrigIV(%arg2) # ctx_data.orig_IV = iv
 
@@ -269,11 +269,11 @@ ALL_F:  .octa 0x
PSHUFB_XMM %xmm2, %xmm0
movdqu %xmm0, CurCount(%arg2) # ctx_data.current_counter = iv
 
-   PRECOMPUTE %xmm1 %xmm2 %xmm3 %xmm4 %xmm5 %xmm6 %xmm7
+   PRECOMPUTE \SUBKEY, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7,
movdqa HashKey(%arg2), %xmm13
 
-   CALC_AAD_HASH %xmm13 %xmm0 %xmm1 %xmm2 %xmm3 %xmm4 \
-   %xmm5 %xmm6
+   CALC_AAD_HASH %xmm13, \AAD, \AADLEN, %xmm0, %xmm1, %xmm2, %xmm3, \
+   %xmm4, %xmm5, %xmm6
 .endm
 
 # GCM_ENC_DEC Encodes/Decodes given data. Assumes that the passed gcm_context
@@ -435,7 +435,7 @@ _multiple_of_16_bytes_\@:
 # GCM_COMPLETE Finishes update of tag of last partial block
 # Output: Authorization Tag (AUTH_TAG)
 # Clobbers rax, r10-r12, and xmm0, xmm1, xmm5-xmm15
-.macro GCM_COMPLETE
+.macro GCM_COMPLETE AUTHTAG AUTHTAGLEN
movdqu AadHash(%arg2), %xmm8
movdqu HashKey(%arg2), %xmm13
 
@@ -466,8 +466,8 @@ _partial_done\@:
ENCRYPT_SINGLE_BLOCK%xmm0,  %xmm1 # E(K, Y0)
pxor%xmm8, %xmm0
 _return_T_\@:
-   mov arg10, %r10 # %r10 = authTag
-   mov arg11, %r11# %r11 = auth_tag_len
+   mov \AUTHTAG, %r10 # %r10 = authTag
+   mov \AUTHTAGLEN, %r11# %r11 = auth_tag_len
cmp $16, %r11
je  _T_16_\@
cmp $8, %r11
@@ -599,11 +599,11 @@ _done_read_partial_block_\@:
 
 # CALC_AAD_HASH: Calculates the hash of the data which will not be encrypted.
 # clobbers r10-11, xmm14
-.macro CALC_AAD_HASH HASHKEY TMP1 TMP2 TMP3 TMP4 TMP5 \
+.macro CALC_AAD_HASH HASHKEY AAD AADLEN TMP1 TMP2 TMP3 TMP4 TMP5 \
TMP6 TMP7
MOVADQ SHUF_MASK(%rip), %xmm14
-   movarg8, %r10   # %r10 = AAD
-   movarg9, %r11   # %r11 = aadLen
+   mov\AAD, %r10   # %r10 = AAD
+   mov\AADLEN, %r11# %r11 = aadLen
pxor   \TMP7, \TMP7
pxor   \TMP6, \TMP6
 
@@ -1103,18 +1103,18 @@ TMP6 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 
operation
mov   keysize,%eax
shr   $2,%eax   # 128->4, 192->6, 256->8
sub   $4,%eax   # 128->0, 192->2, 256->4
-   jzaes_loop_par_enc_done
+   jzaes_loop_par_enc_done\@
 
-aes_loop_par_enc:
+aes_loop_par_enc\@:
MOVADQ(%r10),\TMP3
 .irpc  index, 1234
AESENC\TMP3, %xmm\index
 .endr
add   $16,%r10
sub   $1,%eax
-   jnz   aes_loop_par_enc
+   jnz   aes_loop_par_enc\@
 
-aes_loop_par_enc_done:
+aes_loop_par_enc_done\@:
MOVADQ(%r10), \TMP3
AESENCLAST \TMP3, \XMM1   # Round 10
AESENCLAST \TMP3, \XMM2
@@ -1311,18 +1311,18 @@ TMP6 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6 XMM7 XMM8 
operation
mov   keysize,%eax
shr   $2,%eax   # 128->4, 192->6, 256->8
sub   $4,%eax   # 128->0, 192->2, 256->4
- 

[PATCH 05/14] x86/crypto: aesni: Merge encode and decode to GCM_ENC_DEC macro

2018-02-12 Thread Dave Watson
Make a macro for the main encode/decode routine.  Only a small handful
of lines differ for enc and dec.   This will also become the main
scatter/gather update routine.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 293 +++---
 1 file changed, 114 insertions(+), 179 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 529c542..8021fd1 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -222,6 +222,118 @@ ALL_F:  .octa 0x
mov %r13, %r12
 .endm
 
+# GCM_ENC_DEC Encodes/Decodes given data. Assumes that the passed gcm_context
+# struct has been initialized by GCM_INIT.
+# Requires the input data be at least 1 byte long because of READ_PARTIAL_BLOCK
+# Clobbers rax, r10-r13, and xmm0-xmm15
+.macro GCM_ENC_DEC operation
+   # Encrypt/Decrypt first few blocks
+
+   and $(3<<4), %r12
+   jz  _initial_num_blocks_is_0_\@
+   cmp $(2<<4), %r12
+   jb  _initial_num_blocks_is_1_\@
+   je  _initial_num_blocks_is_2_\@
+_initial_num_blocks_is_3_\@:
+   INITIAL_BLOCKS_ENC_DEC  %xmm9, %xmm10, %xmm13, %xmm11, %xmm12, %xmm0, \
+%xmm1, %xmm2, %xmm3, %xmm4, %xmm8, %xmm5, %xmm6, 5, 678, \operation
+   sub $48, %r13
+   jmp _initial_blocks_\@
+_initial_num_blocks_is_2_\@:
+   INITIAL_BLOCKS_ENC_DEC  %xmm9, %xmm10, %xmm13, %xmm11, %xmm12, %xmm0, \
+%xmm1, %xmm2, %xmm3, %xmm4, %xmm8, %xmm5, %xmm6, 6, 78, \operation
+   sub $32, %r13
+   jmp _initial_blocks_\@
+_initial_num_blocks_is_1_\@:
+   INITIAL_BLOCKS_ENC_DEC  %xmm9, %xmm10, %xmm13, %xmm11, %xmm12, %xmm0, \
+%xmm1, %xmm2, %xmm3, %xmm4, %xmm8, %xmm5, %xmm6, 7, 8, \operation
+   sub $16, %r13
+   jmp _initial_blocks_\@
+_initial_num_blocks_is_0_\@:
+   INITIAL_BLOCKS_ENC_DEC  %xmm9, %xmm10, %xmm13, %xmm11, %xmm12, %xmm0, \
+%xmm1, %xmm2, %xmm3, %xmm4, %xmm8, %xmm5, %xmm6, 8, 0, \operation
+_initial_blocks_\@:
+
+   # Main loop - Encrypt/Decrypt remaining blocks
+
+   cmp $0, %r13
+   je  _zero_cipher_left_\@
+   sub $64, %r13
+   je  _four_cipher_left_\@
+_crypt_by_4_\@:
+   GHASH_4_ENCRYPT_4_PARALLEL_\operation   %xmm9, %xmm10, %xmm11, %xmm12, \
+   %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, \
+   %xmm7, %xmm8, enc
+   add $64, %r11
+   sub $64, %r13
+   jne _crypt_by_4_\@
+_four_cipher_left_\@:
+   GHASH_LAST_4%xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, \
+%xmm15, %xmm1, %xmm2, %xmm3, %xmm4, %xmm8
+_zero_cipher_left_\@:
+   mov %arg4, %r13
+   and $15, %r13   # %r13 = arg4 (mod 16)
+   je  _multiple_of_16_bytes_\@
+
+   # Handle the last <16 Byte block separately
+   paddd ONE(%rip), %xmm0# INCR CNT to get Yn
+movdqa SHUF_MASK(%rip), %xmm10
+   PSHUFB_XMM %xmm10, %xmm0
+
+   ENCRYPT_SINGLE_BLOCK%xmm0, %xmm1# Encrypt(K, Yn)
+
+   lea (%arg3,%r11,1), %r10
+   mov %r13, %r12
+   READ_PARTIAL_BLOCK %r10 %r12 %xmm2 %xmm1
+
+   lea ALL_F+16(%rip), %r12
+   sub %r13, %r12
+.ifc \operation, dec
+   movdqa  %xmm1, %xmm2
+.endif
+   pxor%xmm1, %xmm0# XOR Encrypt(K, Yn)
+   movdqu  (%r12), %xmm1
+   # get the appropriate mask to mask out top 16-r13 bytes of xmm0
+   pand%xmm1, %xmm0# mask out top 16-r13 bytes of xmm0
+.ifc \operation, dec
+   pand%xmm1, %xmm2
+   movdqa SHUF_MASK(%rip), %xmm10
+   PSHUFB_XMM %xmm10 ,%xmm2
+
+   pxor %xmm2, %xmm8
+.else
+   movdqa SHUF_MASK(%rip), %xmm10
+   PSHUFB_XMM %xmm10,%xmm0
+
+   pxor%xmm0, %xmm8
+.endif
+
+   GHASH_MUL %xmm8, %xmm13, %xmm9, %xmm10, %xmm11, %xmm5, %xmm6
+.ifc \operation, enc
+   # GHASH computation for the last <16 byte block
+   movdqa SHUF_MASK(%rip), %xmm10
+   # shuffle xmm0 back to output as ciphertext
+   PSHUFB_XMM %xmm10, %xmm0
+.endif
+
+   # Output %r13 bytes
+   MOVQ_R64_XMM %xmm0, %rax
+   cmp $8, %r13
+   jle _less_than_8_bytes_left_\@
+   mov %rax, (%arg2 , %r11, 1)
+   add $8, %r11
+   psrldq $8, %xmm0
+   MOVQ_R64_XMM %xmm0, %rax
+   sub $8, %r13
+_less_than_8_bytes_left_\@:
+   mov %al,  (%arg2, %r11, 1)
+   add $1, %r11
+   shr $8, %rax
+   sub $1, %r13
+   jne _less_than_8_bytes_left_\@
+_multiple_of_16_bytes_\@:
+.endm
+
 # GCM_COMPLETE Finishes update of tag of last partial block
 # Output: Authorization Tag (AUTH_TAG)
 # Clobbers rax, r10-r12, and xmm0, xmm1, xmm5-xmm15
@@ -1245,93 +1357,7 @@ ENTRY(aesni_gcm_dec)
FUNC_SAVE
 
GCM_INIT
-
-# Decrypt first few blocks
-
-   and $(3<<4), %r12
-   jz _initial_num_blocks_is_0_decrypt
-   cmp $(2<<4), %r12
-   jb 

[PATCH 12/14] x86/crypto: aesni: Add fast path for > 16 byte update

2018-02-12 Thread Dave Watson
We can fast-path any < 16 byte read if the full message is > 16 bytes,
and shift over by the appropriate amount.  Usually we are
reading > 16 bytes, so this should be faster than the READ_PARTIAL
macro introduced in b20209c91e2 for the average case.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 25 +
 1 file changed, 25 insertions(+)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 398bd2237f..b941952 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -355,12 +355,37 @@ _zero_cipher_left_\@:
ENCRYPT_SINGLE_BLOCK%xmm0, %xmm1# Encrypt(K, Yn)
movdqu %xmm0, PBlockEncKey(%arg2)
 
+   cmp $16, %arg5
+   jge _large_enough_update_\@
+
lea (%arg4,%r11,1), %r10
mov %r13, %r12
READ_PARTIAL_BLOCK %r10 %r12 %xmm2 %xmm1
+   jmp _data_read_\@
+
+_large_enough_update_\@:
+   sub $16, %r11
+   add %r13, %r11
+
+   # receive the last <16 Byte block
+   movdqu  (%arg4, %r11, 1), %xmm1
 
+   sub %r13, %r11
+   add $16, %r11
+
+   lea SHIFT_MASK+16(%rip), %r12
+   # adjust the shuffle mask pointer to be able to shift 16-r13 bytes
+   # (r13 is the number of bytes in plaintext mod 16)
+   sub %r13, %r12
+   # get the appropriate shuffle mask
+   movdqu  (%r12), %xmm2
+   # shift right 16-r13 bytes
+   PSHUFB_XMM  %xmm2, %xmm1
+
+_data_read_\@:
lea ALL_F+16(%rip), %r12
sub %r13, %r12
+
 .ifc \operation, dec
movdqa  %xmm1, %xmm2
 .endif
-- 
2.9.5



[PATCH 07/14] x86/crypto: aesni: Split AAD hash calculation to separate macro

2018-02-12 Thread Dave Watson
AAD hash only needs to be calculated once for each scatter/gather operation.
Move it to its own macro, and call it from GCM_INIT instead of
INITIAL_BLOCKS.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 71 ---
 1 file changed, 43 insertions(+), 28 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 6c5a80d..58bbfac 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -229,6 +229,10 @@ ALL_F:  .octa 0x
mov %arg5, %r13 # %xmm13 holds HashKey<<1 (mod poly)
and $-16, %r13
mov %r13, %r12
+
+   CALC_AAD_HASH %xmm13 %xmm0 %xmm1 %xmm2 %xmm3 %xmm4 \
+   %xmm5 %xmm6
+   mov %r13, %r12
 .endm
 
 # GCM_ENC_DEC Encodes/Decodes given data. Assumes that the passed gcm_context
@@ -496,51 +500,62 @@ _read_next_byte_lt8_\@:
 _done_read_partial_block_\@:
 .endm
 
-/*
-* if a = number of total plaintext bytes
-* b = floor(a/16)
-* num_initial_blocks = b mod 4
-* encrypt the initial num_initial_blocks blocks and apply ghash on
-* the ciphertext
-* %r10, %r11, %r12, %rax, %xmm5, %xmm6, %xmm7, %xmm8, %xmm9 registers
-* are clobbered
-* arg1, %arg3, %arg4, %r14 are used as a pointer only, not modified
-*/
-
-
-.macro INITIAL_BLOCKS_ENC_DEC TMP1 TMP2 TMP3 TMP4 TMP5 XMM0 XMM1 \
-XMM2 XMM3 XMM4 XMMDst TMP6 TMP7 i i_seq operation
-MOVADQ SHUF_MASK(%rip), %xmm14
-   movarg8, %r10   # %r10 = AAD
-   movarg9, %r11   # %r11 = aadLen
-   pxor   %xmm\i, %xmm\i
-   pxor   \XMM2, \XMM2
+# CALC_AAD_HASH: Calculates the hash of the data which will not be encrypted.
+# clobbers r10-11, xmm14
+.macro CALC_AAD_HASH HASHKEY TMP1 TMP2 TMP3 TMP4 TMP5 \
+   TMP6 TMP7
+   MOVADQ SHUF_MASK(%rip), %xmm14
+   movarg8, %r10   # %r10 = AAD
+   movarg9, %r11   # %r11 = aadLen
+   pxor   \TMP7, \TMP7
+   pxor   \TMP6, \TMP6
 
cmp$16, %r11
jl _get_AAD_rest\@
 _get_AAD_blocks\@:
-   movdqu (%r10), %xmm\i
-   PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
-   pxor   %xmm\i, \XMM2
-   GHASH_MUL  \XMM2, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
+   movdqu (%r10), \TMP7
+   PSHUFB_XMM   %xmm14, \TMP7 # byte-reflect the AAD data
+   pxor   \TMP7, \TMP6
+   GHASH_MUL  \TMP6, \HASHKEY, \TMP1, \TMP2, \TMP3, \TMP4, \TMP5
add$16, %r10
sub$16, %r11
cmp$16, %r11
jge_get_AAD_blocks\@
 
-   movdqu \XMM2, %xmm\i
+   movdqu \TMP6, \TMP7
 
/* read the last <16B of AAD */
 _get_AAD_rest\@:
cmp$0, %r11
je _get_AAD_done\@
 
-   READ_PARTIAL_BLOCK %r10, %r11, \TMP1, %xmm\i
-   PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
-   pxor   \XMM2, %xmm\i
-   GHASH_MUL  %xmm\i, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
+   READ_PARTIAL_BLOCK %r10, %r11, \TMP1, \TMP7
+   PSHUFB_XMM   %xmm14, \TMP7 # byte-reflect the AAD data
+   pxor   \TMP6, \TMP7
+   GHASH_MUL  \TMP7, \HASHKEY, \TMP1, \TMP2, \TMP3, \TMP4, \TMP5
+   movdqu \TMP7, \TMP6
 
 _get_AAD_done\@:
+   movdqu \TMP6, AadHash(%arg2)
+.endm
+
+/*
+* if a = number of total plaintext bytes
+* b = floor(a/16)
+* num_initial_blocks = b mod 4
+* encrypt the initial num_initial_blocks blocks and apply ghash on
+* the ciphertext
+* %r10, %r11, %r12, %rax, %xmm5, %xmm6, %xmm7, %xmm8, %xmm9 registers
+* are clobbered
+* arg1, %arg2, %arg3, %r14 are used as a pointer only, not modified
+*/
+
+
+.macro INITIAL_BLOCKS_ENC_DEC TMP1 TMP2 TMP3 TMP4 TMP5 XMM0 XMM1 \
+   XMM2 XMM3 XMM4 XMMDst TMP6 TMP7 i i_seq operation
+
+   movdqu AadHash(%arg2), %xmm\i   # XMM0 = Y0
+
xor%r11, %r11 # initialise the data pointer offset as zero
# start AES for num_initial_blocks blocks
 
-- 
2.9.5



[PATCH 08/14] x86/crypto: aesni: Fill in new context data structures

2018-02-12 Thread Dave Watson
Fill in aadhash, aadlen, pblocklen, curcount with appropriate values.
pblocklen, aadhash, and pblockenckey are also updated at the end
of each scatter/gather operation, to be carried over to the next
operation.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 51 ++-
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 58bbfac..aa82493 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -204,6 +204,21 @@ ALL_F:  .octa 0x
 # GCM_INIT initializes a gcm_context struct to prepare for encoding/decoding.
 # Clobbers rax, r10-r13 and xmm0-xmm6, %xmm13
 .macro GCM_INIT
+
+   mov arg9, %r11
+   mov %r11, AadLen(%arg2) # ctx_data.aad_length = aad_length
+   xor %r11, %r11
+   mov %r11, InLen(%arg2) # ctx_data.in_length = 0
+   mov %r11, PBlockLen(%arg2) # ctx_data.partial_block_length = 0
+   mov %r11, PBlockEncKey(%arg2) # ctx_data.partial_block_enc_key = 0
+   mov %arg6, %rax
+   movdqu (%rax), %xmm0
+   movdqu %xmm0, OrigIV(%arg2) # ctx_data.orig_IV = iv
+
+   movdqa  SHUF_MASK(%rip), %xmm2
+   PSHUFB_XMM %xmm2, %xmm0
+   movdqu %xmm0, CurCount(%arg2) # ctx_data.current_counter = iv
+
mov arg7, %r12
movdqu  (%r12), %xmm13
movdqa  SHUF_MASK(%rip), %xmm2
@@ -226,13 +241,9 @@ ALL_F:  .octa 0x
pandPOLY(%rip), %xmm2
pxor%xmm2, %xmm13
movdqa  %xmm13, HashKey(%rsp)
-   mov %arg5, %r13 # %xmm13 holds HashKey<<1 (mod poly)
-   and $-16, %r13
-   mov %r13, %r12
 
CALC_AAD_HASH %xmm13 %xmm0 %xmm1 %xmm2 %xmm3 %xmm4 \
%xmm5 %xmm6
-   mov %r13, %r12
 .endm
 
 # GCM_ENC_DEC Encodes/Decodes given data. Assumes that the passed gcm_context
@@ -240,6 +251,12 @@ ALL_F:  .octa 0x
 # Requires the input data be at least 1 byte long because of READ_PARTIAL_BLOCK
 # Clobbers rax, r10-r13, and xmm0-xmm15
 .macro GCM_ENC_DEC operation
+   movdqu AadHash(%arg2), %xmm8
+   movdqu HashKey(%rsp), %xmm13
+   add %arg5, InLen(%arg2)
+   mov %arg5, %r13 # save the number of bytes
+   and $-16, %r13  # %r13 = %r13 - (%r13 mod 16)
+   mov %r13, %r12
# Encrypt/Decrypt first few blocks
 
and $(3<<4), %r12
@@ -284,16 +301,23 @@ _four_cipher_left_\@:
GHASH_LAST_4%xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, \
 %xmm15, %xmm1, %xmm2, %xmm3, %xmm4, %xmm8
 _zero_cipher_left_\@:
+   movdqu %xmm8, AadHash(%arg2)
+   movdqu %xmm0, CurCount(%arg2)
+
mov %arg5, %r13
and $15, %r13   # %r13 = arg5 (mod 16)
je  _multiple_of_16_bytes_\@
 
+   mov %r13, PBlockLen(%arg2)
+
# Handle the last <16 Byte block separately
paddd ONE(%rip), %xmm0# INCR CNT to get Yn
+   movdqu %xmm0, CurCount(%arg2)
movdqa SHUF_MASK(%rip), %xmm10
PSHUFB_XMM %xmm10, %xmm0
 
ENCRYPT_SINGLE_BLOCK%xmm0, %xmm1# Encrypt(K, Yn)
+   movdqu %xmm0, PBlockEncKey(%arg2)
 
lea (%arg4,%r11,1), %r10
mov %r13, %r12
@@ -322,6 +346,7 @@ _zero_cipher_left_\@:
 .endif
 
GHASH_MUL %xmm8, %xmm13, %xmm9, %xmm10, %xmm11, %xmm5, %xmm6
+   movdqu %xmm8, AadHash(%arg2)
 .ifc \operation, enc
# GHASH computation for the last <16 byte block
movdqa SHUF_MASK(%rip), %xmm10
@@ -351,11 +376,15 @@ _multiple_of_16_bytes_\@:
 # Output: Authorization Tag (AUTH_TAG)
 # Clobbers rax, r10-r12, and xmm0, xmm1, xmm5-xmm15
 .macro GCM_COMPLETE
-   mov arg9, %r12# %r13 = aadLen (number of bytes)
+   movdqu AadHash(%arg2), %xmm8
+   movdqu HashKey(%rsp), %xmm13
+   mov AadLen(%arg2), %r12  # %r13 = aadLen (number of bytes)
shl $3, %r12  # convert into number of bits
movd%r12d, %xmm15 # len(A) in %xmm15
-   shl $3, %arg5 # len(C) in bits (*128)
-   MOVQ_R64_XMM%arg5, %xmm1
+   mov InLen(%arg2), %r12
+   shl $3, %r12  # len(C) in bits (*128)
+   MOVQ_R64_XMM%r12, %xmm1
+
pslldq  $8, %xmm15# %xmm15 = len(A)||0x
pxor%xmm1, %xmm15 # %xmm15 = len(A)||len(C)
pxor%xmm15, %xmm8
@@ -364,8 +393,7 @@ _multiple_of_16_bytes_\@:
movdqa SHUF_MASK(%rip), %xmm10
PSHUFB_XMM %xmm10, %xmm8
 
-   mov %arg6, %rax   # %rax = *Y0
-   movdqu  (%rax), %xmm0 # %xmm0 = Y0
+   movdqu OrigIV(%arg2), %xmm0   # %xmm0 = Y0
ENCRYPT_SINGLE_BLOCK%xmm0,  %xmm1 # E(K, Y0)
pxor%xmm8, %xmm0
 _return_T_\@:
@@ -553,15 +581,14 @@ 

[PATCH 09/14] x86/crypto: aesni: Move ghash_mul to GCM_COMPLETE

2018-02-12 Thread Dave Watson
Prepare to handle partial blocks between scatter/gather calls.
For the last partial block, we only want to calculate the aadhash
in GCM_COMPLETE, and a new partial block macro will handle both
aadhash update and encrypting partial blocks between calls.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index aa82493..37b1cee 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -345,7 +345,6 @@ _zero_cipher_left_\@:
pxor%xmm0, %xmm8
 .endif
 
-   GHASH_MUL %xmm8, %xmm13, %xmm9, %xmm10, %xmm11, %xmm5, %xmm6
movdqu %xmm8, AadHash(%arg2)
 .ifc \operation, enc
# GHASH computation for the last <16 byte block
@@ -378,6 +377,15 @@ _multiple_of_16_bytes_\@:
 .macro GCM_COMPLETE
movdqu AadHash(%arg2), %xmm8
movdqu HashKey(%rsp), %xmm13
+
+   mov PBlockLen(%arg2), %r12
+
+   cmp $0, %r12
+   je _partial_done\@
+
+   GHASH_MUL %xmm8, %xmm13, %xmm9, %xmm10, %xmm11, %xmm5, %xmm6
+
+_partial_done\@:
mov AadLen(%arg2), %r12  # %r13 = aadLen (number of bytes)
shl $3, %r12  # convert into number of bits
movd%r12d, %xmm15 # len(A) in %xmm15
-- 
2.9.5



[PATCH 00/14] x86/crypto gcmaes SSE scatter/gather support

2018-02-12 Thread Dave Watson
This patch set refactors the x86 aes/gcm SSE crypto routines to
support true scatter/gather by adding gcm_enc/dec_update methods.

The layout is:

* First 5 patches refactor the code to use macros, so changes only
  need to be applied once for encode and decode.  There should be no
  functional changes.

* The next 6 patches introduce a gcm_context structure to be passed
  between scatter/gather calls to maintain state.  The struct is also
  used as scratch space for the existing enc/dec routines.

* The last 2 set up the asm function entry points for scatter gather
  support, and then call the new routines per buffer in the passed in
  sglist in aesni-intel_glue.

Testing: 
asm itself fuzz tested vs. existing code and isa-l asm.
Ran libkcapi test suite, passes.
Passes my TLS tests.
IPSec or testing of other aesni users would be appreciated.

perf of a large (16k messages) TLS sends sg vs. no sg:

no-sg

33287255597  cycles  
53702871176  instructions

43.47%   _crypt_by_4
17.83%   memcpy
16.36%   aes_loop_par_enc_done

sg

27568944591  cycles 
54580446678  instructions

49.87%   _crypt_by_4
17.40%   aes_loop_par_enc_done
1.79%aes_loop_initial_5416
1.52%aes_loop_initial_4974
1.27%gcmaes_encrypt_sg.constprop.15


Dave Watson (14):
  x86/crypto: aesni: Merge INITIAL_BLOCKS_ENC/DEC
  x86/crypto: aesni: Macro-ify func save/restore
  x86/crypto: aesni: Add GCM_INIT macro
  x86/crypto: aesni: Add GCM_COMPLETE macro
  x86/crypto: aesni: Merge encode and decode to GCM_ENC_DEC macro
  x86/crypto: aesni: Introduce gcm_context_data
  x86/crypto: aesni: Split AAD hash calculation to separate macro
  x86/crypto: aesni: Fill in new context data structures
  x86/crypto: aesni: Move ghash_mul to GCM_COMPLETE
  x86/crypto: aesni: Move HashKey computation from stack to gcm_context
  x86/crypto: aesni: Introduce partial block macro
  x86/crypto: aesni: Add fast path for > 16 byte update
  x86/crypto: aesni: Introduce scatter/gather asm function stubs
  x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather

 arch/x86/crypto/aesni-intel_asm.S  | 1414 ++--
 arch/x86/crypto/aesni-intel_glue.c |  263 ++-
 2 files changed, 932 insertions(+), 745 deletions(-)

-- 
2.9.5



[PATCH 03/14] x86/crypto: aesni: Add GCM_INIT macro

2018-02-12 Thread Dave Watson
Reduce code duplication by introducting GCM_INIT macro.  This macro
will also be exposed as a function for implementing scatter/gather
support, since INIT only needs to be called once for the full
operation.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 84 +++
 1 file changed, 33 insertions(+), 51 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 39b42b1..b9fe2ab 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -191,6 +191,37 @@ ALL_F:  .octa 0x
pop %r12
 .endm
 
+
+# GCM_INIT initializes a gcm_context struct to prepare for encoding/decoding.
+# Clobbers rax, r10-r13 and xmm0-xmm6, %xmm13
+.macro GCM_INIT
+   mov %arg6, %r12
+   movdqu  (%r12), %xmm13
+   movdqa  SHUF_MASK(%rip), %xmm2
+   PSHUFB_XMM %xmm2, %xmm13
+
+   # precompute HashKey<<1 mod poly from the HashKey (required for GHASH)
+
+   movdqa  %xmm13, %xmm2
+   psllq   $1, %xmm13
+   psrlq   $63, %xmm2
+   movdqa  %xmm2, %xmm1
+   pslldq  $8, %xmm2
+   psrldq  $8, %xmm1
+   por %xmm2, %xmm13
+
+   # reduce HashKey<<1
+
+   pshufd  $0x24, %xmm1, %xmm2
+   pcmpeqd TWOONE(%rip), %xmm2
+   pandPOLY(%rip), %xmm2
+   pxor%xmm2, %xmm13
+   movdqa  %xmm13, HashKey(%rsp)
+   mov %arg4, %r13 # %xmm13 holds HashKey<<1 (mod 
poly)
+   and $-16, %r13
+   mov %r13, %r12
+.endm
+
 #ifdef __x86_64__
 /* GHASH_MUL MACRO to implement: Data*HashKey mod (128,127,126,121,0)
 *
@@ -1151,36 +1182,11 @@ _esb_loop_\@:
 */
 ENTRY(aesni_gcm_dec)
FUNC_SAVE
-   mov %arg6, %r12
-   movdqu  (%r12), %xmm13# %xmm13 = HashKey
-movdqa  SHUF_MASK(%rip), %xmm2
-   PSHUFB_XMM %xmm2, %xmm13
-
-
-# Precompute HashKey<<1 (mod poly) from the hash key (required for GHASH)
-
-   movdqa  %xmm13, %xmm2
-   psllq   $1, %xmm13
-   psrlq   $63, %xmm2
-   movdqa  %xmm2, %xmm1
-   pslldq  $8, %xmm2
-   psrldq  $8, %xmm1
-   por %xmm2, %xmm13
-
-# Reduction
-
-   pshufd  $0x24, %xmm1, %xmm2
-   pcmpeqd TWOONE(%rip), %xmm2
-   pandPOLY(%rip), %xmm2
-   pxor%xmm2, %xmm13 # %xmm13 holds the HashKey<<1 (mod poly)
 
+   GCM_INIT
 
 # Decrypt first few blocks
 
-   movdqa %xmm13, HashKey(%rsp)   # store HashKey<<1 (mod poly)
-   mov %arg4, %r13# save the number of bytes of plaintext/ciphertext
-   and $-16, %r13  # %r13 = %r13 - (%r13 mod 16)
-   mov %r13, %r12
and $(3<<4), %r12
jz _initial_num_blocks_is_0_decrypt
cmp $(2<<4), %r12
@@ -1402,32 +1408,8 @@ ENDPROC(aesni_gcm_dec)
 ***/
 ENTRY(aesni_gcm_enc)
FUNC_SAVE
-   mov %arg6, %r12
-   movdqu  (%r12), %xmm13
-movdqa  SHUF_MASK(%rip), %xmm2
-   PSHUFB_XMM %xmm2, %xmm13
-
-# precompute HashKey<<1 mod poly from the HashKey (required for GHASH)
-
-   movdqa  %xmm13, %xmm2
-   psllq   $1, %xmm13
-   psrlq   $63, %xmm2
-   movdqa  %xmm2, %xmm1
-   pslldq  $8, %xmm2
-   psrldq  $8, %xmm1
-   por %xmm2, %xmm13
-
-# reduce HashKey<<1
-
-   pshufd  $0x24, %xmm1, %xmm2
-   pcmpeqd TWOONE(%rip), %xmm2
-   pandPOLY(%rip), %xmm2
-   pxor%xmm2, %xmm13
-   movdqa  %xmm13, HashKey(%rsp)
-   mov %arg4, %r13# %xmm13 holds HashKey<<1 (mod poly)
-   and $-16, %r13
-   mov %r13, %r12
 
+   GCM_INIT
 # Encrypt first few blocks
 
and $(3<<4), %r12
-- 
2.9.5



[PATCH 04/14] x86/crypto: aesni: Add GCM_COMPLETE macro

2018-02-12 Thread Dave Watson
Merge encode and decode tag calculations in GCM_COMPLETE macro.
Scatter/gather routines will call this once at the end of encryption
or decryption.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 172 ++
 1 file changed, 63 insertions(+), 109 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index b9fe2ab..529c542 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -222,6 +222,67 @@ ALL_F:  .octa 0x
mov %r13, %r12
 .endm
 
+# GCM_COMPLETE Finishes update of tag of last partial block
+# Output: Authorization Tag (AUTH_TAG)
+# Clobbers rax, r10-r12, and xmm0, xmm1, xmm5-xmm15
+.macro GCM_COMPLETE
+   mov arg8, %r12# %r13 = aadLen (number of bytes)
+   shl $3, %r12  # convert into number of bits
+   movd%r12d, %xmm15 # len(A) in %xmm15
+   shl $3, %arg4 # len(C) in bits (*128)
+   MOVQ_R64_XMM%arg4, %xmm1
+   pslldq  $8, %xmm15# %xmm15 = len(A)||0x
+   pxor%xmm1, %xmm15 # %xmm15 = len(A)||len(C)
+   pxor%xmm15, %xmm8
+   GHASH_MUL   %xmm8, %xmm13, %xmm9, %xmm10, %xmm11, %xmm5, %xmm6
+   # final GHASH computation
+   movdqa SHUF_MASK(%rip), %xmm10
+   PSHUFB_XMM %xmm10, %xmm8
+
+   mov %arg5, %rax   # %rax = *Y0
+   movdqu  (%rax), %xmm0 # %xmm0 = Y0
+   ENCRYPT_SINGLE_BLOCK%xmm0,  %xmm1 # E(K, Y0)
+   pxor%xmm8, %xmm0
+_return_T_\@:
+   mov arg9, %r10 # %r10 = authTag
+   mov arg10, %r11# %r11 = auth_tag_len
+   cmp $16, %r11
+   je  _T_16_\@
+   cmp $8, %r11
+   jl  _T_4_\@
+_T_8_\@:
+   MOVQ_R64_XMM%xmm0, %rax
+   mov %rax, (%r10)
+   add $8, %r10
+   sub $8, %r11
+   psrldq  $8, %xmm0
+   cmp $0, %r11
+   je  _return_T_done_\@
+_T_4_\@:
+   movd%xmm0, %eax
+   mov %eax, (%r10)
+   add $4, %r10
+   sub $4, %r11
+   psrldq  $4, %xmm0
+   cmp $0, %r11
+   je  _return_T_done_\@
+_T_123_\@:
+   movd%xmm0, %eax
+   cmp $2, %r11
+   jl  _T_1_\@
+   mov %ax, (%r10)
+   cmp $2, %r11
+   je  _return_T_done_\@
+   add $2, %r10
+   sar $16, %eax
+_T_1_\@:
+   mov %al, (%r10)
+   jmp _return_T_done_\@
+_T_16_\@:
+   movdqu  %xmm0, (%r10)
+_return_T_done_\@:
+.endm
+
 #ifdef __x86_64__
 /* GHASH_MUL MACRO to implement: Data*HashKey mod (128,127,126,121,0)
 *
@@ -1271,61 +1332,7 @@ _less_than_8_bytes_left_decrypt:
sub $1, %r13
jne _less_than_8_bytes_left_decrypt
 _multiple_of_16_bytes_decrypt:
-   mov arg8, %r12# %r13 = aadLen (number of bytes)
-   shl $3, %r12  # convert into number of bits
-   movd%r12d, %xmm15 # len(A) in %xmm15
-   shl $3, %arg4 # len(C) in bits (*128)
-   MOVQ_R64_XMM%arg4, %xmm1
-   pslldq  $8, %xmm15# %xmm15 = len(A)||0x
-   pxor%xmm1, %xmm15 # %xmm15 = len(A)||len(C)
-   pxor%xmm15, %xmm8
-   GHASH_MUL   %xmm8, %xmm13, %xmm9, %xmm10, %xmm11, %xmm5, %xmm6
-# final GHASH computation
-movdqa SHUF_MASK(%rip), %xmm10
-   PSHUFB_XMM %xmm10, %xmm8
-
-   mov %arg5, %rax   # %rax = *Y0
-   movdqu  (%rax), %xmm0 # %xmm0 = Y0
-   ENCRYPT_SINGLE_BLOCK%xmm0,  %xmm1 # E(K, Y0)
-   pxor%xmm8, %xmm0
-_return_T_decrypt:
-   mov arg9, %r10# %r10 = authTag
-   mov arg10, %r11   # %r11 = auth_tag_len
-   cmp $16, %r11
-   je  _T_16_decrypt
-   cmp $8, %r11
-   jl  _T_4_decrypt
-_T_8_decrypt:
-   MOVQ_R64_XMM%xmm0, %rax
-   mov %rax, (%r10)
-   add $8, %r10
-   sub $8, %r11
-   psrldq  $8, %xmm0
-   cmp $0, %r11
-   je  _return_T_done_decrypt
-_T_4_decrypt:
-   movd%xmm0, %eax
-   mov %eax, (%r10)
-   add $4, %r10
-   sub $4, %r11
-   psrldq  $4, %xmm0
-   cmp $0, %r11
-   je  _return_T_done_decrypt
-_T_123_decrypt:
-   movd%xmm0, %eax
-   cmp $2, %r11
-   jl  _T_1_decrypt
-   mov %ax, (%r10)
-   cmp $2, %r11
-   je  _return_T_done_decrypt
-   add $2, %r10
-   sar $16, %eax
-_T_1_decrypt:
-   mov %al, (%r10)
-   jmp _return_T_done_decrypt
-_T_16_decrypt:
-   movdqu  %xmm0, (%r10)
-_return_T_done_decrypt:
+   GCM_COMPLETE
FUNC_RESTORE
ret
 

[PATCH 02/14] x86/crypto: aesni: Macro-ify func save/restore

2018-02-12 Thread Dave Watson
Macro-ify function save and restore.  These will be used in new functions
added for scatter/gather update operations.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 53 ++-
 1 file changed, 24 insertions(+), 29 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 48911fe..39b42b1 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -170,6 +170,26 @@ ALL_F:  .octa 0x
 #define TKEYP  T1
 #endif
 
+.macro FUNC_SAVE
+   push%r12
+   push%r13
+   push%r14
+   mov %rsp, %r14
+#
+# states of %xmm registers %xmm6:%xmm15 not saved
+# all %xmm registers are clobbered
+#
+   sub $VARIABLE_OFFSET, %rsp
+   and $~63, %rsp
+.endm
+
+
+.macro FUNC_RESTORE
+   mov %r14, %rsp
+   pop %r14
+   pop %r13
+   pop %r12
+.endm
 
 #ifdef __x86_64__
 /* GHASH_MUL MACRO to implement: Data*HashKey mod (128,127,126,121,0)
@@ -1130,16 +1150,7 @@ _esb_loop_\@:
 *
 */
 ENTRY(aesni_gcm_dec)
-   push%r12
-   push%r13
-   push%r14
-   mov %rsp, %r14
-/*
-* states of %xmm registers %xmm6:%xmm15 not saved
-* all %xmm registers are clobbered
-*/
-   sub $VARIABLE_OFFSET, %rsp
-   and $~63, %rsp# align rsp to 64 bytes
+   FUNC_SAVE
mov %arg6, %r12
movdqu  (%r12), %xmm13# %xmm13 = HashKey
 movdqa  SHUF_MASK(%rip), %xmm2
@@ -1309,10 +1320,7 @@ _T_1_decrypt:
 _T_16_decrypt:
movdqu  %xmm0, (%r10)
 _return_T_done_decrypt:
-   mov %r14, %rsp
-   pop %r14
-   pop %r13
-   pop %r12
+   FUNC_RESTORE
ret
 ENDPROC(aesni_gcm_dec)
 
@@ -1393,22 +1401,12 @@ ENDPROC(aesni_gcm_dec)
 * poly = x^128 + x^127 + x^126 + x^121 + 1
 ***/
 ENTRY(aesni_gcm_enc)
-   push%r12
-   push%r13
-   push%r14
-   mov %rsp, %r14
-#
-# states of %xmm registers %xmm6:%xmm15 not saved
-# all %xmm registers are clobbered
-#
-   sub $VARIABLE_OFFSET, %rsp
-   and $~63, %rsp
+   FUNC_SAVE
mov %arg6, %r12
movdqu  (%r12), %xmm13
 movdqa  SHUF_MASK(%rip), %xmm2
PSHUFB_XMM %xmm2, %xmm13
 
-
 # precompute HashKey<<1 mod poly from the HashKey (required for GHASH)
 
movdqa  %xmm13, %xmm2
@@ -1576,10 +1574,7 @@ _T_1_encrypt:
 _T_16_encrypt:
movdqu  %xmm0, (%r10)
 _return_T_done_encrypt:
-   mov %r14, %rsp
-   pop %r14
-   pop %r13
-   pop %r12
+   FUNC_RESTORE
ret
 ENDPROC(aesni_gcm_enc)
 
-- 
2.9.5



[PATCH 01/14] x86/crypto: aesni: Merge INITIAL_BLOCKS_ENC/DEC

2018-02-12 Thread Dave Watson
Use macro operations to merge implemetations of INITIAL_BLOCKS,
since they differ by only a small handful of lines.

Use macro counter \@ to simplify implementation.

Signed-off-by: Dave Watson 
---
 arch/x86/crypto/aesni-intel_asm.S | 298 ++
 1 file changed, 48 insertions(+), 250 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 76d8cd4..48911fe 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -275,234 +275,7 @@ _done_read_partial_block_\@:
 */
 
 
-.macro INITIAL_BLOCKS_DEC num_initial_blocks TMP1 TMP2 TMP3 TMP4 TMP5 XMM0 
XMM1 \
-XMM2 XMM3 XMM4 XMMDst TMP6 TMP7 i i_seq operation
-MOVADQ SHUF_MASK(%rip), %xmm14
-   movarg7, %r10   # %r10 = AAD
-   movarg8, %r11   # %r11 = aadLen
-   pxor   %xmm\i, %xmm\i
-   pxor   \XMM2, \XMM2
-
-   cmp$16, %r11
-   jl _get_AAD_rest\num_initial_blocks\operation
-_get_AAD_blocks\num_initial_blocks\operation:
-   movdqu (%r10), %xmm\i
-   PSHUFB_XMM %xmm14, %xmm\i # byte-reflect the AAD data
-   pxor   %xmm\i, \XMM2
-   GHASH_MUL  \XMM2, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-   add$16, %r10
-   sub$16, %r11
-   cmp$16, %r11
-   jge_get_AAD_blocks\num_initial_blocks\operation
-
-   movdqu \XMM2, %xmm\i
-
-   /* read the last <16B of AAD */
-_get_AAD_rest\num_initial_blocks\operation:
-   cmp$0, %r11
-   je _get_AAD_done\num_initial_blocks\operation
-
-   READ_PARTIAL_BLOCK %r10, %r11, \TMP1, %xmm\i
-   PSHUFB_XMM   %xmm14, %xmm\i # byte-reflect the AAD data
-   pxor   \XMM2, %xmm\i
-   GHASH_MUL  %xmm\i, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-
-_get_AAD_done\num_initial_blocks\operation:
-   xor%r11, %r11 # initialise the data pointer offset as zero
-   # start AES for num_initial_blocks blocks
-
-   mov%arg5, %rax  # %rax = *Y0
-   movdqu (%rax), \XMM0# XMM0 = Y0
-   PSHUFB_XMM   %xmm14, \XMM0
-
-.if (\i == 5) || (\i == 6) || (\i == 7)
-   MOVADQ  ONE(%RIP),\TMP1
-   MOVADQ  (%arg1),\TMP2
-.irpc index, \i_seq
-   paddd  \TMP1, \XMM0 # INCR Y0
-   movdqa \XMM0, %xmm\index
-   PSHUFB_XMM   %xmm14, %xmm\index  # perform a 16 byte swap
-   pxor   \TMP2, %xmm\index
-.endr
-   lea 0x10(%arg1),%r10
-   mov keysize,%eax
-   shr $2,%eax # 128->4, 192->6, 256->8
-   add $5,%eax   # 128->9, 192->11, 256->13
-
-aes_loop_initial_dec\num_initial_blocks:
-   MOVADQ  (%r10),\TMP1
-.irpc  index, \i_seq
-   AESENC  \TMP1, %xmm\index
-.endr
-   add $16,%r10
-   sub $1,%eax
-   jnz aes_loop_initial_dec\num_initial_blocks
-
-   MOVADQ  (%r10), \TMP1
-.irpc index, \i_seq
-   AESENCLAST \TMP1, %xmm\index # Last Round
-.endr
-.irpc index, \i_seq
-   movdqu (%arg3 , %r11, 1), \TMP1
-   pxor   \TMP1, %xmm\index
-   movdqu %xmm\index, (%arg2 , %r11, 1)
-   # write back plaintext/ciphertext for num_initial_blocks
-   add$16, %r11
-
-   movdqa \TMP1, %xmm\index
-   PSHUFB_XMM %xmm14, %xmm\index
-# prepare plaintext/ciphertext for GHASH computation
-.endr
-.endif
-
-# apply GHASH on num_initial_blocks blocks
-
-.if \i == 5
-pxor   %xmm5, %xmm6
-   GHASH_MUL  %xmm6, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-pxor   %xmm6, %xmm7
-   GHASH_MUL  %xmm7, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-pxor   %xmm7, %xmm8
-   GHASH_MUL  %xmm8, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-.elseif \i == 6
-pxor   %xmm6, %xmm7
-   GHASH_MUL  %xmm7, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-pxor   %xmm7, %xmm8
-   GHASH_MUL  %xmm8, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-.elseif \i == 7
-pxor   %xmm7, %xmm8
-   GHASH_MUL  %xmm8, \TMP3, \TMP1, \TMP2, \TMP4, \TMP5, \XMM1
-.endif
-   cmp$64, %r13
-   jl  _initial_blocks_done\num_initial_blocks\operation
-   # no need for precomputed values
-/*
-*
-* Precomputations for HashKey parallel with encryption of first 4 blocks.
-* Haskey_i_k holds XORed values of the low and high parts of the Haskey_i
-*/
-   MOVADQ ONE(%rip), \TMP1
-   paddd  \TMP1, \XMM0  # INCR Y0
-   MOVADQ \XMM0, \XMM1
-   PSHUFB_XMM  %xmm14, \XMM1# perform a 16 byte swap
-
-   paddd  \TMP1, \XMM0  # INCR Y0
-   MOVADQ \XMM0, \XMM2
-   PSHUFB_XMM  %xmm14, \XMM2# perform a 16 byte swap
-
-   paddd  \TMP1, \XMM0  # INCR Y0
-   MOVADQ \XMM0, \XMM3
-   

Re: [Crypto v4 03/12] support for inline tls

2018-02-12 Thread David Miller
From: Atul Gupta 
Date: Mon, 12 Feb 2018 17:34:28 +0530

> +static int get_tls_prot(struct sock *sk)
> +{
> + struct tls_context *ctx = tls_get_ctx(sk);
> + struct net_device *netdev;
> + struct tls_device *dev;
> +
> + /* Device bound to specific IP */
> + if (inet_sk(sk)->inet_rcv_saddr) {
> + netdev = find_netdev(sk);
> + if (!netdev)
> + goto out;
> +
> + /* Device supports Inline record processing */
> + if (!(netdev->features & NETIF_F_HW_TLS_INLINE))
> + goto out;
> +
> + mutex_lock(_mutex);
> + list_for_each_entry(dev, _list, dev_list) {
> + if (dev->netdev && dev->netdev(dev, netdev))
> + break;
> + }
> + mutex_unlock(_mutex);
> +
> + ctx->tx_conf = TLS_FULL_HW;
> + if (dev->prot)
> + dev->prot(dev, sk);

What if the same IP address is configured on multiple interfaces?

> + } else { /* src address not known or INADDR_ANY */
> + mutex_lock(_mutex);
> + list_for_each_entry(dev, _list, dev_list) {
> + if (dev->feature && dev->feature(dev)) {
> + ctx->tx_conf = TLS_FULL_HW;
> + break;
> + }
> + }
> + mutex_unlock(_mutex);
> + update_sk_prot(sk, ctx);

And I think this is even more of a stretch.  Just because you find
an inline TLS device on the global list doesn't mean traffic will
necessarily flow through it once the connection is fully established
and therefore be able to provide inline TLS offloading.


Re: [Crypto v4 01/12] tls: tls_device struct to register TLS drivers

2018-02-12 Thread David Miller
From: Atul Gupta 
Date: Mon, 12 Feb 2018 17:33:48 +0530

> + /* When calling get_netdev, the HW vendor's driver should return the
> +  * net device of device @device at port @port_num or NULL if such
> +  * a net device doesn't exist
> +  */
> + struct net_device *(*netdev)(struct tls_device *device,
> +  struct net_device *netdev);

I do not see a "port_num" parameter or structure member anywhere.

Please update this comment to match reality.

Thank you.


Re: [PATCH 0/5] crypto: Speck support

2018-02-12 Thread Eric Biggers
Hi all,

On Fri, Feb 09, 2018 at 07:07:01PM -0500, Jeffrey Walton wrote:
> > Hi Jeffrey,
> >
> > I see you wrote the SPECK implementation in Crypto++, and you are treating 
> > the
> > words as big endian.
> >
> > Do you have a reference for this being the "correct" order?  Unfortunately 
> > the
> > authors of the cipher failed to mention the byte order in their paper.  And 
> > they
> > gave the test vectors as words, so the test vectors don't clarify it either.
> >
> > I had assumed little endian words, but now I am having second thoughts...  
> > And
> > to confuse things further, it seems that some implementations (including the
> > authors own implementation for the SUPERCOP benchmark toolkit [1]) even 
> > consider
> > the words themselves in the order (y, x) rather than the more intuitive (x, 
> > y).
> >
> > [1] 
> > https://github.com/iadgov/simon-speck-supercop/blob/master/crypto_stream/speck128128ctr/ref/stream.c
> >
> > In fact, even the reference code from the paper treats pt[0] as y and pt[1] 
> > as
> > x, where 'pt' is a u64 array -- although that being said, it's not shown 
> > how the
> > actual bytes should be translated to/from those u64 arrays.
> >
> > I'd really like to avoid people having to add additional versions of SPECK 
> > later
> > for the different byte and word orders...
> 
> Hi Eric,
> 
> Yeah, this was a point of confusion for us as well. After the sidebar
> conversations I am wondering about the correctness of Crypto++
> implementation.
> 

We've received another response from one of the Speck creators (Louis Wingers)
that (to summarize) the intended byte order is little endian, and the intended
word order is (y, x), i.e. 'y' is at a lower memory address than 'x'.  Or
equivalently: the test vectors given in the original paper need to be read as
byte arrays from *right-to-left*.

(y, x) is not the intuitive order, but it's not a huge deal.  The more important
thing is that we don't end up with multiple implementations with different byte
and/or word orders.

So, barring any additional confusion, I'll send a revised version of this
patchset that flips the word order.  Jeff would need to flip both the byte and
word orders in his implementation in Crypto++ as well.

- Eric

> As a first step here is the official test vector for Speck-128(128)
> from Appendix C, p. 42 (https://eprint.iacr.org/2013/404.pdf):
> 
> Speck128/128
> Key: 0f0e0d0c0b0a0908 0706050403020100
> Plaintext: 6c61766975716520 7469206564616d20
> Ciphertext: a65d985179783265 7860fedf5c570d18
> 
> We had some confusion over the presentation. Here is what the Simon
> and Speck team sent when I asked about it, what gets plugged into the
> algorithm, and how it gets plugged in:
> 
> 
> 
> On Mon, Nov 20, 2017 at 10:50 AM,  wrote:
> > ...
> > I'll explain the problem you have been having with our test vectors.
> >
> > The key is:  0x0f0e0d0c0b0a0908 0x0706050403020100
> > The plaintext is:  6c61766975716520 7469206564616d20
> > The ciphertext is:  a65d985179783265 7860fedf5c570d18
> >
> > The problem is essentially one of what goes where and we probably could
> > have done a better job explaining things.
> >
> > For the key, with two words, K=(K[1],K[0]).  With three words 
> > K=(K[2],K[1],K[0]),
> > with four words K=(K[3],K[2],K[1],K[0]).
> >
> > So for the test vector you should have K[0]= 0x0706050403020100, K[1]= 
> > 0x0f0e0d0c0b0a0908
> > which is the opposite of what you have done.
> >
> > If we put this K into ExpandKey(K,sk) then the first few round keys
> > are:
> >
> > 0706050403020100
> > 37253b31171d0309
> > f91d89cc90c4085c
> > c6b1f07852cc7689
> > ...
> >
> > For the plaintext, P=(P[1],P[0]), i.e., P[1] goes into the left word of the 
> > block cipher
> > and P[0] goes into the right word of the block cipher.  So you should have
> > m[0]= 7469206564616d20 and m[1]= 6c61766975716520, which is again opposite 
> > of what you
> > have. If c[0]=m[0] and c[1]=m[1], then the encrypt function should be 
> > called as
> > Encrypt(c+1,c+0,sk).   The resulting ciphertext is (c+1,c+0).
> >
> > In general, everything goes in as a byte stream (not 64-bit words).  In 
> > this case,
> > if the 16 bytes of key are 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 
> > then
> > we first need to turn these into two 64-bit words.  The first word, K[0] is
> > 0706050403020100 and the second word is K[1]=0f0e0d0c0b0a0908.  On x86 
> > processors,
> > if the key bytes are in the array k[] of type uint8_t, then a simple
> > casting should get you K[1] and K[0].  That is, K[0]=((uint64_t *)k)[0] and
> > K[1]=((uint64_t *)k)[1].  The key expansion is run with ExpandKey(K,sk).
> > So that was what we had in mind.
> >
> > Similarly, if the plaintext "bytes" were:  20 6d 61 64 65 20 69 74 20 65 71 
> > 75 69 76 61 6c
> > (which is ascii for " made it equival")
> > then we first need to turn these into two 64-bit words.  The first word is
> > pt[0]=7469206564616d20 and the second word is 

Re: BUG: unable to handle kernel NULL pointer dereference in sha512_mb_mgr_get_comp_job_avx2

2018-02-12 Thread Eric Biggers
On Sun, Dec 03, 2017 at 12:31:01PM -0800, syzbot wrote:
> syzkaller has found reproducer for the following crash on
> 4131d5166185d0d75b5f1d4bf362a9e0bac05598
> git://git.cmpxchg.org/linux-mmots.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
> 
> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
> for information about syzkaller reproducers
> 
> 
> BUG: unable to handle kernel NULL pointer dereference at c58b0b19
> IP: sha512_mb_mgr_get_comp_job_avx2+0x6e/0xee
> arch/x86/crypto/sha512-mb/sha512_mb_mgr_flush_avx2.S:251
> PGD 1cb562067 P4D 1cb562067 PUD 1cb563067 PMD 0
> Oops: 0002 [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 24 Comm: kworker/1:1 Not tainted 4.15.0-rc1-mm1+ #29
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: crypto mcryptd_queue_worker
> task: 18ff7174 task.stack: 4c6e7fb4
> RIP: 0010:sha512_mb_mgr_get_comp_job_avx2+0x6e/0xee
> arch/x86/crypto/sha512-mb/sha512_mb_mgr_flush_avx2.S:251
> RSP: 0018:8801d9d171b8 EFLAGS: 00010002
> RAX:  RBX: 8801d5aa38d0 RCX: 
> RDX: 0001 RSI:  RDI: 8801d5aa3780
> RBP: 8801d9d171e0 R08: 0001 R09: 0001
> R10: 0002 R11: 0003 R12: 8801d5aa3780
> R13: 0282 R14: 8801cc115760 R15: e8d10630
> FS:  () GS:8801db50() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0060 CR3: 0001cb55f000 CR4: 001406e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  sha_complete_job+0x276/0x830 arch/x86/crypto/sha512-mb/sha512_mb.c:510
>  sha512_mb_update+0x2f6/0x530 arch/x86/crypto/sha512-mb/sha512_mb.c:610
>  crypto_ahash_update include/crypto/hash.h:522 [inline]
>  ahash_mcryptd_update crypto/mcryptd.c:628 [inline]
>  mcryptd_hash_update+0xcd/0x1c0 crypto/mcryptd.c:374
>  mcryptd_queue_worker+0xfe/0x660 crypto/mcryptd.c:182
>  process_one_work+0xbfd/0x1bc0 kernel/workqueue.c:2113
>  worker_thread+0x223/0x1990 kernel/workqueue.c:2247
>  kthread+0x37a/0x440 kernel/kthread.c:238
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:517
> Code: 49 0f 42 d3 48 f7 c2 f0 ff ff ff 0f 85 9a 00 00 00 48 83 e2 0f 48 6b
> da 08 48 8d 9c 1f 48 01 00 00 48 8b 03 48 c7 03 00 00 00 00  40 60 02 00
> 00 00 48 8b 9f 40 01 00 00 48 c1 e3 08 48 09 d3
> RIP: sha512_mb_mgr_get_comp_job_avx2+0x6e/0xee
> arch/x86/crypto/sha512-mb/sha512_mb_mgr_flush_avx2.S:251 RSP:
> 8801d9d171b8
> CR2: 0060
> ---[ end trace 2003a6fbb2bb168e ]---
> Kernel panic - not syncing: Fatal exception
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..

Fixed by commit eff84b379089cd, so marking it fixed for syzbot:

#syz fix: crypto: sha512-mb - initialize pending lengths correctly

- Eric


[PATCH] crypto: arm/aes-cipher - move S-box to .rodata section

2018-02-12 Thread Jinbum Park
Move the AES inverse S-box to the .rodata section
where it is safe from abuse by speculation.

Signed-off-by: Jinbum Park 
---
 arch/arm/crypto/aes-cipher-core.S | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/arm/crypto/aes-cipher-core.S 
b/arch/arm/crypto/aes-cipher-core.S
index 54b3840..184d6c2 100644
--- a/arch/arm/crypto/aes-cipher-core.S
+++ b/arch/arm/crypto/aes-cipher-core.S
@@ -174,6 +174,16 @@
.ltorg
.endm
 
+ENTRY(__aes_arm_encrypt)
+   do_cryptfround, crypto_ft_tab, crypto_ft_tab + 1, 2
+ENDPROC(__aes_arm_encrypt)
+
+   .align  5
+ENTRY(__aes_arm_decrypt)
+   do_cryptiround, crypto_it_tab, __aes_arm_inverse_sbox, 0
+ENDPROC(__aes_arm_decrypt)
+
+   .section".rodata", "a"
.align  L1_CACHE_SHIFT
.type   __aes_arm_inverse_sbox, %object
 __aes_arm_inverse_sbox:
@@ -210,12 +220,3 @@ __aes_arm_inverse_sbox:
.byte   0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26
.byte   0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d
.size   __aes_arm_inverse_sbox, . - __aes_arm_inverse_sbox
-
-ENTRY(__aes_arm_encrypt)
-   do_cryptfround, crypto_ft_tab, crypto_ft_tab + 1, 2
-ENDPROC(__aes_arm_encrypt)
-
-   .align  5
-ENTRY(__aes_arm_decrypt)
-   do_cryptiround, crypto_it_tab, __aes_arm_inverse_sbox, 0
-ENDPROC(__aes_arm_decrypt)
-- 
1.9.1



[Crypto v4 12/12] Makefile Kconfig

2018-02-12 Thread Atul Gupta
Entry for Inline TLS as another driver dependent on cxgb4 and chcr

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/Kconfig| 11 +++
 drivers/crypto/chelsio/Makefile   |  1 +
 drivers/crypto/chelsio/chtls/Makefile |  4 
 3 files changed, 16 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/Makefile

diff --git a/drivers/crypto/chelsio/Kconfig b/drivers/crypto/chelsio/Kconfig
index 5ae9f87..930d82d 100644
--- a/drivers/crypto/chelsio/Kconfig
+++ b/drivers/crypto/chelsio/Kconfig
@@ -29,3 +29,14 @@ config CHELSIO_IPSEC_INLINE
 default n
 ---help---
   Enable support for IPSec Tx Inline.
+
+config CRYPTO_DEV_CHELSIO_TLS
+tristate "Chelsio Crypto Inline TLS Driver"
+depends on CHELSIO_T4
+depends on TLS
+select CRYPTO_DEV_CHELSIO
+---help---
+  Support Chelsio Inline TLS with Chelsio crypto accelerator.
+
+  To compile this driver as a module, choose M here: the module
+  will be called chtls.
diff --git a/drivers/crypto/chelsio/Makefile b/drivers/crypto/chelsio/Makefile
index eaecaf1..639e571 100644
--- a/drivers/crypto/chelsio/Makefile
+++ b/drivers/crypto/chelsio/Makefile
@@ -3,3 +3,4 @@ ccflags-y := -Idrivers/net/ethernet/chelsio/cxgb4
 obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chcr.o
 chcr-objs :=  chcr_core.o chcr_algo.o
 chcr-$(CONFIG_CHELSIO_IPSEC_INLINE) += chcr_ipsec.o
+obj-$(CONFIG_CRYPTO_DEV_CHELSIO_TLS) += chtls/
diff --git a/drivers/crypto/chelsio/chtls/Makefile 
b/drivers/crypto/chelsio/chtls/Makefile
new file mode 100644
index 000..df13795
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/Makefile
@@ -0,0 +1,4 @@
+ccflags-y := -Idrivers/net/ethernet/chelsio/cxgb4 -Idrivers/crypto/chelsio/
+
+obj-$(CONFIG_CRYPTO_DEV_CHELSIO_TLS) += chtls.o
+chtls-objs := chtls_main.o chtls_cm.o chtls_io.o chtls_hw.o
-- 
1.8.3.1



[Crypto v4 11/12] chtls: Register the chtls Inline TLS with net tls

2018-02-12 Thread Atul Gupta
Add new uld driver for Inline TLS support. Register ULP for chtls.
Setsockopt to program key on chip. support AES GCM key size 128.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_main.c | 619 ++
 include/uapi/linux/tls.h  |   1 +
 2 files changed, 620 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_main.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_main.c 
b/drivers/crypto/chelsio/chtls/chtls_main.c
new file mode 100644
index 000..58efb4a
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_main.c
@@ -0,0 +1,619 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+#define DRV_NAME "chtls"
+
+/*
+ * chtls device management
+ * maintains a list of the chtls devices
+ */
+static LIST_HEAD(cdev_list);
+static DEFINE_MUTEX(cdev_mutex);
+static DEFINE_MUTEX(cdev_list_lock);
+
+static struct proto chtls_base_prot;
+static struct proto chtls_cpl_prot;
+static DEFINE_MUTEX(notify_mutex);
+static RAW_NOTIFIER_HEAD(listen_notify_list);
+struct request_sock_ops chtls_rsk_ops;
+static uint send_page_order = (14 - PAGE_SHIFT < 0) ? 0 : 14 - PAGE_SHIFT;
+
+static int register_listen_notifier(struct notifier_block *nb)
+{
+   int err;
+
+   mutex_lock(_mutex);
+   err = raw_notifier_chain_register(_notify_list, nb);
+   mutex_unlock(_mutex);
+   return err;
+}
+
+static int unregister_listen_notifier(struct notifier_block *nb)
+{
+   int err;
+
+   mutex_lock(_mutex);
+   err = raw_notifier_chain_unregister(_notify_list, nb);
+   mutex_unlock(_mutex);
+   return err;
+}
+
+static int listen_notify_handler(struct notifier_block *this,
+unsigned long event, void *data)
+{
+   struct sock *sk = data;
+   struct chtls_dev *cdev;
+   int ret =  NOTIFY_DONE;
+
+   switch (event) {
+   case CHTLS_LISTEN_START:
+   case CHTLS_LISTEN_STOP:
+   mutex_lock(_list_lock);
+   list_for_each_entry(cdev, _list, list) {
+   if (event == CHTLS_LISTEN_START)
+   ret = chtls_listen_start(cdev, sk);
+   else
+   chtls_listen_stop(cdev, sk);
+   }
+   mutex_unlock(_list_lock);
+   break;
+   }
+   return ret;
+}
+
+static struct notifier_block listen_notifier = {
+   .notifier_call = listen_notify_handler
+};
+
+static int listen_backlog_rcv(struct sock *sk, struct sk_buff *skb)
+{
+   if (likely(skb_transport_header(skb) != skb_network_header(skb)))
+   return tcp_v4_do_rcv(sk, skb);
+   BLOG_SKB_CB(skb)->backlog_rcv(sk, skb);
+   return 0;
+}
+
+static int chtls_start_listen(struct sock *sk)
+{
+   int err;
+
+   if (sk->sk_protocol != IPPROTO_TCP)
+   return -EPROTONOSUPPORT;
+
+   if (sk->sk_family == PF_INET &&
+   LOOPBACK(inet_sk(sk)->inet_rcv_saddr))
+   return -EADDRNOTAVAIL;
+
+   sk->sk_backlog_rcv = listen_backlog_rcv;
+   mutex_lock(_mutex);
+   err = raw_notifier_call_chain(_notify_list, 0, sk);
+   mutex_unlock(_mutex);
+   return err;
+}
+
+static int chtls_hash(struct sock *sk)
+{
+   int err;
+
+   err = tcp_prot.hash(sk);
+   if (sk->sk_state == TCP_LISTEN)
+   err |= chtls_start_listen(sk);
+
+   if (err)
+   tcp_prot.unhash(sk);
+   return err;
+}
+
+static int chtls_stop_listen(struct sock *sk)
+{
+   if (sk->sk_protocol != IPPROTO_TCP)
+   return -EPROTONOSUPPORT;
+
+   mutex_lock(_mutex);
+   raw_notifier_call_chain(_notify_list, 1, sk);
+   mutex_unlock(_mutex);
+   return 0;
+}
+
+struct net_device *chtls_netdev(struct tls_device *dev,
+   struct net_device *netdev)
+{
+   struct chtls_dev *cdev = to_chtls_dev(dev);
+   int i;
+
+   for (i = 0; i < cdev->lldi->nports; i++)
+   if (cdev->ports[i] == netdev)
+   return netdev;
+
+   return cdev->ports[i];
+}
+
+void chtls_update_prot(struct tls_device *dev, struct sock *sk)
+{
+   sk->sk_prot = _base_prot;
+}
+
+int chtls_inline_feature(struct tls_device *dev)
+{
+   struct chtls_dev *cdev = to_chtls_dev(dev);
+   struct net_device *netdev;
+   int i;
+
+   for (i = 0; i < cdev->lldi->nports; i++) {
+   netdev = cdev->ports[i];
+   if (netdev->features & NETIF_F_HW_TLS_INLINE)
+   

[Crypto v4 10/12] chtls: Inline crypto request Tx/Rx

2018-02-12 Thread Atul Gupta
TLS handler for record transmit and receive.
Create Inline TLS work request and post to FW.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_io.c | 1867 +++
 1 file changed, 1867 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_io.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_io.c 
b/drivers/crypto/chelsio/chtls/chtls_io.c
new file mode 100644
index 000..a0f03fb
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_io.c
@@ -0,0 +1,1867 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+static bool is_tls_hw(struct chtls_sock *csk)
+{
+   return csk->tlshws.ofld;
+}
+
+static bool is_tls_rx(struct chtls_sock *csk)
+{
+   return (csk->tlshws.rxkey >= 0);
+}
+
+static bool is_tls_tx(struct chtls_sock *csk)
+{
+   return (csk->tlshws.txkey >= 0);
+}
+
+static bool is_tls_skb(struct chtls_sock *csk, const struct sk_buff *skb)
+{
+   return (is_tls_hw(csk) && skb_ulp_tls_skb_flags(skb));
+}
+
+static int key_size(void *sk)
+{
+   return 16; /* Key on DDR */
+}
+
+#define ceil(x, y) \
+   ({ unsigned long __x = (x), __y = (y); (__x + __y - 1) / __y; })
+
+static int data_sgl_len(const struct sk_buff *skb)
+{
+   unsigned int cnt;
+
+   cnt = skb_shinfo(skb)->nr_frags;
+   return (sgl_len(cnt) * 8);
+}
+
+static int nos_ivs(struct sock *sk, unsigned int size)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+
+   return ceil(size, csk->tlshws.mfs);
+}
+
+#define TLS_WR_CPL_LEN \
+   (sizeof(struct fw_tlstx_data_wr) + \
+   sizeof(struct cpl_tx_tls_sfo))
+
+static int is_ivs_imm(struct sock *sk, const struct sk_buff *skb)
+{
+   int ivs_size = nos_ivs(sk, skb->len) * CIPHER_BLOCK_SIZE;
+   int hlen = TLS_WR_CPL_LEN + data_sgl_len(skb);
+
+   if ((hlen + key_size(sk) + ivs_size) <
+   MAX_IMM_OFLD_TX_DATA_WR_LEN) {
+   ULP_SKB_CB(skb)->ulp.tls.iv = 1;
+   return 1;
+   }
+   ULP_SKB_CB(skb)->ulp.tls.iv = 0;
+   return 0;
+}
+
+static int max_ivs_size(struct sock *sk, int size)
+{
+   return (nos_ivs(sk, size) * CIPHER_BLOCK_SIZE);
+}
+
+static int ivs_size(struct sock *sk, const struct sk_buff *skb)
+{
+   return (is_ivs_imm(sk, skb) ? (nos_ivs(sk, skb->len) *
+CIPHER_BLOCK_SIZE) : 0);
+}
+
+static int flowc_wr_credits(int nparams, int *flowclenp)
+{
+   int flowclen16, flowclen;
+
+   flowclen = offsetof(struct fw_flowc_wr, mnemval[nparams]);
+   flowclen16 = DIV_ROUND_UP(flowclen, 16);
+   flowclen = flowclen16 * 16;
+
+   if (flowclenp)
+   *flowclenp = flowclen;
+
+   return flowclen16;
+}
+
+static struct sk_buff *create_flowc_wr_skb(struct sock *sk,
+  struct fw_flowc_wr *flowc,
+  int flowclen)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct sk_buff *skb;
+
+   skb = alloc_skb(flowclen, GFP_ATOMIC);
+   if (!skb)
+   return NULL;
+
+   memcpy(__skb_put(skb, flowclen), flowc, flowclen);
+   set_queue(skb, (csk->txq_idx << 1) | CPL_PRIORITY_DATA, sk);
+
+   return skb;
+}
+
+static int send_flowc_wr(struct sock *sk, struct fw_flowc_wr *flowc,
+int flowclen)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   bool syn_sent = (sk->sk_state == TCP_SYN_SENT);
+   struct tcp_sock *tp = tcp_sk(sk);
+   int flowclen16 = flowclen / 16;
+   struct sk_buff *skb;
+
+   if (csk_flag(sk, CSK_TX_DATA_SENT)) {
+   skb = create_flowc_wr_skb(sk, flowc, flowclen);
+   if (!skb)
+   return -ENOMEM;
+
+   if (syn_sent)
+   __skb_queue_tail(>ooo_queue, skb);
+   else
+   skb_entail(sk, skb,
+  ULPCB_FLAG_NO_HDR | ULPCB_FLAG_NO_APPEND);
+   return 0;
+   }
+
+   if (!syn_sent) {
+   int ret;
+
+   ret = cxgb4_immdata_send(csk->egress_dev,
+csk->txq_idx,
+flowc, flowclen);
+   if (!ret)
+   return flowclen16;
+   }
+   skb = create_flowc_wr_skb(sk, flowc, flowclen);
+   if (!skb)
+   return -ENOMEM;
+   send_or_defer(sk, tp, skb, 0);
+   return flowclen16;
+}
+
+static u8 

[Crypto v4 09/12] chtls: CPL handler definition

2018-02-12 Thread Atul Gupta
CPL handlers for TLS session, record transmit and receive.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_cm.c | 2045 +++
 net/ipv4/tcp_minisocks.c|1 +
 2 files changed, 2046 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_cm.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_cm.c 
b/drivers/crypto/chelsio/chtls/chtls_cm.c
new file mode 100644
index 000..dac3c3a
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_cm.c
@@ -0,0 +1,2045 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+extern struct request_sock_ops chtls_rsk_ops;
+
+/*
+ * State transitions and actions for close.  Note that if we are in SYN_SENT
+ * we remain in that state as we cannot control a connection while it's in
+ * SYN_SENT; such connections are allowed to establish and are then aborted.
+ */
+static unsigned char new_state[16] = {
+   /* current state: new state:  action: */
+   /* (Invalid)   */ TCP_CLOSE,
+   /* TCP_ESTABLISHED */ TCP_FIN_WAIT1 | TCP_ACTION_FIN,
+   /* TCP_SYN_SENT*/ TCP_SYN_SENT,
+   /* TCP_SYN_RECV*/ TCP_FIN_WAIT1 | TCP_ACTION_FIN,
+   /* TCP_FIN_WAIT1   */ TCP_FIN_WAIT1,
+   /* TCP_FIN_WAIT2   */ TCP_FIN_WAIT2,
+   /* TCP_TIME_WAIT   */ TCP_CLOSE,
+   /* TCP_CLOSE   */ TCP_CLOSE,
+   /* TCP_CLOSE_WAIT  */ TCP_LAST_ACK | TCP_ACTION_FIN,
+   /* TCP_LAST_ACK*/ TCP_LAST_ACK,
+   /* TCP_LISTEN  */ TCP_CLOSE,
+   /* TCP_CLOSING */ TCP_CLOSING,
+};
+
+static struct chtls_sock *chtls_sock_create(struct chtls_dev *cdev)
+{
+   struct chtls_sock *csk = kzalloc(sizeof(*csk), GFP_ATOMIC);
+
+   if (!csk)
+   return NULL;
+
+   csk->txdata_skb_cache = alloc_skb(TXDATA_SKB_LEN, GFP_ATOMIC);
+   if (!csk->txdata_skb_cache) {
+   kfree(csk);
+   return NULL;
+   }
+
+   kref_init(>kref);
+   csk->cdev = cdev;
+   skb_queue_head_init(>txq);
+   csk->wr_skb_head = NULL;
+   csk->wr_skb_tail = NULL;
+   csk->mss = MAX_MSS;
+   csk->tlshws.ofld = 1;
+   csk->tlshws.txkey = -1;
+   csk->tlshws.rxkey = -1;
+   csk->tlshws.mfs = TLS_MFS;
+   skb_queue_head_init(>tlshws.sk_recv_queue);
+   return csk;
+}
+
+static void chtls_sock_release(struct kref *ref)
+{
+   struct chtls_sock *csk =
+   container_of(ref, struct chtls_sock, kref);
+
+   kfree(csk);
+}
+
+static struct net_device *chtls_ipv4_netdev(struct chtls_dev *cdev,
+   struct sock *sk)
+{
+   struct net_device *ndev = cdev->ports[0];
+
+   if (likely(!inet_sk(sk)->inet_rcv_saddr))
+   return ndev;
+
+   ndev = ip_dev_find(_net, inet_sk(sk)->inet_rcv_saddr);
+   if (!ndev)
+   return NULL;
+
+   if (is_vlan_dev(ndev))
+   return vlan_dev_real_dev(ndev);
+   return ndev;
+}
+
+static void assign_rxopt(struct sock *sk, unsigned int opt)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct tcp_sock *tp = tcp_sk(sk);
+   const struct chtls_dev *cdev;
+
+   cdev = csk->cdev;
+   tp->tcp_header_len   = sizeof(struct tcphdr);
+   tp->rx_opt.mss_clamp = cdev->mtus[TCPOPT_MSS_G(opt)] - 40;
+   tp->mss_cache= tp->rx_opt.mss_clamp;
+   tp->rx_opt.tstamp_ok = TCPOPT_TSTAMP_G(opt);
+   tp->rx_opt.snd_wscale= TCPOPT_SACK_G(opt);
+   tp->rx_opt.wscale_ok = TCPOPT_WSCALE_OK_G(opt);
+   SND_WSCALE(tp)   = TCPOPT_SND_WSCALE_G(opt);
+   if (!tp->rx_opt.wscale_ok)
+   tp->rx_opt.rcv_wscale = 0;
+   if (tp->rx_opt.tstamp_ok) {
+   tp->tcp_header_len += TCPOLEN_TSTAMP_ALIGNED;
+   tp->rx_opt.mss_clamp -= TCPOLEN_TSTAMP_ALIGNED;
+   } else if (csk->opt2 & TSTAMPS_EN_F) {
+   csk->opt2 &= ~TSTAMPS_EN_F;
+   csk->mtu_idx = TCPOPT_MSS_G(opt);
+   }
+}
+
+static void chtls_purge_rcv_queue(struct sock *sk)
+{
+   struct sk_buff *skb;
+
+   while ((skb = __skb_dequeue(>sk_receive_queue)) != NULL) {
+   skb_dst_set(skb, (void *)NULL);
+   kfree_skb(skb);
+   }
+}
+
+static void chtls_purge_write_queue(struct sock *sk)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct sk_buff *skb;
+
+   while ((skb = __skb_dequeue(>txq))) {
+

[Crypto v4 06/12] cxgb4: LLD driver changes to enable TLS

2018-02-12 Thread Atul Gupta
Read FW capability. Read key area size. Dump the TLS record count.

Signed-off-by: Atul Gupta 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c | 18 +++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 32 +--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h |  7 ++
 drivers/net/ethernet/chelsio/cxgb4/sge.c   | 98 +-
 4 files changed, 142 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index cf47183..cfc9210 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -2826,8 +2826,8 @@ static int meminfo_show(struct seq_file *seq, void *v)
"Tx payload:", "Rx payload:", "LE hash:", "iSCSI region:",
"TDDP region:", "TPT region:", "STAG region:", "RQ region:",
"RQUDP region:", "PBL region:", "TXPBL region:",
-   "DBVFIFO region:", "ULPRX state:", "ULPTX state:",
-   "On-chip queues:"
+   "TLSKey region:", "DBVFIFO region:", "ULPRX state:",
+   "ULPTX state:", "On-chip queues:"
};
 
int i, n;
@@ -2943,6 +2943,12 @@ static int meminfo_show(struct seq_file *seq, void *v)
ulp_region(RX_RQUDP);
ulp_region(RX_PBL);
ulp_region(TX_PBL);
+   if (adap->params.crypto & FW_CAPS_CONFIG_TLS_INLINE) {
+   ulp_region(RX_TLS_KEY);
+   } else {
+   md->base = 0;
+   md->idx = ARRAY_SIZE(region);
+   }
 #undef ulp_region
md->base = 0;
md->idx = ARRAY_SIZE(region);
@@ -3098,6 +3104,14 @@ static int chcr_show(struct seq_file *seq, void *v)
   atomic_read(>chcr_stats.fallback));
seq_printf(seq, "IPSec PDU: %10u\n",
   atomic_read(>chcr_stats.ipsec_cnt));
+
+   seq_puts(seq, "\nChelsio Inline TLS Stats\n");
+   seq_printf(seq, "TLS PDU Tx: %u\n",
+  atomic_read(>chcr_stats.tls_pdu_tx));
+   seq_printf(seq, "TLS PDU Rx: %u\n",
+  atomic_read(>chcr_stats.tls_pdu_rx));
+   seq_printf(seq, "TLS Keys (DDR) Count: %u\n",
+  atomic_read(>chcr_stats.tls_key));
return 0;
 }
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 05a4abf..60eb18b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4086,18 +4086,32 @@ static int adap_init0(struct adapter *adap)
adap->num_ofld_uld += 2;
}
if (caps_cmd.cryptocaps) {
-   /* Should query params here...TODO */
-   params[0] = FW_PARAM_PFVF(NCRYPTO_LOOKASIDE);
-   ret = t4_query_params(adap, adap->mbox, adap->pf, 0, 2,
- params, val);
-   if (ret < 0) {
-   if (ret != -EINVAL)
+   if (ntohs(caps_cmd.cryptocaps) &
+   FW_CAPS_CONFIG_CRYPTO_LOOKASIDE) {
+   params[0] = FW_PARAM_PFVF(NCRYPTO_LOOKASIDE);
+   ret = t4_query_params(adap, adap->mbox, adap->pf, 0,
+ 2, params, val);
+   if (ret < 0) {
+   if (ret != -EINVAL)
+   goto bye;
+   } else {
+   adap->vres.ncrypto_fc = val[0];
+   }
+   adap->num_ofld_uld += 1;
+   }
+   if (ntohs(caps_cmd.cryptocaps) &
+   FW_CAPS_CONFIG_TLS_INLINE) {
+   params[0] = FW_PARAM_PFVF(TLS_START);
+   params[1] = FW_PARAM_PFVF(TLS_END);
+   ret = t4_query_params(adap, adap->mbox, adap->pf, 0,
+ 2, params, val);
+   if (ret < 0)
goto bye;
-   } else {
-   adap->vres.ncrypto_fc = val[0];
+   adap->vres.key.start = val[0];
+   adap->vres.key.size = val[1] - val[0] + 1;
+   adap->num_uld += 1;
}
adap->params.crypto = ntohs(caps_cmd.cryptocaps);
-   adap->num_uld += 1;
}
 #undef FW_PARAM_PFVF
 #undef FW_PARAM_DEV
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
index 1d37672..55863f6 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
@@ -237,6 +237,7 @@ enum cxgb4_uld {
CXGB4_ULD_ISCSI,
CXGB4_ULD_ISCSIT,
CXGB4_ULD_CRYPTO,
+   CXGB4_ULD_TLS,
CXGB4_ULD_MAX
 };
 
@@ -287,6 +288,7 @@ struct cxgb4_virt_res { 

[Crypto v4 08/12] chtls: Key program

2018-02-12 Thread Atul Gupta
Program the tx and rx key on chip.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls_hw.c | 394 
 1 file changed, 394 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_hw.c

diff --git a/drivers/crypto/chelsio/chtls/chtls_hw.c 
b/drivers/crypto/chelsio/chtls/chtls_hw.c
new file mode 100644
index 000..c3e17159
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls_hw.c
@@ -0,0 +1,394 @@
+/*
+ * Copyright (c) 2017 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Written by: Atul Gupta (atul.gu...@chelsio.com)
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "chtls.h"
+#include "chtls_cm.h"
+
+static void __set_tcb_field_direct(struct chtls_sock *csk,
+  struct cpl_set_tcb_field *req, u16 word,
+  u64 mask, u64 val, u8 cookie, int no_reply)
+{
+   struct ulptx_idata *sc;
+
+   INIT_TP_WR_CPL(req, CPL_SET_TCB_FIELD, csk->tid);
+   req->wr.wr_mid |= htonl(FW_WR_FLOWID_V(csk->tid));
+   req->reply_ctrl = htons(NO_REPLY_V(no_reply) |
+   QUEUENO_V(csk->rss_qid));
+   req->word_cookie = htons(TCB_WORD_V(word) | TCB_COOKIE_V(cookie));
+   req->mask = cpu_to_be64(mask);
+   req->val = cpu_to_be64(val);
+   sc = (struct ulptx_idata *)(req + 1);
+   sc->cmd_more = htonl(ULPTX_CMD_V(ULP_TX_SC_NOOP));
+   sc->len = htonl(0);
+}
+
+static void __set_tcb_field(struct sock *sk, struct sk_buff *skb, u16 word,
+   u64 mask, u64 val, u8 cookie, int no_reply)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct cpl_set_tcb_field *req;
+   struct ulptx_idata *sc;
+   unsigned int wrlen = roundup(sizeof(*req) + sizeof(*sc), 16);
+
+   req = (struct cpl_set_tcb_field *)__skb_put(skb, wrlen);
+   __set_tcb_field_direct(csk, req, word, mask, val, cookie, no_reply);
+   set_wr_txq(skb, CPL_PRIORITY_CONTROL, csk->port_id);
+}
+
+static int chtls_set_tcb_field(struct sock *sk, u16 word, u64 mask, u64 val)
+{
+   struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
+   struct sk_buff *skb;
+   struct cpl_set_tcb_field *req;
+   struct ulptx_idata *sc;
+   unsigned int wrlen = roundup(sizeof(*req) + sizeof(*sc), 16);
+   unsigned int credits_needed = DIV_ROUND_UP(wrlen, 16);
+
+   skb = alloc_skb(wrlen, GFP_ATOMIC);
+   if (!skb)
+   return -ENOMEM;
+
+   __set_tcb_field(sk, skb, word, mask, val, 0, 1);
+   set_queue(skb, (csk->txq_idx << 1) | CPL_PRIORITY_DATA, sk);
+   csk->wr_credits -= credits_needed;
+   csk->wr_unacked += credits_needed;
+   enqueue_wr(csk, skb);
+   cxgb4_ofld_send(csk->egress_dev, skb);
+   return 0;
+}
+
+/*
+ * Set one of the t_flags bits in the TCB.
+ */
+int chtls_set_tcb_tflag(struct sock *sk, unsigned int bit_pos, int val)
+{
+   return chtls_set_tcb_field(sk, 1, 1ULL << bit_pos,
+   val << bit_pos);
+}
+
+static int chtls_set_tcb_keyid(struct sock *sk, int keyid)
+{
+   return chtls_set_tcb_field(sk, 31, 0xULL, keyid);
+}
+
+static int chtls_set_tcb_seqno(struct sock *sk)
+{
+   return chtls_set_tcb_field(sk, 28, ~0ULL, 0);
+}
+
+static int chtls_set_tcb_quiesce(struct sock *sk, int val)
+{
+   return chtls_set_tcb_field(sk, 1, (1ULL << TF_RX_QUIESCE_S),
+  TF_RX_QUIESCE_V(val));
+}
+
+static void *chtls_alloc_mem(unsigned long size)
+{
+   void *p = kmalloc(size, GFP_KERNEL);
+
+   if (!p)
+   p = vmalloc(size);
+   if (p)
+   memset(p, 0, size);
+   return p;
+}
+
+static void chtls_free_mem(void *addr)
+{
+   unsigned long p = (unsigned long)addr;
+
+   if (p >= VMALLOC_START && p < VMALLOC_END)
+   vfree(addr);
+   else
+   kfree(addr);
+}
+
+/* TLS Key bitmap processing */
+int chtls_init_kmap(struct chtls_dev *cdev, struct cxgb4_lld_info *lldi)
+{
+   unsigned int num_key_ctx, bsize;
+
+   num_key_ctx = (lldi->vr->key.size / TLS_KEY_CONTEXT_SZ);
+   bsize = BITS_TO_LONGS(num_key_ctx);
+
+   cdev->kmap.size = num_key_ctx;
+   cdev->kmap.available = bsize;
+   cdev->kmap.addr = chtls_alloc_mem(sizeof(*cdev->kmap.addr) *
+ bsize);
+   if (!cdev->kmap.addr)
+   return -1;
+
+   cdev->kmap.start = lldi->vr->key.start;
+   spin_lock_init(>kmap.lock);
+   return 0;
+}
+
+void chtls_free_kmap(struct chtls_dev *cdev)
+{
+   if (cdev->kmap.addr)
+   chtls_free_mem(cdev->kmap.addr);
+}
+
+static int 

[Crypto v4 07/12] chcr: Key Macro

2018-02-12 Thread Atul Gupta
Define macro for TLS Key context

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chcr_algo.h | 42 +
 drivers/crypto/chelsio/chcr_core.h | 55 +-
 2 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/chelsio/chcr_algo.h 
b/drivers/crypto/chelsio/chcr_algo.h
index d1673a5..f263cd4 100644
--- a/drivers/crypto/chelsio/chcr_algo.h
+++ b/drivers/crypto/chelsio/chcr_algo.h
@@ -86,6 +86,39 @@
 KEY_CONTEXT_OPAD_PRESENT_M)
 #define KEY_CONTEXT_OPAD_PRESENT_F  KEY_CONTEXT_OPAD_PRESENT_V(1U)
 
+#define TLS_KEYCTX_RXFLIT_CNT_S 24
+#define TLS_KEYCTX_RXFLIT_CNT_V(x) ((x) << TLS_KEYCTX_RXFLIT_CNT_S)
+
+#define TLS_KEYCTX_RXPROT_VER_S 20
+#define TLS_KEYCTX_RXPROT_VER_M 0xf
+#define TLS_KEYCTX_RXPROT_VER_V(x) ((x) << TLS_KEYCTX_RXPROT_VER_S)
+
+#define TLS_KEYCTX_RXCIPH_MODE_S 16
+#define TLS_KEYCTX_RXCIPH_MODE_M 0xf
+#define TLS_KEYCTX_RXCIPH_MODE_V(x) ((x) << TLS_KEYCTX_RXCIPH_MODE_S)
+
+#define TLS_KEYCTX_RXAUTH_MODE_S 12
+#define TLS_KEYCTX_RXAUTH_MODE_M 0xf
+#define TLS_KEYCTX_RXAUTH_MODE_V(x) ((x) << TLS_KEYCTX_RXAUTH_MODE_S)
+
+#define TLS_KEYCTX_RXCIAU_CTRL_S 11
+#define TLS_KEYCTX_RXCIAU_CTRL_V(x) ((x) << TLS_KEYCTX_RXCIAU_CTRL_S)
+
+#define TLS_KEYCTX_RX_SEQCTR_S 9
+#define TLS_KEYCTX_RX_SEQCTR_M 0x3
+#define TLS_KEYCTX_RX_SEQCTR_V(x) ((x) << TLS_KEYCTX_RX_SEQCTR_S)
+
+#define TLS_KEYCTX_RX_VALID_S 8
+#define TLS_KEYCTX_RX_VALID_V(x) ((x) << TLS_KEYCTX_RX_VALID_S)
+
+#define TLS_KEYCTX_RXCK_SIZE_S 3
+#define TLS_KEYCTX_RXCK_SIZE_M 0x7
+#define TLS_KEYCTX_RXCK_SIZE_V(x) ((x) << TLS_KEYCTX_RXCK_SIZE_S)
+
+#define TLS_KEYCTX_RXMK_SIZE_S 0
+#define TLS_KEYCTX_RXMK_SIZE_M 0x7
+#define TLS_KEYCTX_RXMK_SIZE_V(x) ((x) << TLS_KEYCTX_RXMK_SIZE_S)
+
 #define CHCR_HASH_MAX_DIGEST_SIZE 64
 #define CHCR_MAX_SHA_DIGEST_SIZE 64
 
@@ -176,6 +209,15 @@
  KEY_CONTEXT_SALT_PRESENT_V(1) | \
  KEY_CONTEXT_CTX_LEN_V((ctx_len)))
 
+#define  FILL_KEY_CRX_HDR(ck_size, mk_size, d_ck, opad, ctx_len) \
+   htonl(TLS_KEYCTX_RXMK_SIZE_V(mk_size) | \
+ TLS_KEYCTX_RXCK_SIZE_V(ck_size) | \
+ TLS_KEYCTX_RX_VALID_V(1) | \
+ TLS_KEYCTX_RX_SEQCTR_V(3) | \
+ TLS_KEYCTX_RXAUTH_MODE_V(4) | \
+ TLS_KEYCTX_RXCIPH_MODE_V(2) | \
+ TLS_KEYCTX_RXFLIT_CNT_V((ctx_len)))
+
 #define FILL_WR_OP_CCTX_SIZE \
htonl( \
FW_CRYPTO_LOOKASIDE_WR_OPCODE_V( \
diff --git a/drivers/crypto/chelsio/chcr_core.h 
b/drivers/crypto/chelsio/chcr_core.h
index 3c29ee0..77056a9 100644
--- a/drivers/crypto/chelsio/chcr_core.h
+++ b/drivers/crypto/chelsio/chcr_core.h
@@ -65,10 +65,58 @@
 struct _key_ctx {
__be32 ctx_hdr;
u8 salt[MAX_SALT];
-   __be64 reserverd;
+   __be64 iv_to_auth;
unsigned char key[0];
 };
 
+#define KEYCTX_TX_WR_IV_S  55
+#define KEYCTX_TX_WR_IV_M  0x1ffULL
+#define KEYCTX_TX_WR_IV_V(x) ((x) << KEYCTX_TX_WR_IV_S)
+#define KEYCTX_TX_WR_IV_G(x) \
+   (((x) >> KEYCTX_TX_WR_IV_S) & KEYCTX_TX_WR_IV_M)
+
+#define KEYCTX_TX_WR_AAD_S 47
+#define KEYCTX_TX_WR_AAD_M 0xffULL
+#define KEYCTX_TX_WR_AAD_V(x) ((x) << KEYCTX_TX_WR_AAD_S)
+#define KEYCTX_TX_WR_AAD_G(x) (((x) >> KEYCTX_TX_WR_AAD_S) & \
+   KEYCTX_TX_WR_AAD_M)
+
+#define KEYCTX_TX_WR_AADST_S 39
+#define KEYCTX_TX_WR_AADST_M 0xffULL
+#define KEYCTX_TX_WR_AADST_V(x) ((x) << KEYCTX_TX_WR_AADST_S)
+#define KEYCTX_TX_WR_AADST_G(x) \
+   (((x) >> KEYCTX_TX_WR_AADST_S) & KEYCTX_TX_WR_AADST_M)
+
+#define KEYCTX_TX_WR_CIPHER_S 30
+#define KEYCTX_TX_WR_CIPHER_M 0x1ffULL
+#define KEYCTX_TX_WR_CIPHER_V(x) ((x) << KEYCTX_TX_WR_CIPHER_S)
+#define KEYCTX_TX_WR_CIPHER_G(x) \
+   (((x) >> KEYCTX_TX_WR_CIPHER_S) & KEYCTX_TX_WR_CIPHER_M)
+
+#define KEYCTX_TX_WR_CIPHERST_S 23
+#define KEYCTX_TX_WR_CIPHERST_M 0x7f
+#define KEYCTX_TX_WR_CIPHERST_V(x) ((x) << KEYCTX_TX_WR_CIPHERST_S)
+#define KEYCTX_TX_WR_CIPHERST_G(x) \
+   (((x) >> KEYCTX_TX_WR_CIPHERST_S) & KEYCTX_TX_WR_CIPHERST_M)
+
+#define KEYCTX_TX_WR_AUTH_S 14
+#define KEYCTX_TX_WR_AUTH_M 0x1ff
+#define KEYCTX_TX_WR_AUTH_V(x) ((x) << KEYCTX_TX_WR_AUTH_S)
+#define KEYCTX_TX_WR_AUTH_G(x) \
+   (((x) >> KEYCTX_TX_WR_AUTH_S) & KEYCTX_TX_WR_AUTH_M)
+
+#define KEYCTX_TX_WR_AUTHST_S 7
+#define KEYCTX_TX_WR_AUTHST_M 0x7f
+#define KEYCTX_TX_WR_AUTHST_V(x) ((x) << KEYCTX_TX_WR_AUTHST_S)
+#define KEYCTX_TX_WR_AUTHST_G(x) \
+   (((x) >> KEYCTX_TX_WR_AUTHST_S) & KEYCTX_TX_WR_AUTHST_M)
+
+#define KEYCTX_TX_WR_AUTHIN_S 0
+#define KEYCTX_TX_WR_AUTHIN_M 0x7f
+#define KEYCTX_TX_WR_AUTHIN_V(x) ((x) << KEYCTX_TX_WR_AUTHIN_S)
+#define KEYCTX_TX_WR_AUTHIN_G(x) \
+   (((x) >> KEYCTX_TX_WR_AUTHIN_S) & KEYCTX_TX_WR_AUTHIN_M)
+
 struct chcr_wr {
struct fw_crypto_lookaside_wr wreq;
struct ulp_txpkt ulptx;
@@ -90,6 +138,11 @@ struct uld_ctx {
struct chcr_dev 

[Crypto v4 04/12] chtls: structure and macro definiton

2018-02-12 Thread Atul Gupta
Inline TLS state, connection management. Supporting macros definition.

Signed-off-by: Atul Gupta 
---
 drivers/crypto/chelsio/chtls/chtls.h| 487 
 drivers/crypto/chelsio/chtls/chtls_cm.h | 203 +
 2 files changed, 690 insertions(+)
 create mode 100644 drivers/crypto/chelsio/chtls/chtls.h
 create mode 100644 drivers/crypto/chelsio/chtls/chtls_cm.h

diff --git a/drivers/crypto/chelsio/chtls/chtls.h 
b/drivers/crypto/chelsio/chtls/chtls.h
new file mode 100644
index 000..c7b8d59
--- /dev/null
+++ b/drivers/crypto/chelsio/chtls/chtls.h
@@ -0,0 +1,487 @@
+/*
+ * Copyright (c) 2016 Chelsio Communications, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __CHTLS_H__
+#define __CHTLS_H__
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "t4fw_api.h"
+#include "t4_msg.h"
+#include "cxgb4.h"
+#include "cxgb4_uld.h"
+#include "l2t.h"
+#include "chcr_algo.h"
+#include "chcr_core.h"
+#include "chcr_crypto.h"
+
+#define CIPHER_BLOCK_SIZE   16
+#define MAX_IVS_PAGE256
+#define TLS_KEY_CONTEXT_SZ 64
+#define TLS_HEADER_LENGTH  5
+#define SCMD_CIPH_MODE_AES_GCM  2
+#define GCM_TAG_SIZE16
+#define AEAD_EXPLICIT_DATA_SIZE 8
+/* Any MFS size should work and come from openssl */
+#define TLS_MFS16384
+
+#define SOCK_INLINE (31)
+#define RSS_HDR sizeof(struct rss_header)
+
+enum {
+   CHTLS_KEY_CONTEXT_DSGL,
+   CHTLS_KEY_CONTEXT_IMM,
+   CHTLS_KEY_CONTEXT_DDR,
+};
+
+enum {
+   CHTLS_LISTEN_START,
+   CHTLS_LISTEN_STOP,
+};
+
+/* Flags for return value of CPL message handlers */
+enum {
+   CPL_RET_BUF_DONE = 1,   /* buffer processing done */
+   CPL_RET_BAD_MSG = 2,/* bad CPL message */
+   CPL_RET_UNKNOWN_TID = 4 /* unexpected unknown TID */
+};
+
+#define TLS_RCV_ST_READ_HEADER  0xF0
+#define TLS_RCV_ST_READ_BODY0xF1
+#define TLS_RCV_ST_READ_DONE0xF2
+#define TLS_RCV_ST_READ_NB  0xF3
+
+#define RSPQ_HASH_BITS 5
+#define LISTEN_INFO_HASH_SIZE 32
+struct listen_info {
+   struct listen_info *next;  /* Link to next entry */
+   struct sock *sk;   /* The listening socket */
+   unsigned int stid; /* The server TID */
+};
+
+enum {
+   T4_LISTEN_START_PENDING,
+   T4_LISTEN_STARTED
+};
+
+enum csk_flags {
+   CSK_CALLBACKS_CHKD, /* socket callbacks have been sanitized */
+   CSK_ABORT_REQ_RCVD, /* received one ABORT_REQ_RSS message */
+   CSK_TX_MORE_DATA,   /* sending ULP data; don't set SHOVE bit */
+   CSK_TX_WAIT_IDLE,   /* suspend Tx until in-flight data is ACKed */
+   CSK_ABORT_SHUTDOWN, /* shouldn't send more abort requests */
+   CSK_ABORT_RPL_PENDING,  /* expecting an abort reply */
+   CSK_CLOSE_CON_REQUESTED,/* we've sent a close_conn_req */
+   CSK_TX_DATA_SENT,   /* sent a TX_DATA WR on this connection */
+   CSK_TX_FAILOVER,/* Tx traffic failing over */
+   CSK_UPDATE_RCV_WND, /* Need to update rcv window */
+   CSK_RST_ABORTED,/* outgoing RST was aborted */
+   CSK_TLS_HANDSHK,/* TLS Handshake */
+};
+
+struct listen_ctx {
+   struct sock *lsk;
+   struct chtls_dev *cdev;
+   u32 state;
+};
+
+struct key_map {
+   unsigned long *addr;
+   unsigned int start;
+   unsigned int available;
+   unsigned int size;
+   spinlock_t lock; /* lock for key id request from map */
+} __packed;
+
+struct tls_scmd {
+   __be32 seqno_numivs;
+   __be32 ivgen_hdrlen;
+};
+
+struct chtls_dev {
+   struct tls_device tlsdev;
+   struct list_head list;
+   struct cxgb4_lld_info *lldi;
+   struct pci_dev *pdev;
+   struct listen_info *listen_hash_tab[LISTEN_INFO_HASH_SIZE];
+   spinlock_t listen_lock; /* lock for listen list */
+   struct net_device **ports;
+   struct tid_info *tids;
+   unsigned int pfvf;
+   const unsigned short *mtus;
+
+   spinlock_t aidr_lock cacheline_aligned_in_smp;
+   struct idr aidr; /* ATID id space */
+   struct idr hwtid_idr;
+   struct idr stid_idr;
+
+   spinlock_t idr_lock cacheline_aligned_in_smp;
+
+   struct net_device *egr_dev[NCHAN * 2];
+   struct sk_buff *rspq_skb_cache[1 << RSPQ_HASH_BITS];
+   struct sk_buff *askb;
+
+   struct sk_buff_head deferq;
+   struct work_struct deferq_task;
+
+   struct list_head list_node;
+   struct list_head rcu_node;
+   struct list_head na_node;
+   unsigned int send_page_order;
+   struct key_map kmap;
+};
+
+struct 

[Crypto v4 03/12] support for inline tls

2018-02-12 Thread Atul Gupta
Signed-off-by: Atul Gupta 
---
 net/tls/tls_main.c | 113 +
 1 file changed, 113 insertions(+)

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index e07ee3a..10a6d5d 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -48,9 +49,12 @@
 enum {
TLS_BASE_TX,
TLS_SW_TX,
+   TLS_FULL_HW, /* TLS record processed Inline */
TLS_NUM_CONFIG,
 };
 
+static LIST_HEAD(device_list);
+static DEFINE_MUTEX(device_mutex);
 static struct proto tls_prots[TLS_NUM_CONFIG];
 
 static inline void update_sk_prot(struct sock *sk, struct tls_context *ctx)
@@ -448,6 +452,92 @@ static int tls_setsockopt(struct sock *sk, int level, int 
optname,
return do_tls_setsockopt(sk, optname, optval, optlen);
 }
 
+static struct net_device *find_netdev(struct sock *sk)
+{
+   struct net_device *netdev = NULL;
+
+   netdev = __ip_dev_find(_net, inet_sk(sk)->inet_rcv_saddr, false);
+   return netdev;
+}
+
+static int get_tls_prot(struct sock *sk)
+{
+   struct tls_context *ctx = tls_get_ctx(sk);
+   struct net_device *netdev;
+   struct tls_device *dev;
+
+   /* Device bound to specific IP */
+   if (inet_sk(sk)->inet_rcv_saddr) {
+   netdev = find_netdev(sk);
+   if (!netdev)
+   goto out;
+
+   /* Device supports Inline record processing */
+   if (!(netdev->features & NETIF_F_HW_TLS_INLINE))
+   goto out;
+
+   mutex_lock(_mutex);
+   list_for_each_entry(dev, _list, dev_list) {
+   if (dev->netdev && dev->netdev(dev, netdev))
+   break;
+   }
+   mutex_unlock(_mutex);
+
+   ctx->tx_conf = TLS_FULL_HW;
+   if (dev->prot)
+   dev->prot(dev, sk);
+   } else { /* src address not known or INADDR_ANY */
+   mutex_lock(_mutex);
+   list_for_each_entry(dev, _list, dev_list) {
+   if (dev->feature && dev->feature(dev)) {
+   ctx->tx_conf = TLS_FULL_HW;
+   break;
+   }
+   }
+   mutex_unlock(_mutex);
+   update_sk_prot(sk, ctx);
+   }
+out:
+   return ctx->tx_conf;
+}
+
+static int tls_hw_prot(struct sock *sk)
+{
+   /* search registered tls device for netdev */
+   return get_tls_prot(sk);
+}
+
+static void tls_hw_unhash(struct sock *sk)
+{
+   struct tls_device *dev;
+
+   mutex_lock(_mutex);
+   list_for_each_entry(dev, _list, dev_list) {
+   if (dev->unhash)
+   dev->unhash(dev, sk);
+   }
+   mutex_unlock(_mutex);
+   tcp_prot.unhash(sk);
+}
+
+static int tls_hw_hash(struct sock *sk)
+{
+   struct tls_device *dev;
+   int err;
+
+   err = tcp_prot.hash(sk);
+   mutex_lock(_mutex);
+   list_for_each_entry(dev, _list, dev_list) {
+   if (dev->hash)
+   err |= dev->hash(dev, sk);
+   }
+   mutex_unlock(_mutex);
+
+   if (err)
+   tls_hw_unhash(sk);
+   return err;
+}
+
 static int tls_init(struct sock *sk)
 {
struct inet_connection_sock *icsk = inet_csk(sk);
@@ -466,6 +556,9 @@ static int tls_init(struct sock *sk)
ctx->sk_proto_close = sk->sk_prot->close;
 
ctx->tx_conf = TLS_BASE_TX;
+   if (tls_hw_prot(sk) == TLS_FULL_HW)
+   goto out;
+
update_sk_prot(sk, ctx);
 out:
return rc;
@@ -487,7 +580,27 @@ static void build_protos(struct proto *prot, struct proto 
*base)
prot[TLS_SW_TX] = prot[TLS_BASE_TX];
prot[TLS_SW_TX].sendmsg = tls_sw_sendmsg;
prot[TLS_SW_TX].sendpage= tls_sw_sendpage;
+
+   prot[TLS_FULL_HW]   = prot[TLS_BASE_TX];
+   prot[TLS_FULL_HW].hash  = tls_hw_hash;
+   prot[TLS_FULL_HW].unhash= tls_hw_unhash;
+}
+
+void tls_register_device(struct tls_device *device)
+{
+   mutex_lock(_mutex);
+   list_add_tail(>dev_list, _list);
+   mutex_unlock(_mutex);
+}
+EXPORT_SYMBOL(tls_register_device);
+
+void tls_unregister_device(struct tls_device *device)
+{
+   mutex_lock(_mutex);
+   list_del(>dev_list);
+   mutex_unlock(_mutex);
 }
+EXPORT_SYMBOL(tls_unregister_device);
 
 static int __init tls_register(void)
 {
-- 
1.8.3.1



[Crypto v4 01/12] tls: tls_device struct to register TLS drivers

2018-02-12 Thread Atul Gupta
added tls_device structure to register Inline TLS
drivers with net/tls

Signed-off-by: Atul Gupta 
---
 include/net/tls.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/include/net/tls.h b/include/net/tls.h
index 936cfc5..2a9f392 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -55,6 +55,25 @@
 
 #define TLS_AAD_SPACE_SIZE 13
 
+#define TLS_DEVICE_NAME_MAX64
+
+struct tls_device {
+   char name[TLS_DEVICE_NAME_MAX];
+   struct list_head dev_list;
+
+   /* When calling get_netdev, the HW vendor's driver should return the
+* net device of device @device at port @port_num or NULL if such
+* a net device doesn't exist
+*/
+   struct net_device *(*netdev)(struct tls_device *device,
+struct net_device *netdev);
+   int (*feature)(struct tls_device *device);
+   int (*hash)(struct tls_device *device, struct sock *sk);
+   void (*unhash)(struct tls_device *device, struct sock *sk);
+   void (*prot)(struct tls_device *device,
+struct sock *sk);
+};
+
 struct tls_sw_context {
struct crypto_aead *aead_send;
 
@@ -254,5 +273,7 @@ static inline struct tls_offload_context *tls_offload_ctx(
 
 int tls_proccess_cmsg(struct sock *sk, struct msghdr *msg,
  unsigned char *record_type);
+void tls_register_device(struct tls_device *device);
+void tls_unregister_device(struct tls_device *device);
 
 #endif /* _TLS_OFFLOAD_H */
-- 
1.8.3.1



[Crypto v4 02/12] ethtool: feature for Inline TLS in HW

2018-02-12 Thread Atul Gupta
Signed-off-by: Atul Gupta 
---
 include/linux/netdev_features.h | 2 ++
 net/core/ethtool.c  | 1 +
 2 files changed, 3 insertions(+)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index b1b0ca7..e1a33b7 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -77,6 +77,7 @@ enum {
NETIF_F_HW_ESP_BIT, /* Hardware ESP transformation offload 
*/
NETIF_F_HW_ESP_TX_CSUM_BIT, /* ESP with TX checksum offload */
NETIF_F_RX_UDP_TUNNEL_PORT_BIT, /* Offload of RX port for UDP tunnels */
+   NETIF_F_HW_TLS_INLINE_BIT,  /* Offload TLS record */
 
/*
 * Add your fresh new feature above and remember to update
@@ -142,6 +143,7 @@ enum {
 #define NETIF_F_HW_ESP __NETIF_F(HW_ESP)
 #define NETIF_F_HW_ESP_TX_CSUM __NETIF_F(HW_ESP_TX_CSUM)
 #defineNETIF_F_RX_UDP_TUNNEL_PORT  __NETIF_F(RX_UDP_TUNNEL_PORT)
+#define NETIF_F_HW_TLS_INLINE   __NETIF_F(HW_TLS_INLINE)
 
 #define for_each_netdev_feature(mask_addr, bit)\
for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f8fcf45..cac1c77 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -106,6 +106,7 @@ int ethtool_op_get_ts_info(struct net_device *dev, struct 
ethtool_ts_info *info)
[NETIF_F_HW_ESP_BIT] =   "esp-hw-offload",
[NETIF_F_HW_ESP_TX_CSUM_BIT] =   "esp-tx-csum-hw-offload",
[NETIF_F_RX_UDP_TUNNEL_PORT_BIT] =   "rx-udp_tunnel-port-offload",
+   [NETIF_F_HW_TLS_INLINE_BIT] ="tls-inline",
 };
 
 static const char
-- 
1.8.3.1