Re: [PATCH v3 4/4] crypto: add CRYPTO_TFM_REQ_IV_SERIALIZE flag

2018-02-13 Thread Harsh Jain


On 10-02-2018 03:34, Stephan Müller wrote:
> Crypto drivers may implement a streamlined serialization support for AIO
> requests that is reported by the CRYPTO_ALG_SERIALIZES_IV_ACCESS flag to
> the crypto user. When the user decides that he wants to send multiple
> AIO requests concurrently and wants the crypto driver to handle the
> serialization, the caller has to set CRYPTO_TFM_REQ_IV_SERIALIZE to notify
> the crypto driver.
Will crypto_alloc_* API takes cares of this flag?. For Kernel Crypto user IV 
Synchronization logic depends on weather tfm allocated supports IV 
Serialisation or not.
>
> Only when this flag is enabled, the crypto driver shall apply its
> serialization logic for handling IV updates between requests. If this
> flag is not provided, the serialization logic shall not be applied by
> the driver as the caller decides that it does not need it (because no
> parallel AIO requests are sent) or that it performs its own
> serialization.
>
> Signed-off-by: Stephan Mueller 
> ---
>  crypto/algif_aead.c | 15 ---
>  crypto/algif_skcipher.c | 15 ---
>  include/linux/crypto.h  |  1 +
>  3 files changed, 25 insertions(+), 6 deletions(-)
>
> diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
> index 619147792cc9..5ec4dec6e6a1 100644
> --- a/crypto/algif_aead.c
> +++ b/crypto/algif_aead.c
> @@ -66,13 +66,22 @@ static int aead_sendmsg(struct socket *sock, struct 
> msghdr *msg, size_t size)
>  {
>   struct sock *sk = sock->sk;
>   struct alg_sock *ask = alg_sk(sk);
> + struct af_alg_ctx *ctx = ask->private;
>   struct sock *psk = ask->parent;
>   struct alg_sock *pask = alg_sk(psk);
>   struct aead_tfm *aeadc = pask->private;
> - struct crypto_aead *tfm = aeadc->aead;
> - unsigned int ivsize = crypto_aead_ivsize(tfm);
> + struct crypto_aead *aead = aeadc->aead;
> + struct crypto_tfm *tfm = crypto_aead_tfm(aead);
> + unsigned int ivsize = crypto_aead_ivsize(aead);
> + int ret = af_alg_sendmsg(sock, msg, size, ivsize);
> +
> + if (ret < 0)
> + return ret;
>  
> - return af_alg_sendmsg(sock, msg, size, ivsize);
> + if (ctx->iiv == ALG_IV_SERIAL_PROCESSING)
> + tfm->crt_flags |= CRYPTO_TFM_REQ_IV_SERIALIZE;
> +
> + return ret;
>  }
>  
>  static int crypto_aead_copy_sgl(struct crypto_skcipher *null_tfm,
> diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
> index cf27dda6a181..fd2a0ba32feb 100644
> --- a/crypto/algif_skcipher.c
> +++ b/crypto/algif_skcipher.c
> @@ -43,12 +43,21 @@ static int skcipher_sendmsg(struct socket *sock, struct 
> msghdr *msg,
>  {
>   struct sock *sk = sock->sk;
>   struct alg_sock *ask = alg_sk(sk);
> + struct af_alg_ctx *ctx = ask->private;
>   struct sock *psk = ask->parent;
>   struct alg_sock *pask = alg_sk(psk);
> - struct crypto_skcipher *tfm = pask->private;
> - unsigned ivsize = crypto_skcipher_ivsize(tfm);
> + struct crypto_skcipher *skc = pask->private;
> + struct crypto_tfm *tfm = crypto_skcipher_tfm(skc);
> + unsigned int ivsize = crypto_skcipher_ivsize(skc);
> + int ret = af_alg_sendmsg(sock, msg, size, ivsize);
> +
> + if (ret < 0)
> + return ret;
>  
> - return af_alg_sendmsg(sock, msg, size, ivsize);
> + if (ctx->iiv == ALG_IV_SERIAL_PROCESSING)
> + tfm->crt_flags |= CRYPTO_TFM_REQ_IV_SERIALIZE;
> +
> + return ret;
>  }
>  
>  static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg,
> diff --git a/include/linux/crypto.h b/include/linux/crypto.h
> index 4860aa2c9be4..4d54f2b30692 100644
> --- a/include/linux/crypto.h
> +++ b/include/linux/crypto.h
> @@ -133,6 +133,7 @@
>  #define CRYPTO_TFM_REQ_WEAK_KEY  0x0100
>  #define CRYPTO_TFM_REQ_MAY_SLEEP 0x0200
>  #define CRYPTO_TFM_REQ_MAY_BACKLOG   0x0400
> +#define CRYPTO_TFM_REQ_IV_SERIALIZE  0x0800
>  #define CRYPTO_TFM_RES_WEAK_KEY  0x0010
>  #define CRYPTO_TFM_RES_BAD_KEY_LEN   0x0020
>  #define CRYPTO_TFM_RES_BAD_KEY_SCHED 0x0040



Re: [PATCH v3 1/4] crypto: AF_ALG AIO - lock context IV

2018-02-13 Thread Harsh Jain


On 10-02-2018 03:33, Stephan Müller wrote:
> The kernel crypto API requires the caller to set an IV in the request data
> structure. That request data structure shall define one particular cipher
> operation. During the cipher operation, the IV is read by the cipher
> implementation and eventually the potentially updated IV (e.g. in case of
> CBC) is written back to the memory location the request data structure
> points to.
>
> AF_ALG allows setting the IV with a sendmsg request, where the IV is stored
> in the AF_ALG context that is unique to one particular AF_ALG socket. Note
> the analogy: an AF_ALG socket is like a TFM where one recvmsg operation
> uses one request with the TFM from the socket.
>
> AF_ALG these days supports AIO operations with multiple IOCBs. I.e. with
> one recvmsg call, multiple IOVECs can be specified. Each individual IOCB
> (derived from one IOVEC) implies that one request data structure is
> created with the data to be processed by the cipher implementation. The
> IV that was set with the sendmsg call is registered with the request data
> structure before the cipher operation.
>
> In case of an AIO operation, the cipher operation invocation returns
> immediately, queuing the request to the hardware. While the AIO request is
> processed by the hardware, recvmsg processes the next IOVEC for which
> another request is created. Again, the IV buffer from the AF_ALG socket
> context is registered with the new request and the cipher operation is
> invoked.
>
> You may now see that there is a potential race condition regarding the IV
> handling, because there is *no* separate IV buffer for the different
> requests. This is nicely demonstrated with libkcapi using the following
> command which creates an AIO request with two IOCBs each encrypting one
> AES block in CBC mode:
>
> kcapi  -d 2 -x 9  -e -c "cbc(aes)" -k
> 8d7dd9b0170ce0b5f2f8e1aa768e01e91da8bfc67fd486d081b28254c99eb423 -i
> 7fbc02ebf5b93322329df9bfccb635af -p 48981da18e4bb9ef7e2e3162d16b1910
>
> When the first AIO request finishes before the 2nd AIO request is
> processed, the returned value is:
>
> 8b19050f66582cb7f7e4b6c873819b7108afa0eaa7de29bac7d903576b674c32
>
> I.e. two blocks where the IV output from the first request is the IV input
> to the 2nd block.
>
> In case the first AIO request is not completed before the 2nd request
> commences, the result is two identical AES blocks (i.e. both use the same
> IV):
>
> 8b19050f66582cb7f7e4b6c873819b718b19050f66582cb7f7e4b6c873819b71
>
> This inconsistent result may even lead to the conclusion that there can be
> a memory corruption in the IV buffer if both AIO requests write to the IV
> buffer at the same time.
>
> As the AF_ALG interface is used by user space, a mutex provides the
> serialization which puts the caller to sleep in case a previous IOCB
> processing is not yet finished.
>
> If multiple IOCBs arrive that all are blocked, the mutex' FIFO operation
> of processing arriving requests ensures that the blocked IOCBs are
> unblocked in the right order.
Hi Stephen,

Patch set is working fine with chelsio Driver.
Do we really need IV locking mechanism for AEAD algo because AEAD algo's don't 
support Partial mode operation and Driver are not updating(atleast Chelsio) 
IV's on AEAD request completions.
>
> CC:  #4.14
> Fixes: e870456d8e7c8 ("crypto: algif_skcipher - overhaul memory management")
> Fixes: d887c52d6ae43 ("crypto: algif_aead - overhaul memory management")
> Signed-off-by: Stephan Mueller 
> ---
>  crypto/af_alg.c | 31 +++
>  crypto/algif_aead.c | 20 +---
>  crypto/algif_skcipher.c | 12 +---
>  include/crypto/if_alg.h |  5 +
>  4 files changed, 58 insertions(+), 10 deletions(-)
>
> diff --git a/crypto/af_alg.c b/crypto/af_alg.c
> index 5231f421ad00..e7887621aa44 100644
> --- a/crypto/af_alg.c
> +++ b/crypto/af_alg.c
> @@ -1051,6 +1051,8 @@ void af_alg_async_cb(struct crypto_async_request *_req, 
> int err)
>   struct kiocb *iocb = areq->iocb;
>   unsigned int resultlen;
>  
> + af_alg_put_iv(sk);
> +
>   /* Buffer size written by crypto operation. */
>   resultlen = areq->outlen;
>  
> @@ -1175,6 +1177,35 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr 
> *msg, int flags,
>  }
>  EXPORT_SYMBOL_GPL(af_alg_get_rsgl);
>  
> +/**
> + * af_alg_get_iv
> + *
> + * @sk [in] AF_ALG socket
> + * @return 0 on success, < 0 on error
> + */
> +int af_alg_get_iv(struct sock *sk)
> +{
> + struct alg_sock *ask = alg_sk(sk);
> + struct af_alg_ctx *ctx = ask->private;
> +
> + return mutex_lock_interruptible(>ivlock);
> +}
> +EXPORT_SYMBOL_GPL(af_alg_get_iv);
> +
> +/**
> + * af_alg_put_iv - release lock on IV in case CTX IV is used
> + *
> + * @sk [in] AF_ALG socket
> + */
> +void af_alg_put_iv(struct sock *sk)
> +{
> + struct alg_sock *ask = alg_sk(sk);
> + struct af_alg_ctx *ctx = ask->private;
> +
> + 

Re: [PATCH 14/14] x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather

2018-02-13 Thread Junaid Shahid
[Resending after delivery failure]

Hi Dave,

On 02/13/2018 10:22 AM, Dave Watson wrote:
>
> Yes, these both sound reasonable.  I will send a V2.
>
> Thanks!

Another minor suggestion for v2:

It might be a good idea to check if the first assoclen bytes are already 
contiguous and only do the kmalloc if that isn't the case.

Thanks,
Junaid


Re: [PATCH v2 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS

2018-02-13 Thread Ard Biesheuvel
On 13 February 2018 at 18:57, Eric Biggers  wrote:
> Hi Ard,
>
> On Tue, Feb 13, 2018 at 11:34:36AM +, Ard Biesheuvel wrote:
>> Hi Eric,
>>
>> On 12 February 2018 at 23:52, Eric Biggers  wrote:
>> > Add an ARM NEON-accelerated implementation of Speck-XTS.  It operates on
>> > 128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
>> > Speck64.  Each 128-byte chunk goes through XTS preprocessing, then is
>> > encrypted/decrypted (doing one cipher round for all the blocks, then the
>> > next round, etc.), then goes through XTS postprocessing.
>> >
>> > The performance depends on the processor but can be about 3 times faster
>> > than the generic code.  For example, on an ARMv7 processor we observe
>> > the following performance with Speck128/256-XTS:
>> >
>> > xts-speck128-neon: Encryption 107.9 MB/s, Decryption 108.1 MB/s
>> > xts(speck128-generic): Encryption  32.1 MB/s, Decryption  36.6 MB/s
>> >
>> > In comparison to AES-256-XTS without the Cryptography Extensions:
>> >
>> > xts-aes-neonbs:Encryption  41.2 MB/s, Decryption  36.7 MB/s
>> > xts(aes-asm):  Encryption  31.7 MB/s, Decryption  30.8 MB/s
>> > xts(aes-generic):  Encryption  21.2 MB/s, Decryption  20.9 MB/s
>> >
>> > Speck64/128-XTS is even faster:
>> >
>> > xts-speck64-neon:  Encryption 138.6 MB/s, Decryption 139.1 MB/s
>> >
>> > Note that as with the generic code, only the Speck128 and Speck64
>> > variants are supported.  Also, for now only the XTS mode of operation is
>> > supported, to target the disk and file encryption use cases.  The NEON
>> > code also only handles the portion of the data that is evenly divisible
>> > into 128-byte chunks, with any remainder handled by a C fallback.  Of
>> > course, other modes of operation could be added later if needed, and/or
>> > the NEON code could be updated to handle other buffer sizes.
>> >
>> > The XTS specification is only defined for AES which has a 128-bit block
>> > size, so for the GF(2^64) math needed for Speck64-XTS we use the
>> > reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
>> > paper.  Of course, when possible users should use Speck128-XTS, but even
>> > that may be too slow on some processors; Speck64-XTS can be faster.
>> >
>>
>> I think this is excellent work. Speck seems an appropriate solution to
>> this problem, and I'm glad we are not ending up with a stream cipher
>> for block encryption.
>>
>> Also, I think an arm64 port would be nice. I may take a stab at this
>> if nobody else beats me to it.
>
> We don't really want to encourage people to use Speck over AES with the
> Cryptography Extensions, so that's why I didn't include an arm64 port.  That
> being said, I suppose we can't stop people from adding an arm64 port if they
> really do prefer Speck, or maybe for use on arm64 CPUs that don't have the
> Cryptography Extensions (though I thought that almost all do).
>

Many do, but not all of them. A notable exception is the Raspberry Pi 3.

>>
>> I did run into an issue with this code though: On big-endian, I get
>>
>> [ 0.272381] alg: skcipher: Test 1 failed (invalid result) on
>> encryption for xts-speck64-neon
>> [0.276151] : 84 af 54 07 19 d4 7c a6 9c 8a ac f6 c2 14 04 d8
>> [0.278541] 0010: 7f 18 6c 43 56 ed 0b b3 92 21 a2 d9 17 59 e4 3b
>>
>> so there may be a byte order corner case you missed in the rewrite (or
>> the issue existed before, as I did not test your v1)
>>
>
> To be honest I haven't tested either version on a big endian ARM CPU yet.  I
> don't really know how to do that currently; maybe it's possible with QEMU.
>

I tested this on a big-endian 32-bit VM running under KVM on a 64-bit host.

> But assuming I haven't missed anything, in the assembly code everything is
> treated as byte arrays with the exception of the round keys which are 32-bit 
> or
> 64-bit numbers in CPU endianness.  The byte arrays are loaded and stored with
> vld1.8 and vst1.8 while the round keys are loaded with vld1.32 or vld1.64, so
> the assembly code *should* work correctly on a big endian CPU.
>

Indeed.

> However, looking over it now, I think there is a bug in the glue code for
> Speck64-XTS when it handles buffers not evenly divisible into 128 bytes.
> Namely, the tweak is treated as CPU endian when it should be little endian.
> Could you try the following patch?
>
> diff --git a/arch/arm/crypto/speck-neon-glue.c 
> b/arch/arm/crypto/speck-neon-glue.c
> index 3987dd6e063e..960cc634b36f 100644
> --- a/arch/arm/crypto/speck-neon-glue.c
> +++ b/arch/arm/crypto/speck-neon-glue.c
> @@ -157,7 +157,7 @@ __speck64_xts_crypt(struct skcipher_request *req, 
> speck64_crypt_one_t crypt_one,
> struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
> struct skcipher_walk walk;
> -   u64 tweak;
> +   __le64 tweak;
> int err;

Re: [PATCH v2 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS

2018-02-13 Thread Eric Biggers
Hi Ard,

On Tue, Feb 13, 2018 at 11:34:36AM +, Ard Biesheuvel wrote:
> Hi Eric,
> 
> On 12 February 2018 at 23:52, Eric Biggers  wrote:
> > Add an ARM NEON-accelerated implementation of Speck-XTS.  It operates on
> > 128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
> > Speck64.  Each 128-byte chunk goes through XTS preprocessing, then is
> > encrypted/decrypted (doing one cipher round for all the blocks, then the
> > next round, etc.), then goes through XTS postprocessing.
> >
> > The performance depends on the processor but can be about 3 times faster
> > than the generic code.  For example, on an ARMv7 processor we observe
> > the following performance with Speck128/256-XTS:
> >
> > xts-speck128-neon: Encryption 107.9 MB/s, Decryption 108.1 MB/s
> > xts(speck128-generic): Encryption  32.1 MB/s, Decryption  36.6 MB/s
> >
> > In comparison to AES-256-XTS without the Cryptography Extensions:
> >
> > xts-aes-neonbs:Encryption  41.2 MB/s, Decryption  36.7 MB/s
> > xts(aes-asm):  Encryption  31.7 MB/s, Decryption  30.8 MB/s
> > xts(aes-generic):  Encryption  21.2 MB/s, Decryption  20.9 MB/s
> >
> > Speck64/128-XTS is even faster:
> >
> > xts-speck64-neon:  Encryption 138.6 MB/s, Decryption 139.1 MB/s
> >
> > Note that as with the generic code, only the Speck128 and Speck64
> > variants are supported.  Also, for now only the XTS mode of operation is
> > supported, to target the disk and file encryption use cases.  The NEON
> > code also only handles the portion of the data that is evenly divisible
> > into 128-byte chunks, with any remainder handled by a C fallback.  Of
> > course, other modes of operation could be added later if needed, and/or
> > the NEON code could be updated to handle other buffer sizes.
> >
> > The XTS specification is only defined for AES which has a 128-bit block
> > size, so for the GF(2^64) math needed for Speck64-XTS we use the
> > reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
> > paper.  Of course, when possible users should use Speck128-XTS, but even
> > that may be too slow on some processors; Speck64-XTS can be faster.
> >
> 
> I think this is excellent work. Speck seems an appropriate solution to
> this problem, and I'm glad we are not ending up with a stream cipher
> for block encryption.
> 
> Also, I think an arm64 port would be nice. I may take a stab at this
> if nobody else beats me to it.

We don't really want to encourage people to use Speck over AES with the
Cryptography Extensions, so that's why I didn't include an arm64 port.  That
being said, I suppose we can't stop people from adding an arm64 port if they
really do prefer Speck, or maybe for use on arm64 CPUs that don't have the
Cryptography Extensions (though I thought that almost all do).

> 
> I did run into an issue with this code though: On big-endian, I get
> 
> [0.272381] alg: skcipher: Test 1 failed (invalid result) on
> encryption for xts-speck64-neon
> [0.276151] : 84 af 54 07 19 d4 7c a6 9c 8a ac f6 c2 14 04 d8
> [0.278541] 0010: 7f 18 6c 43 56 ed 0b b3 92 21 a2 d9 17 59 e4 3b
> 
> so there may be a byte order corner case you missed in the rewrite (or
> the issue existed before, as I did not test your v1)
> 

To be honest I haven't tested either version on a big endian ARM CPU yet.  I
don't really know how to do that currently; maybe it's possible with QEMU.

But assuming I haven't missed anything, in the assembly code everything is
treated as byte arrays with the exception of the round keys which are 32-bit or
64-bit numbers in CPU endianness.  The byte arrays are loaded and stored with
vld1.8 and vst1.8 while the round keys are loaded with vld1.32 or vld1.64, so
the assembly code *should* work correctly on a big endian CPU.

However, looking over it now, I think there is a bug in the glue code for
Speck64-XTS when it handles buffers not evenly divisible into 128 bytes.
Namely, the tweak is treated as CPU endian when it should be little endian.
Could you try the following patch?

diff --git a/arch/arm/crypto/speck-neon-glue.c 
b/arch/arm/crypto/speck-neon-glue.c
index 3987dd6e063e..960cc634b36f 100644
--- a/arch/arm/crypto/speck-neon-glue.c
+++ b/arch/arm/crypto/speck-neon-glue.c
@@ -157,7 +157,7 @@ __speck64_xts_crypt(struct skcipher_request *req, 
speck64_crypt_one_t crypt_one,
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
-   u64 tweak;
+   __le64 tweak;
int err;
 
err = skcipher_walk_virt(, req, true);
@@ -184,16 +184,16 @@ __speck64_xts_crypt(struct skcipher_request *req, 
speck64_crypt_one_t crypt_one,
}
 
/* Handle any remainder with generic code */
-   while (nbytes >= sizeof(u64)) {
-   *(u64 *)dst = *(u64 *)src ^ tweak;
+  

Re: [PATCH 14/14] x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather

2018-02-13 Thread Dave Watson
On 02/13/18 08:42 AM, Stephan Mueller wrote:
> > +static int gcmaes_encrypt_sg(struct aead_request *req, unsigned int
> > assoclen, + u8 *hash_subkey, u8 *iv, void *aes_ctx)
> > +{
> > +   struct crypto_aead *tfm = crypto_aead_reqtfm(req);
> > +   unsigned long auth_tag_len = crypto_aead_authsize(tfm);
> > +   struct gcm_context_data data AESNI_ALIGN_ATTR;
> > +   struct scatter_walk dst_sg_walk = {};
> > +   unsigned long left = req->cryptlen;
> > +   unsigned long len, srclen, dstlen;
> > +   struct scatter_walk src_sg_walk;
> > +   struct scatterlist src_start[2];
> > +   struct scatterlist dst_start[2];
> > +   struct scatterlist *src_sg;
> > +   struct scatterlist *dst_sg;
> > +   u8 *src, *dst, *assoc;
> > +   u8 authTag[16];
> > +
> > +   assoc = kmalloc(assoclen, GFP_ATOMIC);
> > +   if (unlikely(!assoc))
> > +   return -ENOMEM;
> > +   scatterwalk_map_and_copy(assoc, req->src, 0, assoclen, 0);
> 
> Have you tested that this code does not barf when assoclen is 0?
> 
> Maybe it is worth while to finally add a test vector to testmgr.h which 
> validates such scenario. If you would like, here is a vector you could add to 
> testmgr:
> 
> https://github.com/smuellerDD/libkcapi/blob/master/test/test.sh#L315

I tested assoclen and cryptlen being 0 and it works, yes.  Both
kmalloc and scatterwalk_map_and_copy work correctly with 0 assoclen.

> This is a decryption of gcm(aes) with no message, no AAD and just a tag. The 
> result should be EBADMSG.
> > +
> > +   src_sg = scatterwalk_ffwd(src_start, req->src, req->assoclen);
> 
> Why do you use assoclen in the map_and_copy, and req->assoclen in the ffwd?

If I understand correctly, rfc4106 appends extra data after the assoc.
assoclen is the real assoc length, req->assoclen is assoclen + the
extra data length.  So we ffwd by req->assoclen in the scatterlist,
but use assoclen when memcpy and testing.

> > 
> > +static int gcmaes_decrypt_sg(struct aead_request *req, unsigned int
> > assoclen, + u8 *hash_subkey, u8 *iv, void *aes_ctx)
> > +{
> 
> This is a lot of code duplication.

I will merge them and send a V2.

> Ciao
> Stephan
> 
> 

Thanks!


Re: [PATCH 14/14] x86/crypto: aesni: Update aesni-intel_glue to use scatter/gather

2018-02-13 Thread Dave Watson
On 02/12/18 03:12 PM, Junaid Shahid wrote:
> Hi Dave,
> 
> 
> On 02/12/2018 11:51 AM, Dave Watson wrote:
> 
> > +static int gcmaes_encrypt_sg(struct aead_request *req, unsigned int 
> > assoclen,
> > +   u8 *hash_subkey, u8 *iv, void *aes_ctx)
> >  
> > +static int gcmaes_decrypt_sg(struct aead_request *req, unsigned int 
> > assoclen,
> > +   u8 *hash_subkey, u8 *iv, void *aes_ctx)
> 
> These two functions are almost identical. Wouldn't it be better to combine 
> them into a single encrypt/decrypt function, similar to what you have done 
> for the assembly macros?
> 
> > +   if (((struct crypto_aes_ctx *)aes_ctx)->key_length != AES_KEYSIZE_128 ||
> > +   aesni_gcm_enc_tfm == aesni_gcm_enc) {
> 
> Shouldn't we also include a check for the buffer length being less than 
> AVX_GEN2_OPTSIZE? AVX will not be used in that case either.

Yes, these both sound reasonable.  I will send a V2.

Thanks!


RE: [Crypto v4 03/12] support for inline tls

2018-02-13 Thread Atul Gupta


-Original Message-
From: David Miller [mailto:da...@davemloft.net] 
Sent: Tuesday, February 13, 2018 1:19 AM
To: Atul Gupta 
Cc: davejwat...@fb.com; herb...@gondor.apana.org.au; s...@queasysnail.net; 
linux-crypto@vger.kernel.org; net...@vger.kernel.org; Ganesh GR 

Subject: Re: [Crypto v4 03/12] support for inline tls

From: Atul Gupta 
Date: Mon, 12 Feb 2018 17:34:28 +0530

> +static int get_tls_prot(struct sock *sk) {
> + struct tls_context *ctx = tls_get_ctx(sk);
> + struct net_device *netdev;
> + struct tls_device *dev;
> +
> + /* Device bound to specific IP */
> + if (inet_sk(sk)->inet_rcv_saddr) {
> + netdev = find_netdev(sk);
> + if (!netdev)
> + goto out;
> +
> + /* Device supports Inline record processing */
> + if (!(netdev->features & NETIF_F_HW_TLS_INLINE))
> + goto out;
> +
> + mutex_lock(_mutex);
> + list_for_each_entry(dev, _list, dev_list) {
> + if (dev->netdev && dev->netdev(dev, netdev))
> + break;
> + }
> + mutex_unlock(_mutex);
> +
> + ctx->tx_conf = TLS_FULL_HW;
> + if (dev->prot)
> + dev->prot(dev, sk);

What if the same IP address is configured on multiple interfaces?

Thanks, I overlooked this point.
The checks above were based on the premise that device chosen is indeed the one 
with Inline TLS enabled, net_device corresponding to specific IP address, 
feature configured for device from ethtool and net_device corresponds to Inline 
TLS driver registered with net tls.

Case with same IP configured on multiple interface looks similar to INADDR_ANY 
below. 

The TLS_FULL_HW and modified hash routines handles devices with/without Inline 
TLS support. The first Inline TLS capable device updates sk_prot for 
TLS_FULL_HW. The tls_hw_hash listens on all interfaces and process device 
specific routine, the listen however succeeds for device on which connect is 
initiated and may not have the Inline TLS capability, such connection continues 
in TLS_BASE_TX or non-tls-offload mode. On the other hand, if Inline TLS 
capable device were to establish connection it updates the prot as required for 
offload mode to continue.

> + } else { /* src address not known or INADDR_ANY */
> + mutex_lock(_mutex);
> + list_for_each_entry(dev, _list, dev_list) {
> + if (dev->feature && dev->feature(dev)) {
> + ctx->tx_conf = TLS_FULL_HW;
> + break;
> + }
> + }
> + mutex_unlock(_mutex);
> + update_sk_prot(sk, ctx);

And I think this is even more of a stretch.  Just because you find an inline 
TLS device on the global list doesn't mean traffic will necessarily flow 
through it once the connection is fully established and therefore be able to 
provide inline TLS offloading.


Re: [PATCH v2 3/5] crypto: arm/speck - add NEON-accelerated implementation of Speck-XTS

2018-02-13 Thread Ard Biesheuvel
Hi Eric,

On 12 February 2018 at 23:52, Eric Biggers  wrote:
> Add an ARM NEON-accelerated implementation of Speck-XTS.  It operates on
> 128-byte chunks at a time, i.e. 8 blocks for Speck128 or 16 blocks for
> Speck64.  Each 128-byte chunk goes through XTS preprocessing, then is
> encrypted/decrypted (doing one cipher round for all the blocks, then the
> next round, etc.), then goes through XTS postprocessing.
>
> The performance depends on the processor but can be about 3 times faster
> than the generic code.  For example, on an ARMv7 processor we observe
> the following performance with Speck128/256-XTS:
>
> xts-speck128-neon: Encryption 107.9 MB/s, Decryption 108.1 MB/s
> xts(speck128-generic): Encryption  32.1 MB/s, Decryption  36.6 MB/s
>
> In comparison to AES-256-XTS without the Cryptography Extensions:
>
> xts-aes-neonbs:Encryption  41.2 MB/s, Decryption  36.7 MB/s
> xts(aes-asm):  Encryption  31.7 MB/s, Decryption  30.8 MB/s
> xts(aes-generic):  Encryption  21.2 MB/s, Decryption  20.9 MB/s
>
> Speck64/128-XTS is even faster:
>
> xts-speck64-neon:  Encryption 138.6 MB/s, Decryption 139.1 MB/s
>
> Note that as with the generic code, only the Speck128 and Speck64
> variants are supported.  Also, for now only the XTS mode of operation is
> supported, to target the disk and file encryption use cases.  The NEON
> code also only handles the portion of the data that is evenly divisible
> into 128-byte chunks, with any remainder handled by a C fallback.  Of
> course, other modes of operation could be added later if needed, and/or
> the NEON code could be updated to handle other buffer sizes.
>
> The XTS specification is only defined for AES which has a 128-bit block
> size, so for the GF(2^64) math needed for Speck64-XTS we use the
> reducing polynomial 'x^64 + x^4 + x^3 + x + 1' given by the original XEX
> paper.  Of course, when possible users should use Speck128-XTS, but even
> that may be too slow on some processors; Speck64-XTS can be faster.
>

I think this is excellent work. Speck seems an appropriate solution to
this problem, and I'm glad we are not ending up with a stream cipher
for block encryption.

Also, I think an arm64 port would be nice. I may take a stab at this
if nobody else beats me to it.

I did run into an issue with this code though: On big-endian, I get

[0.272381] alg: skcipher: Test 1 failed (invalid result) on
encryption for xts-speck64-neon
[0.276151] : 84 af 54 07 19 d4 7c a6 9c 8a ac f6 c2 14 04 d8
[0.278541] 0010: 7f 18 6c 43 56 ed 0b b3 92 21 a2 d9 17 59 e4 3b

so there may be a byte order corner case you missed in the rewrite (or
the issue existed before, as I did not test your v1)

-- 
Ard.


> Signed-off-by: Eric Biggers 
> ---
>  arch/arm/crypto/Kconfig   |   6 +
>  arch/arm/crypto/Makefile  |   2 +
>  arch/arm/crypto/speck-neon-core.S | 432 
> ++
>  arch/arm/crypto/speck-neon-glue.c | 290 +
>  4 files changed, 730 insertions(+)
>  create mode 100644 arch/arm/crypto/speck-neon-core.S
>  create mode 100644 arch/arm/crypto/speck-neon-glue.c
>
> diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
> index b8e69fe282b8..925d1364727a 100644
> --- a/arch/arm/crypto/Kconfig
> +++ b/arch/arm/crypto/Kconfig
> @@ -121,4 +121,10 @@ config CRYPTO_CHACHA20_NEON
> select CRYPTO_BLKCIPHER
> select CRYPTO_CHACHA20
>
> +config CRYPTO_SPECK_NEON
> +   tristate "NEON accelerated Speck cipher algorithms"
> +   depends on KERNEL_MODE_NEON
> +   select CRYPTO_BLKCIPHER
> +   select CRYPTO_SPECK
> +
>  endif
> diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
> index 30ef8e291271..a758107c5525 100644
> --- a/arch/arm/crypto/Makefile
> +++ b/arch/arm/crypto/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
>  obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
>  obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
>  obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
> +obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
>
>  ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
>  ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
> @@ -53,6 +54,7 @@ ghash-arm-ce-y:= ghash-ce-core.o ghash-ce-glue.o
>  crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
>  crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
>  chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
> +speck-neon-y := speck-neon-core.o speck-neon-glue.o
>
>  quiet_cmd_perl = PERL$@
>cmd_perl = $(PERL) $(<) > $(@)
> diff --git a/arch/arm/crypto/speck-neon-core.S 
> b/arch/arm/crypto/speck-neon-core.S
> new file mode 100644
> index ..3c1e203e53b9
> --- /dev/null
> +++ b/arch/arm/crypto/speck-neon-core.S
> @@ -0,0 +1,432 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * NEON-accelerated implementation of 

Re: [PATCH] crypto: arm/aes-cipher - move S-box to .rodata section

2018-02-13 Thread Ard Biesheuvel
On 12 February 2018 at 13:52, Jinbum Park  wrote:
> Move the AES inverse S-box to the .rodata section
> where it is safe from abuse by speculation.
>
> Signed-off-by: Jinbum Park 

Acked-by: Ard Biesheuvel 

> ---
>  arch/arm/crypto/aes-cipher-core.S | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm/crypto/aes-cipher-core.S 
> b/arch/arm/crypto/aes-cipher-core.S
> index 54b3840..184d6c2 100644
> --- a/arch/arm/crypto/aes-cipher-core.S
> +++ b/arch/arm/crypto/aes-cipher-core.S
> @@ -174,6 +174,16 @@
> .ltorg
> .endm
>
> +ENTRY(__aes_arm_encrypt)
> +   do_cryptfround, crypto_ft_tab, crypto_ft_tab + 1, 2
> +ENDPROC(__aes_arm_encrypt)
> +
> +   .align  5
> +ENTRY(__aes_arm_decrypt)
> +   do_cryptiround, crypto_it_tab, __aes_arm_inverse_sbox, 0
> +ENDPROC(__aes_arm_decrypt)
> +
> +   .section".rodata", "a"
> .align  L1_CACHE_SHIFT
> .type   __aes_arm_inverse_sbox, %object
>  __aes_arm_inverse_sbox:
> @@ -210,12 +220,3 @@ __aes_arm_inverse_sbox:
> .byte   0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26
> .byte   0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d
> .size   __aes_arm_inverse_sbox, . - __aes_arm_inverse_sbox
> -
> -ENTRY(__aes_arm_encrypt)
> -   do_cryptfround, crypto_ft_tab, crypto_ft_tab + 1, 2
> -ENDPROC(__aes_arm_encrypt)
> -
> -   .align  5
> -ENTRY(__aes_arm_decrypt)
> -   do_cryptiround, crypto_it_tab, __aes_arm_inverse_sbox, 0
> -ENDPROC(__aes_arm_decrypt)
> --
> 1.9.1
>


Re: [PATCH] crypto: Fix incorrect values in PKCS#1 test vector

2018-02-13 Thread Stephan Mueller
Am Dienstag, 13. Februar 2018, 09:29:56 CET schrieb Mcloughlin, Conor:

Hi Conor,

> The RSA private key for the first form should have
> version, prime1, prime2, exponent1, exponent2, coefficient
> values 0.
> With non-zero values for prime1,2, exponent 1,2 and coefficient
> the Intel QAT driver will assume that values are provided for the
> private key second form. This will result in signature verification
> failures for modules where QAT device is present and the modules
> are signed with rsa,sha256.
> 
> Cc: 
> Signed-off-by: Giovanni Cabiddu 
> Signed-off-by: Conor McLoughlin 

Reviewed-by: Stephan Mueller 

Ciao
Stephan




[PATCH 7/8] crypto: inside-secure - keep the requests push/pop synced

2018-02-13 Thread Antoine Tenart
This patch updates the Inside Secure SafeXcel driver to avoid being
out-of-sync between the number of requests sent and the one being
completed.

The number of requests acknowledged by the driver can be different than
the threshold that was configured if new requests were being pushed to
the h/w in the meantime. The driver wasn't taking those into account,
and the number of remaining requests to handled (to reconfigure the
interrupt threshold) could be out-of sync.

This patch fixes it by not taking in account the number of requests
left, but by taking in account the total number of requests being sent
to the hardware, so that new requests are being taken into account.

Fixes: dc7e28a3286e ("crypto: inside-secure - dequeue all requests at once")
Suggested-by: Ofer Heifetz 
Signed-off-by: Antoine Tenart 
---
 drivers/crypto/inside-secure/safexcel.c | 28 +---
 drivers/crypto/inside-secure/safexcel.h |  6 ++
 2 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/drivers/crypto/inside-secure/safexcel.c 
b/drivers/crypto/inside-secure/safexcel.c
index 0642d7181c9e..956a37692e42 100644
--- a/drivers/crypto/inside-secure/safexcel.c
+++ b/drivers/crypto/inside-secure/safexcel.c
@@ -432,20 +432,18 @@ static int safexcel_hw_init(struct safexcel_crypto_priv 
*priv)
 }
 
 /* Called with ring's lock taken */
-static int safexcel_try_push_requests(struct safexcel_crypto_priv *priv,
- int ring, int reqs)
+static void safexcel_try_push_requests(struct safexcel_crypto_priv *priv,
+  int ring)
 {
-   int coal = min_t(int, reqs, EIP197_MAX_BATCH_SZ);
+   int coal = min_t(int, priv->ring[ring].requests, EIP197_MAX_BATCH_SZ);
 
if (!coal)
-   return 0;
+   return;
 
/* Configure when we want an interrupt */
writel(EIP197_HIA_RDR_THRESH_PKT_MODE |
   EIP197_HIA_RDR_THRESH_PROC_PKT(coal),
   EIP197_HIA_RDR(priv, ring) + EIP197_HIA_xDR_THRESH);
-
-   return coal;
 }
 
 void safexcel_dequeue(struct safexcel_crypto_priv *priv, int ring)
@@ -521,13 +519,13 @@ void safexcel_dequeue(struct safexcel_crypto_priv *priv, 
int ring)
 
spin_lock_bh(>ring[ring].egress_lock);
 
+   priv->ring[ring].requests += nreq;
+
if (!priv->ring[ring].busy) {
-   nreq -= safexcel_try_push_requests(priv, ring, nreq);
+   safexcel_try_push_requests(priv, ring);
priv->ring[ring].busy = true;
}
 
-   priv->ring[ring].requests_left += nreq;
-
spin_unlock_bh(>ring[ring].egress_lock);
 
/* let the RDR know we have pending descriptors */
@@ -631,7 +629,7 @@ static inline void safexcel_handle_result_descriptor(struct 
safexcel_crypto_priv
 {
struct safexcel_request *sreq;
struct safexcel_context *ctx;
-   int ret, i, nreq, ndesc, tot_descs, done;
+   int ret, i, nreq, ndesc, tot_descs, handled = 0;
bool should_complete;
 
 handle_results:
@@ -667,6 +665,7 @@ static inline void safexcel_handle_result_descriptor(struct 
safexcel_crypto_priv
 
kfree(sreq);
tot_descs += ndesc;
+   handled++;
}
 
 acknowledge:
@@ -685,11 +684,10 @@ static inline void 
safexcel_handle_result_descriptor(struct safexcel_crypto_priv
 requests_left:
spin_lock_bh(>ring[ring].egress_lock);
 
-   done = safexcel_try_push_requests(priv, ring,
- priv->ring[ring].requests_left);
+   priv->ring[ring].requests -= handled;
+   safexcel_try_push_requests(priv, ring);
 
-   priv->ring[ring].requests_left -= done;
-   if (!done && !priv->ring[ring].requests_left)
+   if (!priv->ring[ring].requests)
priv->ring[ring].busy = false;
 
spin_unlock_bh(>ring[ring].egress_lock);
@@ -970,7 +968,7 @@ static int safexcel_probe(struct platform_device *pdev)
goto err_clk;
}
 
-   priv->ring[i].requests_left = 0;
+   priv->ring[i].requests = 0;
priv->ring[i].busy = false;
 
crypto_init_queue(>ring[i].queue,
diff --git a/drivers/crypto/inside-secure/safexcel.h 
b/drivers/crypto/inside-secure/safexcel.h
index 4e219c21608b..caaf6a81b162 100644
--- a/drivers/crypto/inside-secure/safexcel.h
+++ b/drivers/crypto/inside-secure/safexcel.h
@@ -551,10 +551,8 @@ struct safexcel_crypto_priv {
struct crypto_queue queue;
spinlock_t queue_lock;
 
-   /* Number of requests in the engine that needs the threshold
-* interrupt to be set up.
-*/
-   int requests_left;
+   /* Number of requests in the engine. */
+   int requests;
 
/* The ring is currently handling at least one request */
bool busy;

[PATCH] crypto: Fix incorrect values in PKCS#1 test vector

2018-02-13 Thread Mcloughlin, Conor
The RSA private key for the first form should have
version, prime1, prime2, exponent1, exponent2, coefficient
values 0.
With non-zero values for prime1,2, exponent 1,2 and coefficient
the Intel QAT driver will assume that values are provided for the
private key second form. This will result in signature verification
failures for modules where QAT device is present and the modules
are signed with rsa,sha256.

Cc: 
Signed-off-by: Giovanni Cabiddu 
Signed-off-by: Conor McLoughlin 
---
 crypto/testmgr.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 6044f69..69fb51e 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -548,7 +548,7 @@ struct kpp_testvec {
 static const struct akcipher_testvec pkcs1pad_rsa_tv_template[] = {
{
.key =
-   "\x30\x82\x03\x1f\x02\x01\x10\x02\x82\x01\x01\x00\xd7\x1e\x77\x82"
+   "\x30\x82\x03\x1f\x02\x01\x00\x02\x82\x01\x01\x00\xd7\x1e\x77\x82"
"\x8c\x92\x31\xe7\x69\x02\xa2\xd5\x5c\x78\xde\xa2\x0c\x8f\xfe\x28"
"\x59\x31\xdf\x40\x9c\x60\x61\x06\xb9\x2f\x62\x40\x80\x76\xcb\x67"
"\x4a\xb5\x59\x56\x69\x17\x07\xfa\xf9\x4c\xbd\x6c\x37\x7a\x46\x7d"
@@ -597,8 +597,8 @@ struct kpp_testvec {
"\xfe\xf8\x27\x1b\xd6\x55\x60\x5e\x48\xb7\x6d\x9a\xa8\x37\xf9\x7a"
"\xde\x1b\xcd\x5d\x1a\x30\xd4\xe9\x9e\x5b\x3c\x15\xf8\x9c\x1f\xda"
"\xd1\x86\x48\x55\xce\x83\xee\x8e\x51\xc7\xde\x32\x12\x47\x7d\x46"
-   "\xb8\x35\xdf\x41\x02\x01\x30\x02\x01\x30\x02\x01\x30\x02\x01\x30"
-   "\x02\x01\x30",
+   "\xb8\x35\xdf\x41\x02\x01\x00\x02\x01\x00\x02\x01\x00\x02\x01\x00"
+   "\x02\x01\x00",
.key_len = 804,
/*
 * m is SHA256 hash of following message:
-- 
1.9.1


--
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263


This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.



[PATCH 8/8] crypto: inside-secure - unmap the result in the hash send error path

2018-02-13 Thread Antoine Tenart
This patch adds a label to unmap the result buffer in the hash send
function error path.

Fixes: 1b44c5a60c13 ("crypto: inside-secure - add SafeXcel EIP197 crypto engine 
driver")
Suggested-by: Ofer Heifetz 
Signed-off-by: Antoine Tenart 
---
 drivers/crypto/inside-secure/safexcel_hash.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/inside-secure/safexcel_hash.c 
b/drivers/crypto/inside-secure/safexcel_hash.c
index 43e94cd59c86..a4960a934eef 100644
--- a/drivers/crypto/inside-secure/safexcel_hash.c
+++ b/drivers/crypto/inside-secure/safexcel_hash.c
@@ -303,7 +303,7 @@ static int safexcel_ahash_send_req(struct 
crypto_async_request *async, int ring,
   req->state_sz);
if (IS_ERR(rdesc)) {
ret = PTR_ERR(rdesc);
-   goto cdesc_rollback;
+   goto unmap_result;
}
 
spin_unlock_bh(>ring[ring].egress_lock);
@@ -315,6 +315,8 @@ static int safexcel_ahash_send_req(struct 
crypto_async_request *async, int ring,
*results = 1;
return 0;
 
+unmap_result:
+   dma_unmap_sg(priv->dev, areq->src, req->nents, DMA_TO_DEVICE);
 cdesc_rollback:
for (i = 0; i < n_cdesc; i++)
safexcel_ring_rollback_wptr(priv, >ring[ring].cdr);
-- 
2.14.3



[PATCH 6/8] crypto: inside-secure - fix the invalidation step during cra_exit

2018-02-13 Thread Antoine Tenart
When exiting a transformation, the cra_exit() helper is called in each
driver providing one. The Inside Secure SafeXcel driver has one, which
is responsible of freeing some areas and of sending one invalidation
request to the crypto engine, to invalidate the context that was used
during the transformation.

We could see in some setups (when lots of transformations were being
used with a short lifetime, and hence lots of cra_exit() calls) NULL
pointer dereferences and other weird issues. All these issues were
coming from accessing the tfm context.

The issue is the invalidation request completion is checked using a
wait_for_completion_interruptible() call in both the cipher and hash
cra_exit() helpers. In some cases this was interrupted while the
invalidation request wasn't processed yet. And then cra_exit() returned,
and its caller was freeing the tfm instance. Only then the request was
being handled by the SafeXcel driver, which lead to the said issues.

This patch fixes this by using wait_for_completion() calls in these
specific cases.

Fixes: 1b44c5a60c13 ("crypto: inside-secure - add SafeXcel EIP197 crypto engine 
driver")
Signed-off-by: Antoine Tenart 
---
 drivers/crypto/inside-secure/safexcel_cipher.c | 2 +-
 drivers/crypto/inside-secure/safexcel_hash.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/inside-secure/safexcel_cipher.c 
b/drivers/crypto/inside-secure/safexcel_cipher.c
index 63a8768ed2ae..17a7725a6f6d 100644
--- a/drivers/crypto/inside-secure/safexcel_cipher.c
+++ b/drivers/crypto/inside-secure/safexcel_cipher.c
@@ -456,7 +456,7 @@ static int safexcel_cipher_exit_inv(struct crypto_tfm *tfm)
queue_work(priv->ring[ring].workqueue,
   >ring[ring].work_data.work);
 
-   wait_for_completion_interruptible();
+   wait_for_completion();
 
if (result.error) {
dev_warn(priv->dev,
diff --git a/drivers/crypto/inside-secure/safexcel_hash.c 
b/drivers/crypto/inside-secure/safexcel_hash.c
index 2951101e1831..43e94cd59c86 100644
--- a/drivers/crypto/inside-secure/safexcel_hash.c
+++ b/drivers/crypto/inside-secure/safexcel_hash.c
@@ -493,7 +493,7 @@ static int safexcel_ahash_exit_inv(struct crypto_tfm *tfm)
queue_work(priv->ring[ring].workqueue,
   >ring[ring].work_data.work);
 
-   wait_for_completion_interruptible();
+   wait_for_completion();
 
if (result.error) {
dev_warn(priv->dev, "hash: completion error (%d)\n",
-- 
2.14.3



[PATCH 0/8] crypto: inside-secure - stabilization and fixes

2018-02-13 Thread Antoine Tenart
Hi Herbert,

We spent a lot of time with Ofer to test various use cases of the Inside
Secure driver. We performed many tests in addition to the crypto
subsystem ones (IPsec, openssl speed, tcrypt...). As a result the driver
is a lot more stable in various configurations, with this series
applied.

The fixes of this series aren't all related, but were all found while
performing these stabilization process and tests.

Also as the company I'm working at changed its name from Free Electrons
to Bootlin, the first patch updates my email address in the MAINTAINERS
file.

The series is based on v4.16-rc1.

Thanks!
Antoine

Antoine Tenart (8):
  MAINTAINERS: update the Inside Secure maintainer email
  crypto: inside-secure - do not overwrite the threshold value
  crypto: inside-secure - fix the extra cache computation
  crypto: inside-secure - fix the cache_len computation
  crypto: inside-secure - do not process request if no command was
issued
  crypto: inside-secure - fix the invalidation step during cra_exit
  crypto: inside-secure - keep the requests push/pop synced
  crypto: inside-secure - unmap the result in the hash send error path

 MAINTAINERS|  2 +-
 drivers/crypto/inside-secure/safexcel.c| 40 +++---
 drivers/crypto/inside-secure/safexcel.h|  6 ++--
 drivers/crypto/inside-secure/safexcel_cipher.c |  2 +-
 drivers/crypto/inside-secure/safexcel_hash.c   | 10 ---
 5 files changed, 33 insertions(+), 27 deletions(-)

-- 
2.14.3



[PATCH 2/8] crypto: inside-secure - do not overwrite the threshold value

2018-02-13 Thread Antoine Tenart
This patch fixes the Inside Secure SafeXcel driver not to overwrite the
interrupt threshold value. In certain cases the value of this register,
which controls when to fire an interrupt, was overwritten. This lead to
packet not being processed or acked as the driver never was aware of
their completion.

This patch fixes this behaviour by not setting the threshold when
requests are being processed by the engine.

Fixes: dc7e28a3286e ("crypto: inside-secure - dequeue all requests at once")
Suggested-by: Ofer Heifetz 
Signed-off-by: Antoine Tenart 
---
 drivers/crypto/inside-secure/safexcel.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/crypto/inside-secure/safexcel.c 
b/drivers/crypto/inside-secure/safexcel.c
index 225e74a7f724..5cb90bcd3f18 100644
--- a/drivers/crypto/inside-secure/safexcel.c
+++ b/drivers/crypto/inside-secure/safexcel.c
@@ -514,8 +514,7 @@ void safexcel_dequeue(struct safexcel_crypto_priv *priv, 
int ring)
 
if (!priv->ring[ring].busy) {
nreq -= safexcel_try_push_requests(priv, ring, nreq);
-   if (nreq)
-   priv->ring[ring].busy = true;
+   priv->ring[ring].busy = true;
}
 
priv->ring[ring].requests_left += nreq;
-- 
2.14.3



[PATCH 4/8] crypto: inside-secure - fix the cache_len computation

2018-02-13 Thread Antoine Tenart
This patch fixes the cache length computation as cache_len could end up
being a negative value. The check between the queued size and the
block size is updated to reflect the caching mechanism which can cache
up to a full block size (included!).

Fixes: 809778e02cd4 ("crypto: inside-secure - fix hash when length is a 
multiple of a block")
Signed-off-by: Antoine Tenart 
---
 drivers/crypto/inside-secure/safexcel_hash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/inside-secure/safexcel_hash.c 
b/drivers/crypto/inside-secure/safexcel_hash.c
index 8fac23b78da5..2951101e1831 100644
--- a/drivers/crypto/inside-secure/safexcel_hash.c
+++ b/drivers/crypto/inside-secure/safexcel_hash.c
@@ -184,7 +184,7 @@ static int safexcel_ahash_send_req(struct 
crypto_async_request *async, int ring,
int i, queued, len, cache_len, extra, n_cdesc = 0, ret = 0;
 
queued = len = req->len - req->processed;
-   if (queued < crypto_ahash_blocksize(ahash))
+   if (queued <= crypto_ahash_blocksize(ahash))
cache_len = queued;
else
cache_len = queued - areq->nbytes;
-- 
2.14.3



[PATCH 1/8] MAINTAINERS: update the Inside Secure maintainer email

2018-02-13 Thread Antoine Tenart
Free Electrons became Bootlin. Update my email accordingly.

Signed-off-by: Antoine Tenart 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bdc260e36b7..d0f6811b9eed 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6990,7 +6990,7 @@ F:drivers/input/input-mt.c
 K: \b(ABS|SYN)_MT_
 
 INSIDE SECURE CRYPTO DRIVER
-M: Antoine Tenart 
+M: Antoine Tenart 
 F: drivers/crypto/inside-secure/
 S: Maintained
 L: linux-crypto@vger.kernel.org
-- 
2.14.3



[PATCH 5/8] crypto: inside-secure - do not process request if no command was issued

2018-02-13 Thread Antoine Tenart
This patch adds a check in the SafeXcel dequeue function, to avoid
processing request further if no hardware command was issued. This can
happen in certain cases where the ->send() function caches all the data
that would have been send.

Fixes: 809778e02cd4 ("crypto: inside-secure - fix hash when length is a 
multiple of a block")
Signed-off-by: Antoine Tenart 
---
 drivers/crypto/inside-secure/safexcel.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/crypto/inside-secure/safexcel.c 
b/drivers/crypto/inside-secure/safexcel.c
index 5cb90bcd3f18..0642d7181c9e 100644
--- a/drivers/crypto/inside-secure/safexcel.c
+++ b/drivers/crypto/inside-secure/safexcel.c
@@ -490,6 +490,15 @@ void safexcel_dequeue(struct safexcel_crypto_priv *priv, 
int ring)
if (backlog)
backlog->complete(backlog, -EINPROGRESS);
 
+   /* In case the send() helper did not issue any command to push
+* to the engine because the input data was cached, continue to
+* dequeue other requests as this is valid and not an error.
+*/
+   if (!commands && !results) {
+   kfree(request);
+   continue;
+   }
+
spin_lock_bh(>ring[ring].egress_lock);
list_add_tail(>list, >ring[ring].list);
spin_unlock_bh(>ring[ring].egress_lock);
-- 
2.14.3