Re: FW: [PATCH V6 5/7] crypto: AES CBC multi-buffer glue code
On Tue, 2017-07-25 at 10:17 +0800, Herbert Xu wrote: > On Mon, Jul 24, 2017 at 06:09:56PM -0700, Megha Dey wrote: > > > > Under the skcipher interface, if both the outer and inner alg are async, > > there should not be any problem right? Currently I do not see any > > existing algorithms have both algorithms async. > > That's because the purpose of cryptd is to turn a sync algorithm > into an async one. > > Your mcryptd is completely different. We already went through > this discussion for sha1-mb. This is no different. You should > choose the type that fits your circumstances. I have updated the inner algorithm to also be async in v7 patch series. > > Cheers,
Re: FW: [PATCH V6 5/7] crypto: AES CBC multi-buffer glue code
On Mon, Jul 24, 2017 at 06:09:56PM -0700, Megha Dey wrote: > > Under the skcipher interface, if both the outer and inner alg are async, > there should not be any problem right? Currently I do not see any > existing algorithms have both algorithms async. That's because the purpose of cryptd is to turn a sync algorithm into an async one. Your mcryptd is completely different. We already went through this discussion for sha1-mb. This is no different. You should choose the type that fits your circumstances. Cheers, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: FW: [PATCH V6 5/7] crypto: AES CBC multi-buffer glue code
On Tue, 2017-07-25 at 00:53 +, Dey, Megha wrote: > > -Original Message- > From: Herbert Xu [mailto:herb...@gondor.apana.org.au] > Sent: Wednesday, July 19, 2017 12:02 AM > To: Dey, Megha <megha@intel.com> > Cc: Tim Chen <tim.c.c...@linux.intel.com>; da...@davemloft.net; > linux-crypto@vger.kernel.org; linux-ker...@vger.kernel.org > Subject: Re: [PATCH V6 5/7] crypto: AES CBC multi-buffer glue code > > On Tue, Jul 18, 2017 at 06:18:59PM -0700, Megha Dey wrote: > > > > > >> +/* > > > >> + * CRYPTO_ALG_ASYNC flag is passed to indicate we have an ablk > > > >> + * scatter-gather walk. > > > >> + */ > > > >> +static struct skcipher_alg aes_cbc_mb_alg = { > > > >> + .base = { > > > >> + .cra_name = "cbc(aes)", > > > >> + .cra_driver_name= "cbc-aes-aesni-mb", > > > >> + .cra_priority = 500, > > > >> + .cra_flags = CRYPTO_ALG_INTERNAL, > > > >> + .cra_blocksize = AES_BLOCK_SIZE, > > > >> + .cra_ctxsize= CRYPTO_AES_CTX_SIZE, > > > >> + .cra_module = THIS_MODULE, > > > >> + }, > > > >> + .min_keysize= AES_MIN_KEY_SIZE, > > > >> + .max_keysize= AES_MAX_KEY_SIZE, > > > >> + .ivsize = AES_BLOCK_SIZE, > > > >> + .setkey = aes_set_key, > > > >> + .encrypt= mb_aes_cbc_encrypt, > > > >> + .decrypt= mb_aes_cbc_decrypt > > > >> +}; > > > > > > > > So this claims to be a sync algorithm. Is this really the case? > > > > yes, the inner algorithm is sync whereas the outer algorithm is async. > > Are you saying that once mb_aes_cbc_encrypt returns it will never touch the > input/output again? This doesn't seem to agree with the actual code: > > + if (!ret_rctx) { > + /* we submitted a job, but none completed */ > + /* just notify the caller */ > + notify_callback(rctx, cstate, -EINPROGRESS); > + return 0; > + } > > Just because an algorithm is internal doesn't mean that it can arbitrarily > violate the crypto API requirements. > > Cheers, Hi Herbert, Under the skcipher interface, if both the outer and inner alg are async, there should not be any problem right? Currently I do not see any existing algorithms have both algorithms async. > -- > Email: Herbert Xu <herb...@gondor.apana.org.au> Home Page: > http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH V6 5/7] crypto: AES CBC multi-buffer glue code
On Tue, Jul 18, 2017 at 06:18:59PM -0700, Megha Dey wrote: > > > >> +/* > > >> + * CRYPTO_ALG_ASYNC flag is passed to indicate we have an ablk > > >> + * scatter-gather walk. > > >> + */ > > >> +static struct skcipher_alg aes_cbc_mb_alg = { > > >> +.base = { > > >> +.cra_name = "cbc(aes)", > > >> +.cra_driver_name= "cbc-aes-aesni-mb", > > >> +.cra_priority = 500, > > >> +.cra_flags = CRYPTO_ALG_INTERNAL, > > >> +.cra_blocksize = AES_BLOCK_SIZE, > > >> +.cra_ctxsize= CRYPTO_AES_CTX_SIZE, > > >> +.cra_module = THIS_MODULE, > > >> +}, > > >> +.min_keysize= AES_MIN_KEY_SIZE, > > >> +.max_keysize= AES_MAX_KEY_SIZE, > > >> +.ivsize = AES_BLOCK_SIZE, > > >> +.setkey = aes_set_key, > > >> +.encrypt= mb_aes_cbc_encrypt, > > >> +.decrypt= mb_aes_cbc_decrypt > > >> +}; > > > > > > So this claims to be a sync algorithm. Is this really the case? > > yes, the inner algorithm is sync whereas the outer algorithm is async. Are you saying that once mb_aes_cbc_encrypt returns it will never touch the input/output again? This doesn't seem to agree with the actual code: + if (!ret_rctx) { + /* we submitted a job, but none completed */ + /* just notify the caller */ + notify_callback(rctx, cstate, -EINPROGRESS); + return 0; + } Just because an algorithm is internal doesn't mean that it can arbitrarily violate the crypto API requirements. Cheers, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH V6 5/7] crypto: AES CBC multi-buffer glue code
On Tue, 2017-07-18 at 17:52 -0700, Tim Chen wrote: > On 07/17/2017 10:41 PM, Herbert Xu wrote: > > On Tue, Jun 27, 2017 at 05:26:13PM -0700, Megha Dey wrote: > >> > >> +static void completion_callback(struct mcryptd_skcipher_request_ctx *rctx, > >> + struct mcryptd_alg_cstate *cstate, > >> + int err) > >> +{ > >> + struct skcipher_request *req = cast_mcryptd_ctx_to_req(rctx); > >> + > >> + /* remove from work list and invoke completion callback */ > >> + spin_lock(>work_lock); > >> + list_del(>waiter); > >> + spin_unlock(>work_lock); > >> + > >> + if (irqs_disabled()) > >> + rctx->complete(>base, err); > >> + else { > >> + local_bh_disable(); > >> + rctx->complete(>base, err); > >> + local_bh_enable(); > >> + } > >> +} > > > > The fact that you need to do this check means that this design is > > wrong. You should always know what context you are in. > > > > I think you are right. The irqs_disabled check is not necessary > as we only call this function in the context of the mcryptd thread. > When I wrote the original mb algorithms I was probably unsure > and put this check in as a precaution in other mb algorithms and > Megha did the same. I will make this change. > > >> +/* > >> + * CRYPTO_ALG_ASYNC flag is passed to indicate we have an ablk > >> + * scatter-gather walk. > >> + */ > >> +static struct skcipher_alg aes_cbc_mb_alg = { > >> + .base = { > >> + .cra_name = "cbc(aes)", > >> + .cra_driver_name= "cbc-aes-aesni-mb", > >> + .cra_priority = 500, > >> + .cra_flags = CRYPTO_ALG_INTERNAL, > >> + .cra_blocksize = AES_BLOCK_SIZE, > >> + .cra_ctxsize= CRYPTO_AES_CTX_SIZE, > >> + .cra_module = THIS_MODULE, > >> + }, > >> + .min_keysize= AES_MIN_KEY_SIZE, > >> + .max_keysize= AES_MAX_KEY_SIZE, > >> + .ivsize = AES_BLOCK_SIZE, > >> + .setkey = aes_set_key, > >> + .encrypt= mb_aes_cbc_encrypt, > >> + .decrypt= mb_aes_cbc_decrypt > >> +}; > > > > So this claims to be a sync algorithm. Is this really the case? yes, the inner algorithm is sync whereas the outer algorithm is async. > > > > Cheers, > > >
Re: [PATCH V6 5/7] crypto: AES CBC multi-buffer glue code
On 07/17/2017 10:41 PM, Herbert Xu wrote: > On Tue, Jun 27, 2017 at 05:26:13PM -0700, Megha Dey wrote: >> >> +static void completion_callback(struct mcryptd_skcipher_request_ctx *rctx, >> +struct mcryptd_alg_cstate *cstate, >> +int err) >> +{ >> +struct skcipher_request *req = cast_mcryptd_ctx_to_req(rctx); >> + >> + /* remove from work list and invoke completion callback */ >> +spin_lock(>work_lock); >> +list_del(>waiter); >> +spin_unlock(>work_lock); >> + >> +if (irqs_disabled()) >> +rctx->complete(>base, err); >> +else { >> +local_bh_disable(); >> +rctx->complete(>base, err); >> +local_bh_enable(); >> +} >> +} > > The fact that you need to do this check means that this design is > wrong. You should always know what context you are in. > I think you are right. The irqs_disabled check is not necessary as we only call this function in the context of the mcryptd thread. When I wrote the original mb algorithms I was probably unsure and put this check in as a precaution in other mb algorithms and Megha did the same. >> +/* >> + * CRYPTO_ALG_ASYNC flag is passed to indicate we have an ablk >> + * scatter-gather walk. >> + */ >> +static struct skcipher_alg aes_cbc_mb_alg = { >> +.base = { >> +.cra_name = "cbc(aes)", >> +.cra_driver_name= "cbc-aes-aesni-mb", >> +.cra_priority = 500, >> +.cra_flags = CRYPTO_ALG_INTERNAL, >> +.cra_blocksize = AES_BLOCK_SIZE, >> +.cra_ctxsize= CRYPTO_AES_CTX_SIZE, >> +.cra_module = THIS_MODULE, >> +}, >> +.min_keysize= AES_MIN_KEY_SIZE, >> +.max_keysize= AES_MAX_KEY_SIZE, >> +.ivsize = AES_BLOCK_SIZE, >> +.setkey = aes_set_key, >> +.encrypt= mb_aes_cbc_encrypt, >> +.decrypt= mb_aes_cbc_decrypt >> +}; > > So this claims to be a sync algorithm. Is this really the case? > > Cheers, >
Re: [PATCH V6 5/7] crypto: AES CBC multi-buffer glue code
On Tue, Jun 27, 2017 at 05:26:13PM -0700, Megha Dey wrote: > > +static void completion_callback(struct mcryptd_skcipher_request_ctx *rctx, > + struct mcryptd_alg_cstate *cstate, > + int err) > +{ > + struct skcipher_request *req = cast_mcryptd_ctx_to_req(rctx); > + > + /* remove from work list and invoke completion callback */ > + spin_lock(>work_lock); > + list_del(>waiter); > + spin_unlock(>work_lock); > + > + if (irqs_disabled()) > + rctx->complete(>base, err); > + else { > + local_bh_disable(); > + rctx->complete(>base, err); > + local_bh_enable(); > + } > +} The fact that you need to do this check means that this design is wrong. You should always know what context you are in. > +/* > + * CRYPTO_ALG_ASYNC flag is passed to indicate we have an ablk > + * scatter-gather walk. > + */ > +static struct skcipher_alg aes_cbc_mb_alg = { > + .base = { > + .cra_name = "cbc(aes)", > + .cra_driver_name= "cbc-aes-aesni-mb", > + .cra_priority = 500, > + .cra_flags = CRYPTO_ALG_INTERNAL, > + .cra_blocksize = AES_BLOCK_SIZE, > + .cra_ctxsize= CRYPTO_AES_CTX_SIZE, > + .cra_module = THIS_MODULE, > + }, > + .min_keysize= AES_MIN_KEY_SIZE, > + .max_keysize= AES_MAX_KEY_SIZE, > + .ivsize = AES_BLOCK_SIZE, > + .setkey = aes_set_key, > + .encrypt= mb_aes_cbc_encrypt, > + .decrypt= mb_aes_cbc_decrypt > +}; So this claims to be a sync algorithm. Is this really the case? Cheers, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[PATCH V6 5/7] crypto: AES CBC multi-buffer glue code
This patch introduces the multi-buffer job manager which is responsible for submitting scatter-gather buffers from several AES CBC jobs to the multi-buffer algorithm. The glue code interfaces with the underlying algorithm that handles 8 data streams of AES CBC encryption in parallel. AES key expansion and CBC decryption requests are performed in a manner similar to the existing AESNI Intel glue driver. The outline of the algorithm for AES CBC encryption requests is sketched below: Any driver requesting the crypto service will place an async crypto request on the workqueue. The multi-buffer crypto daemon will pull an AES CBC encryption request from work queue and put each request in an empty data lane for multi-buffer crypto computation. When all the empty lanes are filled, computation will commence on the jobs in parallel and the job with the shortest remaining buffer will get completed and be returned. To prevent prolonged stall, when no new jobs arrive, we will flush workqueue of jobs after a maximum allowable delay has elapsed. To accommodate the fragmented nature of scatter-gather, we will keep submitting the next scatter-buffer fragment for a job for multi-buffer computation until a job is completed and no more buffer fragments remain. At that time we will pull a new job to fill the now empty data slot. We check with the multibuffer scheduler to see if there are other completed jobs to prevent extraneous delay in returning any completed jobs. This multi-buffer algorithm should be used for cases where we get at least 8 streams of crypto jobs submitted at a reasonably high rate. For low crypto job submission rate and low number of data streams, this algorithm will not be beneficial. The reason is at low rate, we do not fill out the data lanes before flushing the jobs instead of processing them with all the data lanes full. We will miss the benefit of parallel computation, and adding delay to the processing of the crypto job at the same time. Some tuning of the maximum latency parameter may be needed to get the best performance. Originally-by: Chandramouli NarayananSigned-off-by: Megha Dey Acked-by: Tim Chen --- arch/x86/crypto/Makefile| 1 + arch/x86/crypto/aes-cbc-mb/Makefile | 22 + arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c | 727 3 files changed, 750 insertions(+) create mode 100644 arch/x86/crypto/aes-cbc-mb/Makefile create mode 100644 arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile index 34b3fa2..cc556a7 100644 --- a/arch/x86/crypto/Makefile +++ b/arch/x86/crypto/Makefile @@ -33,6 +33,7 @@ obj-$(CONFIG_CRYPTO_CRC32_PCLMUL) += crc32-pclmul.o obj-$(CONFIG_CRYPTO_SHA256_SSSE3) += sha256-ssse3.o obj-$(CONFIG_CRYPTO_SHA512_SSSE3) += sha512-ssse3.o obj-$(CONFIG_CRYPTO_CRCT10DIF_PCLMUL) += crct10dif-pclmul.o +obj-$(CONFIG_CRYPTO_AES_CBC_MB) += aes-cbc-mb/ obj-$(CONFIG_CRYPTO_POLY1305_X86_64) += poly1305-x86_64.o # These modules require assembler to support AVX. diff --git a/arch/x86/crypto/aes-cbc-mb/Makefile b/arch/x86/crypto/aes-cbc-mb/Makefile new file mode 100644 index 000..b642bd8 --- /dev/null +++ b/arch/x86/crypto/aes-cbc-mb/Makefile @@ -0,0 +1,22 @@ +# +# Arch-specific CryptoAPI modules. +# + +avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no) + +# we need decryption and key expansion routine symbols +# if either AESNI_NI_INTEL or AES_CBC_MB is a module + +ifeq ($(CONFIG_CRYPTO_AES_NI_INTEL),m) + dec_support := ../aesni-intel_asm.o +endif +ifeq ($(CONFIG_CRYPTO_AES_CBC_MB),m) + dec_support := ../aesni-intel_asm.o +endif + +ifeq ($(avx_supported),yes) + obj-$(CONFIG_CRYPTO_AES_CBC_MB) += aes-cbc-mb.o + aes-cbc-mb-y := $(dec_support) aes_cbc_mb.o aes_mb_mgr_init.o \ + mb_mgr_inorder_x8_asm.o mb_mgr_ooo_x8_asm.o \ + aes_cbc_enc_x8.o +endif diff --git a/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c new file mode 100644 index 000..7d38041 --- /dev/null +++ b/arch/x86/crypto/aes-cbc-mb/aes_cbc_mb.c @@ -0,0 +1,727 @@ +/* + * Multi buffer AES CBC algorithm glue code + * + * + * This file is provided under a dual BSD/GPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * GPL LICENSE SUMMARY + * + * Copyright(c) 2016 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * Contact Information: +