Re: [dm-devel] [dm-crypt] [RFC PATCH 1/1] Add DM_CRYPT_FORCE_INLINE flag to dm-crypt target

2020-06-24 Thread Damien Le Moal
On 2020/06/24 14:05, Eric Biggers wrote:
> On Fri, Jun 19, 2020 at 05:41:32PM +0100, Ignat Korchagin wrote:
>> Sometimes extra thread offloading imposed by dm-crypt hurts IO latency. This 
>> is
>> especially visible on busy systems with many processes/threads. Moreover, 
>> most
>> Crypto API implementaions are async, that is they offload crypto operations 
>> on
>> their own, so this dm-crypt offloading is excessive.
> 
> This really should say "some Crypto API implementations are async" instead of
> "most Crypto API implementations are async".
> 
> Notably, the AES-NI implementation of AES-XTS is synchronous if you call it 
> in a
> context where SIMD instructions are usable.  It's only asynchronous when SIMD 
> is
> not usable.  (This seems to have been missed in your blog post.)
> 
>> This adds a new flag, which directs dm-crypt not to offload crypto operations
>> and process everything inline. For cases, where crypto operations cannot 
>> happen
>> inline (hard interrupt context, for example the read path of the NVME 
>> driver),
>> we offload the work to a tasklet rather than a workqueue.
> 
> This patch both removes some dm-crypt specific queueing, and changes 
> decryption
> to use softIRQ context instead of a workqueue.  It would be useful to know how
> much of a difference the workqueue => softIRQ change makes by itself.  Such a
> change could be useful for fscrypt as well.  (fscrypt uses a workqueue for
> decryption, but besides that doesn't use any other queueing.)
> 
>> @@ -127,7 +128,7 @@ struct iv_elephant_private {
>>   * and encrypts / decrypts at the same time.
>>   */
>>  enum flags { DM_CRYPT_SUSPENDED, DM_CRYPT_KEY_VALID,
>> - DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD };
>> + DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD, DM_CRYPT_FORCE_INLINE = 
>> (sizeof(unsigned long) * 8 - 1) };
> 
> Assigning a specific enum value isn't necessary.
> 
>> @@ -1458,13 +1459,18 @@ static void crypt_alloc_req_skcipher(struct 
>> crypt_config *cc,
>>  
>>  skcipher_request_set_tfm(ctx->r.req, cc->cipher_tfm.tfms[key_index]);
>>  
>> -/*
>> - * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
>> - * requests if driver request queue is full.
>> - */
>> -skcipher_request_set_callback(ctx->r.req,
>> -CRYPTO_TFM_REQ_MAY_BACKLOG,
>> -kcryptd_async_done, dmreq_of_req(cc, ctx->r.req));
>> +if (test_bit(DM_CRYPT_FORCE_INLINE, >flags))
>> +/* make sure we zero important fields of the request */
>> +skcipher_request_set_callback(ctx->r.req,
>> +0, NULL, NULL);
>> +else
>> +/*
>> + * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
>> + * requests if driver request queue is full.
>> + */
>> +skcipher_request_set_callback(ctx->r.req,
>> +CRYPTO_TFM_REQ_MAY_BACKLOG,
>> +kcryptd_async_done, dmreq_of_req(cc, ctx->r.req));
>>  }
> 
> This looks wrong.  Unless type=0 and mask=CRYPTO_ALG_ASYNC are passed to
> crypto_alloc_skcipher(), the skcipher implementation can still be 
> asynchronous,
> in which case providing a callback is required.

Another point: for a skcipher implementation that is asynchronous, for the
regular case/not-inline, can't we just issue the request directly without using
the workqueue ? If yes, that would save one context switch, since queueing of
the request can be handled by the crypto API when the request callback is set
with CRYPTO_TFM_REQ_MAY_BACKLOG. At least that is how I understood the
documentation & comments. I may be wrong here...

> 
> Do you intend that the "force_inline" option forces the use of a synchronous
> skcipher (alongside the other things it does)?  Or should it still allow
> asynchronous ones?
> 
> We may not actually have a choice in that matter, since xts-aes-aesni has the
> CRYPTO_ALG_ASYNC bit set (as I mentioned) despite being synchronous in most
> cases; thus, the crypto API won't give you it if you ask for a synchronous
> cipher.  So I think you still need to allow async skciphers?  That means a
> callback is still always required.
> 
> - Eric
> 
> --
> dm-devel mailing list
> dm-de...@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> 


-- 
Damien Le Moal
Western Digital Research


Re: [dm-devel] [dm-crypt] [RFC PATCH 1/1] Add DM_CRYPT_FORCE_INLINE flag to dm-crypt target

2020-06-24 Thread Damien Le Moal
On 2020/06/24 14:27, Eric Biggers wrote:
> On Wed, Jun 24, 2020 at 05:21:24AM +, Damien Le Moal wrote:
 @@ -1458,13 +1459,18 @@ static void crypt_alloc_req_skcipher(struct 
 crypt_config *cc,
  
skcipher_request_set_tfm(ctx->r.req, cc->cipher_tfm.tfms[key_index]);
  
 -  /*
 -   * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
 -   * requests if driver request queue is full.
 -   */
 -  skcipher_request_set_callback(ctx->r.req,
 -  CRYPTO_TFM_REQ_MAY_BACKLOG,
 -  kcryptd_async_done, dmreq_of_req(cc, ctx->r.req));
 +  if (test_bit(DM_CRYPT_FORCE_INLINE, >flags))
 +  /* make sure we zero important fields of the request */
 +  skcipher_request_set_callback(ctx->r.req,
 +  0, NULL, NULL);
 +  else
 +  /*
 +   * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
 +   * requests if driver request queue is full.
 +   */
 +  skcipher_request_set_callback(ctx->r.req,
 +  CRYPTO_TFM_REQ_MAY_BACKLOG,
 +  kcryptd_async_done, dmreq_of_req(cc, ctx->r.req));
  }
>>>
>>> This looks wrong.  Unless type=0 and mask=CRYPTO_ALG_ASYNC are passed to
>>> crypto_alloc_skcipher(), the skcipher implementation can still be 
>>> asynchronous,
>>> in which case providing a callback is required.

Digging the code further, in light of your hints, it looks like to fix this, all
that needs to be done is to change crypt_convert_block_skcipher() from doing:

if (bio_data_dir(ctx->bio_in) == WRITE)
r = crypto_skcipher_encrypt(req);
else
r = crypto_skcipher_decrypt(req);

to do something like:

struct crypto_wait wait;

...

if (bio_data_dir(ctx->bio_in) == WRITE) {
if (test_bit(DM_CRYPT_FORCE_INLINE_WRITE, >flags))
r = crypto_wait_req(crypto_skcipher_encrypt(req),
);
else
r = crypto_skcipher_encrypt(req);
} else {
if (test_bit(DM_CRYPT_FORCE_INLINE_READ, >flags))


r = crypto_wait_req(crypto_skcipher_decrypt(req),
);
else
r = crypto_skcipher_decrypt(req);
}

Doing so, crypt_convert_block_skcipher() cannot return -EBUSY nor -EINPROGRESS
for inline IOs, leading to crypt_convert() to see synchronous completions, as
expected for inline case. The above likely does not add much overhead at all for
synchronous skcipher/accelerators, and handles asynchronous ones as if they were
synchronous. Would this be correct ?



>>>
>>> Do you intend that the "force_inline" option forces the use of a synchronous
>>> skcipher (alongside the other things it does)?  Or should it still allow
>>> asynchronous ones?
>>>
>>> We may not actually have a choice in that matter, since xts-aes-aesni has 
>>> the
>>> CRYPTO_ALG_ASYNC bit set (as I mentioned) despite being synchronous in most
>>> cases; thus, the crypto API won't give you it if you ask for a synchronous
>>> cipher.  So I think you still need to allow async skciphers?  That means a
>>> callback is still always required.
>>
>> Arg... So it means that some skciphers will not be OK at all for SMR writes. 
>> I
>> was not aware of these differences (tested with aes-xts-plain64 only). The 
>> ugly
>> way to support async ciphers would be to just wait inline for the crypto API 
>> to
>> complete using a completion for instance. But that is very ugly. Back to
>> brainstorming, and need to learn more about the crypto API...
>>
> 
> It's easy to wait for crypto API requests to complete if you need to --
> just use crypto_wait_req().
> 
> We do this in fs/crypto/, for example.  (Not many people are using fscrypt 
> with
> crypto API based accelerators, so there hasn't yet been much need to support 
> the
> complexity of issuing multiple async crypto requests like dm-crypt supports.)
> 
> - Eric
> 


-- 
Damien Le Moal
Western Digital Research


Re: [dm-devel] [dm-crypt] [RFC PATCH 1/1] Add DM_CRYPT_FORCE_INLINE flag to dm-crypt target

2020-06-24 Thread Damien Le Moal
On 2020/06/24 14:27, Eric Biggers wrote:
> On Wed, Jun 24, 2020 at 05:21:24AM +, Damien Le Moal wrote:
 @@ -1458,13 +1459,18 @@ static void crypt_alloc_req_skcipher(struct 
 crypt_config *cc,
  
skcipher_request_set_tfm(ctx->r.req, cc->cipher_tfm.tfms[key_index]);
  
 -  /*
 -   * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
 -   * requests if driver request queue is full.
 -   */
 -  skcipher_request_set_callback(ctx->r.req,
 -  CRYPTO_TFM_REQ_MAY_BACKLOG,
 -  kcryptd_async_done, dmreq_of_req(cc, ctx->r.req));
 +  if (test_bit(DM_CRYPT_FORCE_INLINE, >flags))
 +  /* make sure we zero important fields of the request */
 +  skcipher_request_set_callback(ctx->r.req,
 +  0, NULL, NULL);
 +  else
 +  /*
 +   * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
 +   * requests if driver request queue is full.
 +   */
 +  skcipher_request_set_callback(ctx->r.req,
 +  CRYPTO_TFM_REQ_MAY_BACKLOG,
 +  kcryptd_async_done, dmreq_of_req(cc, ctx->r.req));
  }
>>>
>>> This looks wrong.  Unless type=0 and mask=CRYPTO_ALG_ASYNC are passed to
>>> crypto_alloc_skcipher(), the skcipher implementation can still be 
>>> asynchronous,
>>> in which case providing a callback is required.
>>>
>>> Do you intend that the "force_inline" option forces the use of a synchronous
>>> skcipher (alongside the other things it does)?  Or should it still allow
>>> asynchronous ones?
>>>
>>> We may not actually have a choice in that matter, since xts-aes-aesni has 
>>> the
>>> CRYPTO_ALG_ASYNC bit set (as I mentioned) despite being synchronous in most
>>> cases; thus, the crypto API won't give you it if you ask for a synchronous
>>> cipher.  So I think you still need to allow async skciphers?  That means a
>>> callback is still always required.
>>
>> Arg... So it means that some skciphers will not be OK at all for SMR writes. 
>> I
>> was not aware of these differences (tested with aes-xts-plain64 only). The 
>> ugly
>> way to support async ciphers would be to just wait inline for the crypto API 
>> to
>> complete using a completion for instance. But that is very ugly. Back to
>> brainstorming, and need to learn more about the crypto API...
>>
> 
> It's easy to wait for crypto API requests to complete if you need to --
> just use crypto_wait_req().

OK. Thanks for the information. I will look into this and the performance
implications. A quick grep shows that a lot of different accelerators for
different architectures have CRYPTO_ALG_ASYNC set. So definitely something that
needs to be checked for SMR, and for Ignat inline patch.

> We do this in fs/crypto/, for example.  (Not many people are using fscrypt 
> with
> crypto API based accelerators, so there hasn't yet been much need to support 
> the
> complexity of issuing multiple async crypto requests like dm-crypt supports.)

Zonefs fscrypt support is on my to do list too :)

Thanks !

>
> - Eric
> 


-- 
Damien Le Moal
Western Digital Research


Re: [dm-devel] [dm-crypt] [RFC PATCH 1/1] Add DM_CRYPT_FORCE_INLINE flag to dm-crypt target

2020-06-23 Thread Eric Biggers
On Wed, Jun 24, 2020 at 05:21:24AM +, Damien Le Moal wrote:
> >> @@ -1458,13 +1459,18 @@ static void crypt_alloc_req_skcipher(struct 
> >> crypt_config *cc,
> >>  
> >>skcipher_request_set_tfm(ctx->r.req, cc->cipher_tfm.tfms[key_index]);
> >>  
> >> -  /*
> >> -   * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
> >> -   * requests if driver request queue is full.
> >> -   */
> >> -  skcipher_request_set_callback(ctx->r.req,
> >> -  CRYPTO_TFM_REQ_MAY_BACKLOG,
> >> -  kcryptd_async_done, dmreq_of_req(cc, ctx->r.req));
> >> +  if (test_bit(DM_CRYPT_FORCE_INLINE, >flags))
> >> +  /* make sure we zero important fields of the request */
> >> +  skcipher_request_set_callback(ctx->r.req,
> >> +  0, NULL, NULL);
> >> +  else
> >> +  /*
> >> +   * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
> >> +   * requests if driver request queue is full.
> >> +   */
> >> +  skcipher_request_set_callback(ctx->r.req,
> >> +  CRYPTO_TFM_REQ_MAY_BACKLOG,
> >> +  kcryptd_async_done, dmreq_of_req(cc, ctx->r.req));
> >>  }
> > 
> > This looks wrong.  Unless type=0 and mask=CRYPTO_ALG_ASYNC are passed to
> > crypto_alloc_skcipher(), the skcipher implementation can still be 
> > asynchronous,
> > in which case providing a callback is required.
> > 
> > Do you intend that the "force_inline" option forces the use of a synchronous
> > skcipher (alongside the other things it does)?  Or should it still allow
> > asynchronous ones?
> > 
> > We may not actually have a choice in that matter, since xts-aes-aesni has 
> > the
> > CRYPTO_ALG_ASYNC bit set (as I mentioned) despite being synchronous in most
> > cases; thus, the crypto API won't give you it if you ask for a synchronous
> > cipher.  So I think you still need to allow async skciphers?  That means a
> > callback is still always required.
> 
> Arg... So it means that some skciphers will not be OK at all for SMR writes. I
> was not aware of these differences (tested with aes-xts-plain64 only). The 
> ugly
> way to support async ciphers would be to just wait inline for the crypto API 
> to
> complete using a completion for instance. But that is very ugly. Back to
> brainstorming, and need to learn more about the crypto API...
> 

It's easy to wait for crypto API requests to complete if you need to --
just use crypto_wait_req().

We do this in fs/crypto/, for example.  (Not many people are using fscrypt with
crypto API based accelerators, so there hasn't yet been much need to support the
complexity of issuing multiple async crypto requests like dm-crypt supports.)

- Eric


Re: [dm-devel] [dm-crypt] [RFC PATCH 1/1] Add DM_CRYPT_FORCE_INLINE flag to dm-crypt target

2020-06-23 Thread Damien Le Moal
On 2020/06/24 14:05, Eric Biggers wrote:
> On Fri, Jun 19, 2020 at 05:41:32PM +0100, Ignat Korchagin wrote:
>> Sometimes extra thread offloading imposed by dm-crypt hurts IO latency. This 
>> is
>> especially visible on busy systems with many processes/threads. Moreover, 
>> most
>> Crypto API implementaions are async, that is they offload crypto operations 
>> on
>> their own, so this dm-crypt offloading is excessive.
> 
> This really should say "some Crypto API implementations are async" instead of
> "most Crypto API implementations are async".
> 
> Notably, the AES-NI implementation of AES-XTS is synchronous if you call it 
> in a
> context where SIMD instructions are usable.  It's only asynchronous when SIMD 
> is
> not usable.  (This seems to have been missed in your blog post.)
> 
>> This adds a new flag, which directs dm-crypt not to offload crypto operations
>> and process everything inline. For cases, where crypto operations cannot 
>> happen
>> inline (hard interrupt context, for example the read path of the NVME 
>> driver),
>> we offload the work to a tasklet rather than a workqueue.
> 
> This patch both removes some dm-crypt specific queueing, and changes 
> decryption
> to use softIRQ context instead of a workqueue.  It would be useful to know how
> much of a difference the workqueue => softIRQ change makes by itself.  Such a
> change could be useful for fscrypt as well.  (fscrypt uses a workqueue for
> decryption, but besides that doesn't use any other queueing.)
> 
>> @@ -127,7 +128,7 @@ struct iv_elephant_private {
>>   * and encrypts / decrypts at the same time.
>>   */
>>  enum flags { DM_CRYPT_SUSPENDED, DM_CRYPT_KEY_VALID,
>> - DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD };
>> + DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD, DM_CRYPT_FORCE_INLINE = 
>> (sizeof(unsigned long) * 8 - 1) };
> 
> Assigning a specific enum value isn't necessary.
> 
>> @@ -1458,13 +1459,18 @@ static void crypt_alloc_req_skcipher(struct 
>> crypt_config *cc,
>>  
>>  skcipher_request_set_tfm(ctx->r.req, cc->cipher_tfm.tfms[key_index]);
>>  
>> -/*
>> - * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
>> - * requests if driver request queue is full.
>> - */
>> -skcipher_request_set_callback(ctx->r.req,
>> -CRYPTO_TFM_REQ_MAY_BACKLOG,
>> -kcryptd_async_done, dmreq_of_req(cc, ctx->r.req));
>> +if (test_bit(DM_CRYPT_FORCE_INLINE, >flags))
>> +/* make sure we zero important fields of the request */
>> +skcipher_request_set_callback(ctx->r.req,
>> +0, NULL, NULL);
>> +else
>> +/*
>> + * Use REQ_MAY_BACKLOG so a cipher driver internally backlogs
>> + * requests if driver request queue is full.
>> + */
>> +skcipher_request_set_callback(ctx->r.req,
>> +CRYPTO_TFM_REQ_MAY_BACKLOG,
>> +kcryptd_async_done, dmreq_of_req(cc, ctx->r.req));
>>  }
> 
> This looks wrong.  Unless type=0 and mask=CRYPTO_ALG_ASYNC are passed to
> crypto_alloc_skcipher(), the skcipher implementation can still be 
> asynchronous,
> in which case providing a callback is required.
> 
> Do you intend that the "force_inline" option forces the use of a synchronous
> skcipher (alongside the other things it does)?  Or should it still allow
> asynchronous ones?
> 
> We may not actually have a choice in that matter, since xts-aes-aesni has the
> CRYPTO_ALG_ASYNC bit set (as I mentioned) despite being synchronous in most
> cases; thus, the crypto API won't give you it if you ask for a synchronous
> cipher.  So I think you still need to allow async skciphers?  That means a
> callback is still always required.

Arg... So it means that some skciphers will not be OK at all for SMR writes. I
was not aware of these differences (tested with aes-xts-plain64 only). The ugly
way to support async ciphers would be to just wait inline for the crypto API to
complete using a completion for instance. But that is very ugly. Back to
brainstorming, and need to learn more about the crypto API...

> 
> - Eric
> 
> --
> dm-devel mailing list
> dm-de...@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> 


-- 
Damien Le Moal
Western Digital Research