from:"Baolin Wang"

Re: [RFC v4 2/4] crypto: Introduce CRYPTO_ALG_BULK flag

2016-06-15 Thread Baolin Wang

On 15 June 2016 at 15:39, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Wed, Jun 15, 2016 at 03:38:02PM +0800, Baolin Wang wrote:
>>
>> But that means we should divide the bulk request into 512-byte size
>> requests and break up the mapped sg table for each request. Another
>> hand we should allocate memory for each request in crypto layer, which
>> dm-crypt have supplied one high efficiency way. I think these are
>> really top level how to use the crypro APIs, does that need to move
>> into crypto laryer? Thanks.
>
> I have already explained to you how you can piggy-back off dm-crypt's
> allocation, so what's the problem?

Because the request created in dm-crypt is connecting with dm-crypt
closely, I am worried if it can work or introduce other issues if we
move these top level things into crypto layer. Anyway I will try to do
that. Thanks.

> --
> Email: Herbert Xu <herb...@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v4 2/4] crypto: Introduce CRYPTO_ALG_BULK flag

2016-06-15 Thread Baolin Wang

On 15 June 2016 at 14:49, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Wed, Jun 15, 2016 at 02:27:04PM +0800, Baolin Wang wrote:
>>
>> After some investigation, I still think we should divide the bulk
>> request from dm-crypt into small request (each one is 512bytes) if
>> this algorithm is not support bulk mode (like CBC). We have talked
>> with dm-crypt
>> maintainers why dm-crypt always use 512 bytes as one request size in
>> below thread, could you please check it?
>> http://www.kernelhub.org/?p=2=907022
>
> That link only points to an email about an oops.

Ah, sorry. Would you check this thread?
http://lkml.iu.edu/hypermail/linux/kernel/1601.1/03829.html

>
> Diggin through that thread, the only objection I have seen is about
> the fact that you have to generate a fresh IV for each sector, which
> is precisely what I'm suggesting that you do.
>
> IOW, implement the IV generators in the crypto API, and then you can
> easily generate a new IV (if necessary) for each sector.
>
>> That means if we move the IV handling into crypto API, we still can
>> not use bulk interface for all algorithm, for example we still need to
>> read/write with 512 bytes for CBC, you can't use 4k or more block on
>> CBC (and most other encryption modes). If only a part of 4k block is
>> written (and then system crash happens), CBC would corrupt the block
>> completely. It means if we map one whole bio with bulk interface in
>> dm-crypt, we need to divide into every 512 bytes requests in crypto
>> layer. So I don't think we can handle every algorithm with bulk
>> interface just moving the IV handling into crypto API. Thanks.
>
> Of course you would do CBC in 512-byte blocks, but my point is that
> you should do this in a crypto API algorithm, rather than dm-crypt
> as we do now.  Once you implement that then dm-crypt can treat
> every algorithm as if they supported bulk processing.

But that means we should divide the bulk request into 512-byte size
requests and break up the mapped sg table for each request. Another
hand we should allocate memory for each request in crypto layer, which
dm-crypt have supplied one high efficiency way. I think these are
really top level how to use the crypro APIs, does that need to move
into crypto laryer? Thanks.

>
> Cheers,
> --
> Email: Herbert Xu <herb...@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v4 2/4] crypto: Introduce CRYPTO_ALG_BULK flag

2016-06-15 Thread Baolin Wang

Hi Herbert,

On 8 June 2016 at 10:00, Baolin Wang <baolin.w...@linaro.org> wrote:
> Hi Herbert,
>
> On 7 June 2016 at 22:16, Herbert Xu <herb...@gondor.apana.org.au> wrote:
>> On Tue, Jun 07, 2016 at 08:17:05PM +0800, Baolin Wang wrote:
>>> Now some cipher hardware engines prefer to handle bulk block rather than one
>>> sector (512 bytes) created by dm-crypt, cause these cipher engines can 
>>> handle
>>> the intermediate values (IV) by themselves in one bulk block. This means we
>>> can increase the size of the request by merging request rather than always 
>>> 512
>>> bytes and thus increase the hardware engine processing speed.
>>>
>>> So introduce 'CRYPTO_ALG_BULK' flag to indicate this cipher can support bulk
>>> mode.
>>>
>>> Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
>>
>> Nack.  As I said before, please do it using explicit IV generators
>> like we do for IPsec.
>
> OK. I would like to try your suggestion. Thanks.

After some investigation, I still think we should divide the bulk
request from dm-crypt into small request (each one is 512bytes) if
this algorithm is not support bulk mode (like CBC). We have talked
with dm-crypt
maintainers why dm-crypt always use 512 bytes as one request size in
below thread, could you please check it?
http://www.kernelhub.org/?p=2=907022

That means if we move the IV handling into crypto API, we still can
not use bulk interface for all algorithm, for example we still need to
read/write with 512 bytes for CBC, you can't use 4k or more block on
CBC (and most other encryption modes). If only a part of 4k block is
written (and then system crash happens), CBC would corrupt the block
completely. It means if we map one whole bio with bulk interface in
dm-crypt, we need to divide into every 512 bytes requests in crypto
layer. So I don't think we can handle every algorithm with bulk
interface just moving the IV handling into crypto API. Thanks.

>
>> --
>> Email: Herbert Xu <herb...@gondor.apana.org.au>
>> Home Page: http://gondor.apana.org.au/~herbert/
>> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
>
>
>
> --
> Baolin.wang
> Best Regards

-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v4 2/4] crypto: Introduce CRYPTO_ALG_BULK flag

2016-06-07 Thread Baolin Wang

Hi Herbert,

On 7 June 2016 at 22:16, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Tue, Jun 07, 2016 at 08:17:05PM +0800, Baolin Wang wrote:
>> Now some cipher hardware engines prefer to handle bulk block rather than one
>> sector (512 bytes) created by dm-crypt, cause these cipher engines can handle
>> the intermediate values (IV) by themselves in one bulk block. This means we
>> can increase the size of the request by merging request rather than always 
>> 512
>> bytes and thus increase the hardware engine processing speed.
>>
>> So introduce 'CRYPTO_ALG_BULK' flag to indicate this cipher can support bulk
>> mode.
>>
>> Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
>
> Nack.  As I said before, please do it using explicit IV generators
> like we do for IPsec.

OK. I would like to try your suggestion. Thanks.

> --
> Email: Herbert Xu <herb...@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v4 1/4] block: Introduce blk_bio_map_sg() to map one bio

2016-06-07 Thread Baolin Wang

In dm-crypt, it need to map one bio to scatterlist for improving the
hardware engine encryption efficiency. Thus this patch introduces the
blk_bio_map_sg() function to map one bio with scatterlists.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 block/blk-merge.c  |   19 +++
 include/linux/blkdev.h |2 ++
 2 files changed, 21 insertions(+)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2613531..bca0609 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -417,6 +417,25 @@ single_segment:
 }
 
 /*
+ * Map a bio to scatterlist, return number of sg entries setup. Caller must
+ * make sure sg can hold bio segments entries and detach the bio from list.
+ */
+int blk_bio_map_sg(struct request_queue *q, struct bio *bio,
+  struct scatterlist *sglist)
+{
+   struct scatterlist *sg = NULL;
+   int nsegs;
+
+   nsegs = __blk_bios_map_sg(q, bio, sglist, );
+
+   if (sg)
+   sg_mark_end(sg);
+
+   return nsegs;
+}
+EXPORT_SYMBOL(blk_bio_map_sg);
+
+/*
  * map a request to scatterlist, return number of sg entries setup. Caller
  * must make sure sg can hold rq->nr_phys_segments entries
  */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3d9cf32..cbe31f9 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1024,6 +1024,8 @@ extern void blk_queue_write_cache(struct request_queue 
*q, bool enabled, bool fu
 extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device 
*bdev);
 
 extern int blk_rq_map_sg(struct request_queue *, struct request *, struct 
scatterlist *);
+extern int blk_bio_map_sg(struct request_queue *q, struct bio *bio,
+ struct scatterlist *sglist);
 extern void blk_dump_rq_flags(struct request *, char *);
 extern long nr_blockdev_pages(void);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v4 2/4] crypto: Introduce CRYPTO_ALG_BULK flag

2016-06-07 Thread Baolin Wang

Now some cipher hardware engines prefer to handle bulk block rather than one
sector (512 bytes) created by dm-crypt, cause these cipher engines can handle
the intermediate values (IV) by themselves in one bulk block. This means we
can increase the size of the request by merging request rather than always 512
bytes and thus increase the hardware engine processing speed.

So introduce 'CRYPTO_ALG_BULK' flag to indicate this cipher can support bulk
mode.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 include/crypto/skcipher.h |7 +++
 include/linux/crypto.h|6 ++
 2 files changed, 13 insertions(+)

diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 0f987f5..d89d29a 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -519,5 +519,12 @@ static inline void skcipher_request_set_crypt(
req->iv = iv;
 }
 
+static inline unsigned int skcipher_is_bulk_mode(struct crypto_skcipher 
*sk_tfm)
+{
+   struct crypto_tfm *tfm = crypto_skcipher_tfm(sk_tfm);
+
+   return crypto_tfm_alg_bulk(tfm);
+}
+
 #endif /* _CRYPTO_SKCIPHER_H */
 
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index 6e28c89..a315487 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -63,6 +63,7 @@
 #define CRYPTO_ALG_DEAD0x0020
 #define CRYPTO_ALG_DYING   0x0040
 #define CRYPTO_ALG_ASYNC   0x0080
+#define CRYPTO_ALG_BULK0x0100
 
 /*
  * Set this bit if and only if the algorithm requires another algorithm of
@@ -623,6 +624,11 @@ static inline u32 crypto_tfm_alg_type(struct crypto_tfm 
*tfm)
return tfm->__crt_alg->cra_flags & CRYPTO_ALG_TYPE_MASK;
 }
 
+static inline unsigned int crypto_tfm_alg_bulk(struct crypto_tfm *tfm)
+{
+   return tfm->__crt_alg->cra_flags & CRYPTO_ALG_BULK;
+}
+
 static inline unsigned int crypto_tfm_alg_blocksize(struct crypto_tfm *tfm)
 {
return tfm->__crt_alg->cra_blocksize;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v4 3/4] md: dm-crypt: Introduce the bulk mode method when sending request

2016-06-07 Thread Baolin Wang

In now dm-crypt code, it is ineffective to map one segment (always one
sector) of one bio with just only one scatterlist at one time for hardware
crypto engine. Especially for some encryption mode (like ecb or xts mode)
cooperating with the crypto engine, they just need one initial IV or null
IV instead of different IV for each sector. In this situation We can consider
to use multiple scatterlists to map the whole bio and send all scatterlists
of one bio to crypto engine to encrypt or decrypt, which can improve the
hardware engine's efficiency.

With this optimization, On my test setup (beaglebone black board with ecb(aes)
cipher and dd testing) using 64KB I/Os on an eMMC storage device I saw about
127% improvement in throughput for encrypted writes, and about 206% improvement
for encrypted reads.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 drivers/md/dm-crypt.c |  159 -
 1 file changed, 158 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 4f3cb35..0b1d452 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -33,6 +33,7 @@
 #include 
 
 #define DM_MSG_PREFIX "crypt"
+#define DM_MAX_SG_LIST 512
 
 /*
  * context holding the current state of a multi-part conversion
@@ -142,6 +143,11 @@ struct crypt_config {
char *cipher;
char *cipher_string;
 
+   struct sg_table sgt_in;
+   struct sg_table sgt_out;
+   atomic_t sgt_init_done;
+   struct completion sgt_restart;
+
struct crypt_iv_operations *iv_gen_ops;
union {
struct iv_essiv_private essiv;
@@ -837,6 +843,141 @@ static u8 *iv_of_dmreq(struct crypt_config *cc,
crypto_skcipher_alignmask(any_tfm(cc)) + 1);
 }
 
+static void crypt_init_sg_table(struct scatterlist *sgl)
+{
+   struct scatterlist *sg;
+   int i;
+
+   for_each_sg(sgl, sg, DM_MAX_SG_LIST, i) {
+   if (i < DM_MAX_SG_LIST - 1 && sg_is_last(sg))
+   sg_unmark_end(sg);
+   else if (i == DM_MAX_SG_LIST - 1)
+   sg_mark_end(sg);
+   }
+
+   for_each_sg(sgl, sg, DM_MAX_SG_LIST, i) {
+   memset(sg, 0, sizeof(struct scatterlist));
+
+   if (i == DM_MAX_SG_LIST - 1)
+   sg_mark_end(sg);
+   }
+}
+
+static void crypt_reinit_sg_table(struct crypt_config *cc)
+{
+   if (!cc->sgt_in.orig_nents || !cc->sgt_out.orig_nents)
+   return;
+
+   crypt_init_sg_table(cc->sgt_in.sgl);
+   crypt_init_sg_table(cc->sgt_out.sgl);
+
+   if (atomic_inc_and_test(>sgt_init_done))
+   complete(>sgt_restart);
+   atomic_set(>sgt_init_done, 1);
+}
+
+static int crypt_alloc_sg_table(struct crypt_config *cc)
+{
+   unsigned int bulk_mode = skcipher_is_bulk_mode(any_tfm(cc));
+   int ret = 0;
+
+   if (!bulk_mode)
+   goto out_skip_alloc;
+
+   ret = sg_alloc_table(>sgt_in, DM_MAX_SG_LIST, GFP_KERNEL);
+   if (ret)
+   goto out_skip_alloc;
+
+   ret = sg_alloc_table(>sgt_out, DM_MAX_SG_LIST, GFP_KERNEL);
+   if (ret)
+   goto out_free_table;
+
+   init_completion(>sgt_restart);
+   atomic_set(>sgt_init_done, 1);
+   return 0;
+
+out_free_table:
+   sg_free_table(>sgt_in);
+out_skip_alloc:
+   cc->sgt_in.orig_nents = 0;
+   cc->sgt_out.orig_nents = 0;
+
+   return ret;
+}
+
+static int crypt_convert_bulk_block(struct crypt_config *cc,
+   struct convert_context *ctx,
+   struct skcipher_request *req)
+{
+   struct bio *bio_in = ctx->bio_in;
+   struct bio *bio_out = ctx->bio_out;
+   unsigned int total_bytes = bio_in->bi_iter.bi_size;
+   unsigned int total_sg_in, total_sg_out;
+   struct scatterlist *sg_in, *sg_out;
+   struct dm_crypt_request *dmreq;
+   u8 *iv;
+   int r;
+
+   if (!cc->sgt_in.orig_nents || !cc->sgt_out.orig_nents)
+   return -EINVAL;
+
+   if (!atomic_dec_and_test(>sgt_init_done)) {
+   wait_for_completion(>sgt_restart);
+   reinit_completion(>sgt_restart);
+   }
+
+   dmreq = dmreq_of_req(cc, req);
+   iv = iv_of_dmreq(cc, dmreq);
+   dmreq->iv_sector = ctx->cc_sector;
+   dmreq->ctx = ctx;
+
+   total_sg_in = blk_bio_map_sg(bdev_get_queue(bio_in->bi_bdev),
+bio_in, cc->sgt_in.sgl);
+   if ((total_sg_in <= 0) || (total_sg_in > DM_MAX_SG_LIST)) {
+   DMERR("%s in sg map error %d, sg table nents[%d]\n",
+ __func__, total_sg_in, cc->sgt_in.orig_nents);
+   return -EINVAL;
+   }
+
+   ctx->iter_in.bi_size -= total_bytes;
+   sg_in = cc->sgt_in.sgl;
+   sg_out

[RFC v4 4/4] crypto: Add the CRYPTO_ALG_BULK flag for ecb(aes) cipher

2016-06-07 Thread Baolin Wang

Since the ecb(aes) cipher does not need to handle the IV things for encryption
or decryption, that means it can support for bulk block when handling data.
Thus this patch adds the CRYPTO_ALG_BULK flag for ecb(aes) cipher to improve
the hardware aes engine's efficiency.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 drivers/crypto/omap-aes.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c
index ce174d3..ab09429 100644
--- a/drivers/crypto/omap-aes.c
+++ b/drivers/crypto/omap-aes.c
@@ -804,7 +804,7 @@ static struct crypto_alg algs_ecb_cbc[] = {
.cra_priority   = 300,
.cra_flags  = CRYPTO_ALG_TYPE_ABLKCIPHER |
  CRYPTO_ALG_KERN_DRIVER_ONLY |
- CRYPTO_ALG_ASYNC,
+ CRYPTO_ALG_ASYNC | CRYPTO_ALG_BULK,
.cra_blocksize  = AES_BLOCK_SIZE,
.cra_ctxsize= sizeof(struct omap_aes_ctx),
.cra_alignmask  = 0,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v4 0/4] Introduce the bulk mode method when sending request to crypto layer

2016-06-07 Thread Baolin Wang

This patchset will check if the cipher can support bulk mode, then dm-crypt
will handle different ways to send requests to crypto layer according to
cipher mode. For bulk mode, we can use sg table to map the whole bio and
send all scatterlists of one bio to crypto engine to encrypt or decrypt,
which can improve the hardware engine's efficiency.

Changes since v3:
 - Some optimization for blk_bio_map_sg() function.

Changes since v2:
 - Add one cipher user with CRYPTO_ALG_BULK flag to support bulk mode.
 - Add one atomic variable to avoid the sg table race.

Changes since v1:
 - Refactor the blk_bio_map_sg() function to avoid duplicated code.
 - Move the sg table allocation to crypt_ctr_cipher() function to avoid memory
 allocation in the IO path.
 - Remove the crypt_sg_entry() function.
 - Other optimization.

Baolin Wang (4):
  block: Introduce blk_bio_map_sg() to map one bio
  crypto: Introduce CRYPTO_ALG_BULK flag
  md: dm-crypt: Introduce the bulk mode method when sending request
  crypto: Add the CRYPTO_ALG_BULK flag for ecb(aes) cipher

 block/blk-merge.c |   19 ++
 drivers/crypto/omap-aes.c |2 +-
 drivers/md/dm-crypt.c |  159 -
 include/crypto/skcipher.h |7 ++
 include/linux/blkdev.h|2 +
 include/linux/crypto.h|6 ++
 6 files changed, 193 insertions(+), 2 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 1/3] block: Introduce blk_bio_map_sg() to map one bio

2016-06-05 Thread Baolin Wang

On 3 June 2016 at 22:35, Jens Axboe <ax...@kernel.dk> wrote:
> On 05/27/2016 05:11 AM, Baolin Wang wrote:
>>
>> In dm-crypt, it need to map one bio to scatterlist for improving the
>> hardware engine encryption efficiency. Thus this patch introduces the
>> blk_bio_map_sg() function to map one bio with scatterlists.
>>
>> For avoiding the duplicated code in __blk_bios_map_sg() function, add
>> one parameter to distinguish bio map or request map.
>
>
> Just detach the bio in blk_bio_map_sg() instead of adding a separate case
> (and argument) for it in __blk_bios_map_sg().

Make sense.

>
> --
> Jens Axboe
>



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 1/3] block: Introduce blk_bio_map_sg() to map one bio

2016-06-05 Thread Baolin Wang

On 3 June 2016 at 22:38, Jens Axboe <ax...@kernel.dk> wrote:
> On 05/27/2016 05:11 AM, Baolin Wang wrote:
>>
>> +/*
>> + * Map a bio to scatterlist, return number of sg entries setup. Caller
>> must
>> + * make sure sg can hold bio segments entries.
>> + */
>> +int blk_bio_map_sg(struct request_queue *q, struct bio *bio,
>> +  struct scatterlist *sglist)
>> +{
>> +   struct scatterlist *sg = NULL;
>> +   int nsegs = 0;
>> +
>> +   if (bio)
>> +   nsegs = __blk_bios_map_sg(q, bio, sglist, , true);
>> +
>> +   if (sg)
>> +   sg_mark_end(sg);
>
>
> Put that if (sg) inside the if (bio) section, 'sg' isn't going to be
> non-NULL outside of that.
>
> Additionally, who would call this with a NULL bio? That seems odd, I'd
> get rid of that check completely.

OK. I'll fix these in next version. Thanks for your comments.

-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 2/3] crypto: Introduce CRYPTO_ALG_BULK flag

2016-06-03 Thread Baolin Wang

On 3 June 2016 at 18:09, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Fri, Jun 03, 2016 at 05:23:59PM +0800, Baolin Wang wrote:
>>
>> Assuming one 64K size bio coming, we can map the whole bio with one sg
>> table in crypt_convert_bulk_block() function. But if we send this bulk
>> request to crypto layer, we should divide the bulk request into small
>> requests, and each small request should be one sector size (512 bytes)
>> with assuming the correct IV, but we need to allocate small requests
>> memory for the division, which will not good for IO mapping, and how
>> each small request connect to dm-crypt (how to notify the request is
>> done?)?
>
> Why won't it be good? The actual AES block size is 16 and yet we

Like I said, we should avoid memory allocation to improve efficiency
in the IO path. Another hand is how the divided small requests
(allocate request memory at crypt layer) connect with dm-crypt? Since
dm-crypt just send one bulk request to crypt layer, but it will be
divided into small requests at crypt layer.

> have no trouble when you feed it a block of 512 bytes.

That's right.

-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 2/3] crypto: Introduce CRYPTO_ALG_BULK flag

2016-06-03 Thread Baolin Wang

On 3 June 2016 at 16:21, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Fri, Jun 03, 2016 at 04:15:28PM +0800, Baolin Wang wrote:
>>
>> Suppose the cbc(aes) algorithm, which can not be handled through bulk
>> interface, it need to map the data sector by sector.
>> If we also handle the cbc(aes) algorithm with bulk interface, we need
>> to divide the sg table into sectors and need to allocate request
>> memory for each divided sectors (As Mike pointed out  this is in the
>> IO mapping
>> path and we try to avoid memory allocations at all costs -- due to the
>> risk of deadlock when issuing IO to stacked block devices (dm-crypt
>> could be part of a much more elaborate IO stack). ), that will
>> introduce more messy things I think.
>
> Perhaps I'm not making myself very clear.  If you move the IV
> generation into the crypto API, those crypto API algorithms will
> be operating at the sector level.

Yeah, IV generation is OK. But it is not only related to IV thing. For example:

(1) For ecb(aes) algorithm which don't need to handle IV generation,
so it can support bulk mode:
Assuming one 64K size bio coming , we can map the whole bio with one
sg table in dm-crypt (assume it used 16 scatterlists from sg table),
then issue the 'skcipher_request_set_crypt()' function to set one
request with the mapped sg table, which will be sent to crypto driver
to be handled.

(2) For cbc(aes) algorithm which need to handle IV generation sector
by sector, so it can not support bulk mode and can not use bulk
interface:
Assuming one 64K size bio coming , we should map the bio sector by
sector with one scatterlist at one time. Each time we will issue the
'skcipher_request_set_crypt()' function to set one request with only
one mapped scatterlist, until it handled done the whole bio.

(3) As your suggestion, if we also use bulk interface for cbc(aes)
algorithm assuming it did all the requisite magic to generate the
correct IV:
Assuming one 64K size bio coming, we can map the whole bio with one sg
table in crypt_convert_bulk_block() function. But if we send this bulk
request to crypto layer, we should divide the bulk request into small
requests, and each small request should be one sector size (512 bytes)
with assuming the correct IV, but we need to allocate small requests
memory for the division, which will not good for IO mapping, and how
each small request connect to dm-crypt (how to notify the request is
done?)?

Thus we should not handle every algorithm with bulk interface.

>
> For example, assuming you're doing lmk, then the algorithm would
> be called lmk(cbc(aes)) and it will take as its input one or more
> sectors and for each sector it should generate an IV and operate
> on it.

>
> Cheers,
> --
> Email: Herbert Xu <herb...@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 2/3] crypto: Introduce CRYPTO_ALG_BULK flag

2016-06-03 Thread Baolin Wang

On 3 June 2016 at 15:54, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Fri, Jun 03, 2016 at 03:10:31PM +0800, Baolin Wang wrote:
>> On 3 June 2016 at 14:51, Herbert Xu <herb...@gondor.apana.org.au> wrote:
>> > On Fri, Jun 03, 2016 at 02:48:34PM +0800, Baolin Wang wrote:
>> >>
>> >> If we move the IV generation into the crypto API, we also can not
>> >> handle every algorithm with the bulk interface. Cause we also need to
>> >> use different methods to map one whole bio or map one sector according
>> >> to the algorithm whether can support bulk mode or not. Please correct
>> >> me if I misunderstand your points. Thanks.
>> >
>> > Which ones can't be handled this way?
>>
>> What I mean is bulk mode and sector mode's difference is not only the
>> IV handling method, but also the method to map the data with
>> scatterlists.
>> Then we have two processes in dm-crypt ( crypt_convert_block() and
>> crypt_convert_bulk_block() ) to handle the data, so we can not handle
>> every algorithm with the bulk interface.
>
> As I asked, which algorithm can't you handle through the bulk
> interface, assuming it did all the requisite magic to generate
> the correct IV?

Suppose the cbc(aes) algorithm, which can not be handled through bulk
interface, it need to map the data sector by sector.
If we also handle the cbc(aes) algorithm with bulk interface, we need
to divide the sg table into sectors and need to allocate request
memory for each divided sectors (As Mike pointed out  this is in the
IO mapping
path and we try to avoid memory allocations at all costs -- due to the
risk of deadlock when issuing IO to stacked block devices (dm-crypt
could be part of a much more elaborate IO stack). ), that will
introduce more messy things I think.

>
> Cheers,
> --
> Email: Herbert Xu <herb...@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 2/3] crypto: Introduce CRYPTO_ALG_BULK flag

2016-06-03 Thread Baolin Wang

On 3 June 2016 at 14:51, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Fri, Jun 03, 2016 at 02:48:34PM +0800, Baolin Wang wrote:
>>
>> If we move the IV generation into the crypto API, we also can not
>> handle every algorithm with the bulk interface. Cause we also need to
>> use different methods to map one whole bio or map one sector according
>> to the algorithm whether can support bulk mode or not. Please correct
>> me if I misunderstand your points. Thanks.
>
> Which ones can't be handled this way?

What I mean is bulk mode and sector mode's difference is not only the
IV handling method, but also the method to map the data with
scatterlists.
Then we have two processes in dm-crypt ( crypt_convert_block() and
crypt_convert_bulk_block() ) to handle the data, so we can not handle
every algorithm with the bulk interface.

>
> Cheers,
> --
> Email: Herbert Xu <herb...@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v2 2/3] crypto: Introduce CRYPTO_ALG_BULK flag

2016-06-03 Thread Baolin Wang

Hi Herbet,

On 2 June 2016 at 16:26, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Fri, May 27, 2016 at 07:11:23PM +0800, Baolin Wang wrote:
>> Now some cipher hardware engines prefer to handle bulk block rather than one
>> sector (512 bytes) created by dm-crypt, cause these cipher engines can handle
>> the intermediate values (IV) by themselves in one bulk block. This means we
>> can increase the size of the request by merging request rather than always 
>> 512
>> bytes and thus increase the hardware engine processing speed.
>>
>> So introduce 'CRYPTO_ALG_BULK' flag to indicate this cipher can support bulk
>> mode.
>>
>> Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
>
> I think a better aproach would be to explicitly move the IV generation
> into the crypto API, similar to how we handle IPsec.  Once you do
> that then every algorithm can be handled through the bulk interface.
>

Sorry for late reply.
If we move the IV generation into the crypto API, we also can not
handle every algorithm with the bulk interface. Cause we also need to
use different methods to map one whole bio or map one sector according
to the algorithm whether can support bulk mode or not. Please correct
me if I misunderstand your points. Thanks.


> Cheers,
> --
> Email: Herbert Xu <herb...@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v3 1/4] block: Introduce blk_bio_map_sg() to map one bio

2016-05-31 Thread Baolin Wang

In dm-crypt, it need to map one bio to scatterlist for improving the
hardware engine encryption efficiency. Thus this patch introduces the
blk_bio_map_sg() function to map one bio with scatterlists.

For avoiding the duplicated code in __blk_bios_map_sg() function, add
one parameter to distinguish bio map or request map.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 block/blk-merge.c  |   36 +++-
 include/linux/blkdev.h |2 ++
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2613531..badae44 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -376,7 +376,7 @@ new_segment:
 
 static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
 struct scatterlist *sglist,
-struct scatterlist **sg)
+struct scatterlist **sg, bool single_bio)
 {
struct bio_vec bvec, bvprv = { NULL };
struct bvec_iter iter;
@@ -408,13 +408,39 @@ single_segment:
return 1;
}
 
-   for_each_bio(bio)
+   if (!single_bio) {
+   for_each_bio(bio)
+   bio_for_each_segment(bvec, bio, iter)
+   __blk_segment_map_sg(q, , sglist, ,
+sg, , );
+   } else {
bio_for_each_segment(bvec, bio, iter)
-   __blk_segment_map_sg(q, , sglist, , sg,
-, );
+   __blk_segment_map_sg(q, , sglist, ,
+sg, , );
+   }
+
+   return nsegs;
+}
+
+/*
+ * Map a bio to scatterlist, return number of sg entries setup. Caller must
+ * make sure sg can hold bio segments entries.
+ */
+int blk_bio_map_sg(struct request_queue *q, struct bio *bio,
+  struct scatterlist *sglist)
+{
+   struct scatterlist *sg = NULL;
+   int nsegs = 0;
+
+   if (bio)
+   nsegs = __blk_bios_map_sg(q, bio, sglist, , true);
+
+   if (sg)
+   sg_mark_end(sg);
 
return nsegs;
 }
+EXPORT_SYMBOL(blk_bio_map_sg);
 
 /*
  * map a request to scatterlist, return number of sg entries setup. Caller
@@ -427,7 +453,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request 
*rq,
int nsegs = 0;
 
if (rq->bio)
-   nsegs = __blk_bios_map_sg(q, rq->bio, sglist, );
+   nsegs = __blk_bios_map_sg(q, rq->bio, sglist, , false);
 
if (unlikely(rq->cmd_flags & REQ_COPY_USER) &&
(blk_rq_bytes(rq) & q->dma_pad_mask)) {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1fd8fdf..5868062 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1013,6 +1013,8 @@ extern void blk_queue_write_cache(struct request_queue 
*q, bool enabled, bool fu
 extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device 
*bdev);
 
 extern int blk_rq_map_sg(struct request_queue *, struct request *, struct 
scatterlist *);
+extern int blk_bio_map_sg(struct request_queue *q, struct bio *bio,
+ struct scatterlist *sglist);
 extern void blk_dump_rq_flags(struct request *, char *);
 extern long nr_blockdev_pages(void);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v3 2/4] crypto: Introduce CRYPTO_ALG_BULK flag

2016-05-31 Thread Baolin Wang

Now some cipher hardware engines prefer to handle bulk block rather than one
sector (512 bytes) created by dm-crypt, cause these cipher engines can handle
the intermediate values (IV) by themselves in one bulk block. This means we
can increase the size of the request by merging request rather than always 512
bytes and thus increase the hardware engine processing speed.

So introduce 'CRYPTO_ALG_BULK' flag to indicate this cipher can support bulk
mode.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 include/crypto/skcipher.h |7 +++
 include/linux/crypto.h|6 ++
 2 files changed, 13 insertions(+)

diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 0f987f5..d89d29a 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -519,5 +519,12 @@ static inline void skcipher_request_set_crypt(
req->iv = iv;
 }
 
+static inline unsigned int skcipher_is_bulk_mode(struct crypto_skcipher 
*sk_tfm)
+{
+   struct crypto_tfm *tfm = crypto_skcipher_tfm(sk_tfm);
+
+   return crypto_tfm_alg_bulk(tfm);
+}
+
 #endif /* _CRYPTO_SKCIPHER_H */
 
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index 6e28c89..a315487 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -63,6 +63,7 @@
 #define CRYPTO_ALG_DEAD0x0020
 #define CRYPTO_ALG_DYING   0x0040
 #define CRYPTO_ALG_ASYNC   0x0080
+#define CRYPTO_ALG_BULK0x0100
 
 /*
  * Set this bit if and only if the algorithm requires another algorithm of
@@ -623,6 +624,11 @@ static inline u32 crypto_tfm_alg_type(struct crypto_tfm 
*tfm)
return tfm->__crt_alg->cra_flags & CRYPTO_ALG_TYPE_MASK;
 }
 
+static inline unsigned int crypto_tfm_alg_bulk(struct crypto_tfm *tfm)
+{
+   return tfm->__crt_alg->cra_flags & CRYPTO_ALG_BULK;
+}
+
 static inline unsigned int crypto_tfm_alg_blocksize(struct crypto_tfm *tfm)
 {
return tfm->__crt_alg->cra_blocksize;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v3 0/4] Introduce the bulk mode method when sending request to crypto layer

2016-05-31 Thread Baolin Wang

This patchset will check if the cipher can support bulk mode, then dm-crypt
will handle different ways to send requests to crypto layer according to
cipher mode. For bulk mode, we can use sg table to map the whole bio and
send all scatterlists of one bio to crypto engine to encrypt or decrypt,
which can improve the hardware engine's efficiency.

Changes since v2:
 - Add one cipher user with CRYPTO_ALG_BULK flag to support bulk mode.
 - Add one atomic variable to avoid the sg table race.

Changes since v1:
 - Refactor the blk_bio_map_sg() function to avoid duplicated code.
 - Move the sg table allocation to crypt_ctr_cipher() function to avoid memory
   allocation in the IO path.
 - Remove the crypt_sg_entry() function.
 - Other optimization.

Baolin Wang (4):
  block: Introduce blk_bio_map_sg() to map one bio
  crypto: Introduce CRYPTO_ALG_BULK flag
  md: dm-crypt: Introduce the bulk mode method when sending request
  crypto: Add the CRYPTO_ALG_BULK flag for ecb(aes) cipher

 block/blk-merge.c |   36 --
 drivers/crypto/omap-aes.c |2 +-
 drivers/md/dm-crypt.c |  159 -
 include/crypto/skcipher.h |7 ++
 include/linux/blkdev.h|2 +
 include/linux/crypto.h|6 ++
 6 files changed, 205 insertions(+), 7 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v3 3/4] md: dm-crypt: Introduce the bulk mode method when sending request

2016-05-31 Thread Baolin Wang

In now dm-crypt code, it is ineffective to map one segment (always one
sector) of one bio with just only one scatterlist at one time for hardware
crypto engine. Especially for some encryption mode (like ecb or xts mode)
cooperating with the crypto engine, they just need one initial IV or null
IV instead of different IV for each sector. In this situation We can consider
to use multiple scatterlists to map the whole bio and send all scatterlists
of one bio to crypto engine to encrypt or decrypt, which can improve the
hardware engine's efficiency.

With this optimization, On my test setup (beaglebone black board and dd testing)
using 64KB I/Os on an eMMC storage device I saw about 127% improvement in
throughput for encrypted writes, and about 206% improvement for encrypted reads.
But this is not fit for other modes which need different IV for each sector.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 drivers/md/dm-crypt.c |  159 -
 1 file changed, 158 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 4f3cb35..0b1d452 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -33,6 +33,7 @@
 #include 
 
 #define DM_MSG_PREFIX "crypt"
+#define DM_MAX_SG_LIST 512
 
 /*
  * context holding the current state of a multi-part conversion
@@ -142,6 +143,11 @@ struct crypt_config {
char *cipher;
char *cipher_string;
 
+   struct sg_table sgt_in;
+   struct sg_table sgt_out;
+   atomic_t sgt_init_done;
+   struct completion sgt_restart;
+
struct crypt_iv_operations *iv_gen_ops;
union {
struct iv_essiv_private essiv;
@@ -837,6 +843,141 @@ static u8 *iv_of_dmreq(struct crypt_config *cc,
crypto_skcipher_alignmask(any_tfm(cc)) + 1);
 }
 
+static void crypt_init_sg_table(struct scatterlist *sgl)
+{
+   struct scatterlist *sg;
+   int i;
+
+   for_each_sg(sgl, sg, DM_MAX_SG_LIST, i) {
+   if (i < DM_MAX_SG_LIST - 1 && sg_is_last(sg))
+   sg_unmark_end(sg);
+   else if (i == DM_MAX_SG_LIST - 1)
+   sg_mark_end(sg);
+   }
+
+   for_each_sg(sgl, sg, DM_MAX_SG_LIST, i) {
+   memset(sg, 0, sizeof(struct scatterlist));
+
+   if (i == DM_MAX_SG_LIST - 1)
+   sg_mark_end(sg);
+   }
+}
+
+static void crypt_reinit_sg_table(struct crypt_config *cc)
+{
+   if (!cc->sgt_in.orig_nents || !cc->sgt_out.orig_nents)
+   return;
+
+   crypt_init_sg_table(cc->sgt_in.sgl);
+   crypt_init_sg_table(cc->sgt_out.sgl);
+
+   if (atomic_inc_and_test(>sgt_init_done))
+   complete(>sgt_restart);
+   atomic_set(>sgt_init_done, 1);
+}
+
+static int crypt_alloc_sg_table(struct crypt_config *cc)
+{
+   unsigned int bulk_mode = skcipher_is_bulk_mode(any_tfm(cc));
+   int ret = 0;
+
+   if (!bulk_mode)
+   goto out_skip_alloc;
+
+   ret = sg_alloc_table(>sgt_in, DM_MAX_SG_LIST, GFP_KERNEL);
+   if (ret)
+   goto out_skip_alloc;
+
+   ret = sg_alloc_table(>sgt_out, DM_MAX_SG_LIST, GFP_KERNEL);
+   if (ret)
+   goto out_free_table;
+
+   init_completion(>sgt_restart);
+   atomic_set(>sgt_init_done, 1);
+   return 0;
+
+out_free_table:
+   sg_free_table(>sgt_in);
+out_skip_alloc:
+   cc->sgt_in.orig_nents = 0;
+   cc->sgt_out.orig_nents = 0;
+
+   return ret;
+}
+
+static int crypt_convert_bulk_block(struct crypt_config *cc,
+   struct convert_context *ctx,
+   struct skcipher_request *req)
+{
+   struct bio *bio_in = ctx->bio_in;
+   struct bio *bio_out = ctx->bio_out;
+   unsigned int total_bytes = bio_in->bi_iter.bi_size;
+   unsigned int total_sg_in, total_sg_out;
+   struct scatterlist *sg_in, *sg_out;
+   struct dm_crypt_request *dmreq;
+   u8 *iv;
+   int r;
+
+   if (!cc->sgt_in.orig_nents || !cc->sgt_out.orig_nents)
+   return -EINVAL;
+
+   if (!atomic_dec_and_test(>sgt_init_done)) {
+   wait_for_completion(>sgt_restart);
+   reinit_completion(>sgt_restart);
+   }
+
+   dmreq = dmreq_of_req(cc, req);
+   iv = iv_of_dmreq(cc, dmreq);
+   dmreq->iv_sector = ctx->cc_sector;
+   dmreq->ctx = ctx;
+
+   total_sg_in = blk_bio_map_sg(bdev_get_queue(bio_in->bi_bdev),
+bio_in, cc->sgt_in.sgl);
+   if ((total_sg_in <= 0) || (total_sg_in > DM_MAX_SG_LIST)) {
+   DMERR("%s in sg map error %d, sg table nents[%d]\n",
+ __func__, total_sg_in, cc->sgt_in.orig_nents);
+   return -EINVAL;
+   }
+
+   ctx->iter_in.bi_size -= total_bytes;

[RFC v3 4/4] crypto: Add the CRYPTO_ALG_BULK flag for ecb(aes) cipher

2016-05-31 Thread Baolin Wang

Since the ecb(aes) cipher does not need to handle the IV things for encryption
or decryption, that means it can support for bulk block when handling data.
Thus this patch adds the CRYPTO_ALG_BULK flag for ecb(aes) cipher to improve
the hardware aes engine's efficiency.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 drivers/crypto/omap-aes.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c
index ce174d3..ab09429 100644
--- a/drivers/crypto/omap-aes.c
+++ b/drivers/crypto/omap-aes.c
@@ -804,7 +804,7 @@ static struct crypto_alg algs_ecb_cbc[] = {
.cra_priority   = 300,
.cra_flags  = CRYPTO_ALG_TYPE_ABLKCIPHER |
  CRYPTO_ALG_KERN_DRIVER_ONLY |
- CRYPTO_ALG_ASYNC,
+ CRYPTO_ALG_ASYNC | CRYPTO_ALG_BULK,
.cra_blocksize  = AES_BLOCK_SIZE,
.cra_ctxsize= sizeof(struct omap_aes_ctx),
.cra_alignmask  = 0,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/2] crypto: engine: permit to enqueue ashash_request

2016-05-31 Thread Baolin Wang

t  *kworker_task;
> struct kthread_work pump_requests;
>
> void*priv_data;
> -   struct ablkcipher_request   *cur_req;
> +   struct crypto_async_request *cur_req;
>  };
>
>  int crypto_transfer_request(struct crypto_engine *engine,
> -   struct ablkcipher_request *req, bool need_pump);
> +   struct crypto_async_request *req, bool need_pump);
>  int crypto_transfer_request_to_engine(struct crypto_engine *engine,
> - struct ablkcipher_request *req);
> + struct crypto_async_request *req);
>  void crypto_finalize_request(struct crypto_engine *engine,
> -struct ablkcipher_request *req, int err);
> +struct crypto_async_request *req, int err);
>  int crypto_engine_start(struct crypto_engine *engine);
>  int crypto_engine_stop(struct crypto_engine *engine);
>  struct crypto_engine *crypto_engine_alloc_init(struct device *dev, bool rt);
> --
> 2.7.3
>

Reviewed-by: Baolin Wang <baolin.w...@linaro.org>


-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/2] crypto: omap: convert to the new cryptoengine API

2016-05-31 Thread Baolin Wang

On 30 May 2016 at 21:32, LABBE Corentin <clabbe.montj...@gmail.com> wrote:
> Since the crypto engine has been converted to use crypto_async_request
> instead of ablkcipher_request, minor changes are needed to use it.
>
> Signed-off-by: LABBE Corentin <clabbe.montj...@gmail.com>
> ---
>  drivers/crypto/omap-aes.c | 10 ++
>  drivers/crypto/omap-des.c | 10 ++
>  2 files changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c
> index ce174d3..7007f13 100644
> --- a/drivers/crypto/omap-aes.c
> +++ b/drivers/crypto/omap-aes.c
> @@ -519,7 +519,7 @@ static void omap_aes_finish_req(struct omap_aes_dev *dd, 
> int err)
>
> pr_debug("err: %d\n", err);
>
> -   crypto_finalize_request(dd->engine, req, err);
> +   crypto_finalize_request(dd->engine, >base, err);
>  }
>
>  static int omap_aes_crypt_dma_stop(struct omap_aes_dev *dd)
> @@ -592,14 +592,15 @@ static int omap_aes_handle_queue(struct omap_aes_dev 
> *dd,
>  struct ablkcipher_request *req)
>  {
> if (req)
> -   return crypto_transfer_request_to_engine(dd->engine, req);
> +   return crypto_transfer_request_to_engine(dd->engine, 
> >base);
>
> return 0;
>  }
>
>  static int omap_aes_prepare_req(struct crypto_engine *engine,
> -   struct ablkcipher_request *req)
> +   struct crypto_async_request *areq)
>  {
> +   struct ablkcipher_request *req = ablkcipher_request_cast(areq);
> struct omap_aes_ctx *ctx = crypto_ablkcipher_ctx(
> crypto_ablkcipher_reqtfm(req));
> struct omap_aes_dev *dd = omap_aes_find_dev(ctx);
> @@ -642,8 +643,9 @@ static int omap_aes_prepare_req(struct crypto_engine 
> *engine,
>  }
>
>  static int omap_aes_crypt_req(struct crypto_engine *engine,
> - struct ablkcipher_request *req)
> + struct crypto_async_request *areq)
>  {
> +   struct ablkcipher_request *req = ablkcipher_request_cast(areq);
> struct omap_aes_ctx *ctx = crypto_ablkcipher_ctx(
> crypto_ablkcipher_reqtfm(req));
> struct omap_aes_dev *dd = omap_aes_find_dev(ctx);
> diff --git a/drivers/crypto/omap-des.c b/drivers/crypto/omap-des.c
> index 3eedb03..0da5686 100644
> --- a/drivers/crypto/omap-des.c
> +++ b/drivers/crypto/omap-des.c
> @@ -506,7 +506,7 @@ static void omap_des_finish_req(struct omap_des_dev *dd, 
> int err)
> pr_debug("err: %d\n", err);
>
> pm_runtime_put(dd->dev);
> -   crypto_finalize_request(dd->engine, req, err);
> +   crypto_finalize_request(dd->engine, >base, err);
>  }
>
>  static int omap_des_crypt_dma_stop(struct omap_des_dev *dd)
> @@ -572,14 +572,15 @@ static int omap_des_handle_queue(struct omap_des_dev 
> *dd,
>  struct ablkcipher_request *req)
>  {
> if (req)
> -   return crypto_transfer_request_to_engine(dd->engine, req);
> +   return crypto_transfer_request_to_engine(dd->engine, 
> >base);
>
> return 0;
>  }
>
>  static int omap_des_prepare_req(struct crypto_engine *engine,
> -   struct ablkcipher_request *req)
> +   struct crypto_async_request *areq)
>  {
> +   struct ablkcipher_request *req = ablkcipher_request_cast(areq);
> struct omap_des_ctx *ctx = crypto_ablkcipher_ctx(
> crypto_ablkcipher_reqtfm(req));
> struct omap_des_dev *dd = omap_des_find_dev(ctx);
> @@ -620,8 +621,9 @@ static int omap_des_prepare_req(struct crypto_engine 
> *engine,
>  }
>
>  static int omap_des_crypt_req(struct crypto_engine *engine,
> - struct ablkcipher_request *req)
> +     struct crypto_async_request *areq)
>  {
> +   struct ablkcipher_request *req = ablkcipher_request_cast(areq);
> struct omap_des_ctx *ctx = crypto_ablkcipher_ctx(
> crypto_ablkcipher_reqtfm(req));
> struct omap_des_dev *dd = omap_des_find_dev(ctx);
> --
> 2.7.3
>

Reviewed-by: Baolin Wang <baolin.w...@linaro.org>

-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] crypto: omap: convert to the new cryptoengine API

2016-05-29 Thread Baolin Wang

On 18 May 2016 at 17:21, LABBE Corentin  wrote:
> Since the crypto engine has been converted to use crypto_async_request
> instead of ablkcipher_request, minor changes are needed to use it.

I think you missed the conversion for omap des driver, please rebase
your patch. Beyond that I think you did a good job for crypto engine
if Herbert applied it.

>
> Signed-off-by: LABBE Corentin 
> ---
>  drivers/crypto/omap-aes.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c
> index d420ec7..1368ab1 100644
> --- a/drivers/crypto/omap-aes.c
> +++ b/drivers/crypto/omap-aes.c
> @@ -530,7 +530,7 @@ static void omap_aes_finish_req(struct omap_aes_dev *dd, 
> int err)
>
> pr_debug("err: %d\n", err);
>
> -   crypto_finalize_request(dd->engine, req, err);
> +   crypto_finalize_request(dd->engine, >base, err);
>  }
>
>  static int omap_aes_crypt_dma_stop(struct omap_aes_dev *dd)
> @@ -603,14 +603,15 @@ static int omap_aes_handle_queue(struct omap_aes_dev 
> *dd,
>  struct ablkcipher_request *req)
>  {
> if (req)
> -   return crypto_transfer_request_to_engine(dd->engine, req);
> +   return crypto_transfer_request_to_engine(dd->engine, 
> >base);
>
> return 0;
>  }
>
>  static int omap_aes_prepare_req(struct crypto_engine *engine,
> -   struct ablkcipher_request *req)
> +   struct crypto_async_request *areq)
>  {
> +   struct ablkcipher_request *req = ablkcipher_request_cast(areq);
> struct omap_aes_ctx *ctx = crypto_ablkcipher_ctx(
> crypto_ablkcipher_reqtfm(req));
> struct omap_aes_dev *dd = omap_aes_find_dev(ctx);
> @@ -653,8 +654,9 @@ static int omap_aes_prepare_req(struct crypto_engine 
> *engine,
>  }
>
>  static int omap_aes_crypt_req(struct crypto_engine *engine,
> - struct ablkcipher_request *req)
> + struct crypto_async_request *areq)
>  {
> +   struct ablkcipher_request *req = ablkcipher_request_cast(areq);
> struct omap_aes_ctx *ctx = crypto_ablkcipher_ctx(
> crypto_ablkcipher_reqtfm(req));
> struct omap_aes_dev *dd = omap_aes_find_dev(ctx);
> --
> 2.7.3
>



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v2 1/3] block: Introduce blk_bio_map_sg() to map one bio

2016-05-27 Thread Baolin Wang

In dm-crypt, it need to map one bio to scatterlist for improving the
hardware engine encryption efficiency. Thus this patch introduces the
blk_bio_map_sg() function to map one bio with scatterlists.

For avoiding the duplicated code in __blk_bios_map_sg() function, add
one parameter to distinguish bio map or request map.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 block/blk-merge.c  |   36 +++-
 include/linux/blkdev.h |2 ++
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2613531..badae44 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -376,7 +376,7 @@ new_segment:
 
 static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio,
 struct scatterlist *sglist,
-struct scatterlist **sg)
+struct scatterlist **sg, bool single_bio)
 {
struct bio_vec bvec, bvprv = { NULL };
struct bvec_iter iter;
@@ -408,13 +408,39 @@ single_segment:
return 1;
}
 
-   for_each_bio(bio)
+   if (!single_bio) {
+   for_each_bio(bio)
+   bio_for_each_segment(bvec, bio, iter)
+   __blk_segment_map_sg(q, , sglist, ,
+sg, , );
+   } else {
bio_for_each_segment(bvec, bio, iter)
-   __blk_segment_map_sg(q, , sglist, , sg,
-, );
+   __blk_segment_map_sg(q, , sglist, ,
+sg, , );
+   }
+
+   return nsegs;
+}
+
+/*
+ * Map a bio to scatterlist, return number of sg entries setup. Caller must
+ * make sure sg can hold bio segments entries.
+ */
+int blk_bio_map_sg(struct request_queue *q, struct bio *bio,
+  struct scatterlist *sglist)
+{
+   struct scatterlist *sg = NULL;
+   int nsegs = 0;
+
+   if (bio)
+   nsegs = __blk_bios_map_sg(q, bio, sglist, , true);
+
+   if (sg)
+   sg_mark_end(sg);
 
return nsegs;
 }
+EXPORT_SYMBOL(blk_bio_map_sg);
 
 /*
  * map a request to scatterlist, return number of sg entries setup. Caller
@@ -427,7 +453,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request 
*rq,
int nsegs = 0;
 
if (rq->bio)
-   nsegs = __blk_bios_map_sg(q, rq->bio, sglist, );
+   nsegs = __blk_bios_map_sg(q, rq->bio, sglist, , false);
 
if (unlikely(rq->cmd_flags & REQ_COPY_USER) &&
(blk_rq_bytes(rq) & q->dma_pad_mask)) {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1fd8fdf..5868062 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1013,6 +1013,8 @@ extern void blk_queue_write_cache(struct request_queue 
*q, bool enabled, bool fu
 extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device 
*bdev);
 
 extern int blk_rq_map_sg(struct request_queue *, struct request *, struct 
scatterlist *);
+extern int blk_bio_map_sg(struct request_queue *q, struct bio *bio,
+ struct scatterlist *sglist);
 extern void blk_dump_rq_flags(struct request *, char *);
 extern long nr_blockdev_pages(void);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v2 0/3] Introduce the bulk mode method when sending request to crypto layer

2016-05-27 Thread Baolin Wang

This patchset will check if the cipher can support bulk mode, then dm-crypt
will handle different ways to send requests to crypto layer according to
cipher mode. For bulk mode, we can use sg table to map the whole bio and
send all scatterlists of one bio to crypto engine to encrypt or decrypt,
which can improve the hardware engine's efficiency.

As Milan pointed out we need one driver with using the new 'CRYPTO_ALG_BULK'
flag, I'll add one cipher engine driver with 'CRYPTO_ALG_BULK' flag to test
this optimization in next version, if I am sure this optimization is on the
right direction according to your comments.

Looking forward to any comments and suggestions. Thanks.

Changes since v1:
 - Refactor the blk_bio_map_sg() function to avoid duplicated code.
 - Move the sg table allocation to crypt_ctr_cipher() function to avoid memory
 allocation in the IO path.
 - Remove the crypt_sg_entry() function.
 - Other optimization.

Baolin Wang (3):
  block: Introduce blk_bio_map_sg() to map one bio
  crypto: Introduce CRYPTO_ALG_BULK flag
  md: dm-crypt: Introduce the bulk mode method when sending request

 block/blk-merge.c |   36 +--
 drivers/md/dm-crypt.c |  145 -
 include/crypto/skcipher.h |7 +++
 include/linux/blkdev.h|2 +
 include/linux/crypto.h|6 ++
 5 files changed, 190 insertions(+), 6 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC v2 3/3] md: dm-crypt: Introduce the bulk mode method when sending request

2016-05-27 Thread Baolin Wang

In now dm-crypt code, it is ineffective to map one segment (always one
sector) of one bio with just only one scatterlist at one time for hardware
crypto engine. Especially for some encryption mode (like ecb or xts mode)
cooperating with the crypto engine, they just need one initial IV or null
IV instead of different IV for each sector. In this situation We can consider
to use multiple scatterlists to map the whole bio and send all scatterlists
of one bio to crypto engine to encrypt or decrypt, which can improve the
hardware engine's efficiency.

With this optimization, On my test setup (beaglebone black board) using 64KB
I/Os on an eMMC storage device I saw about 60% improvement in throughput for
encrypted writes, and about 100% improvement for encrypted reads. But this
is not fit for other modes which need different IV for each sector.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 drivers/md/dm-crypt.c |  145 -
 1 file changed, 144 insertions(+), 1 deletion(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 4f3cb35..2101f35 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -33,6 +33,7 @@
 #include 
 
 #define DM_MSG_PREFIX "crypt"
+#define DM_MAX_SG_LIST 1024
 
 /*
  * context holding the current state of a multi-part conversion
@@ -142,6 +143,9 @@ struct crypt_config {
char *cipher;
char *cipher_string;
 
+   struct sg_table sgt_in;
+   struct sg_table sgt_out;
+
struct crypt_iv_operations *iv_gen_ops;
union {
struct iv_essiv_private essiv;
@@ -837,6 +841,129 @@ static u8 *iv_of_dmreq(struct crypt_config *cc,
crypto_skcipher_alignmask(any_tfm(cc)) + 1);
 }
 
+static void crypt_init_sg_table(struct scatterlist *sgl)
+{
+   struct scatterlist *sg;
+   int i;
+
+   for_each_sg(sgl, sg, DM_MAX_SG_LIST, i) {
+   if (i < DM_MAX_SG_LIST - 1 && sg_is_last(sg))
+   sg_unmark_end(sg);
+   else if (i == DM_MAX_SG_LIST - 1)
+   sg_mark_end(sg);
+   }
+
+   for_each_sg(sgl, sg, DM_MAX_SG_LIST, i) {
+   memset(sg, 0, sizeof(struct scatterlist));
+
+   if (i == DM_MAX_SG_LIST - 1)
+   sg_mark_end(sg);
+   }
+}
+
+static void crypt_reinit_sg_table(struct crypt_config *cc)
+{
+   if (!cc->sgt_in.orig_nents || !cc->sgt_out.orig_nents)
+   return;
+
+   crypt_init_sg_table(cc->sgt_in.sgl);
+   crypt_init_sg_table(cc->sgt_out.sgl);
+}
+
+static int crypt_alloc_sg_table(struct crypt_config *cc)
+{
+   unsigned int bulk_mode = skcipher_is_bulk_mode(any_tfm(cc));
+   int ret = 0;
+
+   if (!bulk_mode)
+   goto out_skip_alloc;
+
+   ret = sg_alloc_table(>sgt_in, DM_MAX_SG_LIST, GFP_KERNEL);
+   if (ret)
+   goto out_skip_alloc;
+
+   ret = sg_alloc_table(>sgt_out, DM_MAX_SG_LIST, GFP_KERNEL);
+   if (ret)
+   goto out_free_table;
+
+   return 0;
+
+out_free_table:
+   sg_free_table(>sgt_in);
+out_skip_alloc:
+   cc->sgt_in.orig_nents = 0;
+   cc->sgt_out.orig_nents = 0;
+
+   return ret;
+}
+
+static int crypt_convert_bulk_block(struct crypt_config *cc,
+   struct convert_context *ctx,
+   struct skcipher_request *req)
+{
+   struct bio *bio_in = ctx->bio_in;
+   struct bio *bio_out = ctx->bio_out;
+   unsigned int total_bytes = bio_in->bi_iter.bi_size;
+   unsigned int total_sg_in, total_sg_out;
+   struct scatterlist *sg_in, *sg_out;
+   struct dm_crypt_request *dmreq;
+   u8 *iv;
+   int r;
+
+   if (!cc->sgt_in.orig_nents || !cc->sgt_out.orig_nents)
+   return -EINVAL;
+
+   dmreq = dmreq_of_req(cc, req);
+   iv = iv_of_dmreq(cc, dmreq);
+   dmreq->iv_sector = ctx->cc_sector;
+   dmreq->ctx = ctx;
+
+   total_sg_in = blk_bio_map_sg(bdev_get_queue(bio_in->bi_bdev),
+bio_in, cc->sgt_in.sgl);
+   if ((total_sg_in <= 0) || (total_sg_in > DM_MAX_SG_LIST)) {
+   DMERR("%s in sg map error %d, sg table nents[%d]\n",
+ __func__, total_sg_in, cc->sgt_in.orig_nents);
+   return -EINVAL;
+   }
+
+   ctx->iter_in.bi_size -= total_bytes;
+   sg_in = cc->sgt_in.sgl;
+   sg_out = cc->sgt_in.sgl;
+
+   if (bio_data_dir(bio_in) == READ)
+   goto set_crypt;
+
+   total_sg_out = blk_bio_map_sg(bdev_get_queue(bio_out->bi_bdev),
+ bio_out, cc->sgt_out.sgl);
+   if ((total_sg_out <= 0) || (total_sg_out > DM_MAX_SG_LIST)) {
+   DMERR("%s out sg map error %d, sg table nents[%d]\n",
+ __f

[RFC v2 2/3] crypto: Introduce CRYPTO_ALG_BULK flag

2016-05-27 Thread Baolin Wang

Now some cipher hardware engines prefer to handle bulk block rather than one
sector (512 bytes) created by dm-crypt, cause these cipher engines can handle
the intermediate values (IV) by themselves in one bulk block. This means we
can increase the size of the request by merging request rather than always 512
bytes and thus increase the hardware engine processing speed.

So introduce 'CRYPTO_ALG_BULK' flag to indicate this cipher can support bulk
mode.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 include/crypto/skcipher.h |7 +++
 include/linux/crypto.h|6 ++
 2 files changed, 13 insertions(+)

diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 0f987f5..d89d29a 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -519,5 +519,12 @@ static inline void skcipher_request_set_crypt(
req->iv = iv;
 }
 
+static inline unsigned int skcipher_is_bulk_mode(struct crypto_skcipher 
*sk_tfm)
+{
+   struct crypto_tfm *tfm = crypto_skcipher_tfm(sk_tfm);
+
+   return crypto_tfm_alg_bulk(tfm);
+}
+
 #endif /* _CRYPTO_SKCIPHER_H */
 
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index 6e28c89..a315487 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -63,6 +63,7 @@
 #define CRYPTO_ALG_DEAD0x0020
 #define CRYPTO_ALG_DYING   0x0040
 #define CRYPTO_ALG_ASYNC   0x0080
+#define CRYPTO_ALG_BULK0x0100
 
 /*
  * Set this bit if and only if the algorithm requires another algorithm of
@@ -623,6 +624,11 @@ static inline u32 crypto_tfm_alg_type(struct crypto_tfm 
*tfm)
return tfm->__crt_alg->cra_flags & CRYPTO_ALG_TYPE_MASK;
 }
 
+static inline unsigned int crypto_tfm_alg_bulk(struct crypto_tfm *tfm)
+{
+   return tfm->__crt_alg->cra_flags & CRYPTO_ALG_BULK;
+}
+
 static inline unsigned int crypto_tfm_alg_blocksize(struct crypto_tfm *tfm)
 {
return tfm->__crt_alg->cra_blocksize;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 2/3] crypto: Introduce CRYPTO_ALG_BULK flag

2016-05-27 Thread Baolin Wang

On 27 May 2016 at 15:53, Milan Broz <gmazyl...@gmail.com> wrote:
> On 05/27/2016 09:04 AM, Baolin Wang wrote:
>> Hi Milan,
>>
>> On 27 May 2016 at 14:31, Milan Broz <gmazyl...@gmail.com> wrote:
>>> On 05/25/2016 08:12 AM, Baolin Wang wrote:
>>>> Now some cipher hardware engines prefer to handle bulk block rather than 
>>>> one
>>>> sector (512 bytes) created by dm-crypt, cause these cipher engines can 
>>>> handle
>>>> the intermediate values (IV) by themselves in one bulk block. This means we
>>>> can increase the size of the request by merging request rather than always 
>>>> 512
>>>> bytes and thus increase the hardware engine processing speed.
>>>
>>> Hi,
>>>
>>> could you please elaborate how exactly you are processing independently
>>> encrypted sectors? For example with XTS mode. Do you play internally with
>>> tweak calculation? Does this keep 512 bytes sector encryption blocks 
>>> independent?
>>>
>>> (If not, it is breaking compatibility everywhere and you are reinventing
>>> disk encryption logic here - just for performance reason for some hw
>>> not designed for this task... But that was said several times already.)
>>
>> These are what the cipher hardware engine and engine driver should do,
>> for software we just need send one initial IV and bulk data to crypto
>> layer, which is enough.
>
> Hi,
>
> Thanks for answer.
>
> So this is just doing some kind of batch processing optimization inside the 
> driver?
> I still do not understand why it is not possible to "batch" these requests 
> inside
> you driver (with async API) then and process them in one go without any 
> changes
> to dmcrypt though. (But I am not familiar with these drivers.)

I think it is not only for the driver level, but also for cipher API.
If one cipher engine can support bulk mode, then we should send bulk
data to it, not only one sector, which need to be implemented at top
API level, such as:

skcipher_request_set_crypt(req, sgt_in.sgl, sgt_out.sgl, total_bytes,
iv); /* send sg table to crypt layer */

If move these optimization into driver, it means we need to merge
requests from dm-crypt together to be one big request for engine
driver, but it will be low efficiency and not help. Why we can not
send one bulk request at first in dm-crypt? We just need to set the
different parameters for skcipher_request_set_crypt() function
according to different cipher mode in dm-crypt.

>
> If I understand it correctly, subsequent IVs are calculated inside
> your driver just based on some initial value?

Yes, something like this.

> This can work only for sequential IVs (or, better said, for predictable
> IVs like plain64, implemented inside your hw/driver).
>
> It cannot work for IVs/tweaks that are randomized or keyed (like ESSIV).
> Yes, I know that using ESSIV for XTS is considered overkill but I do not
> see you are checking this condition anywhere in code (we can definitely
> configure such a mapping).

So if some ciphers (like: aes(cbc)) can not support bulk mode, then we
should not add the  CRYPTO_ALG_BULK flag for this cipher in the cipher
driver.

>
>>>> So introduce 'CRYPTO_ALG_BULK' flag to indicate this cipher can support 
>>>> bulk
>>>> mode.
>>>
>>> What exactly skcipher will do if this flag is set?
>>
>> I think that depends on how to implement the cipher engine driver.
>
> I do not care about implementation details, I just would like to see
> some high-level description for the new API flag.
> (Or at least read the code that implements it :)

I think for high-level description, we can use sg table to map one
bulk block and send to crypt layer by skcipher_request_set_crypt()
function.

>
>>> Which drivers it should use? I do not see any posted patch that uses this 
>>> flag yet.
>>> How we can test it?
>>
>> Some cipher engine drivers which support bulk mode should use this
>> flag. Yeah, we need upstream one cipher driver with this flag for
>> testing.
>
> I think if you introduce new flag, you should also post drivers that uses it.
> Otherwise it is just unused code for mainline.

Yeah, that's right.

>
> And I definitely would like to test it somewhere and see real
> performance data (not just simple dd).

OK. Thanks.

>
> Thanks,
> Milan
>
>>
>>>
>>> Milan
>>>
>>>>
>>>> Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
>>>> ---
>>>>  include/crypto/skcipher.h |7 +++
>>>>  include/linux/crypto

Re: [RFC 2/3] crypto: Introduce CRYPTO_ALG_BULK flag

2016-05-27 Thread Baolin Wang

Hi Milan,

On 27 May 2016 at 14:31, Milan Broz <gmazyl...@gmail.com> wrote:
> On 05/25/2016 08:12 AM, Baolin Wang wrote:
>> Now some cipher hardware engines prefer to handle bulk block rather than one
>> sector (512 bytes) created by dm-crypt, cause these cipher engines can handle
>> the intermediate values (IV) by themselves in one bulk block. This means we
>> can increase the size of the request by merging request rather than always 
>> 512
>> bytes and thus increase the hardware engine processing speed.
>
> Hi,
>
> could you please elaborate how exactly you are processing independently
> encrypted sectors? For example with XTS mode. Do you play internally with
> tweak calculation? Does this keep 512 bytes sector encryption blocks 
> independent?
>
> (If not, it is breaking compatibility everywhere and you are reinventing
> disk encryption logic here - just for performance reason for some hw
> not designed for this task... But that was said several times already.)

These are what the cipher hardware engine and engine driver should do,
for software we just need send one initial IV and bulk data to crypto
layer, which is enough.

>
>> So introduce 'CRYPTO_ALG_BULK' flag to indicate this cipher can support bulk
>> mode.
>
> What exactly skcipher will do if this flag is set?

I think that depends on how to implement the cipher engine driver.

>
> Which drivers it should use? I do not see any posted patch that uses this 
> flag yet.
> How we can test it?

Some cipher engine drivers which support bulk mode should use this
flag. Yeah, we need upstream one cipher driver with this flag for
testing.

>
> Milan
>
>>
>> Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
>> ---
>>  include/crypto/skcipher.h |7 +++
>>  include/linux/crypto.h|6 ++
>>  2 files changed, 13 insertions(+)
>>
>> diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
>> index 0f987f5..d89d29a 100644
>> --- a/include/crypto/skcipher.h
>> +++ b/include/crypto/skcipher.h
>> @@ -519,5 +519,12 @@ static inline void skcipher_request_set_crypt(
>>   req->iv = iv;
>>  }
>>
>> +static inline unsigned int skcipher_is_bulk_mode(struct crypto_skcipher 
>> *sk_tfm)
>> +{
>> + struct crypto_tfm *tfm = crypto_skcipher_tfm(sk_tfm);
>> +
>> + return crypto_tfm_alg_bulk(tfm);
>> +}
>> +
>>  #endif   /* _CRYPTO_SKCIPHER_H */
>>
>> diff --git a/include/linux/crypto.h b/include/linux/crypto.h
>> index 6e28c89..a315487 100644
>> --- a/include/linux/crypto.h
>> +++ b/include/linux/crypto.h
>> @@ -63,6 +63,7 @@
>>  #define CRYPTO_ALG_DEAD  0x0020
>>  #define CRYPTO_ALG_DYING 0x0040
>>  #define CRYPTO_ALG_ASYNC 0x0080
>> +#define CRYPTO_ALG_BULK  0x0100
>>
>>  /*
>>   * Set this bit if and only if the algorithm requires another algorithm of
>> @@ -623,6 +624,11 @@ static inline u32 crypto_tfm_alg_type(struct crypto_tfm 
>> *tfm)
>>   return tfm->__crt_alg->cra_flags & CRYPTO_ALG_TYPE_MASK;
>>  }
>>
>> +static inline unsigned int crypto_tfm_alg_bulk(struct crypto_tfm *tfm)
>> +{
>> + return tfm->__crt_alg->cra_flags & CRYPTO_ALG_BULK;
>> +}
>> +
>>  static inline unsigned int crypto_tfm_alg_blocksize(struct crypto_tfm *tfm)
>>  {
>>   return tfm->__crt_alg->cra_blocksize;
>>
>



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 1/3] block: Introduce blk_bio_map_sg() to map one bio

2016-05-25 Thread Baolin Wang

On 25 May 2016 at 16:52, Ming Lei  wrote:
>>  /*
>> + * map a bio to scatterlist, return number of sg entries setup.
>> + */
>> +int blk_bio_map_sg(struct request_queue *q, struct bio *bio,
>> +  struct scatterlist *sglist,
>> +  struct scatterlist **sg)
>> +{
>> +   struct bio_vec bvec, bvprv = { NULL };
>> +   struct bvec_iter iter;
>> +   int nsegs, cluster;
>> +
>> +   nsegs = 0;
>> +   cluster = blk_queue_cluster(q);
>> +
>> +   if (bio->bi_rw & REQ_DISCARD) {
>> +   /*
>> +* This is a hack - drivers should be neither modifying the
>> +* biovec, nor relying on bi_vcnt - but because of
>> +* blk_add_request_payload(), a discard bio may or may not 
>> have
>> +* a payload we need to set up here (thank you Christoph) and
>> +* bi_vcnt is really the only way of telling if we need to.
>> +*/
>> +
>> +   if (bio->bi_vcnt)
>> +   goto single_segment;
>> +
>> +   return 0;
>> +   }
>> +
>> +   if (bio->bi_rw & REQ_WRITE_SAME) {
>> +single_segment:
>> +   *sg = sglist;
>> +   bvec = bio_iovec(bio);
>> +   sg_set_page(*sg, bvec.bv_page, bvec.bv_len, bvec.bv_offset);
>> +   return 1;
>> +   }
>> +
>> +   bio_for_each_segment(bvec, bio, iter)
>> +   __blk_segment_map_sg(q, , sglist, , sg,
>> +, );
>> +
>> +   return nsegs;
>> +}
>> +EXPORT_SYMBOL(blk_bio_map_sg);
>
> You can use __blk_bios_map_sg() to implement blk_bio_map_sg(),
> then code duplication may be avoided.

OK. I'll re-factor the code to map one bio.

>
>> +
>> +/*
>>   * map a request to scatterlist, return number of sg entries setup. Caller
>>   * must make sure sg can hold rq->nr_phys_segments entries
>>   */
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 1fd8fdf..e5de4f8 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -1013,6 +1013,9 @@ extern void blk_queue_write_cache(struct request_queue 
>> *q, bool enabled, bool fu
>>  extern struct backing_dev_info *blk_get_backing_dev_info(struct 
>> block_device *bdev);
>>
>>  extern int blk_rq_map_sg(struct request_queue *, struct request *, struct 
>> scatterlist *);
>> +extern int blk_bio_map_sg(struct request_queue *q, struct bio *bio,
>> + struct scatterlist *sglist,
>> + struct scatterlist **sg);
>>  extern void blk_dump_rq_flags(struct request *, char *);
>>  extern long nr_blockdev_pages(void);
>>
>> --
>> 1.7.9.5
>>



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 3/3] md: dm-crypt: Introduce the bulk mode method when sending request

2016-05-25 Thread Baolin Wang

In now dm-crypt code, it is ineffective to map one segment (always one
sector) of one bio with just only one scatterlist at one time for hardware
crypto engine. Especially for some encryption mode (like ecb or xts mode)
cooperating with the crypto engine, they just need one initial IV or null
IV instead of different IV for each sector. In this situation We can consider
to use multiple scatterlists to map the whole bio and send all scatterlists
of one bio to crypto engine to encrypt or decrypt, which can improve the
hardware engine's efficiency.

With this optimization, On my test setup (beaglebone black board) using 64KB
I/Os on an eMMC storage device I saw about 60% improvement in throughput for
encrypted writes, and about 100% improvement for encrypted reads. But this
is not fit for other modes which need different IV for each sector.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 drivers/md/dm-crypt.c |  188 +
 1 file changed, 176 insertions(+), 12 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 4f3cb35..1c86ea7 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -33,6 +33,7 @@
 #include 
 
 #define DM_MSG_PREFIX "crypt"
+#define DM_MAX_SG_LIST 1024
 
 /*
  * context holding the current state of a multi-part conversion
@@ -46,6 +47,8 @@ struct convert_context {
sector_t cc_sector;
atomic_t cc_pending;
struct skcipher_request *req;
+   struct sg_table sgt_in;
+   struct sg_table sgt_out;
 };
 
 /*
@@ -803,6 +806,108 @@ static struct crypt_iv_operations crypt_iv_tcw_ops = {
.post  = crypt_iv_tcw_post
 };
 
+/*
+ * Check how many sg entry numbers are needed when map one bio
+ * with scatterlists in advance.
+ */
+static unsigned int crypt_sg_entry(struct bio *bio_t)
+{
+   struct request_queue *q = bdev_get_queue(bio_t->bi_bdev);
+   int cluster = blk_queue_cluster(q);
+   struct bio_vec bvec, bvprv = { NULL };
+   struct bvec_iter biter;
+   unsigned long nbytes = 0, sg_length = 0;
+   unsigned int sg_cnt = 0, first_bvec = 0;
+
+   if (bio_t->bi_rw & REQ_DISCARD) {
+   if (bio_t->bi_vcnt)
+   return 1;
+   return 0;
+   }
+
+   if (bio_t->bi_rw & REQ_WRITE_SAME)
+   return 1;
+
+   bio_for_each_segment(bvec, bio_t, biter) {
+   nbytes = bvec.bv_len;
+
+   if (!cluster) {
+   sg_cnt++;
+   continue;
+   }
+
+   if (!first_bvec) {
+   first_bvec = 1;
+   goto new_segment;
+   }
+
+   if (sg_length + nbytes > queue_max_segment_size(q))
+   goto new_segment;
+
+   if (!BIOVEC_PHYS_MERGEABLE(, ))
+   goto new_segment;
+
+   if (!BIOVEC_SEG_BOUNDARY(q, , ))
+   goto new_segment;
+
+   sg_length += nbytes;
+   continue;
+
+new_segment:
+   memcpy(, , sizeof(struct bio_vec));
+   sg_length = nbytes;
+   sg_cnt++;
+   }
+
+   return sg_cnt;
+}
+
+static int crypt_convert_alloc_table(struct crypt_config *cc,
+struct convert_context *ctx)
+{
+   struct bio *bio_in = ctx->bio_in;
+   struct bio *bio_out = ctx->bio_out;
+   unsigned int mode = skcipher_is_bulk_mode(any_tfm(cc));
+   unsigned int sg_in_max, sg_out_max;
+   int ret = 0;
+
+   if (!mode)
+   goto out2;
+
+   /*
+* Need to calculate how many sg entry need to be used
+* for this bio.
+*/
+   sg_in_max = crypt_sg_entry(bio_in) + 1;
+   if (sg_in_max > DM_MAX_SG_LIST || sg_in_max <= 2)
+   goto out2;
+
+   ret = sg_alloc_table(>sgt_in, sg_in_max, GFP_KERNEL);
+   if (ret)
+   goto out2;
+
+   if (bio_data_dir(bio_in) == READ)
+   goto out1;
+
+   sg_out_max = crypt_sg_entry(bio_out) + 1;
+   if (sg_out_max > DM_MAX_SG_LIST || sg_out_max <= 2)
+   goto out3;
+
+   ret = sg_alloc_table(>sgt_out, sg_out_max, GFP_KERNEL);
+   if (ret)
+   goto out3;
+
+   return 0;
+
+out3:
+   sg_free_table(>sgt_in);
+out2:
+   ctx->sgt_in.orig_nents = 0;
+out1:
+   ctx->sgt_out.orig_nents = 0;
+   return ret;
+}
+
 static void crypt_convert_init(struct crypt_config *cc,
   struct convert_context *ctx,
   struct bio *bio_out, struct bio *bio_in,
@@ -843,7 +948,13 @@ static int crypt_convert_block(struct crypt_config *cc,
 {
struct bio_vec bv_in = bio_iter_iovec(ctx->bio_in, ctx->iter_in);
struct bio_vec bv_out = bio_iter_iovec(ctx->bio_out, ctx->iter_out);
+   unsi

[RFC 1/3] block: Introduce blk_bio_map_sg() to map one bio

2016-05-25 Thread Baolin Wang

In dm-crypt, it need to map one bio to scatterlist for improving the
hardware engine encryption efficiency. Thus this patch introduces the
blk_bio_map_sg() function to map one bio with scatterlists.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 block/blk-merge.c  |   45 +
 include/linux/blkdev.h |3 +++
 2 files changed, 48 insertions(+)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 2613531..9b92af4 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -417,6 +417,51 @@ single_segment:
 }
 
 /*
+ * map a bio to scatterlist, return number of sg entries setup.
+ */
+int blk_bio_map_sg(struct request_queue *q, struct bio *bio,
+  struct scatterlist *sglist,
+  struct scatterlist **sg)
+{
+   struct bio_vec bvec, bvprv = { NULL };
+   struct bvec_iter iter;
+   int nsegs, cluster;
+
+   nsegs = 0;
+   cluster = blk_queue_cluster(q);
+
+   if (bio->bi_rw & REQ_DISCARD) {
+   /*
+* This is a hack - drivers should be neither modifying the
+* biovec, nor relying on bi_vcnt - but because of
+* blk_add_request_payload(), a discard bio may or may not have
+* a payload we need to set up here (thank you Christoph) and
+* bi_vcnt is really the only way of telling if we need to.
+*/
+
+   if (bio->bi_vcnt)
+   goto single_segment;
+
+   return 0;
+   }
+
+   if (bio->bi_rw & REQ_WRITE_SAME) {
+single_segment:
+   *sg = sglist;
+   bvec = bio_iovec(bio);
+   sg_set_page(*sg, bvec.bv_page, bvec.bv_len, bvec.bv_offset);
+   return 1;
+   }
+
+   bio_for_each_segment(bvec, bio, iter)
+   __blk_segment_map_sg(q, , sglist, , sg,
+, );
+
+   return nsegs;
+}
+EXPORT_SYMBOL(blk_bio_map_sg);
+
+/*
  * map a request to scatterlist, return number of sg entries setup. Caller
  * must make sure sg can hold rq->nr_phys_segments entries
  */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1fd8fdf..e5de4f8 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1013,6 +1013,9 @@ extern void blk_queue_write_cache(struct request_queue 
*q, bool enabled, bool fu
 extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device 
*bdev);
 
 extern int blk_rq_map_sg(struct request_queue *, struct request *, struct 
scatterlist *);
+extern int blk_bio_map_sg(struct request_queue *q, struct bio *bio,
+ struct scatterlist *sglist,
+ struct scatterlist **sg);
 extern void blk_dump_rq_flags(struct request *, char *);
 extern long nr_blockdev_pages(void);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 2/3] crypto: Introduce CRYPTO_ALG_BULK flag

2016-05-25 Thread Baolin Wang

Now some cipher hardware engines prefer to handle bulk block rather than one
sector (512 bytes) created by dm-crypt, cause these cipher engines can handle
the intermediate values (IV) by themselves in one bulk block. This means we
can increase the size of the request by merging request rather than always 512
bytes and thus increase the hardware engine processing speed.

So introduce 'CRYPTO_ALG_BULK' flag to indicate this cipher can support bulk
mode.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 include/crypto/skcipher.h |7 +++
 include/linux/crypto.h|6 ++
 2 files changed, 13 insertions(+)

diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h
index 0f987f5..d89d29a 100644
--- a/include/crypto/skcipher.h
+++ b/include/crypto/skcipher.h
@@ -519,5 +519,12 @@ static inline void skcipher_request_set_crypt(
req->iv = iv;
 }
 
+static inline unsigned int skcipher_is_bulk_mode(struct crypto_skcipher 
*sk_tfm)
+{
+   struct crypto_tfm *tfm = crypto_skcipher_tfm(sk_tfm);
+
+   return crypto_tfm_alg_bulk(tfm);
+}
+
 #endif /* _CRYPTO_SKCIPHER_H */
 
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index 6e28c89..a315487 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -63,6 +63,7 @@
 #define CRYPTO_ALG_DEAD0x0020
 #define CRYPTO_ALG_DYING   0x0040
 #define CRYPTO_ALG_ASYNC   0x0080
+#define CRYPTO_ALG_BULK0x0100
 
 /*
  * Set this bit if and only if the algorithm requires another algorithm of
@@ -623,6 +624,11 @@ static inline u32 crypto_tfm_alg_type(struct crypto_tfm 
*tfm)
return tfm->__crt_alg->cra_flags & CRYPTO_ALG_TYPE_MASK;
 }
 
+static inline unsigned int crypto_tfm_alg_bulk(struct crypto_tfm *tfm)
+{
+   return tfm->__crt_alg->cra_flags & CRYPTO_ALG_BULK;
+}
+
 static inline unsigned int crypto_tfm_alg_blocksize(struct crypto_tfm *tfm)
 {
return tfm->__crt_alg->cra_blocksize;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC 0/3] Introduce the bulk mode method when sending request to crypto layer

2016-05-25 Thread Baolin Wang

This patchset will check if the cipher can support bulk mode, then dm-crypt
will handle different ways to send requests to crypto layer according to
cipher mode.

Looking forward to any comments and suggestions. Thanks.

Baolin Wang (3):
  block: Introduce blk_bio_map_sg() to map one bio
  crypto: Introduce CRYPTO_ALG_BULK flag
  md: dm-crypt: Introduce the bulk mode method when sending request

 block/blk-merge.c |   45 +++
 drivers/md/dm-crypt.c |  188 ++---
 include/crypto/skcipher.h |7 ++
 include/linux/blkdev.h|3 +
 include/linux/crypto.h|6 ++
 5 files changed, 237 insertions(+), 12 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/4] scatterlist: Introduce some helper functions

2016-04-20 Thread Baolin Wang

Hi Robert,

On 5 April 2016 at 15:10, Baolin Wang <baolin.w...@linaro.org> wrote:
> Hi Robert,
>
> Sorry for the late reply.
>
> On 2 April 2016 at 23:00, Robert Jarzmik <robert.jarz...@free.fr> wrote:
>> Baolin Wang <baolin.w...@linaro.org> writes:
>>
>>> +/**
>>> + * sg_is_contiguous - Check if the scatterlists are contiguous
>>> + * @sga: SG entry
>>> + * @sgb: SG entry
>>> + *
>>> + * Description:
>>> + *   If the sga scatterlist is contiguous with the sgb scatterlist,
>>> + *   that means they can be merged together.
>>> + *
>>> + **/
>>> +static inline bool sg_is_contiguous(struct scatterlist *sga,
>>> + struct scatterlist *sgb)
>>> +{
>>> + return *(unsigned long *)sg_virt(sga) + sga->length ==
>>> + *(unsigned long *)sg_virt(sgb);
>> As I already said, I don't like casts.
>
> OK. Could you give me a good example. Thanks.
>
>>
>> But let's take some height : you're needing this function to decide to merge
>> scatterlists. That means that you think the probability of having 2 
>> scatterlist
>> mergeable is not meaningless, ie. 50% or more.
>>
>> I suppose your scatterlists are allocated through kmalloc(). I'd like to 
>> know,
>> through your testing, what is the success rate of sg_is_contiguous(), ie. I'd
>> like to know how many times sg_is_contiguous() was called, and amongst these
>> calls how many times it returned true.
>>
>> That will tell me how "worth" is this optimization.
>
> I think this is just one useful API for users, If the rate is only 1%,
> we also need to check if they are contiguous to decide if they can be
> coalesced.
>
>>
>>> + * sg_add_sg_to_table - Add one scatterlist into sg table
>>> + * @sgt: The sg table header to use
>>> + * @src: The sg need to be added into sg table
>>> + *
>>> + * Description:
>>> + *   The 'nents' member indicates how many mapped scatterlists has been 
>>> added
>>> + *   in the dynamic sg table. The 'orig_nents' member indicates the size 
>>> of the
>>> + *   dynamic sg table.
>>> + *
>>> + *   Copy one mapped @src@ scatterlist into the dynamic sg table and 
>>> increase
>>> + *   'nents' member.
>>> + *
>>> + **/
>> Okay, I still believe this one is wrong, because we don't understand each 
>> other.
>> So let's take an example :
>> sg_table = {
>>  .sgl = {
>> {
>> .page_link = PAGE_48,
>> .offset = 0,
>> .length = 2048,
>> .dma_address = 0x3,
>> .dma_length = 4096,
>> },
>> {
>> .page_link = PAGE_48 | 0x02,
>> .offset = 2048,
>> .length = 2048,
>> .dma_address = 0,
>> .dma_length = 0,
>> },
>> },
>>  .nents = 1,
>>  .orig_nents = 2,
>> };
>>
>> In this example, by sheer luck the 2 scatterlist entries were physically
>> contiguous, and the mapping function coallesced the 2 entries into only one
>> (dma_address, dma_length) entry. That could also happen with an IOMMU by the
>> way.  Therefore, sg_table.nents = 1.
>>
>> If I understand your function correctly, it will erase sg_table.sgl[1], and 
>> that
>> looks incorrect to me. This is why I can't understand how your code can be
>> correct, and why I say you add a new "meaning" to sg_table->nents, which is 
>> not
>> consistent with the meaning I understand.
>
> I think you misunderstood my point. The 'sg_add_sg_to_table()'
> function is used to add one mapped scatterlist into the dynamic sg
> table. For example:
> 1. How to add one mapped sg into dynamic sg table
> (1) If we allocate one dynamic sg table with 3 (small size can be
> showed easily) empty scatterlists.:
> sg_table = {
>   .sgl = {
>  {
>  .page_link = 0,
>  .offset = 0,
>  .length = 0,
>  },
>  {
>  .page_link = 0,
>  .offset = 0,
>  .length = 0,
>  },
>  {
>  .page_lin

Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

2016-04-18 Thread Baolin Wang

On 18 April 2016 at 16:41, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Mon, Apr 18, 2016 at 04:40:36PM +0800, Baolin Wang wrote:
>>
>> Simply to say, now there are many different hardware engines for
>> different vendors, some engines can support bulk block but some can
>> not (or no cipher hardware engine), then the dm-crypt can not know
>> your hardware engine features. If the dm-crypt send one bulk block to
>> low level, but the engine driver can not support bulk block, then it
>> will crash. So we did the merging action in driver level not dm-crypt
>> level.
>
> Surely we can handle it in the crypto API layer, just as we do GSO
> in the network stack for drivers that can't handle TSO?

Yes, it is similar. Thanks.

>
> Cheers,
> --
> Email: Herbert Xu <herb...@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

2016-04-18 Thread Baolin Wang

On 18 April 2016 at 16:31, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Mon, Apr 18, 2016 at 04:28:46PM +0800, Baolin Wang wrote:
>>
>> What I meaning is if the xts engine can support bulk block, then the
>> engine driver can select bulk mode to do encryption, but if their xts
>> engine can not support bulk mode, which depends on hardware design,
>> the engine driver can not select bulk mode. So the dm-crypt can not
>> know what will be selected by the engine driver, it can not send one
>> bulk block each time.
>
> Why can't the xts code just break it up if it can't handle it?

Simply to say, now there are many different hardware engines for
different vendors, some engines can support bulk block but some can
not (or no cipher hardware engine), then the dm-crypt can not know
your hardware engine features. If the dm-crypt send one bulk block to
low level, but the engine driver can not support bulk block, then it
will crash. So we did the merging action in driver level not dm-crypt
level.

>
> You want to postpone splitting as much as possible.  Even if the
> underlying xts code couldn't handle it, it would still make sense
> for the crypto API to see the request in one piece.
>
> Cheers,
> --
> Email: Herbert Xu <herb...@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

2016-04-18 Thread Baolin Wang

On 18 April 2016 at 16:04, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Mon, Apr 18, 2016 at 03:58:59PM +0800, Baolin Wang wrote:
>>
>> That depends on the hardware engine. Some cipher hardware engines
>> (like xts(aes) engine) can handle the intermediate values (IV) by
>> themselves in one bulk block, which means we can increase the size of
>> the request by merging request rather than always 512 bytes and thus
>> increase the hardware engine processing speed. But for some other
>> hardware engines (like cbc(aes) engine), they can not support bulk
>> block, must sector by sector. So the engine drivers can select the
>> suitable mode to do encryption/decryption.
>
> So what is this supposed to handle, xts or cbc?

As I know, now cbc engine also need to handle requests sector by
sector, but for xts/ecb engine can support bulk block, which means can
merge requests.

>
>> > Even with batching we should be involving the user because only the
>> > user knows (if anyone does) whether more data will be forthcoming.
>>
>> If this cipher engine can support bulk block encryption, the crypto
>> engine framework can merge requests if they are eligible
>> automatically. Don't need to worry about how many data will be
>> forthcoming.
>
> Merging is simply wrong when the data is coming in as one piece
> and you've just artifically broken it up, only to merge it later.

It will not broke it up,  and it will check if the requests coming
from dm-crypt can be merged together.

>
> If the data can be merged then it should have stayed as one piece
> rather than being fragmented.

Yes, usually one whole block can be merged into one request as the latency.

>
> Cheers,
> --
> Email: Herbert Xu <herb...@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

2016-04-18 Thread Baolin Wang

On 18 April 2016 at 15:24, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Mon, Apr 18, 2016 at 03:21:16PM +0800, Baolin Wang wrote:
>>
>> I don't think so, the dm-crypt can not send maximal requests at some
>> situations. For example, the 'cbc(aes)' cipher, it must be handled
>> sector by sector (IV is dependency for each sector), so the dm-crypt
>> can not send maximal requests, but must sector by sector in this
>> situation.
>
> If you can't merge them as requests, then how is the hardware going
> to benefit from batching them? Or are you talking about parallel

That depends on the hardware engine. Some cipher hardware engines
(like xts(aes) engine) can handle the intermediate values (IV) by
themselves in one bulk block, which means we can increase the size of
the request by merging request rather than always 512 bytes and thus
increase the hardware engine processing speed. But for some other
hardware engines (like cbc(aes) engine), they can not support bulk
block, must sector by sector. So the engine drivers can select the
suitable mode to do encryption/decryption.

> processing similar to sha-mb?
>
> Even with batching we should be involving the user because only the
> user knows (if anyone does) whether more data will be forthcoming.

If this cipher engine can support bulk block encryption, the crypto
engine framework can merge requests if they are eligible
automatically. Don't need to worry about how many data will be
forthcoming.

>
> Cheers,
> --
> Email: Herbert Xu <herb...@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

2016-04-18 Thread Baolin Wang

On 18 April 2016 at 15:04, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Mon, Apr 18, 2016 at 02:02:51PM +0800, Baolin Wang wrote:
>>
>> If the crypto hardware engine can support bulk data
>> encryption/decryption, so the engine driver can select bulk mode to
>> handle the requests. I think it is a totally driver things, not in
>> dmcrypt. The dmcrypt can not get the hardware engine's attributes.
>
> It has nothing to do with the hardware attributes.  dm-crypt should
> be sending maximal requests in the first place.

I don't think so, the dm-crypt can not send maximal requests at some
situations. For example, the 'cbc(aes)' cipher, it must be handled
sector by sector (IV is dependency for each sector), so the dm-crypt
can not send maximal requests, but must sector by sector in this
situation.

>
> Cheers,
> --
> Email: Herbert Xu <herb...@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

2016-04-18 Thread Baolin Wang

On 18 April 2016 at 13:45, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Mon, Apr 18, 2016 at 01:31:09PM +0800, Baolin Wang wrote:
>>
>> We've tried to do this in dm-crypt, but it failed.
>> The dm-crypt maintainer explained to me that I should optimize the
>> driver, not add strange hw-dependent crypto modes to dm-crypt, this is
>> not the first crypto accelerator that is just not suited for this kind
>> of use.
>> He thought if it can process batch of chunks of data each with own IV,
>> then it can work with dm-crypt, but he thought such optimized code
>> should be inside crypto API, not in dmcrypt.
>
> That's a completely bogus argument.  The user always has more
> information available than the underlying API.  So it is totally
> stupid to have the API try to extract information that the user
> could have provided in the first place.

If the crypto hardware engine can support bulk data
encryption/decryption, so the engine driver can select bulk mode to
handle the requests. I think it is a totally driver things, not in
dmcrypt. The dmcrypt can not get the hardware engine's attributes.

>
> I'm not taking this patch-set.
>
> Cheers,
> --
> Email: Herbert Xu <herb...@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 0/4] Introduce bulk mode for crypto engine framework

2016-04-17 Thread Baolin Wang

Hi Herbert,

On 15 April 2016 at 21:48, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Tue, Mar 15, 2016 at 03:47:58PM +0800, Baolin Wang wrote:
>> Now some cipher hardware engines prefer to handle bulk block by merging 
>> requests
>> to increase the block size and thus increase the hardware engine processing 
>> speed.
>>
>> This patchset introduces request bulk mode to help the crypto hardware 
>> drivers
>> improve in efficiency.
>
> Could you please explain why this merging can't be done in dm-crypt
> instead?

We've tried to do this in dm-crypt, but it failed.
The dm-crypt maintainer explained to me that I should optimize the
driver, not add strange hw-dependent crypto modes to dm-crypt, this is
not the first crypto accelerator that is just not suited for this kind
of use.
He thought if it can process batch of chunks of data each with own IV,
then it can work with dm-crypt, but he thought such optimized code
should be inside crypto API, not in dmcrypt.

I think his suggestion is reasonable, so we introduce the crypto
engine framework to factor out the common patterns for driving the
queue of operations. Then it will be more reasonable to do the bulk
mode optimization in crypto engine framework. Thanks.

>
> Thanks,
> --
> Email: Herbert Xu <herb...@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/4] scatterlist: Introduce some helper functions

2016-04-05 Thread Baolin Wang

Hi Robert,

Sorry for the late reply.

On 2 April 2016 at 23:00, Robert Jarzmik <robert.jarz...@free.fr> wrote:
> Baolin Wang <baolin.w...@linaro.org> writes:
>
>> +/**
>> + * sg_is_contiguous - Check if the scatterlists are contiguous
>> + * @sga: SG entry
>> + * @sgb: SG entry
>> + *
>> + * Description:
>> + *   If the sga scatterlist is contiguous with the sgb scatterlist,
>> + *   that means they can be merged together.
>> + *
>> + **/
>> +static inline bool sg_is_contiguous(struct scatterlist *sga,
>> + struct scatterlist *sgb)
>> +{
>> + return *(unsigned long *)sg_virt(sga) + sga->length ==
>> + *(unsigned long *)sg_virt(sgb);
> As I already said, I don't like casts.

OK. Could you give me a good example. Thanks.

>
> But let's take some height : you're needing this function to decide to merge
> scatterlists. That means that you think the probability of having 2 
> scatterlist
> mergeable is not meaningless, ie. 50% or more.
>
> I suppose your scatterlists are allocated through kmalloc(). I'd like to know,
> through your testing, what is the success rate of sg_is_contiguous(), ie. I'd
> like to know how many times sg_is_contiguous() was called, and amongst these
> calls how many times it returned true.
>
> That will tell me how "worth" is this optimization.

I think this is just one useful API for users, If the rate is only 1%,
we also need to check if they are contiguous to decide if they can be
coalesced.

>
>> + * sg_add_sg_to_table - Add one scatterlist into sg table
>> + * @sgt: The sg table header to use
>> + * @src: The sg need to be added into sg table
>> + *
>> + * Description:
>> + *   The 'nents' member indicates how many mapped scatterlists has been 
>> added
>> + *   in the dynamic sg table. The 'orig_nents' member indicates the size of 
>> the
>> + *   dynamic sg table.
>> + *
>> + *   Copy one mapped @src@ scatterlist into the dynamic sg table and 
>> increase
>> + *   'nents' member.
>> + *
>> + **/
> Okay, I still believe this one is wrong, because we don't understand each 
> other.
> So let's take an example :
> sg_table = {
>  .sgl = {
> {
> .page_link = PAGE_48,
> .offset = 0,
> .length = 2048,
> .dma_address = 0x3,
> .dma_length = 4096,
> },
> {
> .page_link = PAGE_48 | 0x02,
> .offset = 2048,
> .length = 2048,
> .dma_address = 0,
> .dma_length = 0,
> },
> },
>  .nents = 1,
>  .orig_nents = 2,
> };
>
> In this example, by sheer luck the 2 scatterlist entries were physically
> contiguous, and the mapping function coallesced the 2 entries into only one
> (dma_address, dma_length) entry. That could also happen with an IOMMU by the
> way.  Therefore, sg_table.nents = 1.
>
> If I understand your function correctly, it will erase sg_table.sgl[1], and 
> that
> looks incorrect to me. This is why I can't understand how your code can be
> correct, and why I say you add a new "meaning" to sg_table->nents, which is 
> not
> consistent with the meaning I understand.

I think you misunderstood my point. The 'sg_add_sg_to_table()'
function is used to add one mapped scatterlist into the dynamic sg
table. For example:
1. How to add one mapped sg into dynamic sg table
(1) If we allocate one dynamic sg table with 3 (small size can be
showed easily) empty scatterlists.:
sg_table = {
  .sgl = {
 {
 .page_link = 0,
 .offset = 0,
 .length = 0,
 },
 {
 .page_link = 0,
 .offset = 0,
 .length = 0,
 },
 {
 .page_link = 0 | 0x02,
 .offset = 0,
 .length = 0,
 },
 },
  .nents = 0,
  .orig_nents = 3,
 };

(2) If some module (like dm-crypt module) always sends one mapped
scatterlist (size is always 512bytes) to another module (crypto
driver) to handle the mapped scatterlist at one time. But we want to
collect the mapped scatterlist into one dynamic sg table to manage
them together, means we can send multiple mapped scatterlists (from
sg_table->sgl) to the crypto driver to handle them together to improve
its e

[PATCH v2 4/4] md: dm-crypt: Initialize the sector number for one request

2016-03-15 Thread Baolin Wang

If the crypto engine can support the bulk mode, that means the contiguous
requests from one block can be merged into one request to be handled by
crypto engine. If so, the crypto engine need the sector number of one request
to do merging action.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 drivers/md/dm-crypt.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3147c8d..9e2dbfd 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -866,6 +866,7 @@ static int crypt_convert_block(struct crypt_config *cc,
return r;
}
 
+   req->sector = ctx->cc_sector;
ablkcipher_request_set_crypt(req, >sg_in, >sg_out,
 1 << SECTOR_SHIFT, iv);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/4] crypto: Introduce some helper functions to help to merge requests

2016-03-15 Thread Baolin Wang

Usually the dm-crypt subsystem will send encryption/descryption requests to
the crypto layer one block at a time, making each request 512 bytes long,
which is a much smaller size for hardware engine, that means the hardware
engine can not play its best performance.

Now some cipher hardware engines prefer to handle bulk block rather than one
sector (512 bytes) created by dm-crypt, cause these cipher engines can handle
the intermediate values (IV) by themselves in one bulk block. This means we
can increase the size of the request by merging request rather than always 512
bytes and thus increase the hardware engine processing speed.

This patch introduces some helper functions to help to merge requests to improve
hardware engine efficiency.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 crypto/ablk_helper.c |  135 ++
 include/crypto/ablk_helper.h |3 +
 include/linux/crypto.h   |5 ++
 3 files changed, 143 insertions(+)

diff --git a/crypto/ablk_helper.c b/crypto/ablk_helper.c
index e1fcf53..3cf15cb 100644
--- a/crypto/ablk_helper.c
+++ b/crypto/ablk_helper.c
@@ -26,6 +26,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -34,6 +35,140 @@
 #include 
 #include 
 
+/**
+ * ablk_link_request_if_contigous - try to link one request into previous one
+ * if the page address is contiguous.
+ * @list_req: the request from queue list
+ * @req: the new request need to be merged
+ *
+ * If the listed request and new request's pages of the scatterlists are
+ * contiguous, then merge the scatterlists of new request into the listed one.
+ *
+ * Return true on success, others means failed.
+ */
+static bool ablk_link_request_if_contigous(struct ablkcipher_request *list_req,
+  struct ablkcipher_request *req)
+{
+   struct scatterlist *last_src_sg =
+   sg_last(list_req->sgt_src.sgl, list_req->sgt_src.nents);
+   struct scatterlist *last_dst_sg =
+   sg_last(list_req->sgt_dst.sgl, list_req->sgt_dst.nents);
+   struct scatterlist *req_src_sg = req->src;
+   struct scatterlist *req_dst_sg = req->dst;
+
+   if (!last_src_sg || !last_dst_sg)
+   return false;
+
+   /* Check if the src/dst scatterlists are contiguous */
+   if (!sg_is_contiguous(last_src_sg, req_src_sg) ||
+   !sg_is_contiguous(last_dst_sg, req_dst_sg))
+   return false;
+
+   /*
+* If the request can be merged into the listed request after the
+* checking, then expand the listed request scatterlists' length.
+*/
+   last_src_sg->length += req_src_sg->length;
+   last_dst_sg->length += req_dst_sg->length;
+   list_req->nbytes += req->nbytes;
+
+   return true;
+}
+
+/**
+ * ablk_merge_request - try to merge one request into previous one
+ * @list_req: the request from queue list
+ * @req: the request need to be merged
+ *
+ * This function will create a dynamic scatterlist table for both source
+ * and destination if the request is the first coming in.
+ *
+ * Return true on success, others means failed.
+ */
+static bool ablk_merge_request(struct ablkcipher_request *list_req,
+  struct ablkcipher_request *req)
+{
+   struct sg_table *sgt_src = _req->sgt_src;
+   struct sg_table *sgt_dst = _req->sgt_dst;
+   unsigned int nents = SG_MAX_SINGLE_ALLOC;
+
+   if (sg_table_is_empty(sgt_src)) {
+   if (sg_alloc_empty_table(sgt_src, nents, GFP_ATOMIC))
+   return false;
+
+   if (sg_add_sg_to_table(sgt_src, list_req->src))
+   return false;
+   }
+
+   if (sg_table_is_empty(sgt_dst)) {
+   if (sg_alloc_empty_table(sgt_dst, nents, GFP_ATOMIC))
+   return false;
+
+   if (sg_add_sg_to_table(sgt_dst, list_req->dst))
+   return false;
+   }
+
+   /*
+* Check if the new request is contiguous for the listed request,
+* if it is contiguous then merge the new request into the listed one.
+*/
+   if (ablk_link_request_if_contigous(list_req, req))
+   return true;
+
+   if (sg_add_sg_to_table(sgt_src, req->src))
+   return false;
+
+   if (sg_add_sg_to_table(sgt_dst, req->dst))
+   return false;
+
+   list_req->nbytes += req->nbytes;
+   return true;
+}
+
+/**
+ * ablk_try_merge - try to merge one request into previous one
+ * @queue: the crypto queue list
+ * @req: the request need to be merged
+ *
+ * Note: The merging action should be under the spinlock or mutex protection.
+ *
+ * Return 0 on success and others are failed.
+ */
+int ablk_try_merge(struct crypto_queue *queue,
+  struct ablkcipher_request *req)
+{
+   struct ablkcipher_request *list_req;
+   str

[PATCH v2 3/4] crypto: Introduce the bulk mode for crypto engine framework

2016-03-15 Thread Baolin Wang

Now some cipher hardware engines prefer to handle bulk block by merging
requests to increase the block size and thus increase the hardware engine
processing speed.

This patch introduces request bulk mode to help the crypto hardware drivers
improve in efficiency, and chooses the suitable mode (SECTOR_MODE) for
initializing omap aes engine.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 crypto/Kconfig|1 +
 crypto/crypto_engine.c|  122 +++--
 drivers/crypto/omap-aes.c |2 +-
 include/crypto/algapi.h   |   23 -
 4 files changed, 143 insertions(+), 5 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index c844227..6a2f9a6 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -229,6 +229,7 @@ config CRYPTO_GLUE_HELPER_X86
 
 config CRYPTO_ENGINE
tristate
+   select CRYPTO_ABLK_HELPER
 
 comment "Authenticated Encryption with Associated Data"
 
diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c
index a55c82d..0de5829 100644
--- a/crypto/crypto_engine.c
+++ b/crypto/crypto_engine.c
@@ -14,6 +14,7 @@
 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 #define CRYPTO_ENGINE_MAX_QLEN 10
@@ -84,6 +85,17 @@ static void crypto_pump_requests(struct crypto_engine 
*engine,
 
req = ablkcipher_request_cast(async_req);
 
+   /*
+* If the engine supports the bulk mode and the request is allocated the
+* sg table to expand scatterlists entries, then it need to point the
+* scatterlists from the sg table.
+*/
+   if (engine->mode == SECTOR_BULK_MODE && req->sgt_src.orig_nents &&
+   req->sgt_dst.orig_nents) {
+   req->src = req->sgt_src.sgl;
+   req->dst = req->sgt_dst.sgl;
+   }
+
engine->cur_req = req;
if (backlog)
backlog->complete(backlog, -EINPROGRESS);
@@ -137,9 +149,46 @@ static void crypto_pump_work(struct kthread_work *work)
 }
 
 /**
+ * crypto_merge_request_to_engine - try to merge one request into previous one
+ * @engine: the hardware engine
+ * @req: the request need to be merged
+ *
+ * If the crypto engine supports bulk mode, then try to merge the new request
+ * into the listed one from engine queue to handle them together.
+ *
+ * Return 0 on success and others are failed.
+ */
+static bool crypto_merge_request_to_engine(struct crypto_engine *engine,
+  struct ablkcipher_request *req)
+{
+   /*
+* The request is allocated from memory pool in dm-crypt, here need to
+* do initialization for sg table in case some random values.
+*/
+   req->sgt_src.orig_nents = 0;
+   req->sgt_dst.orig_nents = 0;
+
+   /*
+* If the hardware engine can not support the bulk mode encryption,
+* just return 1 means merging failed.
+*/
+   if (engine->mode != SECTOR_BULK_MODE)
+   return 1;
+
+   return ablk_try_merge(>queue, req);
+}
+
+/**
  * crypto_transfer_request - transfer the new request into the engine queue
  * @engine: the hardware engine
  * @req: the request need to be listed into the engine queue
+ *
+ * Firstly it will check if the new request can be merged into previous one
+ * if their secotr numbers are continuous, if not should list it into engine
+ * queue.
+ *
+ * If the new request can be merged into the previous request, then just
+ * finalize the new request.
  */
 int crypto_transfer_request(struct crypto_engine *engine,
struct ablkcipher_request *req, bool need_pump)
@@ -154,6 +203,26 @@ int crypto_transfer_request(struct crypto_engine *engine,
return -ESHUTDOWN;
}
 
+   /*
+* Here need to check if the request can be merged into previous
+* request. If the hardware engine can support encryption with
+* bulk block, we can merge the new request into previous request
+* if their secotr numbers are continuous, which can be handled
+* together by engine to improve the encryption efficiency.
+* Return -EINPROGRESS means it has been merged into previous request,
+* so just end up this request.
+*/
+   ret = crypto_merge_request_to_engine(engine, req);
+   if (!ret) {
+   spin_unlock_irqrestore(>queue_lock, flags);
+   crypto_finalize_request(engine, req, 0);
+   return -EINPROGRESS;
+   }
+
+   /*
+* If the request can not be merged into previous request, then list it
+* into the queue of engine, and will be handled by kthread worker.
+*/
ret = ablkcipher_enqueue_request(>queue, req);
 
if (!engine->busy && need_pump)
@@ -178,7 +247,8 @@ int crypto_transfer_request_to_engine(struct crypto_engine 
*engine,
 EXPORT_SYMBOL_GPL(crypto_transfer_re

[PATCH v2 1/4] scatterlist: Introduce some helper functions

2016-03-15 Thread Baolin Wang

In crypto engine framework, one request can combine (copy) other requests'
scatterlists into its dynamic sg table to manage them together as a bulk
block , which can improve engine efficency with handling bulk block. Thus
we need some helper functions to manage dynamic scattertables.

This patch introduces 'sg_is_contiguous()' function to check if two
scatterlists are contiguous, 'sg_alloc_empty_table()' function to
allocate one empty dynamic sg table, 'sg_add_sg_to_table()' function
to copy one mapped scatterlist into sg table and 'sg_table_is_empty'
function to check if the sg table is empty.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 include/linux/scatterlist.h |   33 +
 lib/scatterlist.c   |   69 +++
 2 files changed, 102 insertions(+)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 556ec1e..c1ed9f4 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -212,6 +212,20 @@ static inline void sg_unmark_end(struct scatterlist *sg)
 }
 
 /**
+ * sg_table_is_empty - Check if the sg table is empty
+ * @sgt: sg table
+ *
+ * Description:
+ *   The 'orig_nents' member of one sg table is used to indicate how many
+ *   scatterlists in the sg table.
+ *
+ **/
+static inline bool sg_table_is_empty(struct sg_table *sgt)
+{
+   return !sgt->orig_nents;
+}
+
+/**
  * sg_phys - Return physical address of an sg entry
  * @sg: SG entry
  *
@@ -241,6 +255,23 @@ static inline void *sg_virt(struct scatterlist *sg)
return page_address(sg_page(sg)) + sg->offset;
 }
 
+/**
+ * sg_is_contiguous - Check if the scatterlists are contiguous
+ * @sga: SG entry
+ * @sgb: SG entry
+ *
+ * Description:
+ *   If the sga scatterlist is contiguous with the sgb scatterlist,
+ *   that means they can be merged together.
+ *
+ **/
+static inline bool sg_is_contiguous(struct scatterlist *sga,
+   struct scatterlist *sgb)
+{
+   return *(unsigned long *)sg_virt(sga) + sga->length ==
+   *(unsigned long *)sg_virt(sgb);
+}
+
 int sg_nents(struct scatterlist *sg);
 int sg_nents_for_len(struct scatterlist *sg, u64 len);
 struct scatterlist *sg_next(struct scatterlist *);
@@ -261,6 +292,8 @@ void sg_free_table(struct sg_table *);
 int __sg_alloc_table(struct sg_table *, unsigned int, unsigned int,
 struct scatterlist *, gfp_t, sg_alloc_fn *);
 int sg_alloc_table(struct sg_table *, unsigned int, gfp_t);
+int sg_alloc_empty_table(struct sg_table *, unsigned int, gfp_t);
+int sg_add_sg_to_table(struct sg_table *, struct scatterlist *);
 int sg_alloc_table_from_pages(struct sg_table *sgt,
struct page **pages, unsigned int n_pages,
unsigned long offset, unsigned long size,
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 004fc70..6d3f3b0 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -370,6 +370,75 @@ int sg_alloc_table(struct sg_table *table, unsigned int 
nents, gfp_t gfp_mask)
 EXPORT_SYMBOL(sg_alloc_table);
 
 /**
+ * sg_add_sg_to_table - Add one scatterlist into sg table
+ * @sgt:   The sg table header to use
+ * @src:   The sg need to be added into sg table
+ *
+ * Description:
+ *   The 'nents' member indicates how many mapped scatterlists has been added
+ *   in the dynamic sg table. The 'orig_nents' member indicates the size of the
+ *   dynamic sg table.
+ *
+ *   Copy one mapped @src@ scatterlist into the dynamic sg table and increase
+ *   'nents' member.
+ *
+ **/
+int sg_add_sg_to_table(struct sg_table *sgt, struct scatterlist *src)
+{
+   unsigned int i = 0, orig_nents = sgt->orig_nents;
+   struct scatterlist *sgl = sgt->sgl;
+   struct scatterlist *sg;
+
+   /* Check if there are enough space for the new sg to be added */
+   if (sgt->nents >= sgt->orig_nents)
+   return -EINVAL;
+
+   for_each_sg(sgl, sg, orig_nents, i) {
+   if (sgt->nents > 0 && i == (sgt->nents - 1)) {
+   sg_unmark_end(sg);
+   } else if (i == sgt->nents) {
+   memcpy(sg, src, sizeof(struct scatterlist));
+   sg_mark_end(sg);
+   sgt->nents++;
+   break;
+   }
+   }
+
+   return 0;
+}
+
+/**
+ * sg_alloc_empty_table - Allocate one empty dynamic sg table
+ * @sgt:   The sg table header to use
+ * @nents: Number of entries in sg list
+ * @gfp_mask:  GFP allocation mask
+ *
+ *  Description:
+ *Allocate and initialize one dynamic sg table. One dynamic sg table means
+ *it need allocate @nents@ empty scatterlists entries and is used to copy
+ *other mapped scatterlists into the dynamic sg table.
+ *
+ *The 'nents' member indicates how many scatterlists has been copied into
+ *the dynamic sg table. It should set 0 which means there are no

[PATCH v2 0/4] Introduce bulk mode for crypto engine framework

2016-03-15 Thread Baolin Wang

Now some cipher hardware engines prefer to handle bulk block by merging requests
to increase the block size and thus increase the hardware engine processing 
speed.

This patchset introduces request bulk mode to help the crypto hardware drivers
improve in efficiency.

Changes since v1:
 - Modify the sg_is_contiguous() function.

Baolin Wang (4):
  scatterlist: Introduce some helper functions
  crypto: Introduce some helper functions to help to merge requests
  crypto: Introduce the bulk mode for crypto engine framework
  md: dm-crypt: Initialize the sector number for one request

 crypto/Kconfig   |1 +
 crypto/ablk_helper.c |  135 ++
 crypto/crypto_engine.c   |  122 +-
 drivers/crypto/omap-aes.c|2 +-
 drivers/md/dm-crypt.c|1 +
 include/crypto/ablk_helper.h |3 +
 include/crypto/algapi.h  |   23 ++-
 include/linux/crypto.h   |5 ++
 include/linux/scatterlist.h  |   33 +++
 lib/scatterlist.c|   69 +
 10 files changed, 389 insertions(+), 5 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] scatterlist: Introduce some helper functions

2016-03-10 Thread Baolin Wang

On 10 March 2016 at 17:42, Robert Jarzmik  wrote:
>>
>>
>> Ah, sorry that's a mistake. It should check as below:
>> static inline bool sg_is_contiguous(struct scatterlist *sga, struct
>> scatterlist *sgb)
>> {
>> return (unsigned int)sg_virt(sga) + sga->length == (unsigned
>> int)sg_virt(sgb);
>> }
> NAK.
> First, I don't like the cast.
> Second ask yourself: what if you compile on a 64 bit architecture ?

Sorry, it's a mistake. I've resend one email to correct that with
'unsigned long'.

> Third, I don't like void* arithmetics, some compilers might refuse it.

OK. I will modify that.

>
> And as a last comment, is it "virtual" contiguity you're looking after, 
> physical
> contiguity, or dma_addr_t contiguity ?

Yes, I need check the virtual contiguity.

>
 @@ -370,6 +370,65 @@ int sg_alloc_table(struct sg_table *table, unsigned 
 int
 nents, gfp_t gfp_mask)
>>> ...
  /**
 + * sg_add_sg_to_table - Add one scatterlist into sg table
 + * @sgt: The sg table header to use
 + * @src: The sg need to be added into sg table
 + *
 + * Description:
 + *   The 'nents' member indicates how many scatterlists added in the sg 
 table.
 + *   Copy the @src@ scatterlist into sg table and increase 'nents' member.
 + *
 + **/
 +int sg_add_sg_to_table(struct sg_table *sgt, struct scatterlist *src)
 +{
 + unsigned int i = 0, orig_nents = sgt->orig_nents;
 + struct scatterlist *sgl = sgt->sgl;
 + struct scatterlist *sg;
 +
 + /* Check if there are enough space for the new sg to be added */
 + if (sgt->nents >= sgt->orig_nents)
 + return -EINVAL;
>>> I must admit I don't understand that one either : how do comparing the 
>>> number of
>>> "mapped" entries against the number of "allocated" entries determines if 
>>> there
>>> is enough room ?
>>
>> That's for a dynamic sg table. If there is one sg table allocated
>> 'orig_nents' scatterlists, and we need copy another mapped scatterlist
>> into the sg table if there are some requirements. So we use 'nents' to
>> record how many scatterlists have been copied into the sg table.
> As I'm still having difficulities to understand, please explain to me :
>  - what is a dynamic sg_table

OK. That's for some special usage. For example, the dm-crypt will send
one request mapped one scatterlist to engine level to handle the
request at one time. But we want to collect some requests together to
handle together at one time to improve engine efficiency. So in crypto
engine layer we need to copy the mapped scatterlists into one sg
table, which has been allocated some sg memory.

>  - how it works

The 'orig_nents' means how many scatterlists the sg table can hold.
The 'nents' means how many scatterlists have been copied into sg
table. So the 'nents' will be not equal as 'orig_nents' unless the sg
table is full.

>  - if it's "struct sg_table", how the field of this structure are different 
> when
>it's a dynamic sg_table from a "static sg_table" ?

>
 +/**
 + * sg_alloc_empty_table - Allocate one empty sg table
 + * @sgt: The sg table header to use
 + * @nents:   Number of entries in sg list
 + * @gfp_mask:GFP allocation mask
 + *
 + *  Description:
 + *Allocate and initialize an sg table. The 'nents' member of sg_table
 + *indicates how many scatterlists added in the sg table. It should set
 + *0 which means there are no scatterlists added in this sg table now.
 + *
 + **/
 +int sg_alloc_empty_table(struct sg_table *sgt, unsigned int nents,
 +  gfp_t gfp_mask)
>>> As for this one, there has to be a purpose for it I fail to see. From far 
>>> away
>>> it looks exactly like sg_alloc_table(), excepting it "works around" the 
>>> nents >
>>> 0 protection of __sg_alloc_table().
>>> What is exactly the need for this one, and if it's usefull why not simply
>>> changing the __sg_alloc_table() "nents > 0" test and see what the outcome 
>>> of the
>>> review will be ?
>>
>> Like I said above. If we want to copy some mapped scatterlists into
>> one sg table, we should set the 'nents' to 0 to indicates how many
>> scatterlists coppied in the sg table.
> So how do you unmap this scatterlist once you've lost the number of mapped
> entries ?

I will not lost the number of mapped in sg table,  'nents' member
means how many mapped scatterlists in the dynamic table. Every time
one mapped sg copied into the dynamic table, the 'nents' will increase
1.

>
> I think we'll come back to this once I understand how a "dynamic" sg_table is
> different from a "static" one.
>
> Cheers.
>
> --
> Robert



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] scatterlist: Introduce some helper functions

2016-03-03 Thread Baolin Wang

Hi Robert,

On 4 March 2016 at 03:15, Robert Jarzmik <robert.jarz...@free.fr> wrote:
> Baolin Wang <baolin.w...@linaro.org> writes:
>
>> @@ -212,6 +212,37 @@ static inline void sg_unmark_end(struct scatterlist *sg)
>>  }
>>
>>  /**
>> + * sg_is_contiguous - Check if the scatterlists are contiguous
>> + * @sga: SG entry
>> + * @sgb: SG entry
>> + *
>> + * Description:
>> + *   If the sga scatterlist is contiguous with the sgb scatterlist,
>> + *   that means they can be merged together.
>> + *
>> + **/
>> +static inline bool sg_is_contiguous(struct scatterlist *sga,
>> + struct scatterlist *sgb)
>> +{
>> + return ((sga->page_link & ~0x3UL) + sga->offset + sga->length ==
>> + (sgb->page_link & ~0x3UL));
>> +}
> I don't understand that one.
> sga->page_link is a pointer to a "struct page *". How can it be added to an
> offset within a page ???


Ah, sorry that's a mistake. It should check as below:
static inline bool sg_is_contiguous(struct scatterlist *sga, struct
scatterlist *sgb)
{
return (unsigned int)sg_virt(sga) + sga->length == (unsigned
int)sg_virt(sgb);
}

>
>> @@ -370,6 +370,65 @@ int sg_alloc_table(struct sg_table *table, unsigned int
>> nents, gfp_t gfp_mask)
> ...
>>  /**
>> + * sg_add_sg_to_table - Add one scatterlist into sg table
>> + * @sgt: The sg table header to use
>> + * @src: The sg need to be added into sg table
>> + *
>> + * Description:
>> + *   The 'nents' member indicates how many scatterlists added in the sg 
>> table.
>> + *   Copy the @src@ scatterlist into sg table and increase 'nents' member.
>> + *
>> + **/
>> +int sg_add_sg_to_table(struct sg_table *sgt, struct scatterlist *src)
>> +{
>> + unsigned int i = 0, orig_nents = sgt->orig_nents;
>> + struct scatterlist *sgl = sgt->sgl;
>> + struct scatterlist *sg;
>> +
>> + /* Check if there are enough space for the new sg to be added */
>> + if (sgt->nents >= sgt->orig_nents)
>> + return -EINVAL;
> I must admit I don't understand that one either : how do comparing the number 
> of
> "mapped" entries against the number of "allocated" entries determines if there
> is enough room ?

That's for a dynamic sg table. If there is one sg table allocated
'orig_nents' scatterlists, and we need copy another mapped scatterlist
into the sg table if there are some requirements. So we use 'nents' to
record how many scatterlists have been copied into the sg table.

>
>> +/**
>> + * sg_alloc_empty_table - Allocate one empty sg table
>> + * @sgt: The sg table header to use
>> + * @nents:   Number of entries in sg list
>> + * @gfp_mask:GFP allocation mask
>> + *
>> + *  Description:
>> + *Allocate and initialize an sg table. The 'nents' member of sg_table
>> + *indicates how many scatterlists added in the sg table. It should set
>> + *0 which means there are no scatterlists added in this sg table now.
>> + *
>> + **/
>> +int sg_alloc_empty_table(struct sg_table *sgt, unsigned int nents,
>> +  gfp_t gfp_mask)
> As for this one, there has to be a purpose for it I fail to see. From far away
> it looks exactly like sg_alloc_table(), excepting it "works around" the nents 
> >
> 0 protection of __sg_alloc_table().
> What is exactly the need for this one, and if it's usefull why not simply
> changing the __sg_alloc_table() "nents > 0" test and see what the outcome of 
> the
> review will be ?

Like I said above. If we want to copy some mapped scatterlists into
one sg table, we should set the 'nents' to 0 to indicates how many
scatterlists coppied in the sg table.
Thanks for your comments.

>
> Cheers.
>
> --
> Robert



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] md: dm-crypt: Initialize the sector number for one request

2016-03-02 Thread Baolin Wang

If the crypto engine can support the bulk mode, that means the contiguous
requests from one block can be merged into one request to be handled by
crypto engine. If so, the crypto engine need the sector number of one request
to do merging action.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 drivers/md/dm-crypt.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3147c8d..9e2dbfd 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -866,6 +866,7 @@ static int crypt_convert_block(struct crypt_config *cc,
return r;
}
 
+   req->sector = ctx->cc_sector;
ablkcipher_request_set_crypt(req, >sg_in, >sg_out,
 1 << SECTOR_SHIFT, iv);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] crypto: Introduce the bulk mode for crypto engine framework

2016-03-02 Thread Baolin Wang

Now some cipher hardware engines prefer to handle bulk block by merging
requests to increase the block size and thus increase the hardware engine
processing speed.

This patch introduces request bulk mode to help the crypto hardware drivers
improve in efficiency, and chooses the suitable mode (SECTOR_MODE) for
initializing aes engine.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 crypto/Kconfig|1 +
 crypto/crypto_engine.c|  122 +++--
 drivers/crypto/omap-aes.c |2 +-
 include/crypto/algapi.h   |   23 -
 4 files changed, 143 insertions(+), 5 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index c844227..6a2f9a6 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -229,6 +229,7 @@ config CRYPTO_GLUE_HELPER_X86
 
 config CRYPTO_ENGINE
tristate
+   select CRYPTO_ABLK_HELPER
 
 comment "Authenticated Encryption with Associated Data"
 
diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c
index a55c82d..0de5829 100644
--- a/crypto/crypto_engine.c
+++ b/crypto/crypto_engine.c
@@ -14,6 +14,7 @@
 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 #define CRYPTO_ENGINE_MAX_QLEN 10
@@ -84,6 +85,17 @@ static void crypto_pump_requests(struct crypto_engine 
*engine,
 
req = ablkcipher_request_cast(async_req);
 
+   /*
+* If the engine supports the bulk mode and the request is allocated the
+* sg table to expand scatterlists entries, then it need to point the
+* scatterlists from the sg table.
+*/
+   if (engine->mode == SECTOR_BULK_MODE && req->sgt_src.orig_nents &&
+   req->sgt_dst.orig_nents) {
+   req->src = req->sgt_src.sgl;
+   req->dst = req->sgt_dst.sgl;
+   }
+
engine->cur_req = req;
if (backlog)
backlog->complete(backlog, -EINPROGRESS);
@@ -137,9 +149,46 @@ static void crypto_pump_work(struct kthread_work *work)
 }
 
 /**
+ * crypto_merge_request_to_engine - try to merge one request into previous one
+ * @engine: the hardware engine
+ * @req: the request need to be merged
+ *
+ * If the crypto engine supports bulk mode, then try to merge the new request
+ * into the listed one from engine queue to handle them together.
+ *
+ * Return 0 on success and others are failed.
+ */
+static bool crypto_merge_request_to_engine(struct crypto_engine *engine,
+  struct ablkcipher_request *req)
+{
+   /*
+* The request is allocated from memory pool in dm-crypt, here need to
+* do initialization for sg table in case some random values.
+*/
+   req->sgt_src.orig_nents = 0;
+   req->sgt_dst.orig_nents = 0;
+
+   /*
+* If the hardware engine can not support the bulk mode encryption,
+* just return 1 means merging failed.
+*/
+   if (engine->mode != SECTOR_BULK_MODE)
+   return 1;
+
+   return ablk_try_merge(>queue, req);
+}
+
+/**
  * crypto_transfer_request - transfer the new request into the engine queue
  * @engine: the hardware engine
  * @req: the request need to be listed into the engine queue
+ *
+ * Firstly it will check if the new request can be merged into previous one
+ * if their secotr numbers are continuous, if not should list it into engine
+ * queue.
+ *
+ * If the new request can be merged into the previous request, then just
+ * finalize the new request.
  */
 int crypto_transfer_request(struct crypto_engine *engine,
struct ablkcipher_request *req, bool need_pump)
@@ -154,6 +203,26 @@ int crypto_transfer_request(struct crypto_engine *engine,
return -ESHUTDOWN;
}
 
+   /*
+* Here need to check if the request can be merged into previous
+* request. If the hardware engine can support encryption with
+* bulk block, we can merge the new request into previous request
+* if their secotr numbers are continuous, which can be handled
+* together by engine to improve the encryption efficiency.
+* Return -EINPROGRESS means it has been merged into previous request,
+* so just end up this request.
+*/
+   ret = crypto_merge_request_to_engine(engine, req);
+   if (!ret) {
+   spin_unlock_irqrestore(>queue_lock, flags);
+   crypto_finalize_request(engine, req, 0);
+   return -EINPROGRESS;
+   }
+
+   /*
+* If the request can not be merged into previous request, then list it
+* into the queue of engine, and will be handled by kthread worker.
+*/
ret = ablkcipher_enqueue_request(>queue, req);
 
if (!engine->busy && need_pump)
@@ -178,7 +247,8 @@ int crypto_transfer_request_to_engine(struct crypto_engine 
*engine,
 EXPORT_SYMBOL_GPL(crypto_transfer_re

[PATCH 2/4] crypto: Introduce some helper functions to help to merge requests

2016-03-02 Thread Baolin Wang

Usually the dm-crypt subsystem will send encryption/descryption requests to
the crypto layer one block at a time, making each request 512 bytes long,
which is a much smaller size for hardware engine, that means the hardware
engine can not play its best performance.

Now some cipher hardware engines prefer to handle bulk block rather than one
sector (512 bytes) created by dm-crypt, cause these cipher engines can handle
the intermediate values (IV) by themselves in one bulk block. This means we
can increase the size of the request by merging request rather than always 512
bytes and thus increase the hardware engine processing speed.

This patch introduces some helper functions to help to merge requests to improve
hardware engine efficiency.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 crypto/ablk_helper.c |  135 ++
 include/crypto/ablk_helper.h |3 +
 include/linux/crypto.h   |5 ++
 3 files changed, 143 insertions(+)

diff --git a/crypto/ablk_helper.c b/crypto/ablk_helper.c
index e1fcf53..3cf15cb 100644
--- a/crypto/ablk_helper.c
+++ b/crypto/ablk_helper.c
@@ -26,6 +26,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -34,6 +35,140 @@
 #include 
 #include 
 
+/**
+ * ablk_link_request_if_contigous - try to link one request into previous one
+ * if the page address is contiguous.
+ * @list_req: the request from queue list
+ * @req: the new request need to be merged
+ *
+ * If the listed request and new request's pages of the scatterlists are
+ * contiguous, then merge the scatterlists of new request into the listed one.
+ *
+ * Return true on success, others means failed.
+ */
+static bool ablk_link_request_if_contigous(struct ablkcipher_request *list_req,
+  struct ablkcipher_request *req)
+{
+   struct scatterlist *last_src_sg =
+   sg_last(list_req->sgt_src.sgl, list_req->sgt_src.nents);
+   struct scatterlist *last_dst_sg =
+   sg_last(list_req->sgt_dst.sgl, list_req->sgt_dst.nents);
+   struct scatterlist *req_src_sg = req->src;
+   struct scatterlist *req_dst_sg = req->dst;
+
+   if (!last_src_sg || !last_dst_sg)
+   return false;
+
+   /* Check if the src/dst scatterlists are contiguous */
+   if (!sg_is_contiguous(last_src_sg, req_src_sg) ||
+   !sg_is_contiguous(last_dst_sg, req_dst_sg))
+   return false;
+
+   /*
+* If the request can be merged into the listed request after the
+* checking, then expand the listed request scatterlists' length.
+*/
+   last_src_sg->length += req_src_sg->length;
+   last_dst_sg->length += req_dst_sg->length;
+   list_req->nbytes += req->nbytes;
+
+   return true;
+}
+
+/**
+ * ablk_merge_request - try to merge one request into previous one
+ * @list_req: the request from queue list
+ * @req: the request need to be merged
+ *
+ * This function will create a dynamic scatterlist table for both source
+ * and destination if the request is the first coming in.
+ *
+ * Return true on success, others means failed.
+ */
+static bool ablk_merge_request(struct ablkcipher_request *list_req,
+  struct ablkcipher_request *req)
+{
+   struct sg_table *sgt_src = _req->sgt_src;
+   struct sg_table *sgt_dst = _req->sgt_dst;
+   unsigned int nents = SG_MAX_SINGLE_ALLOC;
+
+   if (sg_table_is_empty(sgt_src)) {
+   if (sg_alloc_empty_table(sgt_src, nents, GFP_ATOMIC))
+   return false;
+
+   if (sg_add_sg_to_table(sgt_src, list_req->src))
+   return false;
+   }
+
+   if (sg_table_is_empty(sgt_dst)) {
+   if (sg_alloc_empty_table(sgt_dst, nents, GFP_ATOMIC))
+   return false;
+
+   if (sg_add_sg_to_table(sgt_dst, list_req->dst))
+   return false;
+   }
+
+   /*
+* Check if the new request is contiguous for the listed request,
+* if it is contiguous then merge the new request into the listed one.
+*/
+   if (ablk_link_request_if_contigous(list_req, req))
+   return true;
+
+   if (sg_add_sg_to_table(sgt_src, req->src))
+   return false;
+
+   if (sg_add_sg_to_table(sgt_dst, req->dst))
+   return false;
+
+   list_req->nbytes += req->nbytes;
+   return true;
+}
+
+/**
+ * ablk_try_merge - try to merge one request into previous one
+ * @queue: the crypto queue list
+ * @req: the request need to be merged
+ *
+ * Note: The merging action should be under the spinlock or mutex protection.
+ *
+ * Return 0 on success and others are failed.
+ */
+int ablk_try_merge(struct crypto_queue *queue,
+  struct ablkcipher_request *req)
+{
+   struct ablkcipher_request *list_req;
+   str

[PATCH 1/4] scatterlist: Introduce some helper functions

2016-03-02 Thread Baolin Wang

In crypto engine framework, one request can combine other requests'
scatterlists into its sg table to improve engine efficency with
handling bulk block. Thus we need some helper functions to manage
dynamic scattertables.

This patch introduces 'sg_is_contiguous()' function to check if two
scatterlists are contiguous, 'sg_alloc_empty_table()' function to
allocate one empty sg table, 'sg_add_sg_to_table()' function to add
one scatterlist into sg table and 'sg_table_is_empty' function to
check if the sg table is empty.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 include/linux/scatterlist.h |   33 
 lib/scatterlist.c   |   59 +++
 2 files changed, 92 insertions(+)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 556ec1e..ae9aee6 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -212,6 +212,37 @@ static inline void sg_unmark_end(struct scatterlist *sg)
 }
 
 /**
+ * sg_is_contiguous - Check if the scatterlists are contiguous
+ * @sga: SG entry
+ * @sgb: SG entry
+ *
+ * Description:
+ *   If the sga scatterlist is contiguous with the sgb scatterlist,
+ *   that means they can be merged together.
+ *
+ **/
+static inline bool sg_is_contiguous(struct scatterlist *sga,
+   struct scatterlist *sgb)
+{
+   return ((sga->page_link & ~0x3UL) + sga->offset + sga->length ==
+   (sgb->page_link & ~0x3UL));
+}
+
+/**
+ * sg_table_is_empty - Check if the sg table is empty
+ * @sgt: sg table
+ *
+ * Description:
+ *   The 'orig_nents' member of one sg table is used to indicate how many
+ *   scatterlists in the sg table.
+ *
+ **/
+static inline bool sg_table_is_empty(struct sg_table *sgt)
+{
+   return !sgt->orig_nents;
+}
+
+/**
  * sg_phys - Return physical address of an sg entry
  * @sg: SG entry
  *
@@ -261,6 +292,8 @@ void sg_free_table(struct sg_table *);
 int __sg_alloc_table(struct sg_table *, unsigned int, unsigned int,
 struct scatterlist *, gfp_t, sg_alloc_fn *);
 int sg_alloc_table(struct sg_table *, unsigned int, gfp_t);
+int sg_alloc_empty_table(struct sg_table *, unsigned int, gfp_t);
+int sg_add_sg_to_table(struct sg_table *, struct scatterlist *);
 int sg_alloc_table_from_pages(struct sg_table *sgt,
struct page **pages, unsigned int n_pages,
unsigned long offset, unsigned long size,
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 004fc70..e9e5d5a 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -370,6 +370,65 @@ int sg_alloc_table(struct sg_table *table, unsigned int 
nents, gfp_t gfp_mask)
 EXPORT_SYMBOL(sg_alloc_table);
 
 /**
+ * sg_add_sg_to_table - Add one scatterlist into sg table
+ * @sgt:   The sg table header to use
+ * @src:   The sg need to be added into sg table
+ *
+ * Description:
+ *   The 'nents' member indicates how many scatterlists added in the sg table.
+ *   Copy the @src@ scatterlist into sg table and increase 'nents' member.
+ *
+ **/
+int sg_add_sg_to_table(struct sg_table *sgt, struct scatterlist *src)
+{
+   unsigned int i = 0, orig_nents = sgt->orig_nents;
+   struct scatterlist *sgl = sgt->sgl;
+   struct scatterlist *sg;
+
+   /* Check if there are enough space for the new sg to be added */
+   if (sgt->nents >= sgt->orig_nents)
+   return -EINVAL;
+
+   for_each_sg(sgl, sg, orig_nents, i) {
+   if (sgt->nents > 0 && i == (sgt->nents - 1)) {
+   sg_unmark_end(sg);
+   } else if (i == sgt->nents) {
+   memcpy(sg, src, sizeof(struct scatterlist));
+   sg_mark_end(sg);
+   sgt->nents++;
+   break;
+   }
+   }
+
+   return 0;
+}
+
+/**
+ * sg_alloc_empty_table - Allocate one empty sg table
+ * @sgt:   The sg table header to use
+ * @nents: Number of entries in sg list
+ * @gfp_mask:  GFP allocation mask
+ *
+ *  Description:
+ *Allocate and initialize an sg table. The 'nents' member of sg_table
+ *indicates how many scatterlists added in the sg table. It should set
+ *0 which means there are no scatterlists added in this sg table now.
+ *
+ **/
+int sg_alloc_empty_table(struct sg_table *sgt, unsigned int nents,
+gfp_t gfp_mask)
+{
+   int ret;
+
+   ret = sg_alloc_table(sgt, nents, gfp_mask);
+   if (ret)
+   return ret;
+
+   sgt->nents = 0;
+   return 0;
+}
+
+/**
  * sg_alloc_table_from_pages - Allocate and initialize an sg table from
  *an array of pages
  * @sgt:   The sg table header to use
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Introduce the cypto engine framework

2016-02-01 Thread Baolin Wang

On 1 February 2016 at 22:33, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Tue, Jan 26, 2016 at 08:25:37PM +0800, Baolin Wang wrote:
>> Now block cipher engines need to implement and maintain their own 
>> queue/thread
>> for processing requests, moreover currently helpers provided for only the 
>> queue
>> itself (in crypto_enqueue_request() and crypto_dequeue_request()) but they
>> don't help with the mechanics of driving the hardware (things like running 
>> the
>> request immediately, DMA map it or providing a thread to process the queue 
>> in)
>> even though a lot of that code really shouldn't vary that much from device to
>> device.
>>
>> This patch introduces the crypto engine framework to help the crypto hardware
>> drivers to queue requests.
>
> Very nice, all applied.  Thanks!

Thanks a lot!

> --
> Email: Herbert Xu <herb...@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt



-- 
Baolin.wang
Best Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] crypto: Introduce the block request crypto engine framework

2016-01-26 Thread Baolin Wang

Now block cipher engines need to implement and maintain their own queue/thread
for processing requests, moreover currently helpers provided for only the queue
itself (in crypto_enqueue_request() and crypto_dequeue_request()) but they
don't help with the mechanics of driving the hardware (things like running the
request immediately, DMA map it or providing a thread to process the queue in)
even though a lot of that code really shouldn't vary that much from device to
device.

Thus this patch provides a mechanism for pushing requests to the hardware
as it becomes free that drivers could use. And this framework is patterned
on the SPI code and has worked out well there.
(https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
 drivers/spi/spi.c?id=ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0)

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 crypto/Kconfig  |3 +
 crypto/Makefile |1 +
 crypto/crypto_engine.c  |  355 +++
 include/crypto/algapi.h |   70 ++
 4 files changed, 429 insertions(+)
 create mode 100644 crypto/crypto_engine.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 7240821..90df301 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -227,6 +227,9 @@ config CRYPTO_GLUE_HELPER_X86
depends on X86
select CRYPTO_ALGAPI
 
+config CRYPTO_ENGINE
+   tristate
+
 comment "Authenticated Encryption with Associated Data"
 
 config CRYPTO_CCM
diff --git a/crypto/Makefile b/crypto/Makefile
index f7aba92..5490d21 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -7,6 +7,7 @@ crypto-y := api.o cipher.o compress.o memneq.o
 
 obj-$(CONFIG_CRYPTO_WORKQUEUE) += crypto_wq.o
 
+obj-$(CONFIG_CRYPTO_ENGINE) += crypto_engine.o
 obj-$(CONFIG_CRYPTO_FIPS) += fips.o
 
 crypto_algapi-$(CONFIG_PROC_FS) += proc.o
diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c
new file mode 100644
index 000..a55c82d
--- /dev/null
+++ b/crypto/crypto_engine.c
@@ -0,0 +1,355 @@
+/*
+ * Handle async block request by crypto hardware engine.
+ *
+ * Copyright (C) 2016 Linaro, Inc.
+ *
+ * Author: Baolin Wang <baolin.w...@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include 
+#include 
+#include "internal.h"
+
+#define CRYPTO_ENGINE_MAX_QLEN 10
+
+void crypto_finalize_request(struct crypto_engine *engine,
+struct ablkcipher_request *req, int err);
+
+/**
+ * crypto_pump_requests - dequeue one request from engine queue to process
+ * @engine: the hardware engine
+ * @in_kthread: true if we are in the context of the request pump thread
+ *
+ * This function checks if there is any request in the engine queue that
+ * needs processing and if so call out to the driver to initialize hardware
+ * and handle each request.
+ */
+static void crypto_pump_requests(struct crypto_engine *engine,
+bool in_kthread)
+{
+   struct crypto_async_request *async_req, *backlog;
+   struct ablkcipher_request *req;
+   unsigned long flags;
+   bool was_busy = false;
+   int ret;
+
+   spin_lock_irqsave(>queue_lock, flags);
+
+   /* Make sure we are not already running a request */
+   if (engine->cur_req)
+   goto out;
+
+   /* If another context is idling then defer */
+   if (engine->idling) {
+   queue_kthread_work(>kworker, >pump_requests);
+   goto out;
+   }
+
+   /* Check if the engine queue is idle */
+   if (!crypto_queue_len(>queue) || !engine->running) {
+   if (!engine->busy)
+   goto out;
+
+   /* Only do teardown in the thread */
+   if (!in_kthread) {
+   queue_kthread_work(>kworker,
+  >pump_requests);
+   goto out;
+   }
+
+   engine->busy = false;
+   engine->idling = true;
+   spin_unlock_irqrestore(>queue_lock, flags);
+
+   if (engine->unprepare_crypt_hardware &&
+   engine->unprepare_crypt_hardware(engine))
+   pr_err("failed to unprepare crypt hardware\n");
+
+   spin_lock_irqsave(>queue_lock, flags);
+   engine->idling = false;
+   goto out;
+   }
+
+   /* Get the fist request from the engine queue to handle */
+   backlog = crypto_get_backlog(>queue);
+   async_req = crypto_dequeue_request(>queue);
+   if (!async_req)
+   goto out;
+
+   req = ablkcipher_request_cast(async_req);
+
+   engine->cur_req =

[PATCH 3/3] crypto: omap-aes: Support crypto engine framework

2016-01-26 Thread Baolin Wang

Integrate with the newly added crypto engine to make the crypto hardware
engine underutilized as each block needs to be processed before the crypto
hardware can start working on the next block.

The requests from dm-crypt will be listed into engine queue and processed
by engine automatically, so remove the 'queue' and 'queue_task' things in
omap aes driver.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 drivers/crypto/Kconfig|1 +
 drivers/crypto/omap-aes.c |   97 -
 2 files changed, 45 insertions(+), 53 deletions(-)

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 2569e04..4945311 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -293,6 +293,7 @@ config CRYPTO_DEV_OMAP_AES
depends on ARCH_OMAP2 || ARCH_OMAP3 || ARCH_OMAP2PLUS
select CRYPTO_AES
select CRYPTO_BLKCIPHER
+   select CRYPTO_ENGINE
help
  OMAP processors have AES module accelerator. Select this if you
  want to use the OMAP module for AES algorithms.
diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c
index eba2314..4ee3acc 100644
--- a/drivers/crypto/omap-aes.c
+++ b/drivers/crypto/omap-aes.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DST_MAXBURST   4
 #define DMA_MIN(DST_MAXBURST * sizeof(u32))
@@ -152,13 +153,10 @@ struct omap_aes_dev {
unsigned long   flags;
int err;
 
-   spinlock_t  lock;
-   struct crypto_queue queue;
-
struct tasklet_struct   done_task;
-   struct tasklet_struct   queue_task;
 
struct ablkcipher_request   *req;
+   struct crypto_engine*engine;
 
/*
 * total is used by PIO mode for book keeping so introduce
@@ -532,9 +530,7 @@ static void omap_aes_finish_req(struct omap_aes_dev *dd, 
int err)
 
pr_debug("err: %d\n", err);
 
-   dd->flags &= ~FLAGS_BUSY;
-
-   req->base.complete(>base, err);
+   crypto_finalize_request(dd->engine, req, err);
 }
 
 static int omap_aes_crypt_dma_stop(struct omap_aes_dev *dd)
@@ -606,34 +602,25 @@ static int omap_aes_copy_sgs(struct omap_aes_dev *dd)
 }
 
 static int omap_aes_handle_queue(struct omap_aes_dev *dd,
-  struct ablkcipher_request *req)
+struct ablkcipher_request *req)
 {
-   struct crypto_async_request *async_req, *backlog;
-   struct omap_aes_ctx *ctx;
-   struct omap_aes_reqctx *rctx;
-   unsigned long flags;
-   int err, ret = 0, len;
-
-   spin_lock_irqsave(>lock, flags);
if (req)
-   ret = ablkcipher_enqueue_request(>queue, req);
-   if (dd->flags & FLAGS_BUSY) {
-   spin_unlock_irqrestore(>lock, flags);
-   return ret;
-   }
-   backlog = crypto_get_backlog(>queue);
-   async_req = crypto_dequeue_request(>queue);
-   if (async_req)
-   dd->flags |= FLAGS_BUSY;
-   spin_unlock_irqrestore(>lock, flags);
+   return crypto_transfer_request_to_engine(dd->engine, req);
 
-   if (!async_req)
-   return ret;
+   return 0;
+}
 
-   if (backlog)
-   backlog->complete(backlog, -EINPROGRESS);
+static int omap_aes_prepare_req(struct crypto_engine *engine,
+   struct ablkcipher_request *req)
+{
+   struct omap_aes_ctx *ctx = crypto_ablkcipher_ctx(
+   crypto_ablkcipher_reqtfm(req));
+   struct omap_aes_dev *dd = omap_aes_find_dev(ctx);
+   struct omap_aes_reqctx *rctx;
+   int len;
 
-   req = ablkcipher_request_cast(async_req);
+   if (!dd)
+   return -ENODEV;
 
/* assign new request to device */
dd->req = req;
@@ -664,16 +651,20 @@ static int omap_aes_handle_queue(struct omap_aes_dev *dd,
dd->ctx = ctx;
ctx->dd = dd;
 
-   err = omap_aes_write_ctrl(dd);
-   if (!err)
-   err = omap_aes_crypt_dma_start(dd);
-   if (err) {
-   /* aes_task will not finish it, so do it here */
-   omap_aes_finish_req(dd, err);
-   tasklet_schedule(>queue_task);
-   }
+   return omap_aes_write_ctrl(dd);
+}
 
-   return ret; /* return ret, which is enqueue return value */
+static int omap_aes_crypt_req(struct crypto_engine *engine,
+ struct ablkcipher_request *req)
+{
+   struct omap_aes_ctx *ctx = crypto_ablkcipher_ctx(
+   crypto_ablkcipher_reqtfm(req));
+   struct omap_aes_dev *dd = omap_aes_find_dev(ctx);
+
+   if (!dd)
+   return -ENODEV;
+
+   return omap_aes_crypt_dma_start(dd);
 }
 
 static void omap_aes_done_task(unsigned long data)
@@ -706,18 +697,10 @@ static void omap_a

[PATCH 1/3] crypto: Introduce crypto_queue_len() helper function

2016-01-26 Thread Baolin Wang

This patch introduces crypto_queue_len() helper function to help to get the
queue length in the crypto queue list now.

Signed-off-by: Baolin Wang <baolin.w...@linaro.org>
---
 include/crypto/algapi.h |4 
 1 file changed, 4 insertions(+)

diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h
index c9fe145..4f861c4 100644
--- a/include/crypto/algapi.h
+++ b/include/crypto/algapi.h
@@ -184,6 +184,10 @@ int crypto_enqueue_request(struct crypto_queue *queue,
   struct crypto_async_request *request);
 struct crypto_async_request *crypto_dequeue_request(struct crypto_queue 
*queue);
 int crypto_tfm_in_queue(struct crypto_queue *queue, struct crypto_tfm *tfm);
+static inline unsigned int crypto_queue_len(struct crypto_queue *queue)
+{
+   return queue->qlen;
+}
 
 /* These functions require the input/output to be aligned as u32. */
 void crypto_inc(u8 *a, unsigned int size);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

59 matches

Mail list logo