Re: [PATCH v3 3/4] dmaengine: Add Broadcom SBA RAID driver

2017-02-10 Thread Dan Williams
On Fri, Feb 10, 2017 at 1:07 AM, Anup Patel  wrote:
> The Broadcom stream buffer accelerator (SBA) provides offloading
> capabilities for RAID operations. This SBA offload engine is
> accessible via Broadcom SoC specific ring manager.
>
> This patch adds Broadcom SBA RAID driver which provides one
> DMA device with RAID capabilities using one or more Broadcom
> SoC specific ring manager channels. The SBA RAID driver in its
> current shape implements memcpy, xor, and pq operations.
>
> Signed-off-by: Anup Patel 
> Reviewed-by: Ray Jui 
> ---
>  drivers/dma/Kconfig|   13 +
>  drivers/dma/Makefile   |1 +
>  drivers/dma/bcm-sba-raid.c | 1711 
> 
>  3 files changed, 1725 insertions(+)
>  create mode 100644 drivers/dma/bcm-sba-raid.c
>
> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
> index 263495d..bf8fb84 100644
> --- a/drivers/dma/Kconfig
> +++ b/drivers/dma/Kconfig
> @@ -99,6 +99,19 @@ config AXI_DMAC
>   controller is often used in Analog Device's reference designs for 
> FPGA
>   platforms.
>
> +config BCM_SBA_RAID
> +   tristate "Broadcom SBA RAID engine support"
> +   depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST
> +   select DMA_ENGINE
> +   select DMA_ENGINE_RAID
> +   select ASYNC_TX_ENABLE_CHANNEL_SWITCH

ASYNC_TX_ENABLE_CHANNEL_SWITCH violates the DMA mapping API and
Russell has warned it's especially problematic on ARM [1].  If you
need channel switching for this offload engine to be useful then you
need to move DMA mapping and channel switching responsibilities to MD
itself.

[1]: 
http://lists.infradead.org/pipermail/linux-arm-kernel/2011-January/036753.html


[..]
> diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c
> new file mode 100644
> index 000..bab9918
> --- /dev/null
> +++ b/drivers/dma/bcm-sba-raid.c
> @@ -0,0 +1,1711 @@
> +/*
> + * Copyright (C) 2017 Broadcom
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +/*
> + * Broadcom SBA RAID Driver
> + *
> + * The Broadcom stream buffer accelerator (SBA) provides offloading
> + * capabilities for RAID operations. The SBA offload engine is accessible
> + * via Broadcom SoC specific ring manager. Two or more offload engines
> + * can share same Broadcom SoC specific ring manager due to this Broadcom
> + * SoC specific ring manager driver is implemented as a mailbox controller
> + * driver and offload engine drivers are implemented as mallbox clients.
> + *
> + * Typically, Broadcom SoC specific ring manager will implement larger
> + * number of hardware rings over one or more SBA hardware devices. By
> + * design, the internal buffer size of SBA hardware device is limited
> + * but all offload operations supported by SBA can be broken down into
> + * multiple small size requests and executed parallely on multiple SBA
> + * hardware devices for achieving high through-put.
> + *
> + * The Broadcom SBA RAID driver does not require any register programming
> + * except submitting request to SBA hardware device via mailbox channels.
> + * This driver implements a DMA device with one DMA channel using a set
> + * of mailbox channels provided by Broadcom SoC specific ring manager
> + * driver. To exploit parallelism (as described above), all DMA request
> + * coming to SBA RAID DMA channel are broken down to smaller requests
> + * and submitted to multiple mailbox channels in round-robin fashion.
> + * For having more SBA DMA channels, we can create more SBA device nodes
> + * in Broadcom SoC specific DTS based on number of hardware rings supported
> + * by Broadcom SoC ring manager.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "dmaengine.h"
> +
> +/* SBA command helper macros */
> +#define SBA_DEC(_d, _s, _m)(((_d) >> (_s)) & (_m))
> +#define SBA_ENC(_d, _v, _s, _m)\
> +   do {\
> +   (_d) &= ~((u64)(_m) << (_s));   \
> +   (_d) |= (((u64)(_v) & (_m)) << (_s));   \
> +   } while (0)

Reusing a macro argument multiple times is problematic, consider
SBA_ENC(..., arg++, ...), and hiding assignments in a macro make this
hard to read. The compiler should inline it properly if you just make
this a function that returns a value. You could also mark it __pure.

[..]
> +
> +static struct sba_request *sba_alloc_request(struct sba_device *sba)
> +{
> +   unsigned long flags;
> +   struct sba_request *req = NULL;
> +
> +   spin_lock_irqsave(&sba->reqs_lock, flags);
> +
> +   if (!list_empty(&sba->reqs_free_list)) {
> +   req = list_first_entry(&sba->reqs_free_l

Re: [PATCH v5] KEYS: add SP800-56A KDF support for DH

2017-02-10 Thread Stephan Müller
Am Freitag, 19. August 2016, 20:39:09 CET schrieb Stephan Mueller:

Hi David,

> Hi,
> 
> This patch now folds the KDF into the keys support as requested by
> Herbert. The caller can only supply the hash name used for the KDF.
> 
> Note, the KDF implementation is identical to the kdf_ctr() support in
> the now unneeded KDF patches to the kernel crypto API.
> 
> The new patch also changes the variable name from kdfname to hashname.
> 
> Also, the patch adds a missing semicolon.
> 
> Finally, the patch adds a guard against compiling the compat code
> if the general Linux kernel configuration does not have the compat
> code enabled. Without that guard, compilation warnings are seen.

May I ask which plans do you have with the KDF support for DH?

Ciao
Stephan


[PATCH 03/12] crypto: caam - fix JR IO mapping if one fails

2017-02-10 Thread Horia Geantă
From: Tudor Ambarus 

If one of the JRs failed at init, the next JR used
the failed JR's IO space. The patch fixes this bug.

Signed-off-by: Tudor Ambarus 
Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/ctrl.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c
index f825e3765a4b..579f8263c479 100644
--- a/drivers/crypto/caam/ctrl.c
+++ b/drivers/crypto/caam/ctrl.c
@@ -308,10 +308,8 @@ static int caam_remove(struct platform_device *pdev)
ctrl = (struct caam_ctrl __iomem *)ctrlpriv->ctrl;
 
/* Remove platform devices for JobRs */
-   for (ring = 0; ring < ctrlpriv->total_jobrs; ring++) {
-   if (ctrlpriv->jrpdev[ring])
-   of_device_unregister(ctrlpriv->jrpdev[ring]);
-   }
+   for (ring = 0; ring < ctrlpriv->total_jobrs; ring++)
+   of_device_unregister(ctrlpriv->jrpdev[ring]);
 
/* De-initialize RNG state handles initialized by this driver. */
if (ctrlpriv->rng4_sh_init)
@@ -423,7 +421,7 @@ DEFINE_SIMPLE_ATTRIBUTE(caam_fops_u64_ro, 
caam_debugfs_u64_get, NULL, "%llu\n");
 /* Probe routine for CAAM top (controller) level */
 static int caam_probe(struct platform_device *pdev)
 {
-   int ret, ring, rspec, gen_sk, ent_delay = RTSDCTL_ENT_DLY_MIN;
+   int ret, ring, ridx, rspec, gen_sk, ent_delay = RTSDCTL_ENT_DLY_MIN;
u64 caam_id;
struct device *dev;
struct device_node *nprop, *np;
@@ -618,6 +616,7 @@ static int caam_probe(struct platform_device *pdev)
}
 
ring = 0;
+   ridx = 0;
ctrlpriv->total_jobrs = 0;
for_each_available_child_of_node(nprop, np)
if (of_device_is_compatible(np, "fsl,sec-v4.0-job-ring") ||
@@ -625,17 +624,19 @@ static int caam_probe(struct platform_device *pdev)
ctrlpriv->jrpdev[ring] =
of_platform_device_create(np, NULL, dev);
if (!ctrlpriv->jrpdev[ring]) {
-   pr_warn("JR%d Platform device creation error\n",
-   ring);
+   pr_warn("JR physical index %d: Platform device 
creation error\n",
+   ridx);
+   ridx++;
continue;
}
ctrlpriv->jr[ring] = (struct caam_job_ring __iomem 
__force *)
 ((__force uint8_t *)ctrl +
-(ring + JR_BLOCK_NUMBER) *
+(ridx + JR_BLOCK_NUMBER) *
  BLOCK_OFFSET
 );
ctrlpriv->total_jobrs++;
ring++;
+   ridx++;
}
 
/* Check to see if QI present. If so, enable */
-- 
2.4.4



[PATCH 02/12] crypto: caam - check return code of dma_set_mask_and_coherent()

2017-02-10 Thread Horia Geantă
Setting the dma mask could fail, thus make sure it succeeds
before going further.

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/ctrl.c | 15 ++-
 drivers/crypto/caam/jr.c   | 19 ++-
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c
index 8957ec952212..f825e3765a4b 100644
--- a/drivers/crypto/caam/ctrl.c
+++ b/drivers/crypto/caam/ctrl.c
@@ -586,13 +586,18 @@ static int caam_probe(struct platform_device *pdev)
  JRSTART_JR1_START | JRSTART_JR2_START |
  JRSTART_JR3_START);
 
-   if (sizeof(dma_addr_t) == sizeof(u64))
+   if (sizeof(dma_addr_t) == sizeof(u64)) {
if (of_device_is_compatible(nprop, "fsl,sec-v5.0"))
-   dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
+   ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
else
-   dma_set_mask_and_coherent(dev, DMA_BIT_MASK(36));
-   else
-   dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
+   ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(36));
+   } else {
+   ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
+   }
+   if (ret) {
+   dev_err(dev, "dma_set_mask_and_coherent failed (%d)\n", ret);
+   goto iounmap_ctrl;
+   }
 
/*
 * Detect and enable JobRs
diff --git a/drivers/crypto/caam/jr.c b/drivers/crypto/caam/jr.c
index c8604dfadbf5..27631000b9f8 100644
--- a/drivers/crypto/caam/jr.c
+++ b/drivers/crypto/caam/jr.c
@@ -498,13 +498,22 @@ static int caam_jr_probe(struct platform_device *pdev)
 
jrpriv->rregs = (struct caam_job_ring __iomem __force *)ctrl;
 
-   if (sizeof(dma_addr_t) == sizeof(u64))
+   if (sizeof(dma_addr_t) == sizeof(u64)) {
if (of_device_is_compatible(nprop, "fsl,sec-v5.0-job-ring"))
-   dma_set_mask_and_coherent(jrdev, DMA_BIT_MASK(40));
+   error = dma_set_mask_and_coherent(jrdev,
+ DMA_BIT_MASK(40));
else
-   dma_set_mask_and_coherent(jrdev, DMA_BIT_MASK(36));
-   else
-   dma_set_mask_and_coherent(jrdev, DMA_BIT_MASK(32));
+   error = dma_set_mask_and_coherent(jrdev,
+ DMA_BIT_MASK(36));
+   } else {
+   error = dma_set_mask_and_coherent(jrdev, DMA_BIT_MASK(32));
+   }
+   if (error) {
+   dev_err(jrdev, "dma_set_mask_and_coherent failed (%d)\n",
+   error);
+   iounmap(ctrl);
+   return error;
+   }
 
/* Identify the interrupt */
jrpriv->irq = irq_of_parse_and_map(nprop, 0);
-- 
2.4.4



[PATCH 10/12] crypto: caam - fix error path for ctx_dma mapping failure

2017-02-10 Thread Horia Geantă
In case ctx_dma dma mapping fails, ahash_unmap_ctx() tries to
dma unmap an invalid address:
map_seq_out_ptr_ctx() / ctx_map_to_sec4_sg() -> goto unmap_ctx ->
-> ahash_unmap_ctx() -> dma unmap ctx_dma

There is also possible to reach ahash_unmap_ctx() with ctx_dma
uninitialzed or to try to unmap the same address twice.

Fix these by setting ctx_dma = 0 where needed:
-initialize ctx_dma in ahash_init()
-clear ctx_dma in case of mapping error (instead of holding
the error code returned by the dma map function)
-clear ctx_dma after each unmapping

Fixes: 32686d34f8fb6 ("crypto: caam - ensure that we clean up after an error")
Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/caamhash.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 2ad83a8dc0fe..6c6c005f417b 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -148,6 +148,7 @@ static inline int map_seq_out_ptr_ctx(u32 *desc, struct 
device *jrdev,
ctx_len, DMA_FROM_DEVICE);
if (dma_mapping_error(jrdev, state->ctx_dma)) {
dev_err(jrdev, "unable to map ctx\n");
+   state->ctx_dma = 0;
return -ENOMEM;
}
 
@@ -208,6 +209,7 @@ static inline int ctx_map_to_sec4_sg(u32 *desc, struct 
device *jrdev,
state->ctx_dma = dma_map_single(jrdev, state->caam_ctx, ctx_len, flag);
if (dma_mapping_error(jrdev, state->ctx_dma)) {
dev_err(jrdev, "unable to map ctx\n");
+   state->ctx_dma = 0;
return -ENOMEM;
}
 
@@ -482,8 +484,10 @@ static inline void ahash_unmap_ctx(struct device *dev,
struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash);
struct caam_hash_state *state = ahash_request_ctx(req);
 
-   if (state->ctx_dma)
+   if (state->ctx_dma) {
dma_unmap_single(dev, state->ctx_dma, ctx->ctx_len, flag);
+   state->ctx_dma = 0;
+   }
ahash_unmap(dev, edesc, req, dst_len);
 }
 
@@ -1463,6 +1467,7 @@ static int ahash_init(struct ahash_request *req)
state->finup = ahash_finup_first;
state->final = ahash_final_no_ctx;
 
+   state->ctx_dma = 0;
state->current_buf = 0;
state->buf_dma = 0;
state->buflen_0 = 0;
-- 
2.4.4



[PATCH 11/12] crypto: caam - abstract ahash request double buffering

2017-02-10 Thread Horia Geantă
caamhash uses double buffering for holding previous/current
and next chunks (data smaller than block size) to be hashed.

Add (inline) functions to abstract this mechanism.

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/caamhash.c | 77 ++
 1 file changed, 48 insertions(+), 29 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 6c6c005f417b..b37d555a80d0 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -137,6 +137,31 @@ struct caam_export_state {
int (*finup)(struct ahash_request *req);
 };
 
+static inline void switch_buf(struct caam_hash_state *state)
+{
+   state->current_buf ^= 1;
+}
+
+static inline u8 *current_buf(struct caam_hash_state *state)
+{
+   return state->current_buf ? state->buf_1 : state->buf_0;
+}
+
+static inline u8 *alt_buf(struct caam_hash_state *state)
+{
+   return state->current_buf ? state->buf_0 : state->buf_1;
+}
+
+static inline int *current_buflen(struct caam_hash_state *state)
+{
+   return state->current_buf ? &state->buflen_1 : &state->buflen_0;
+}
+
+static inline int *alt_buflen(struct caam_hash_state *state)
+{
+   return state->current_buf ? &state->buflen_0 : &state->buflen_1;
+}
+
 /* Common job descriptor seq in/out ptr routines */
 
 /* Map state->caam_ctx, and append seq_out_ptr command that points to it */
@@ -695,11 +720,10 @@ static int ahash_update_ctx(struct ahash_request *req)
struct device *jrdev = ctx->jrdev;
gfp_t flags = (req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
   CRYPTO_TFM_REQ_MAY_SLEEP)) ? GFP_KERNEL : GFP_ATOMIC;
-   u8 *buf = state->current_buf ? state->buf_1 : state->buf_0;
-   int *buflen = state->current_buf ? &state->buflen_1 : &state->buflen_0;
-   u8 *next_buf = state->current_buf ? state->buf_0 : state->buf_1;
-   int *next_buflen = state->current_buf ? &state->buflen_0 :
-  &state->buflen_1, last_buflen;
+   u8 *buf = current_buf(state);
+   int *buflen = current_buflen(state);
+   u8 *next_buf = alt_buf(state);
+   int *next_buflen = alt_buflen(state), last_buflen;
int in_len = *buflen + req->nbytes, to_hash;
u32 *desc;
int src_nents, mapped_nents, sec4_sg_bytes, sec4_sg_src_index;
@@ -771,7 +795,7 @@ static int ahash_update_ctx(struct ahash_request *req)
cpu_to_caam32(SEC4_SG_LEN_FIN);
}
 
-   state->current_buf = !state->current_buf;
+   switch_buf(state);
 
desc = edesc->hw_desc;
 
@@ -829,10 +853,9 @@ static int ahash_final_ctx(struct ahash_request *req)
struct device *jrdev = ctx->jrdev;
gfp_t flags = (req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
   CRYPTO_TFM_REQ_MAY_SLEEP)) ? GFP_KERNEL : GFP_ATOMIC;
-   u8 *buf = state->current_buf ? state->buf_1 : state->buf_0;
-   int buflen = state->current_buf ? state->buflen_1 : state->buflen_0;
-   int last_buflen = state->current_buf ? state->buflen_0 :
- state->buflen_1;
+   u8 *buf = current_buf(state);
+   int buflen = *current_buflen(state);
+   int last_buflen = *alt_buflen(state);
u32 *desc;
int sec4_sg_bytes, sec4_sg_src_index;
int digestsize = crypto_ahash_digestsize(ahash);
@@ -908,10 +931,9 @@ static int ahash_finup_ctx(struct ahash_request *req)
struct device *jrdev = ctx->jrdev;
gfp_t flags = (req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
   CRYPTO_TFM_REQ_MAY_SLEEP)) ? GFP_KERNEL : GFP_ATOMIC;
-   u8 *buf = state->current_buf ? state->buf_1 : state->buf_0;
-   int buflen = state->current_buf ? state->buflen_1 : state->buflen_0;
-   int last_buflen = state->current_buf ? state->buflen_0 :
- state->buflen_1;
+   u8 *buf = current_buf(state);
+   int buflen = *current_buflen(state);
+   int last_buflen = *alt_buflen(state);
u32 *desc;
int sec4_sg_src_index;
int src_nents, mapped_nents;
@@ -1075,8 +1097,8 @@ static int ahash_final_no_ctx(struct ahash_request *req)
struct device *jrdev = ctx->jrdev;
gfp_t flags = (req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
   CRYPTO_TFM_REQ_MAY_SLEEP)) ? GFP_KERNEL : GFP_ATOMIC;
-   u8 *buf = state->current_buf ? state->buf_1 : state->buf_0;
-   int buflen = state->current_buf ? state->buflen_1 : state->buflen_0;
+   u8 *buf = current_buf(state);
+   int buflen = *current_buflen(state);
u32 *desc;
int digestsize = crypto_ahash_digestsize(ahash);
struct ahash_edesc *edesc;
@@ -1136,11 +1158,10 @@ static int ahash_update_no_ctx(struct ahash_request 
*req)
struct device *jrdev = ctx->jrdev;
gfp_t flags = (req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
   CRYPTO_T

[PATCH 01/12] crypto: caam - don't include unneeded headers

2017-02-10 Thread Horia Geantă
intern.h, jr.h are not needed in error.c
error.h is not needed in ctrl.c

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/ctrl.c  | 1 -
 drivers/crypto/caam/error.c | 2 --
 2 files changed, 3 deletions(-)

diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c
index 755109841cfd..8957ec952212 100644
--- a/drivers/crypto/caam/ctrl.c
+++ b/drivers/crypto/caam/ctrl.c
@@ -13,7 +13,6 @@
 #include "intern.h"
 #include "jr.h"
 #include "desc_constr.h"
-#include "error.h"
 #include "ctrl.h"
 
 bool caam_little_end;
diff --git a/drivers/crypto/caam/error.c b/drivers/crypto/caam/error.c
index 79a0cc70717f..6f44ccb55c63 100644
--- a/drivers/crypto/caam/error.c
+++ b/drivers/crypto/caam/error.c
@@ -6,9 +6,7 @@
 
 #include "compat.h"
 #include "regs.h"
-#include "intern.h"
 #include "desc.h"
-#include "jr.h"
 #include "error.h"
 
 static const struct {
-- 
2.4.4



[PATCH 06/12] crypto: caam - replace sg_count() with sg_nents_for_len()

2017-02-10 Thread Horia Geantă
Replace internal sg_count() function and the convoluted logic
around it with the standard sg_nents_for_len() function.
src_nents, dst_nents now hold the number of SW S/G entries,
instead of the HW S/G table entries.

With this change, null (zero length) input data for AEAD case
needs to be handled in a visible way. req->src is no longer
(un)mapped, pointer address is set to 0 in SEQ IN PTR command.

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/caamalg.c| 189 ++-
 drivers/crypto/caam/sg_sw_sec4.h |  11 ---
 2 files changed, 88 insertions(+), 112 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index ed8a04412767..14b7dc8d5dcb 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -887,8 +887,8 @@ static int xts_ablkcipher_setkey(struct crypto_ablkcipher 
*ablkcipher,
 
 /*
  * aead_edesc - s/w-extended aead descriptor
- * @src_nents: number of segments in input scatterlist
- * @dst_nents: number of segments in output scatterlist
+ * @src_nents: number of segments in input s/w scatterlist
+ * @dst_nents: number of segments in output s/w scatterlist
  * @sec4_sg_bytes: length of dma mapped sec4_sg space
  * @sec4_sg_dma: bus physical mapped address of h/w link table
  * @sec4_sg: pointer to h/w link table
@@ -905,8 +905,8 @@ struct aead_edesc {
 
 /*
  * ablkcipher_edesc - s/w-extended ablkcipher descriptor
- * @src_nents: number of segments in input scatterlist
- * @dst_nents: number of segments in output scatterlist
+ * @src_nents: number of segments in input s/w scatterlist
+ * @dst_nents: number of segments in output s/w scatterlist
  * @iv_dma: dma address of iv for checking continuity and link table
  * @sec4_sg_bytes: length of dma mapped sec4_sg space
  * @sec4_sg_dma: bus physical mapped address of h/w link table
@@ -930,10 +930,11 @@ static void caam_unmap(struct device *dev, struct 
scatterlist *src,
   int sec4_sg_bytes)
 {
if (dst != src) {
-   dma_unmap_sg(dev, src, src_nents ? : 1, DMA_TO_DEVICE);
-   dma_unmap_sg(dev, dst, dst_nents ? : 1, DMA_FROM_DEVICE);
+   if (src_nents)
+   dma_unmap_sg(dev, src, src_nents, DMA_TO_DEVICE);
+   dma_unmap_sg(dev, dst, dst_nents, DMA_FROM_DEVICE);
} else {
-   dma_unmap_sg(dev, src, src_nents ? : 1, DMA_BIDIRECTIONAL);
+   dma_unmap_sg(dev, src, src_nents, DMA_BIDIRECTIONAL);
}
 
if (iv_dma)
@@ -1102,7 +1103,7 @@ static void init_aead_job(struct aead_request *req,
init_job_desc_shared(desc, ptr, len, HDR_SHARE_DEFER | HDR_REVERSE);
 
if (all_contig) {
-   src_dma = sg_dma_address(req->src);
+   src_dma = edesc->src_nents ? sg_dma_address(req->src) : 0;
in_options = 0;
} else {
src_dma = edesc->sec4_sg_dma;
@@ -1117,7 +1118,7 @@ static void init_aead_job(struct aead_request *req,
out_options = in_options;
 
if (unlikely(req->src != req->dst)) {
-   if (!edesc->dst_nents) {
+   if (edesc->dst_nents == 1) {
dst_dma = sg_dma_address(req->dst);
} else {
dst_dma = edesc->sec4_sg_dma +
@@ -1227,10 +1228,11 @@ static void init_ablkcipher_job(u32 *sh_desc, 
dma_addr_t ptr,
print_hex_dump(KERN_ERR, "presciv@"__stringify(__LINE__)": ",
   DUMP_PREFIX_ADDRESS, 16, 4, req->info,
   ivsize, 1);
-   printk(KERN_ERR "asked=%d, nbytes%d\n", (int)edesc->src_nents ? 100 : 
req->nbytes, req->nbytes);
+   pr_err("asked=%d, nbytes%d\n",
+  (int)edesc->src_nents > 1 ? 100 : req->nbytes, req->nbytes);
dbg_dump_sg(KERN_ERR, "src@"__stringify(__LINE__)": ",
DUMP_PREFIX_ADDRESS, 16, 4, req->src,
-   edesc->src_nents ? 100 : req->nbytes, 1);
+   edesc->src_nents > 1 ? 100 : req->nbytes, 1);
 #endif
 
len = desc_len(sh_desc);
@@ -1247,7 +1249,7 @@ static void init_ablkcipher_job(u32 *sh_desc, dma_addr_t 
ptr,
append_seq_in_ptr(desc, src_dma, req->nbytes + ivsize, in_options);
 
if (likely(req->src == req->dst)) {
-   if (!edesc->src_nents && iv_contig) {
+   if (edesc->src_nents == 1 && iv_contig) {
dst_dma = sg_dma_address(req->src);
} else {
dst_dma = edesc->sec4_sg_dma +
@@ -1255,7 +1257,7 @@ static void init_ablkcipher_job(u32 *sh_desc, dma_addr_t 
ptr,
out_options = LDST_SGF;
}
} else {
-   if (!edesc->dst_nents) {
+   if (edesc->dst_nents == 1) {
dst_dma = sg_dma_address(req->dst);
} else {
dst_dma = edesc->sec4_sg_dma +
@@ -1287,13 +1289,13 @@ static void init_ablkcipher

[PATCH 05/12] crypto: caam - check sg_count() return value

2017-02-10 Thread Horia Geantă
sg_count() internally calls sg_nents_for_len(), which could fail
in case the required number of bytes is larger than the total
bytes in the S/G.

Thus, add checks to validate the input.

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/caamalg.c | 44 +--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 05d4690351b9..ed8a04412767 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -1335,13 +1335,31 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
 
if (unlikely(req->dst != req->src)) {
src_nents = sg_count(req->src, req->assoclen + req->cryptlen);
+   if (unlikely(src_nents < 0)) {
+   dev_err(jrdev, "Insufficient bytes (%d) in src S/G\n",
+   req->assoclen + req->cryptlen);
+   return ERR_PTR(src_nents);
+   }
+
dst_nents = sg_count(req->dst,
 req->assoclen + req->cryptlen +
(encrypt ? authsize : (-authsize)));
+   if (unlikely(dst_nents < 0)) {
+   dev_err(jrdev, "Insufficient bytes (%d) in dst S/G\n",
+   req->assoclen + req->cryptlen +
+   (encrypt ? authsize : (-authsize)));
+   return ERR_PTR(dst_nents);
+   }
} else {
src_nents = sg_count(req->src,
 req->assoclen + req->cryptlen +
(encrypt ? authsize : 0));
+   if (unlikely(src_nents < 0)) {
+   dev_err(jrdev, "Insufficient bytes (%d) in src S/G\n",
+   req->assoclen + req->cryptlen +
+   (encrypt ? authsize : 0));
+   return ERR_PTR(src_nents);
+   }
}
 
/* Check if data are contiguous. */
@@ -1609,9 +1627,20 @@ static struct ablkcipher_edesc 
*ablkcipher_edesc_alloc(struct ablkcipher_request
int sec4_sg_index;
 
src_nents = sg_count(req->src, req->nbytes);
+   if (unlikely(src_nents < 0)) {
+   dev_err(jrdev, "Insufficient bytes (%d) in src S/G\n",
+   req->nbytes);
+   return ERR_PTR(src_nents);
+   }
 
-   if (req->dst != req->src)
+   if (req->dst != req->src) {
dst_nents = sg_count(req->dst, req->nbytes);
+   if (unlikely(dst_nents < 0)) {
+   dev_err(jrdev, "Insufficient bytes (%d) in dst S/G\n",
+   req->nbytes);
+   return ERR_PTR(dst_nents);
+   }
+   }
 
if (likely(req->src == req->dst)) {
sgc = dma_map_sg(jrdev, req->src, src_nents ? : 1,
@@ -1807,6 +1836,11 @@ static struct ablkcipher_edesc 
*ablkcipher_giv_edesc_alloc(
int sec4_sg_index;
 
src_nents = sg_count(req->src, req->nbytes);
+   if (unlikely(src_nents < 0)) {
+   dev_err(jrdev, "Insufficient bytes (%d) in src S/G\n",
+   req->nbytes);
+   return ERR_PTR(src_nents);
+   }
 
if (likely(req->src == req->dst)) {
sgc = dma_map_sg(jrdev, req->src, src_nents ? : 1,
@@ -1826,6 +1860,12 @@ static struct ablkcipher_edesc 
*ablkcipher_giv_edesc_alloc(
}
 
dst_nents = sg_count(req->dst, req->nbytes);
+   if (unlikely(dst_nents < 0)) {
+   dev_err(jrdev, "Insufficient bytes (%d) in dst S/G\n",
+   req->nbytes);
+   return ERR_PTR(dst_nents);
+   }
+
sgc = dma_map_sg(jrdev, req->dst, dst_nents ? : 1,
 DMA_FROM_DEVICE);
if (unlikely(!sgc)) {
@@ -1914,7 +1954,7 @@ static int ablkcipher_givencrypt(struct 
skcipher_givcrypt_request *creq)
struct crypto_ablkcipher *ablkcipher = crypto_ablkcipher_reqtfm(req);
struct caam_ctx *ctx = crypto_ablkcipher_ctx(ablkcipher);
struct device *jrdev = ctx->jrdev;
-   bool iv_contig;
+   bool iv_contig = false;
u32 *desc;
int ret = 0;
 
-- 
2.4.4



[PATCH 09/12] crypto: caam - fix DMA API leaks for multiple setkey() calls

2017-02-10 Thread Horia Geantă
setkey() callback may be invoked multiple times for the same tfm.
In this case, DMA API leaks are caused by shared descriptors
(and key for caamalg) being mapped several times and unmapped only once.
Fix this by performing mapping / unmapping only in crypto algorithm's
cra_init() / cra_exit() callbacks and sync_for_device in the setkey()
tfm callback.

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/caamalg.c  | 275 +++--
 drivers/crypto/caam/caamhash.c |  79 +---
 2 files changed, 102 insertions(+), 252 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 71d09e896d48..9bc80eb06934 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -134,15 +134,15 @@ struct caam_aead_alg {
  * per-session context
  */
 struct caam_ctx {
-   struct device *jrdev;
u32 sh_desc_enc[DESC_MAX_USED_LEN];
u32 sh_desc_dec[DESC_MAX_USED_LEN];
u32 sh_desc_givenc[DESC_MAX_USED_LEN];
+   u8 key[CAAM_MAX_KEY_SIZE];
dma_addr_t sh_desc_enc_dma;
dma_addr_t sh_desc_dec_dma;
dma_addr_t sh_desc_givenc_dma;
-   u8 key[CAAM_MAX_KEY_SIZE];
dma_addr_t key_dma;
+   struct device *jrdev;
struct alginfo adata;
struct alginfo cdata;
unsigned int authsize;
@@ -171,13 +171,8 @@ static int aead_null_set_sh_desc(struct crypto_aead *aead)
/* aead_encrypt shared descriptor */
desc = ctx->sh_desc_enc;
cnstr_shdsc_aead_null_encap(desc, &ctx->adata, ctx->authsize);
-   ctx->sh_desc_enc_dma = dma_map_single(jrdev, desc,
- desc_bytes(desc),
- DMA_TO_DEVICE);
-   if (dma_mapping_error(jrdev, ctx->sh_desc_enc_dma)) {
-   dev_err(jrdev, "unable to map shared descriptor\n");
-   return -ENOMEM;
-   }
+   dma_sync_single_for_device(jrdev, ctx->sh_desc_enc_dma,
+  desc_bytes(desc), DMA_TO_DEVICE);
 
/*
 * Job Descriptor and Shared Descriptors
@@ -194,13 +189,8 @@ static int aead_null_set_sh_desc(struct crypto_aead *aead)
/* aead_decrypt shared descriptor */
desc = ctx->sh_desc_dec;
cnstr_shdsc_aead_null_decap(desc, &ctx->adata, ctx->authsize);
-   ctx->sh_desc_dec_dma = dma_map_single(jrdev, desc,
- desc_bytes(desc),
- DMA_TO_DEVICE);
-   if (dma_mapping_error(jrdev, ctx->sh_desc_dec_dma)) {
-   dev_err(jrdev, "unable to map shared descriptor\n");
-   return -ENOMEM;
-   }
+   dma_sync_single_for_device(jrdev, ctx->sh_desc_dec_dma,
+  desc_bytes(desc), DMA_TO_DEVICE);
 
return 0;
 }
@@ -278,13 +268,8 @@ static int aead_set_sh_desc(struct crypto_aead *aead)
desc = ctx->sh_desc_enc;
cnstr_shdsc_aead_encap(desc, &ctx->cdata, &ctx->adata, ctx->authsize,
   is_rfc3686, nonce, ctx1_iv_off);
-   ctx->sh_desc_enc_dma = dma_map_single(jrdev, desc,
- desc_bytes(desc),
- DMA_TO_DEVICE);
-   if (dma_mapping_error(jrdev, ctx->sh_desc_enc_dma)) {
-   dev_err(jrdev, "unable to map shared descriptor\n");
-   return -ENOMEM;
-   }
+   dma_sync_single_for_device(jrdev, ctx->sh_desc_enc_dma,
+  desc_bytes(desc), DMA_TO_DEVICE);
 
 skip_enc:
/*
@@ -315,13 +300,8 @@ static int aead_set_sh_desc(struct crypto_aead *aead)
cnstr_shdsc_aead_decap(desc, &ctx->cdata, &ctx->adata, ivsize,
   ctx->authsize, alg->caam.geniv, is_rfc3686,
   nonce, ctx1_iv_off);
-   ctx->sh_desc_dec_dma = dma_map_single(jrdev, desc,
- desc_bytes(desc),
- DMA_TO_DEVICE);
-   if (dma_mapping_error(jrdev, ctx->sh_desc_dec_dma)) {
-   dev_err(jrdev, "unable to map shared descriptor\n");
-   return -ENOMEM;
-   }
+   dma_sync_single_for_device(jrdev, ctx->sh_desc_dec_dma,
+  desc_bytes(desc), DMA_TO_DEVICE);
 
if (!alg->caam.geniv)
goto skip_givenc;
@@ -354,13 +334,8 @@ static int aead_set_sh_desc(struct crypto_aead *aead)
cnstr_shdsc_aead_givencap(desc, &ctx->cdata, &ctx->adata, ivsize,
  ctx->authsize, is_rfc3686, nonce,
  ctx1_iv_off);
-   ctx->sh_desc_enc_dma = dma_map_single(jrdev, desc,
- desc_bytes(desc),
- DMA_TO_DEVICE);
-   if (dma_mapping_error(jrdev, ctx->sh_desc_enc_dma)) {
-   de

[PATCH 07/12] crypto: caam - use dma_map_sg() return code

2017-02-10 Thread Horia Geantă
dma_map_sg() might coalesce S/G entries, so use the number of S/G
entries returned by it instead of what sg_nents_for_len() initially
returns.

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/caamalg.c | 133 ++
 1 file changed, 71 insertions(+), 62 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 14b7dc8d5dcb..71d09e896d48 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -1328,9 +1328,8 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
struct device *jrdev = ctx->jrdev;
gfp_t flags = (req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
   CRYPTO_TFM_REQ_MAY_SLEEP)) ? GFP_KERNEL : GFP_ATOMIC;
-   int src_nents, dst_nents = 0;
+   int src_nents, mapped_src_nents, dst_nents = 0, mapped_dst_nents = 0;
struct aead_edesc *edesc;
-   int sgc;
int sec4_sg_index, sec4_sg_len, sec4_sg_bytes;
unsigned int authsize = ctx->authsize;
 
@@ -1365,60 +1364,62 @@ static struct aead_edesc *aead_edesc_alloc(struct 
aead_request *req,
}
}
 
-   sec4_sg_len = src_nents > 1 ? src_nents : 0;
-   sec4_sg_len += dst_nents > 1 ? dst_nents : 0;
-   sec4_sg_bytes = sec4_sg_len * sizeof(struct sec4_sg_entry);
-
-   /* allocate space for base edesc and hw desc commands, link tables */
-   edesc = kzalloc(sizeof(*edesc) + desc_bytes + sec4_sg_bytes,
-   GFP_DMA | flags);
-   if (!edesc) {
-   dev_err(jrdev, "could not allocate extended descriptor\n");
-   return ERR_PTR(-ENOMEM);
-   }
-
if (likely(req->src == req->dst)) {
-   sgc = dma_map_sg(jrdev, req->src, src_nents, DMA_BIDIRECTIONAL);
-   if (unlikely(!sgc)) {
+   mapped_src_nents = dma_map_sg(jrdev, req->src, src_nents,
+ DMA_BIDIRECTIONAL);
+   if (unlikely(!mapped_src_nents)) {
dev_err(jrdev, "unable to map source\n");
-   kfree(edesc);
return ERR_PTR(-ENOMEM);
}
} else {
/* Cover also the case of null (zero length) input data */
if (src_nents) {
-   sgc = dma_map_sg(jrdev, req->src, src_nents,
-DMA_TO_DEVICE);
-   if (unlikely(!sgc)) {
+   mapped_src_nents = dma_map_sg(jrdev, req->src,
+ src_nents, DMA_TO_DEVICE);
+   if (unlikely(!mapped_src_nents)) {
dev_err(jrdev, "unable to map source\n");
-   kfree(edesc);
return ERR_PTR(-ENOMEM);
}
+   } else {
+   mapped_src_nents = 0;
}
 
-   sgc = dma_map_sg(jrdev, req->dst, dst_nents, DMA_FROM_DEVICE);
-   if (unlikely(!sgc)) {
+   mapped_dst_nents = dma_map_sg(jrdev, req->dst, dst_nents,
+ DMA_FROM_DEVICE);
+   if (unlikely(!mapped_dst_nents)) {
dev_err(jrdev, "unable to map destination\n");
dma_unmap_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
-   kfree(edesc);
return ERR_PTR(-ENOMEM);
}
}
 
+   sec4_sg_len = mapped_src_nents > 1 ? mapped_src_nents : 0;
+   sec4_sg_len += mapped_dst_nents > 1 ? mapped_dst_nents : 0;
+   sec4_sg_bytes = sec4_sg_len * sizeof(struct sec4_sg_entry);
+
+   /* allocate space for base edesc and hw desc commands, link tables */
+   edesc = kzalloc(sizeof(*edesc) + desc_bytes + sec4_sg_bytes,
+   GFP_DMA | flags);
+   if (!edesc) {
+   caam_unmap(jrdev, req->src, req->dst, src_nents, dst_nents, 0,
+  0, 0, 0);
+   return ERR_PTR(-ENOMEM);
+   }
+
edesc->src_nents = src_nents;
edesc->dst_nents = dst_nents;
edesc->sec4_sg = (void *)edesc + sizeof(struct aead_edesc) +
 desc_bytes;
-   *all_contig_ptr = !(src_nents > 1);
+   *all_contig_ptr = !(mapped_src_nents > 1);
 
sec4_sg_index = 0;
-   if (src_nents > 1) {
-   sg_to_sec4_sg_last(req->src, src_nents,
- edesc->sec4_sg + sec4_sg_index, 0);
-   sec4_sg_index += src_nents;
+   if (mapped_src_nents > 1) {
+   sg_to_sec4_sg_last(req->src, mapped_src_nents,
+  edesc->sec4_sg + sec4_sg_index, 0);
+   sec4_sg_index += mapped_src_nents;
}
-   if (dst_nents > 1) {
-   sg_to_sec4_sg_last(req->dst, dst_nents,
+   

[PATCH 04/12] crypto: caam - fix HW S/G in ablkcipher_giv_edesc_alloc()

2017-02-10 Thread Horia Geantă
HW S/G generation does not work properly when the following conditions
are met:
-src == dst
-src/dst is S/G
-IV is right before (contiguous with) the first src/dst S/G entry
since "iv_contig" is set to true (iv_contig is a misnomer here and
it actually refers to the whole output being contiguous)

Fix this by setting dst S/G nents equal to src S/G nents, instead of
leaving it set to init value (0).

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/caamalg.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 662fe94cb2f8..05d4690351b9 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -1798,7 +1798,7 @@ static struct ablkcipher_edesc 
*ablkcipher_giv_edesc_alloc(
gfp_t flags = (req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
  CRYPTO_TFM_REQ_MAY_SLEEP)) ?
   GFP_KERNEL : GFP_ATOMIC;
-   int src_nents, dst_nents = 0, sec4_sg_bytes;
+   int src_nents, dst_nents, sec4_sg_bytes;
struct ablkcipher_edesc *edesc;
dma_addr_t iv_dma = 0;
bool iv_contig = false;
@@ -1808,9 +1808,6 @@ static struct ablkcipher_edesc 
*ablkcipher_giv_edesc_alloc(
 
src_nents = sg_count(req->src, req->nbytes);
 
-   if (unlikely(req->dst != req->src))
-   dst_nents = sg_count(req->dst, req->nbytes);
-
if (likely(req->src == req->dst)) {
sgc = dma_map_sg(jrdev, req->src, src_nents ? : 1,
 DMA_BIDIRECTIONAL);
@@ -1818,6 +1815,8 @@ static struct ablkcipher_edesc 
*ablkcipher_giv_edesc_alloc(
dev_err(jrdev, "unable to map source\n");
return ERR_PTR(-ENOMEM);
}
+
+   dst_nents = src_nents;
} else {
sgc = dma_map_sg(jrdev, req->src, src_nents ? : 1,
 DMA_TO_DEVICE);
@@ -1826,6 +1825,7 @@ static struct ablkcipher_edesc 
*ablkcipher_giv_edesc_alloc(
return ERR_PTR(-ENOMEM);
}
 
+   dst_nents = sg_count(req->dst, req->nbytes);
sgc = dma_map_sg(jrdev, req->dst, dst_nents ? : 1,
 DMA_FROM_DEVICE);
if (unlikely(!sgc)) {
-- 
2.4.4



[PATCH 08/12] crypto: caam - don't dma_map key for hash algorithms

2017-02-10 Thread Horia Geantă
Shared descriptors for hash algorithms are small enough
for (split) keys to be inlined in all cases.
Since driver already does this, all what's left is to remove
unused ctx->key_dma.

Fixes: 045e36780f115 ("crypto: caam - ahash hmac support")
Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/caamhash.c | 18 +-
 1 file changed, 1 insertion(+), 17 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index e58639ea53b1..117bbd8c08d4 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -109,7 +109,6 @@ struct caam_hash_ctx {
dma_addr_t sh_desc_digest_dma;
struct device *jrdev;
u8 key[CAAM_MAX_HASH_KEY_SIZE];
-   dma_addr_t key_dma;
int ctx_len;
struct alginfo adata;
 };
@@ -420,7 +419,6 @@ static int ahash_setkey(struct crypto_ahash *ahash,
const u8 *key, unsigned int keylen)
 {
struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash);
-   struct device *jrdev = ctx->jrdev;
int blocksize = crypto_tfm_alg_blocksize(&ahash->base);
int digestsize = crypto_ahash_digestsize(ahash);
int ret;
@@ -448,28 +446,14 @@ static int ahash_setkey(struct crypto_ahash *ahash,
if (ret)
goto bad_free_key;
 
-   ctx->key_dma = dma_map_single(jrdev, ctx->key, ctx->adata.keylen_pad,
- DMA_TO_DEVICE);
-   if (dma_mapping_error(jrdev, ctx->key_dma)) {
-   dev_err(jrdev, "unable to map key i/o memory\n");
-   ret = -ENOMEM;
-   goto error_free_key;
-   }
 #ifdef DEBUG
print_hex_dump(KERN_ERR, "ctx.key@"__stringify(__LINE__)": ",
   DUMP_PREFIX_ADDRESS, 16, 4, ctx->key,
   ctx->adata.keylen_pad, 1);
 #endif
 
-   ret = ahash_set_sh_desc(ahash);
-   if (ret) {
-   dma_unmap_single(jrdev, ctx->key_dma, ctx->adata.keylen_pad,
-DMA_TO_DEVICE);
-   }
-
- error_free_key:
kfree(hashed_key);
-   return ret;
+   return ahash_set_sh_desc(ahash);
  bad_free_key:
kfree(hashed_key);
crypto_ahash_set_flags(ahash, CRYPTO_TFM_RES_BAD_KEY_LEN);
-- 
2.4.4



[PATCH 12/12] crypto: caam - fix state buffer DMA (un)mapping

2017-02-10 Thread Horia Geantă
If we register the DMA API debug notification chain to
receive platform bus events:
dma_debug_add_bus(&platform_bus_type);
we start receiving warnings after a simple test like "modprobe caam_jr &&
modprobe caamhash && modprobe -r caamhash && modprobe -r caam_jr":
platform ffe301000.jr: DMA-API: device driver has pending DMA allocations while 
released from device [count=1938]
One of leaked entries details: [device address=0x000173fda090] [size=63 
bytes] [mapped with DMA_TO_DEVICE] [mapped as single]

It turns out there are several issues with handling buf_dma (mapping of buffer
holding the previous chunk smaller than hash block size):
-detection of buf_dma mapping failure occurs too late, after a job descriptor
using that value has been submitted for execution
-dma mapping leak - unmapping is not performed in all places: for e.g.
in ahash_export or in most ahash_fin* callbacks (due to current back-to-back
implementation of buf_dma unmapping/mapping)

Fix these by:
-calling dma_mapping_error() on buf_dma right after the mapping and providing
an error code if needed
-unmapping buf_dma during the "job done" (ahash_done_*) callbacks

Signed-off-by: Horia Geantă 
---
 drivers/crypto/caam/caamhash.c | 107 -
 1 file changed, 52 insertions(+), 55 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index b37d555a80d0..da4f94eab3da 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -194,36 +194,27 @@ static inline dma_addr_t map_seq_out_ptr_result(u32 
*desc, struct device *jrdev,
return dst_dma;
 }
 
-/* Map current buffer in state and put it in link table */
-static inline dma_addr_t buf_map_to_sec4_sg(struct device *jrdev,
-   struct sec4_sg_entry *sec4_sg,
-   u8 *buf, int buflen)
+/* Map current buffer in state (if length > 0) and put it in link table */
+static inline int buf_map_to_sec4_sg(struct device *jrdev,
+struct sec4_sg_entry *sec4_sg,
+struct caam_hash_state *state)
 {
-   dma_addr_t buf_dma;
+   int buflen = *current_buflen(state);
 
-   buf_dma = dma_map_single(jrdev, buf, buflen, DMA_TO_DEVICE);
-   dma_to_sec4_sg_one(sec4_sg, buf_dma, buflen, 0);
+   if (!buflen)
+   return 0;
 
-   return buf_dma;
-}
+   state->buf_dma = dma_map_single(jrdev, current_buf(state), buflen,
+   DMA_TO_DEVICE);
+   if (dma_mapping_error(jrdev, state->buf_dma)) {
+   dev_err(jrdev, "unable to map buf\n");
+   state->buf_dma = 0;
+   return -ENOMEM;
+   }
 
-/*
- * Only put buffer in link table if it contains data, which is possible,
- * since a buffer has previously been used, and needs to be unmapped,
- */
-static inline dma_addr_t
-try_buf_map_to_sec4_sg(struct device *jrdev, struct sec4_sg_entry *sec4_sg,
-  u8 *buf, dma_addr_t buf_dma, int buflen,
-  int last_buflen)
-{
-   if (buf_dma && !dma_mapping_error(jrdev, buf_dma))
-   dma_unmap_single(jrdev, buf_dma, last_buflen, DMA_TO_DEVICE);
-   if (buflen)
-   buf_dma = buf_map_to_sec4_sg(jrdev, sec4_sg, buf, buflen);
-   else
-   buf_dma = 0;
-
-   return buf_dma;
+   dma_to_sec4_sg_one(sec4_sg, state->buf_dma, buflen, 0);
+
+   return 0;
 }
 
 /* Map state->caam_ctx, and add it to link table */
@@ -491,6 +482,8 @@ static inline void ahash_unmap(struct device *dev,
struct ahash_edesc *edesc,
struct ahash_request *req, int dst_len)
 {
+   struct caam_hash_state *state = ahash_request_ctx(req);
+
if (edesc->src_nents)
dma_unmap_sg(dev, req->src, edesc->src_nents, DMA_TO_DEVICE);
if (edesc->dst_dma)
@@ -499,6 +492,12 @@ static inline void ahash_unmap(struct device *dev,
if (edesc->sec4_sg_bytes)
dma_unmap_single(dev, edesc->sec4_sg_dma,
 edesc->sec4_sg_bytes, DMA_TO_DEVICE);
+
+   if (state->buf_dma) {
+   dma_unmap_single(dev, state->buf_dma, *current_buflen(state),
+DMA_TO_DEVICE);
+   state->buf_dma = 0;
+   }
 }
 
 static inline void ahash_unmap_ctx(struct device *dev,
@@ -557,8 +556,8 @@ static void ahash_done_bi(struct device *jrdev, u32 *desc, 
u32 err,
struct ahash_edesc *edesc;
struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash);
-#ifdef DEBUG
struct caam_hash_state *state = ahash_request_ctx(req);
+#ifdef DEBUG
int digestsize = crypto_ahash_digestsize(ahash);
 
dev_err(jrdev, "%s %d: err 0x%x\n", __func__, __LINE__, err);
@@ -569,6 +568,7 @@ static void ahash_done_bi

[PATCH 00/12] crypto: caam - fixes

2017-02-10 Thread Horia Geantă
This batch consists mostly of DMA API related fixes and simplifications.

Since no no arch calls:
dma_debug_add_bus(&platform_bus_type);
DMA API debugging does not have the chance to report leaks when modules
are removed.

I am not sure why dma_debug_add_bus() is not used for the platform bus,
however when I did that for testing purposes, I could notice quite a few
problems in caam driver.

Thanks,
Horia

Horia Geantă (11):
  crypto: caam - don't include unneeded headers
  crypto: caam - check return code of dma_set_mask_and_coherent()
  crypto: caam - fix HW S/G in ablkcipher_giv_edesc_alloc()
  crypto: caam - check sg_count() return value
  crypto: caam - replace sg_count() with sg_nents_for_len()
  crypto: caam - use dma_map_sg() return code
  crypto: caam - don't dma_map key for hash algorithms
  crypto: caam - fix DMA API leaks for multiple setkey() calls
  crypto: caam - fix error path for ctx_dma mapping failure
  crypto: caam - abstract ahash request double buffering
  crypto: caam - fix state buffer DMA (un)mapping

Tudor Ambarus (1):
  crypto: caam - fix JR IO mapping if one fails

 drivers/crypto/caam/caamalg.c| 589 ---
 drivers/crypto/caam/caamhash.c   | 268 +-
 drivers/crypto/caam/ctrl.c   |  33 ++-
 drivers/crypto/caam/error.c  |   2 -
 drivers/crypto/caam/jr.c |  19 +-
 drivers/crypto/caam/sg_sw_sec4.h |  11 -
 6 files changed, 407 insertions(+), 515 deletions(-)

-- 
2.4.4



[PATCH v3 1/4] lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position

2017-02-10 Thread Anup Patel
The raid6_gfexp table represents {2}^n values for 0 <= n < 256. The
Linux async_tx framework pass values from raid6_gfexp as coefficients
for each source to prep_dma_pq() callback of DMA channel with PQ
capability. This creates problem for RAID6 offload engines (such as
Broadcom SBA) which take disk position (i.e. log of {2}) instead of
multiplicative cofficients from raid6_gfexp table.

This patch adds raid6_gflog table having log-of-2 value for any given
x such that 0 <= x < 256. For any given disk coefficient x, the
corresponding disk position is given by raid6_gflog[x]. The RAID6
offload engine driver can use this newly added raid6_gflog table to
get disk position from multiplicative coefficient.

Signed-off-by: Anup Patel 
Reviewed-by: Scott Branden 
Reviewed-by: Ray Jui 
---
 include/linux/raid/pq.h |  1 +
 lib/raid6/mktables.c| 20 
 2 files changed, 21 insertions(+)

diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
index 4d57bba..30f9453 100644
--- a/include/linux/raid/pq.h
+++ b/include/linux/raid/pq.h
@@ -142,6 +142,7 @@ int raid6_select_algo(void);
 extern const u8 raid6_gfmul[256][256] __attribute__((aligned(256)));
 extern const u8 raid6_vgfmul[256][32] __attribute__((aligned(256)));
 extern const u8 raid6_gfexp[256]  __attribute__((aligned(256)));
+extern const u8 raid6_gflog[256]  __attribute__((aligned(256)));
 extern const u8 raid6_gfinv[256]  __attribute__((aligned(256)));
 extern const u8 raid6_gfexi[256]  __attribute__((aligned(256)));
 
diff --git a/lib/raid6/mktables.c b/lib/raid6/mktables.c
index 39787db..e824d08 100644
--- a/lib/raid6/mktables.c
+++ b/lib/raid6/mktables.c
@@ -125,6 +125,26 @@ int main(int argc, char *argv[])
printf("EXPORT_SYMBOL(raid6_gfexp);\n");
printf("#endif\n");
 
+   /* Compute log-of-2 table */
+   printf("\nconst u8 __attribute__((aligned(256)))\n"
+  "raid6_gflog[256] =\n" "{\n");
+   for (i = 0; i < 256; i += 8) {
+   printf("\t");
+   for (j = 0; j < 8; j++) {
+   v = 255;
+   for (k = 0; k < 256; k++)
+   if (exptbl[k] == (i + j)) {
+   v = k;
+   break;
+   }
+   printf("0x%02x,%c", v, (j == 7) ? '\n' : ' ');
+   }
+   }
+   printf("};\n");
+   printf("#ifdef __KERNEL__\n");
+   printf("EXPORT_SYMBOL(raid6_gflog);\n");
+   printf("#endif\n");
+
/* Compute inverse table x^-1 == x^254 */
printf("\nconst u8 __attribute__((aligned(256)))\n"
   "raid6_gfinv[256] =\n" "{\n");
-- 
2.7.4



[PATCH v3 0/4] Broadcom SBA RAID support

2017-02-10 Thread Anup Patel
The Broadcom SBA RAID is a stream-based device which provides
RAID5/6 offload.

It requires a SoC specific ring manager (such as Broadcom FlexRM
ring manager) to provide ring-based programming interface. Due to
this, the Broadcom SBA RAID driver (mailbox client) implements
DMA device having one DMA channel using a set of mailbox channels
provided by Broadcom SoC specific ring manager driver (mailbox
controller).

The Broadcom SBA RAID hardware requires PQ disk position instead
of PQ disk coefficient. To address this, we have added raid_gflog
table which will help driver to convert PQ disk coefficient to PQ
disk position.

This patchset is based on Linux-4.10-rc2 and depends on patchset
"[PATCH v4 0/2] Broadcom FlexRM ring manager support"

It is also available at sba-raid-v3 branch of
https://github.com/Broadcom/arm64-linux.git

Changes since v2:
 - Droped patch to handle DMA devices having support for fewer
   PQ coefficients in Linux Async Tx
 - Added work-around in bcm-sba-raid driver to handle unsupported
   PQ coefficients using multiple SBA requests

Changes since v1:
 - Droped patch to add mbox_channel_device() API
 - Used GENMASK and BIT macros wherever possible in bcm-sba-raid driver
 - Replaced C_MDATA macros with static inline functions in
   bcm-sba-raid driver
 - Removed sba_alloc_chan_resources() callback in bcm-sba-raid driver
 - Used dev_err() instead of dev_info() wherever applicable
 - Removed call to sba_issue_pending() from sba_tx_submit() in
   bcm-sba-raid driver
 - Implemented SBA request chaning for handling (len > sba->req_size)
   in bcm-sba-raid driver
 - Implemented device_terminate_all() callback in bcm-sba-raid driver

Anup Patel (4):
  lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position
  async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()
  dmaengine: Add Broadcom SBA RAID driver
  dt-bindings: Add DT bindings document for Broadcom SBA RAID driver

 .../devicetree/bindings/dma/brcm,iproc-sba.txt |   29 +
 crypto/async_tx/async_pq.c |5 +-
 drivers/dma/Kconfig|   13 +
 drivers/dma/Makefile   |1 +
 drivers/dma/bcm-sba-raid.c | 1711 
 include/linux/raid/pq.h|1 +
 lib/raid6/mktables.c   |   20 +
 7 files changed, 1777 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
 create mode 100644 drivers/dma/bcm-sba-raid.c

-- 
2.7.4



[PATCH v3 3/4] dmaengine: Add Broadcom SBA RAID driver

2017-02-10 Thread Anup Patel
The Broadcom stream buffer accelerator (SBA) provides offloading
capabilities for RAID operations. This SBA offload engine is
accessible via Broadcom SoC specific ring manager.

This patch adds Broadcom SBA RAID driver which provides one
DMA device with RAID capabilities using one or more Broadcom
SoC specific ring manager channels. The SBA RAID driver in its
current shape implements memcpy, xor, and pq operations.

Signed-off-by: Anup Patel 
Reviewed-by: Ray Jui 
---
 drivers/dma/Kconfig|   13 +
 drivers/dma/Makefile   |1 +
 drivers/dma/bcm-sba-raid.c | 1711 
 3 files changed, 1725 insertions(+)
 create mode 100644 drivers/dma/bcm-sba-raid.c

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 263495d..bf8fb84 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -99,6 +99,19 @@ config AXI_DMAC
  controller is often used in Analog Device's reference designs for FPGA
  platforms.
 
+config BCM_SBA_RAID
+   tristate "Broadcom SBA RAID engine support"
+   depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST
+   select DMA_ENGINE
+   select DMA_ENGINE_RAID
+   select ASYNC_TX_ENABLE_CHANNEL_SWITCH
+   default ARCH_BCM_IPROC
+   help
+ Enable support for Broadcom SBA RAID Engine. The SBA RAID
+ engine is available on most of the Broadcom iProc SoCs. It
+ has the capability to offload memcpy, xor and pq computation
+ for raid5/6.
+
 config COH901318
bool "ST-Ericsson COH901318 DMA support"
select DMA_ENGINE
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index a4fa336..ba96bdd 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_AMCC_PPC440SPE_ADMA) += ppc4xx/
 obj-$(CONFIG_AT_HDMAC) += at_hdmac.o
 obj-$(CONFIG_AT_XDMAC) += at_xdmac.o
 obj-$(CONFIG_AXI_DMAC) += dma-axi-dmac.o
+obj-$(CONFIG_BCM_SBA_RAID) += bcm-sba-raid.o
 obj-$(CONFIG_COH901318) += coh901318.o coh901318_lli.o
 obj-$(CONFIG_DMA_BCM2835) += bcm2835-dma.o
 obj-$(CONFIG_DMA_JZ4740) += dma-jz4740.o
diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c
new file mode 100644
index 000..bab9918
--- /dev/null
+++ b/drivers/dma/bcm-sba-raid.c
@@ -0,0 +1,1711 @@
+/*
+ * Copyright (C) 2017 Broadcom
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * Broadcom SBA RAID Driver
+ *
+ * The Broadcom stream buffer accelerator (SBA) provides offloading
+ * capabilities for RAID operations. The SBA offload engine is accessible
+ * via Broadcom SoC specific ring manager. Two or more offload engines
+ * can share same Broadcom SoC specific ring manager due to this Broadcom
+ * SoC specific ring manager driver is implemented as a mailbox controller
+ * driver and offload engine drivers are implemented as mallbox clients.
+ *
+ * Typically, Broadcom SoC specific ring manager will implement larger
+ * number of hardware rings over one or more SBA hardware devices. By
+ * design, the internal buffer size of SBA hardware device is limited
+ * but all offload operations supported by SBA can be broken down into
+ * multiple small size requests and executed parallely on multiple SBA
+ * hardware devices for achieving high through-put.
+ *
+ * The Broadcom SBA RAID driver does not require any register programming
+ * except submitting request to SBA hardware device via mailbox channels.
+ * This driver implements a DMA device with one DMA channel using a set
+ * of mailbox channels provided by Broadcom SoC specific ring manager
+ * driver. To exploit parallelism (as described above), all DMA request
+ * coming to SBA RAID DMA channel are broken down to smaller requests
+ * and submitted to multiple mailbox channels in round-robin fashion.
+ * For having more SBA DMA channels, we can create more SBA device nodes
+ * in Broadcom SoC specific DTS based on number of hardware rings supported
+ * by Broadcom SoC ring manager.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "dmaengine.h"
+
+/* SBA command helper macros */
+#define SBA_DEC(_d, _s, _m)(((_d) >> (_s)) & (_m))
+#define SBA_ENC(_d, _v, _s, _m)\
+   do {\
+   (_d) &= ~((u64)(_m) << (_s));   \
+   (_d) |= (((u64)(_v) & (_m)) << (_s));   \
+   } while (0)
+
+/* SBA command related defines */
+#define SBA_TYPE_SHIFT 48
+#define SBA_TYPE_MASK  GENMASK(1, 0)
+#define SBA_TYPE_A 0x0
+#define SBA_TYPE_B 0x2
+#define SBA_TYPE_C   

[PATCH v3 4/4] dt-bindings: Add DT bindings document for Broadcom SBA RAID driver

2017-02-10 Thread Anup Patel
This patch adds the DT bindings document for newly added Broadcom
SBA RAID driver.

Signed-off-by: Anup Patel 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 .../devicetree/bindings/dma/brcm,iproc-sba.txt | 29 ++
 1 file changed, 29 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt

diff --git a/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt 
b/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
new file mode 100644
index 000..092913a
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
@@ -0,0 +1,29 @@
+* Broadcom SBA RAID engine
+
+Required properties:
+- compatible: Should be one of the following
+ "brcm,iproc-sba"
+ "brcm,iproc-sba-v2"
+  The "brcm,iproc-sba" has support for only 6 PQ coefficients
+  The "brcm,iproc-sba-v2" has support for only 30 PQ coefficients
+- mboxes: List of phandle and mailbox channel specifiers
+
+Example:
+
+raid_mbox: mbox@6740 {
+   ...
+   #mbox-cells = <3>;
+   ...
+};
+
+raid0 {
+   compatible = "brcm,iproc-sba-v2";
+   mboxes = <&raid_mbox 0 0x1 0x>,
+<&raid_mbox 1 0x1 0x>,
+<&raid_mbox 2 0x1 0x>,
+<&raid_mbox 3 0x1 0x>,
+<&raid_mbox 4 0x1 0x>,
+<&raid_mbox 5 0x1 0x>,
+<&raid_mbox 6 0x1 0x>,
+<&raid_mbox 7 0x1 0x>;
+};
-- 
2.7.4



[PATCH v3 2/4] async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()

2017-02-10 Thread Anup Patel
The DMA_PREP_FENCE is to be used when preparing Tx descriptor if output
of Tx descriptor is to be used by next/dependent Tx descriptor.

The DMA_PREP_FENSE will not be set correctly in do_async_gen_syndrome()
when calling dma->device_prep_dma_pq() under following conditions:
1. ASYNC_TX_FENCE not set in submit->flags
2. DMA_PREP_FENCE not set in dma_flags
3. src_cnt (= (disks - 2)) is greater than dma_maxpq(dma, dma_flags)

This patch fixes DMA_PREP_FENCE usage in do_async_gen_syndrome() taking
inspiration from do_async_xor() implementation.

Signed-off-by: Anup Patel 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 crypto/async_tx/async_pq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
index f83de99..56bd612 100644
--- a/crypto/async_tx/async_pq.c
+++ b/crypto/async_tx/async_pq.c
@@ -62,9 +62,6 @@ do_async_gen_syndrome(struct dma_chan *chan,
dma_addr_t dma_dest[2];
int src_off = 0;
 
-   if (submit->flags & ASYNC_TX_FENCE)
-   dma_flags |= DMA_PREP_FENCE;
-
while (src_cnt > 0) {
submit->flags = flags_orig;
pq_src_cnt = min(src_cnt, dma_maxpq(dma, dma_flags));
@@ -83,6 +80,8 @@ do_async_gen_syndrome(struct dma_chan *chan,
if (cb_fn_orig)
dma_flags |= DMA_PREP_INTERRUPT;
}
+   if (submit->flags & ASYNC_TX_FENCE)
+   dma_flags |= DMA_PREP_FENCE;
 
/* Drivers force forward progress in case they can not provide
 * a descriptor
-- 
2.7.4