Re: [PATCH v9 00/20] simplify crypto wait for async op

2017-10-17 Thread Russell King - ARM Linux
On Sun, Oct 15, 2017 at 10:19:45AM +0100, Gilad Ben-Yossef wrote:
> Many users of kernel async. crypto services have a pattern of
> starting an async. crypto op and than using a completion
> to wait for it to end.
> 
> This patch set simplifies this common use case in two ways:
> 
> First, by separating the return codes of the case where a
> request is queued to a backlog due to the provider being
> busy (-EBUSY) from the case the request has failed due
> to the provider being busy and backlogging is not enabled
> (-EAGAIN).
> 
> Next, this change is than built on to create a generic API
> to wait for a async. crypto operation to complete.
> 
> The end result is a smaller code base and an API that is
> easier to use and more difficult to get wrong.
> 
> The patch set was boot tested on x86_64 and arm64 which
> at the very least tests the crypto users via testmgr and
> tcrypt but I do note that I do not have access to some
> of the HW whose drivers are modified nor do I claim I was
> able to test all of the corner cases.
> 
> The patch set is based upon linux-next release tagged
> next-20171013.

Has there been any performance impact analysis of these changes?  I
ended up with patches for one of the crypto drivers which converted
its interrupt handling to threaded interrupts being reverted because
it caused a performance degredation.

Moving code to latest APIs to simplify it is not always beneficial.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up


[PATCH] hw_random: update help description for omap-rng

2017-02-13 Thread Russell King
omap-rng also supports Marvell Armada 7k/8k SoCs, but no mention of this
is made in the help text, despite the dependency being added. Explicitly
mention these SoCs in the help description so people know that it covers
more than just TI SoCs.

Fixes: 383212425c92 ("hwrng: omap - Add device variant for SafeXcel IP-76 found 
in Armada 8K")
Signed-off-by: Russell King <rmk+ker...@armlinux.org.uk>
---
 drivers/char/hw_random/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/char/hw_random/Kconfig b/drivers/char/hw_random/Kconfig
index ceff2fc524b1..0cafe08919c9 100644
--- a/drivers/char/hw_random/Kconfig
+++ b/drivers/char/hw_random/Kconfig
@@ -172,8 +172,8 @@ config HW_RANDOM_OMAP
default HW_RANDOM
---help---
  This driver provides kernel-side support for the Random Number
- Generator hardware found on OMAP16xx, OMAP2/3/4/5 and AM33xx/AM43xx
- multimedia processors.
+ Generator hardware found on OMAP16xx, OMAP2/3/4/5, AM33xx/AM43xx
+ multimedia processors, and Marvell Armada 7k/8k SoCs.
 
  To compile this driver as a module, choose M here: the
  module will be called omap-rng.
-- 
2.7.4



Re: [BISECT] ARM build errors on GCC v6.2 (crypto: arm/aes - replace scalar AES cipher)

2017-01-14 Thread Russell King - ARM Linux
On Sat, Jan 14, 2017 at 04:24:35PM +0200, Krzysztof Kozlowski wrote:
> Hi,
> 
> allyesconfig and multi_v7_defconfig fail to build on recent linux-next
> on GCC 6.2.0.
> 
> Errors:
> ../arch/arm/crypto/aes-cipher-core.S: Assembler messages:
> ../arch/arm/crypto/aes-cipher-core.S:21: Error: selected processor does not 
> support `tt .req ip' in ARM mode
> ../arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- 
> `movw tt,#:lower16:crypto_ft_tab'
> ../arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- 
> `movt tt,#:upper16:crypto_ft_tab'
> 
> Compiler: arm-linux-gnueabi-gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005

It's worth pointing out that _assembler_ messages come from _binutils_
rather than _gcc_.   Please advise which version of _binutils_ you
are using.  Thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] ARM: wire up HWCAP2 feature bits to the CPU modalias

2017-01-02 Thread Russell King - ARM Linux
On Mon, Jan 02, 2017 at 09:06:04PM +, Ard Biesheuvel wrote:
> On 31 October 2016 at 16:13, Russell King - ARM Linux
> <li...@armlinux.org.uk> wrote:
> > On Sat, Oct 29, 2016 at 11:08:36AM +0100, Ard Biesheuvel wrote:
> >> On 18 October 2016 at 11:52, Ard Biesheuvel <ard.biesheu...@linaro.org> 
> >> wrote:
> >> > Wire up the generic support for exposing CPU feature bits via the
> >> > modalias in /sys/device/system/cpu. This allows udev to automatically
> >> > load modules for things like crypto algorithms that are implemented
> >> > using optional instructions.
> >> >
> >> > Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org>
> >> > ---
> >> >  arch/arm/Kconfig  |  1 +
> >> >  arch/arm/include/asm/cpufeature.h | 32 
> >> >  2 files changed, 33 insertions(+)
> >> >
> >>
> >> Russell,
> >>
> >> do you have any concerns regarding this patch? If not, I will drop it
> >> into the patch system.
> >
> > It's still something I need to look at... I've been offline last week,
> > and sort-of offline the previous week, so I'm catching up.
> >
> 
> Hi Russell,
> 
> Any thoughts yet?

None, and the patch is well buried now that it'll take me a while to
find... back in mid-October?  Yea, I'll have to drop everything and
go digging through my mailboxes to find it... and I'm just catching
up (again) after a week and a bit's time offline - yep, it's wonderful
timing.  Sorry, no time to look at it right now, you're not the only
one wanting my attention at the moment.

Please try again in about a week's time - don't leave it a few months,
and please include the patch.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/14] Revert "crypto: caam - get rid of tasklet"

2016-11-09 Thread Russell King - ARM Linux
Please include Thomas in this.

On Wed, Nov 09, 2016 at 10:46:21AM +0200, Horia Geantă wrote:
> This reverts commit 66d2e2028091a074aa1290d2eeda5ddb1a6c329c.
> 
> Quoting from Russell's findings:
> https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg21136.html
> 
> [quote]
> Okay, I've re-tested, using a different way of measuring, because using
> openssl speed is impractical for off-loaded engines.  I've decided to
> use this way to measure the performance:
> 
> dd if=/dev/zero bs=1048576 count=128 | /usr/bin/time openssl dgst -md5
> 
> For the threaded IRQs case gives:
> 
> 0.05user 2.74system 0:05.30elapsed 52%CPU (0avgtext+0avgdata 2400maxresident)k
> 0.06user 2.52system 0:05.18elapsed 49%CPU (0avgtext+0avgdata 2404maxresident)k
> 0.12user 2.60system 0:05.61elapsed 48%CPU (0avgtext+0avgdata 2460maxresident)k
>   => 5.36s => 25.0MB/s
> 
> and the tasklet case:
> 
> 0.08user 2.53system 0:04.83elapsed 54%CPU (0avgtext+0avgdata 2468maxresident)k
> 0.09user 2.47system 0:05.16elapsed 49%CPU (0avgtext+0avgdata 2368maxresident)k
> 0.10user 2.51system 0:04.87elapsed 53%CPU (0avgtext+0avgdata 2460maxresident)k
>   => 4.95 => 27.1MB/s
> 
> which corresponds to an 8% slowdown for the threaded IRQ case.  So,
> tasklets are indeed faster than threaded IRQs.
> 
> [...]
> 
> I think I've proven from the above that this patch needs to be reverted
> due to the performance regression, and that there _is_ most definitely
> a deterimental effect of switching from tasklets to threaded IRQs.
> [/quote]
> 
> Signed-off-by: Horia Geantă 
> ---
>  drivers/crypto/caam/intern.h |  1 +
>  drivers/crypto/caam/jr.c | 25 -
>  2 files changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/crypto/caam/intern.h b/drivers/crypto/caam/intern.h
> index 5d4c05074a5c..e2bcacc1a921 100644
> --- a/drivers/crypto/caam/intern.h
> +++ b/drivers/crypto/caam/intern.h
> @@ -41,6 +41,7 @@ struct caam_drv_private_jr {
>   struct device   *dev;
>   int ridx;
>   struct caam_job_ring __iomem *rregs;/* JobR's register space */
> + struct tasklet_struct irqtask;
>   int irq;/* One per queue */
>  
>   /* Number of scatterlist crypt transforms active on the JobR */
> diff --git a/drivers/crypto/caam/jr.c b/drivers/crypto/caam/jr.c
> index 7331ea734f37..c8604dfadbf5 100644
> --- a/drivers/crypto/caam/jr.c
> +++ b/drivers/crypto/caam/jr.c
> @@ -73,6 +73,8 @@ static int caam_jr_shutdown(struct device *dev)
>  
>   ret = caam_reset_hw_jr(dev);
>  
> + tasklet_kill(>irqtask);
> +
>   /* Release interrupt */
>   free_irq(jrp->irq, dev);
>  
> @@ -128,7 +130,7 @@ static irqreturn_t caam_jr_interrupt(int irq, void 
> *st_dev)
>  
>   /*
>* Check the output ring for ready responses, kick
> -  * the threaded irq if jobs done.
> +  * tasklet if jobs done.
>*/
>   irqstate = rd_reg32(>rregs->jrintstatus);
>   if (!irqstate)
> @@ -150,13 +152,18 @@ static irqreturn_t caam_jr_interrupt(int irq, void 
> *st_dev)
>   /* Have valid interrupt at this point, just ACK and trigger */
>   wr_reg32(>rregs->jrintstatus, irqstate);
>  
> - return IRQ_WAKE_THREAD;
> + preempt_disable();
> + tasklet_schedule(>irqtask);
> + preempt_enable();
> +
> + return IRQ_HANDLED;
>  }
>  
> -static irqreturn_t caam_jr_threadirq(int irq, void *st_dev)
> +/* Deferred service handler, run as interrupt-fired tasklet */
> +static void caam_jr_dequeue(unsigned long devarg)
>  {
>   int hw_idx, sw_idx, i, head, tail;
> - struct device *dev = st_dev;
> + struct device *dev = (struct device *)devarg;
>   struct caam_drv_private_jr *jrp = dev_get_drvdata(dev);
>   void (*usercall)(struct device *dev, u32 *desc, u32 status, void *arg);
>   u32 *userdesc, userstatus;
> @@ -230,8 +237,6 @@ static irqreturn_t caam_jr_threadirq(int irq, void 
> *st_dev)
>  
>   /* reenable / unmask IRQs */
>   clrsetbits_32(>rregs->rconfig_lo, JRCFG_IMSK, 0);
> -
> - return IRQ_HANDLED;
>  }
>  
>  /**
> @@ -389,10 +394,11 @@ static int caam_jr_init(struct device *dev)
>  
>   jrp = dev_get_drvdata(dev);
>  
> + tasklet_init(>irqtask, caam_jr_dequeue, (unsigned long)dev);
> +
>   /* Connect job ring interrupt handler. */
> - error = request_threaded_irq(jrp->irq, caam_jr_interrupt,
> -  caam_jr_threadirq, IRQF_SHARED,
> -  dev_name(dev), dev);
> + error = request_irq(jrp->irq, caam_jr_interrupt, IRQF_SHARED,
> + dev_name(dev), dev);
>   if (error) {
>   dev_err(dev, "can't connect JobR %d interrupt (%d)\n",
>   jrp->ridx, jrp->irq);
> @@ -454,6 +460,7 @@ static int caam_jr_init(struct device *dev)
>  out_free_irq:
>   free_irq(jrp->irq, dev);
>  out_kill_deq:
> + tasklet_kill(>irqtask);
>   return error;
>  }
>  
> -- 
> 

Re: [PATCH 1/5] ARM: wire up HWCAP2 feature bits to the CPU modalias

2016-10-31 Thread Russell King - ARM Linux
On Sat, Oct 29, 2016 at 11:08:36AM +0100, Ard Biesheuvel wrote:
> On 18 October 2016 at 11:52, Ard Biesheuvel  wrote:
> > Wire up the generic support for exposing CPU feature bits via the
> > modalias in /sys/device/system/cpu. This allows udev to automatically
> > load modules for things like crypto algorithms that are implemented
> > using optional instructions.
> >
> > Signed-off-by: Ard Biesheuvel 
> > ---
> >  arch/arm/Kconfig  |  1 +
> >  arch/arm/include/asm/cpufeature.h | 32 
> >  2 files changed, 33 insertions(+)
> >
> 
> Russell,
> 
> do you have any concerns regarding this patch? If not, I will drop it
> into the patch system.

It's still something I need to look at... I've been offline last week,
and sort-of offline the previous week, so I'm catching up.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH] crypto: caam: add support for iMX6UL

2016-10-31 Thread Russell King - ARM Linux
On Mon, Oct 17, 2016 at 01:28:00PM +0200, Marcus Folkesson wrote:
> i.MX6UL does only require three clocks to enable CAAM module.
> 
> Signed-off-by: Marcus Folkesson 
> Acked-by: Rob Herring 
> Reviewed-by: Horia Geantă 
> ---
>  .../devicetree/bindings/crypto/fsl-sec4.txt| 20 +
>  drivers/crypto/caam/ctrl.c | 35 
> --
>  2 files changed, 40 insertions(+), 15 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/crypto/fsl-sec4.txt 
> b/Documentation/devicetree/bindings/crypto/fsl-sec4.txt
> index adeca34..10a425f 100644
> --- a/Documentation/devicetree/bindings/crypto/fsl-sec4.txt
> +++ b/Documentation/devicetree/bindings/crypto/fsl-sec4.txt
> @@ -123,6 +123,9 @@ PROPERTIES
>  
>  
>  EXAMPLE
> +
> +iMX6QDL/SX requires four clocks
> +
>   crypto@30 {
>   compatible = "fsl,sec-v4.0";
>   fsl,sec-era = <2>;
> @@ -139,6 +142,23 @@ EXAMPLE
>   clock-names = "mem", "aclk", "ipg", "emi_slow";
>   };
>  
> +
> +iMX6UL does only require three clocks
> +
> + crypto: caam@214 {
> + compatible = "fsl,sec-v4.0";
> + #address-cells = <1>;
> + #size-cells = <1>;
> + reg = <0x214 0x3c000>;
> + ranges = <0 0x214 0x3c000>;
> + interrupts = ;
> +
> + clocks = < IMX6UL_CLK_CAAM_MEM>,
> +  < IMX6UL_CLK_CAAM_ACLK>,
> +  < IMX6UL_CLK_CAAM_IPG>;
> + clock-names = "mem", "aclk", "ipg";
> + };
> +
>  =
>  Job Ring (JR) Node
>  
> diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c
> index 0ec112e..5abaf37 100644
> --- a/drivers/crypto/caam/ctrl.c
> +++ b/drivers/crypto/caam/ctrl.c
> @@ -329,8 +329,8 @@ static int caam_remove(struct platform_device *pdev)
>   clk_disable_unprepare(ctrlpriv->caam_ipg);
>   clk_disable_unprepare(ctrlpriv->caam_mem);
>   clk_disable_unprepare(ctrlpriv->caam_aclk);
> - clk_disable_unprepare(ctrlpriv->caam_emi_slow);
> -
> + if (!of_machine_is_compatible("fsl,imx6ul"))
> + clk_disable_unprepare(ctrlpriv->caam_emi_slow);

There is no need to re-lookup the platform here.  Can't you check the
validity of ctrlpriv->caam_emi_slow ?

>   return 0;
>  }
>  
> @@ -481,14 +481,16 @@ static int caam_probe(struct platform_device *pdev)
>   }
>   ctrlpriv->caam_aclk = clk;
>  
> - clk = caam_drv_identify_clk(>dev, "emi_slow");
> - if (IS_ERR(clk)) {
> - ret = PTR_ERR(clk);
> - dev_err(>dev,
> - "can't identify CAAM emi_slow clk: %d\n", ret);
> - return ret;
> + if (!of_machine_is_compatible("fsl,imx6ul")) {
> + clk = caam_drv_identify_clk(>dev, "emi_slow");
> + if (IS_ERR(clk)) {
> + ret = PTR_ERR(clk);
> + dev_err(>dev,
> + "can't identify CAAM emi_slow clk: %d\n", ret);
> + return ret;
> + }
> + ctrlpriv->caam_emi_slow = clk;
>   }
> - ctrlpriv->caam_emi_slow = clk;
>  
>   ret = clk_prepare_enable(ctrlpriv->caam_ipg);
>   if (ret < 0) {
> @@ -509,11 +511,13 @@ static int caam_probe(struct platform_device *pdev)
>   goto disable_caam_mem;
>   }
>  
> - ret = clk_prepare_enable(ctrlpriv->caam_emi_slow);
> - if (ret < 0) {
> - dev_err(>dev, "can't enable CAAM emi slow clock: %d\n",
> - ret);
> - goto disable_caam_aclk;
> + if (!of_machine_is_compatible("fsl,imx6ul")) {

Same here.

> + ret = clk_prepare_enable(ctrlpriv->caam_emi_slow);
> + if (ret < 0) {
> + dev_err(>dev, "can't enable CAAM emi slow clock: 
> %d\n",
> + ret);
> + goto disable_caam_aclk;
> + }
>   }
>  
>   /* Get configuration properties from device tree */
> @@ -829,7 +833,8 @@ caam_remove:
>  iounmap_ctrl:
>   iounmap(ctrl);
>  disable_caam_emi_slow:
> - clk_disable_unprepare(ctrlpriv->caam_emi_slow);
> + if (!of_machine_is_compatible("fsl,imx6ul"))
> + clk_disable_unprepare(ctrlpriv->caam_emi_slow);

and here.

>  disable_caam_aclk:
>   clk_disable_unprepare(ctrlpriv->caam_aclk);
>  disable_caam_mem:
> -- 
> 2.8.0
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto: caam from tasklet to threadirq

2016-09-20 Thread Russell King - ARM Linux
Okay, I've re-tested, using a different way of measuring, because using
openssl speed is impractical for off-loaded engines.  I've decided to
use this way to measure the performance:

dd if=/dev/zero bs=1048576 count=128 | /usr/bin/time openssl dgst -md5

For the threaded IRQs case gives:

0.05user 2.74system 0:05.30elapsed 52%CPU (0avgtext+0avgdata 2400maxresident)k
0.06user 2.52system 0:05.18elapsed 49%CPU (0avgtext+0avgdata 2404maxresident)k
0.12user 2.60system 0:05.61elapsed 48%CPU (0avgtext+0avgdata 2460maxresident)k
=> 5.36s => 25.0MB/s

and the tasklet case:

0.08user 2.53system 0:04.83elapsed 54%CPU (0avgtext+0avgdata 2468maxresident)k
0.09user 2.47system 0:05.16elapsed 49%CPU (0avgtext+0avgdata 2368maxresident)k
0.10user 2.51system 0:04.87elapsed 53%CPU (0avgtext+0avgdata 2460maxresident)k
=> 4.95 => 27.1MB/s

which corresponds to an 8% slowdown for the threaded IRQ case.  So,
tasklets are indeed faster than threaded IRQs.

I guess the reason is that tasklets are much simpler, being able to
run just before we return to userspace without involving scheduler
overheads, but that's speculation.

I've tried to perf it, but...

Samples: 31K of event 'cycles', Event count (approx.): 3552246846
  Overhead  Command  Shared Object Symbol
+   33.22%  kworker/0:1  [kernel.vmlinux]  [k] __do_softirq
+   15.78%  irq/311-2101000  [kernel.vmlinux]  [k] __do_softirq
+7.49%  irqbalance   [kernel.vmlinux]  [k] __do_softirq
+7.26%  openssl  [kernel.vmlinux]  [k] __do_softirq
+5.71%  ksoftirqd/0  [kernel.vmlinux]  [k] __do_softirq
+3.64%  kworker/0:2  [kernel.vmlinux]  [k] __do_softirq
+3.52%  swapper  [kernel.vmlinux]  [k] __do_softirq
+3.14%  kworker/0:1  [kernel.vmlinux]  [k] _raw_spin_unlock_irq

I was going to try to get the threaded IRQ case, but I've ended up with
perf getting buggered because of the iMX6 SMP perf disfunctionality:

[ 3448.810416] irq 24: nobody cared (try booting with the "irqpoll" option)
...
[ 3448.824528] Disabling IRQ #24

caused by FSL's utterly brain-dead idea of routing all the perf
interrupts to single non-CPU local interrupt input, and the refusal of
kernel folk to find an acceptable solution to support this.

So, sorry, I'm not going to bother trying to get any further with this.
If the job was not made harder stupid hardware design and kernel
politics, then I might be more inclined to do deeper investigation, but
right now I'm finding that I'm not interested in trying to jump through
these stupid hoops.

I think I've proven from the above that this patch needs to be reverted
due to the performance regression, and that there _is_ most definitely
a deterimental effect of switching from tasklets to threaded IRQs.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto: caam from tasklet to threadirq

2016-09-16 Thread Russell King - ARM Linux
On Fri, Sep 16, 2016 at 02:01:00PM +, Cata Vasile wrote:
> Hi,
> 
> We've tried to test and benchmark your submitted work[1].
> 
> Cryptographic offloading is also used in IPsec in the Linux Kernel. In
> heavy traffic scenarios, the NIC driver competes with the crypto device
> driver. Most NICs use the NAPI context, which is one of the most
> prioritized context types. In IPsec scenarios  the performance is
> trashed because, although raw data gets in to device, the data is
> encrypted/decrypted and the dequeue code in CAAM driver has a hard time
> being scheduled to actually call the callback to notify the networking
> stack it can continue working with  that data.

Having received a reply from Thomas Gleixner today, there appears to be
some disagreement with your findings, and a suggestion that the problem
needs proper and more in-depth investigation.

Thomas indicates that the NAPI processing shows an improvement when
moved to the same context that threaded interrupts run in, as opposed
to the current softirq context - which also would run the tasklets.

What I would say is that if threaded IRQs are causing harm, then there
seems to be something very wrong somewhere.

> Being this scenario, at heavy load, the Kernel warns on rcu stalls and
> the forwarding path has a lot of latency.  Have you tried benchmarking
> the board you used for testing?

It's way too long ago for me to remember - these patches were created
almost a year ago - October 20th 2015, which is when I'd have tested
them.  So, I'm afraid I can't help very much at this point, apart from
trying to re-run some benchmarks.

I'd suggest testing the openssl (with AF_ALG support), which is probably
what I tested and benchmarked.  However, as I say, it's far too long
ago for me to really remember at this point.

> I have ran some on our other platforms. The after benchmark fails to
> run at the top level of the before results.

Sorry, that last sentence doesn't make any sense to me.

I don't have the bandwidth to look at this, and IPsec doesn't interest
me one bit - I've never been able to work out how to setup IPsec
locally.  From what I remember when I looked into it many years ago,
you had to have significant information about ipsec to get it up and
running.  Maybe things have changed since then, I don't know.

If you want me to reproduce it, please send me a step-by-step idiots
guide on setting up a working test scenario which reproduces your
problem.

Thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


AF_ALG zero-size hash fails

2016-08-09 Thread Russell King - ARM Linux
Hi,

While testing AF_ALG with openssl af-alg-rr, I've found that:

OPENSSL_CONF=/shared/crypto/openssl-imx.cnf openssl dgst -sha1 http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] crypto: caam - avoid kernel warnings on probe failure

2016-08-09 Thread Russell King
 rc_cec 
snd_soc_fsl_spdif imx_pcm_dma videobuf2_vmalloc videobuf2_memops imx_thermal 
dw_hdmi_ahb_audio dw_hdmi_cec etnaviv fuse rc_pinnacle_pctv_hd [last unloaded: 
caam]
CPU: 0 PID: 2346 Comm: modprobe Tainted: GW   4.8.0-rc1+ #2014
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[] (dump_backtrace) from [] (show_stack+0x18/0x1c)
[] (show_stack) from [] (dump_stack+0xa4/0xdc)
[] (dump_stack) from [] (__warn+0xdc/0x108)
[] (__warn) from [] (warn_slowpath_null+0x28/0x30)
[] (warn_slowpath_null) from [] 
(clk_core_unprepare+0x204/0x388)
[] (clk_core_unprepare) from [] (clk_unprepare+0x2c/0x34)
[] (clk_unprepare) from [] (caam_probe+0x404/0x1498 [caam])
[] (caam_probe [caam]) from [] 
(platform_drv_probe+0x58/0xb8)
[] (platform_drv_probe) from [] 
(driver_probe_device+0x1fc/0x2b8)
[] (driver_probe_device) from [] 
(__driver_attach+0xbc/0xc0) r10: r8:bf24b000 r7: r6:ef215844 
r5:bf2490c4 r4:ef215810
[] (__driver_attach) from [] (bus_for_each_dev+0x5c/0x90)
[] (bus_for_each_dev) from [] (driver_attach+0x24/0x28)
[] (driver_attach) from [] (bus_add_driver+0xf4/0x200)
[] (bus_add_driver) from [] (driver_register+0x80/0xfc)
[] (driver_register) from [] 
(__platform_driver_register+0x48/0x4c)
[] (__platform_driver_register) from [] 
(caam_driver_init+0x18/0x24 [caam])
[] (caam_driver_init [caam]) from [] 
(do_one_initcall+0x44/0x178)
[] (do_one_initcall) from [] (do_init_module+0x68/0x1d8)
[] (do_init_module) from [] (load_module+0x1974/0x20b0)
[] (load_module) from [] (SyS_finit_module+0x94/0xa0)
[] (SyS_finit_module) from [] (ret_fast_syscall+0x0/0x1c)
---[ end trace 34e3370d88bb1788 ]---

Signed-off-by: Russell King <rmk+ker...@armlinux.org.uk>
---
 drivers/crypto/caam/ctrl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c
index 0ec112ee5204..f4c044f5bcb2 100644
--- a/drivers/crypto/caam/ctrl.c
+++ b/drivers/crypto/caam/ctrl.c
@@ -826,6 +826,8 @@ static int caam_probe(struct platform_device *pdev)
 
 caam_remove:
caam_remove(pdev);
+   return ret;
+
 iounmap_ctrl:
iounmap(ctrl);
 disable_caam_emi_slow:
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AF_ALG broken?

2016-08-09 Thread Russell King - ARM Linux
On Tue, Aug 09, 2016 at 03:14:02PM +0800, Herbert Xu wrote:
> On Tue, Aug 09, 2016 at 08:08:59AM +0100, Russell King - ARM Linux wrote:
> > 
> > I thought I gave the commands and link to your example code.  The
> > openssl case is md5, though sha* also gives the same result.  Your
> > example code was sha1 iirc.  I guess none of these would be using
> > HMAC - the openssl cases used to give results compatible with the
> > md5sum/ sha1sum etc userspace commands.
> > 
> > /proc/crypto:
> > 
> > name : md5
> > driver   : md5-caam
> 
> Right, caam is providing a setkey function for md5, which leads the
> API to think that a key is required.  We should fix it so that setkey
> is only set for the HMAC-variant.

Thanks, that works nicely again, and passes my tests.

8<
From: Russell King <rmk+ker...@armlinux.org.uk>
Subject: [PATCH] crypto: caam - fix non-hmac hashes

Since 6de62f15b581 ("crypto: algif_hash - Require setkey before
accept(2)"), the AF_ALG interface requires userspace to provide a key
to any algorithm that has a setkey method.  However, the non-HMAC
algorithms are not keyed, so setting a key is unnecessary.

Fix this by removing the setkey method from the non-keyed hash
algorithms.

Fixes: 6de62f15b581 ("crypto: algif_hash - Require setkey before accept(2)")
Cc: <sta...@vger.kernel.org>
Signed-off-by: Russell King <rmk+ker...@armlinux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index ea284e3909ef..9d7fc9ec0b7e 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -1950,6 +1950,7 @@ caam_hash_alloc(struct caam_hash_template *template,
 template->name);
snprintf(alg->cra_driver_name, CRYPTO_MAX_ALG_NAME, "%s",
 template->driver_name);
+   t_alg->ahash_alg.setkey = NULL;
}
alg->cra_module = THIS_MODULE;
alg->cra_init = caam_hash_cra_init;
-- 
2.1.0

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AF_ALG broken?

2016-08-08 Thread Russell King - ARM Linux
On Mon, Aug 08, 2016 at 01:47:33PM -0400, Jeffrey Walton wrote:
> > When trying to use the openssl AF_ALG module with 4.8-rc1 with imx
> > caam, I get this:
> >
> > $ OPENSSL_CONF=/shared/crypto/openssl-imx.cnf strace openssl dgst -md5 
> >  > ...
> > socket(PF_ALG, SOCK_SEQPACKET, 0)   = 3
> > close(3)= 0
> > socket(PF_ALG, SOCK_SEQPACKET, 0)   = 3
> > bind(3, {sa_family=AF_ALG, sa_data="hash\0\0\0\0\0\0\0\0\0\0"}, 88) = 0
> > accept(3, 0, NULL)  = 4
> > fstat64(0, {st_mode=S_IFREG|0755, st_size=666864, ...}) = 0
> > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
> > 0xb6fab000
> > read(0, 
> > "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\2\0(\0\1\0\0\0\21'\2\0004\0\0\0"..., 8192) 
> > = 8192
> > send(4, 
> > "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\2\0(\0\1\0\0\0\21'\2\0004\0\0\0"..., 8192, 
> > MSG_MORE) = -1 ENOKEY (Required key not available)
> 
> As far as I know from testing on x86, it has never worked as expected.
> I believe you have to use 'sendto' and 'recvfrom' because 'send' and
> 'recv' use default structures, and they configure the object
> incorrectly.

This used to work, because that's how I've tested my previous CAAM
and Marvell CESA patches.  So, this is not a case of "never worked"
but is definitely a regression caused by some kernel change.

There's also people's presentations that illustrate example code:

https://events.linuxfoundation.org/sites/events/files/slides/lcj-2014-crypto-user.pdf

which can also be found at:

https://lwn.net/Articles/410833/

and that example code comes from Herbert, so one would assume that
it's tested and was working.  Note that it doesn't use any send*,
but uses write().

Testing that code on 4.8-rc (and 4.7 fwiw) gives:

socket(PF_ALG, SOCK_SEQPACKET, 0)   = 3
bind(3, {sa_family=AF_ALG, sa_data="hash\0\0\0\0\0\0\0\0\0\0"}, 88) = 0
accept(3, 0, NULL)  = 4
write(4, "abc", 3)  = -1 ENOKEY (Required key not available)
read(4, 0xbec50508, 20) = -1 ENOKEY (Required key not available)

IOW, the same problem - and it seems not to be a recent regression.

Since the last time I tested CESA or CAAM was back in 4.4 times,
it's got to be something between 4.4 and 4.7.

Looking at the history, my guess would be the setkey changes -
crypto: algif_skcipher - Require setkey before accept(2)
crypto: af_alg - Disallow bind/setkey/... after accept(2)
crypto: af_alg - Add nokey compatibility path
crypto: hash - Add crypto_ahash_has_setkey
crypto: algif_hash - Require setkey before accept(2)

-- 
Rmk's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/11] crypto: caam: move job descriptor initialisation to ahash_edesc_alloc()

2016-08-08 Thread Russell King
Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 84 +-
 1 file changed, 34 insertions(+), 50 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 2c2c15b63059..9c3e74e4088e 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -770,7 +770,9 @@ static void ahash_done_ctx_dst(struct device *jrdev, u32 
*desc, u32 err,
  * and space for hardware scatter table containing sg_num entries.
  */
 static struct ahash_edesc *ahash_edesc_alloc(struct caam_hash_ctx *ctx,
-int sg_num, gfp_t flags)
+int sg_num, u32 *sh_desc,
+dma_addr_t sh_desc_dma,
+gfp_t flags)
 {
struct ahash_edesc *edesc;
unsigned int sg_size = sg_num * sizeof(struct sec4_sg_entry);
@@ -781,6 +783,9 @@ static struct ahash_edesc *ahash_edesc_alloc(struct 
caam_hash_ctx *ctx,
return NULL;
}
 
+   init_job_desc_shared(edesc->hw_desc, sh_desc_dma, desc_len(sh_desc),
+HDR_SHARE_DEFER | HDR_REVERSE);
+
return edesc;
 }
 
@@ -799,12 +804,10 @@ static int ahash_update_ctx(struct ahash_request *req)
int *next_buflen = state->current_buf ? >buflen_0 :
   >buflen_1, last_buflen;
int in_len = *buflen + req->nbytes, to_hash;
-   u32 *sh_desc = ctx->sh_desc_update, *desc;
-   dma_addr_t ptr = ctx->sh_desc_update_dma;
+   u32 *desc;
int src_nents, mapped_nents, sec4_sg_bytes, sec4_sg_src_index;
struct ahash_edesc *edesc;
int ret = 0;
-   int sh_len;
 
last_buflen = *next_buflen;
*next_buflen = in_len & (crypto_tfm_alg_blocksize(>base) - 1);
@@ -838,7 +841,8 @@ static int ahash_update_ctx(struct ahash_request *req)
 * link tables
 */
edesc = ahash_edesc_alloc(ctx, sec4_sg_src_index + mapped_nents,
- flags);
+ ctx->sh_desc_update,
+ ctx->sh_desc_update_dma, flags);
if (!edesc) {
dma_unmap_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
return -ENOMEM;
@@ -872,10 +876,7 @@ static int ahash_update_ctx(struct ahash_request *req)
 
state->current_buf = !state->current_buf;
 
-   sh_len = desc_len(sh_desc);
desc = edesc->hw_desc;
-   init_job_desc_shared(desc, ptr, sh_len, HDR_SHARE_DEFER |
-HDR_REVERSE);
 
edesc->sec4_sg_dma = dma_map_single(jrdev, edesc->sec4_sg,
 sec4_sg_bytes,
@@ -936,25 +937,23 @@ static int ahash_final_ctx(struct ahash_request *req)
int buflen = state->current_buf ? state->buflen_1 : state->buflen_0;
int last_buflen = state->current_buf ? state->buflen_0 :
  state->buflen_1;
-   u32 *sh_desc = ctx->sh_desc_fin, *desc;
-   dma_addr_t ptr = ctx->sh_desc_fin_dma;
+   u32 *desc;
int sec4_sg_bytes, sec4_sg_src_index;
int digestsize = crypto_ahash_digestsize(ahash);
struct ahash_edesc *edesc;
int ret = 0;
-   int sh_len;
 
sec4_sg_src_index = 1 + (buflen ? 1 : 0);
sec4_sg_bytes = sec4_sg_src_index * sizeof(struct sec4_sg_entry);
 
/* allocate space for base edesc and hw desc commands, link tables */
-   edesc = ahash_edesc_alloc(ctx, sec4_sg_src_index, flags);
+   edesc = ahash_edesc_alloc(ctx, sec4_sg_src_index,
+ ctx->sh_desc_fin, ctx->sh_desc_fin_dma,
+ flags);
if (!edesc)
return -ENOMEM;
 
-   sh_len = desc_len(sh_desc);
desc = edesc->hw_desc;
-   init_job_desc_shared(desc, ptr, sh_len, HDR_SHARE_DEFER | HDR_REVERSE);
 
edesc->sec4_sg_bytes = sec4_sg_bytes;
edesc->src_nents = 0;
@@ -1018,14 +1017,12 @@ static int ahash_finup_ctx(struct ahash_request *req)
int buflen = state->current_buf ? state->buflen_1 : state->buflen_0;
int last_buflen = state->current_buf ? state->buflen_0 :
  state->buflen_1;
-   u32 *sh_desc = ctx->sh_desc_finup, *desc;
-   dma_addr_t ptr = ctx->sh_desc_finup_dma;
+   u32 *desc;
int sec4_sg_bytes, sec4_sg_src_index;
int src_nents, mapped_nents;
int digestsize = crypto_ahash_digestsize(ahash);
struct ahash_edesc *edesc;
int ret = 0;
-   int sh_len;
 
src_nents = sg_nents_for_len(req

[PATCH 10/11] crypto: caam: add ahash_edesc_add_src()

2016-08-08 Thread Russell King
Add a helper to map the source scatterlist into the descriptor.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 137 +
 1 file changed, 57 insertions(+), 80 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 9c3e74e4088e..ea284e3909ef 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -789,6 +789,41 @@ static struct ahash_edesc *ahash_edesc_alloc(struct 
caam_hash_ctx *ctx,
return edesc;
 }
 
+static int ahash_edesc_add_src(struct caam_hash_ctx *ctx,
+  struct ahash_edesc *edesc,
+  struct ahash_request *req, int nents,
+  unsigned int first_sg,
+  unsigned int first_bytes, size_t to_hash)
+{
+   dma_addr_t src_dma;
+   u32 options;
+
+   if (nents > 1 || first_sg) {
+   struct sec4_sg_entry *sg = edesc->sec4_sg;
+   unsigned int sgsize = sizeof(*sg) * (first_sg + nents);
+
+   sg_to_sec4_sg_last(req->src, nents, sg + first_sg, 0);
+
+   src_dma = dma_map_single(ctx->jrdev, sg, sgsize, DMA_TO_DEVICE);
+   if (dma_mapping_error(ctx->jrdev, src_dma)) {
+   dev_err(ctx->jrdev, "unable to map S/G table\n");
+   return -ENOMEM;
+   }
+
+   edesc->sec4_sg_bytes = sgsize;
+   edesc->sec4_sg_dma = src_dma;
+   options = LDST_SGF;
+   } else {
+   src_dma = sg_dma_address(req->src);
+   options = 0;
+   }
+
+   append_seq_in_ptr(edesc->hw_desc, src_dma, first_bytes + to_hash,
+ options);
+
+   return 0;
+}
+
 /* submit update job descriptor */
 static int ahash_update_ctx(struct ahash_request *req)
 {
@@ -1018,7 +1053,7 @@ static int ahash_finup_ctx(struct ahash_request *req)
int last_buflen = state->current_buf ? state->buflen_0 :
  state->buflen_1;
u32 *desc;
-   int sec4_sg_bytes, sec4_sg_src_index;
+   int sec4_sg_src_index;
int src_nents, mapped_nents;
int digestsize = crypto_ahash_digestsize(ahash);
struct ahash_edesc *edesc;
@@ -1042,8 +1077,6 @@ static int ahash_finup_ctx(struct ahash_request *req)
}
 
sec4_sg_src_index = 1 + (buflen ? 1 : 0);
-   sec4_sg_bytes = (sec4_sg_src_index + mapped_nents) *
-sizeof(struct sec4_sg_entry);
 
/* allocate space for base edesc and hw desc commands, link tables */
edesc = ahash_edesc_alloc(ctx, sec4_sg_src_index + mapped_nents,
@@ -1057,7 +1090,6 @@ static int ahash_finup_ctx(struct ahash_request *req)
desc = edesc->hw_desc;
 
edesc->src_nents = src_nents;
-   edesc->sec4_sg_bytes = sec4_sg_bytes;
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
 edesc->sec4_sg, DMA_TO_DEVICE);
@@ -1068,19 +1100,11 @@ static int ahash_finup_ctx(struct ahash_request *req)
buf, state->buf_dma, buflen,
last_buflen);
 
-   sg_to_sec4_sg_last(req->src, mapped_nents,
-  edesc->sec4_sg + sec4_sg_src_index, 0);
-
-   edesc->sec4_sg_dma = dma_map_single(jrdev, edesc->sec4_sg,
-   sec4_sg_bytes, DMA_TO_DEVICE);
-   if (dma_mapping_error(jrdev, edesc->sec4_sg_dma)) {
-   dev_err(jrdev, "unable to map S/G table\n");
-   ret = -ENOMEM;
+   ret = ahash_edesc_add_src(ctx, edesc, req, mapped_nents,
+ sec4_sg_src_index, ctx->ctx_len + buflen,
+ req->nbytes);
+   if (ret)
goto err;
-   }
-
-   append_seq_in_ptr(desc, edesc->sec4_sg_dma, ctx->ctx_len +
-  buflen + req->nbytes, LDST_SGF);
 
edesc->dst_dma = map_seq_out_ptr_result(desc, jrdev, req->result,
digestsize);
@@ -1116,11 +1140,9 @@ static int ahash_digest(struct ahash_request *req)
   CRYPTO_TFM_REQ_MAY_SLEEP)) ? GFP_KERNEL : GFP_ATOMIC;
u32 *desc;
int digestsize = crypto_ahash_digestsize(ahash);
-   int src_nents, mapped_nents, sec4_sg_bytes;
-   dma_addr_t src_dma;
+   int src_nents, mapped_nents;
struct ahash_edesc *edesc;
int ret = 0;
-   u32 options;
 
src_nents = sg_nents_for_len(req->src, req->nbytes);
if (src_nents < 0) {
@@ -1139,11 +1161,6 @@ static int ahash_digest(struct ahash_request *req)
mapped_nents = 0;
}
 
-   if (mapped_ne

[PATCH 11/11] crypto: caam: get rid of tasklet

2016-08-08 Thread Russell King
Threaded interrupts can perform the function of the tasklet, and much
more safely too - without races when trying to take the tasklet and
interrupt down on device removal.

With the old code, there is a window where we call tasklet_kill().  If
the interrupt handler happens to be running on a different CPU, and
subsequently calls tasklet_schedule(), the tasklet will be re-scheduled
for execution.

Switching to a hardirq/threadirq combination implementation avoids this,
and it also means generic code deals with the teardown sequencing of the
threaded and non-threaded parts.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/intern.h |  1 -
 drivers/crypto/caam/jr.c | 25 +
 2 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/drivers/crypto/caam/intern.h b/drivers/crypto/caam/intern.h
index e2bcacc1a921..5d4c05074a5c 100644
--- a/drivers/crypto/caam/intern.h
+++ b/drivers/crypto/caam/intern.h
@@ -41,7 +41,6 @@ struct caam_drv_private_jr {
struct device   *dev;
int ridx;
struct caam_job_ring __iomem *rregs;/* JobR's register space */
-   struct tasklet_struct irqtask;
int irq;/* One per queue */
 
/* Number of scatterlist crypt transforms active on the JobR */
diff --git a/drivers/crypto/caam/jr.c b/drivers/crypto/caam/jr.c
index a81f551ac222..320228875e9a 100644
--- a/drivers/crypto/caam/jr.c
+++ b/drivers/crypto/caam/jr.c
@@ -73,8 +73,6 @@ static int caam_jr_shutdown(struct device *dev)
 
ret = caam_reset_hw_jr(dev);
 
-   tasklet_kill(>irqtask);
-
/* Release interrupt */
free_irq(jrp->irq, dev);
 
@@ -130,7 +128,7 @@ static irqreturn_t caam_jr_interrupt(int irq, void *st_dev)
 
/*
 * Check the output ring for ready responses, kick
-* tasklet if jobs done.
+* the threaded irq if jobs done.
 */
irqstate = rd_reg32(>rregs->jrintstatus);
if (!irqstate)
@@ -152,18 +150,13 @@ static irqreturn_t caam_jr_interrupt(int irq, void 
*st_dev)
/* Have valid interrupt at this point, just ACK and trigger */
wr_reg32(>rregs->jrintstatus, irqstate);
 
-   preempt_disable();
-   tasklet_schedule(>irqtask);
-   preempt_enable();
-
-   return IRQ_HANDLED;
+   return IRQ_WAKE_THREAD;
 }
 
-/* Deferred service handler, run as interrupt-fired tasklet */
-static void caam_jr_dequeue(unsigned long devarg)
+static irqreturn_t caam_jr_threadirq(int irq, void *st_dev)
 {
int hw_idx, sw_idx, i, head, tail;
-   struct device *dev = (struct device *)devarg;
+   struct device *dev = st_dev;
struct caam_drv_private_jr *jrp = dev_get_drvdata(dev);
void (*usercall)(struct device *dev, u32 *desc, u32 status, void *arg);
u32 *userdesc, userstatus;
@@ -237,6 +230,8 @@ static void caam_jr_dequeue(unsigned long devarg)
 
/* reenable / unmask IRQs */
clrsetbits_32(>rregs->rconfig_lo, JRCFG_IMSK, 0);
+
+   return IRQ_HANDLED;
 }
 
 /**
@@ -394,11 +389,10 @@ static int caam_jr_init(struct device *dev)
 
jrp = dev_get_drvdata(dev);
 
-   tasklet_init(>irqtask, caam_jr_dequeue, (unsigned long)dev);
-
/* Connect job ring interrupt handler. */
-   error = request_irq(jrp->irq, caam_jr_interrupt, IRQF_SHARED,
-   dev_name(dev), dev);
+   error = request_threaded_irq(jrp->irq, caam_jr_interrupt,
+caam_jr_threadirq, IRQF_SHARED,
+dev_name(dev), dev);
if (error) {
dev_err(dev, "can't connect JobR %d interrupt (%d)\n",
jrp->ridx, jrp->irq);
@@ -460,7 +454,6 @@ static int caam_jr_init(struct device *dev)
 out_free_irq:
free_irq(jrp->irq, dev);
 out_kill_deq:
-   tasklet_kill(>irqtask);
return error;
 }
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/11] crypto: caam: check and use dma_map_sg() return code

2016-08-08 Thread Russell King
Strictly, dma_map_sg() may coalesce SG entries, but in practise on iMX
hardware, this will never happen.  However, dma_map_sg() can fail, and
we completely fail to check its return value.  So, fix this properly.

Arrange the code to map the scatterlist early, so we know how many
scatter table entries to allocate, and then fill them in.  This allows
us to keep relatively simple error cleanup paths.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 138 ++---
 1 file changed, 103 insertions(+), 35 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index e1925bf3a7cc..a639183d0115 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -187,15 +187,6 @@ static inline dma_addr_t buf_map_to_sec4_sg(struct device 
*jrdev,
return buf_dma;
 }
 
-/* Map req->src and put it in link table */
-static inline void src_map_to_sec4_sg(struct device *jrdev,
- struct scatterlist *src, int src_nents,
- struct sec4_sg_entry *sec4_sg)
-{
-   dma_map_sg(jrdev, src, src_nents, DMA_TO_DEVICE);
-   sg_to_sec4_sg_last(src, src_nents, sec4_sg, 0);
-}
-
 /*
  * Only put buffer in link table if it contains data, which is possible,
  * since a buffer has previously been used, and needs to be unmapped,
@@ -791,7 +782,7 @@ static int ahash_update_ctx(struct ahash_request *req)
int in_len = *buflen + req->nbytes, to_hash;
u32 *sh_desc = ctx->sh_desc_update, *desc;
dma_addr_t ptr = ctx->sh_desc_update_dma;
-   int src_nents, sec4_sg_bytes, sec4_sg_src_index;
+   int src_nents, mapped_nents, sec4_sg_bytes, sec4_sg_src_index;
struct ahash_edesc *edesc;
int ret = 0;
int sh_len;
@@ -807,8 +798,20 @@ static int ahash_update_ctx(struct ahash_request *req)
dev_err(jrdev, "Invalid number of src SG.\n");
return src_nents;
}
+
+   if (src_nents) {
+   mapped_nents = dma_map_sg(jrdev, req->src, src_nents,
+ DMA_TO_DEVICE);
+   if (!mapped_nents) {
+   dev_err(jrdev, "unable to DMA map source\n");
+   return -ENOMEM;
+   }
+   } else {
+   mapped_nents = 0;
+   }
+
sec4_sg_src_index = 1 + (*buflen ? 1 : 0);
-   sec4_sg_bytes = (sec4_sg_src_index + src_nents) *
+   sec4_sg_bytes = (sec4_sg_src_index + mapped_nents) *
 sizeof(struct sec4_sg_entry);
 
/*
@@ -820,6 +823,7 @@ static int ahash_update_ctx(struct ahash_request *req)
if (!edesc) {
dev_err(jrdev,
"could not allocate extended descriptor\n");
+   dma_unmap_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
return -ENOMEM;
}
 
@@ -836,9 +840,10 @@ static int ahash_update_ctx(struct ahash_request *req)
buf, state->buf_dma,
*buflen, last_buflen);
 
-   if (src_nents) {
-   src_map_to_sec4_sg(jrdev, req->src, src_nents,
-  edesc->sec4_sg + sec4_sg_src_index);
+   if (mapped_nents) {
+   sg_to_sec4_sg_last(req->src, mapped_nents,
+  edesc->sec4_sg + sec4_sg_src_index,
+  0);
if (*next_buflen)
scatterwalk_map_and_copy(next_buf, req->src,
 to_hash - *buflen,
@@ -1001,7 +1006,7 @@ static int ahash_finup_ctx(struct ahash_request *req)
u32 *sh_desc = ctx->sh_desc_finup, *desc;
dma_addr_t ptr = ctx->sh_desc_finup_dma;
int sec4_sg_bytes, sec4_sg_src_index;
-   int src_nents;
+   int src_nents, mapped_nents;
int digestsize = crypto_ahash_digestsize(ahash);
struct ahash_edesc *edesc;
int ret = 0;
@@ -1012,14 +1017,27 @@ static int ahash_finup_ctx(struct ahash_request *req)
dev_err(jrdev, "Invalid number of src SG.\n");
return src_nents;
}
+
+   if (src_nents) {
+   mapped_nents = dma_map_sg(jrdev, req->src, src_nents,
+ DMA_TO_DEVICE);
+   if (!mapped_nents) {
+   dev_err(jrdev, "unable to DMA map source\n");
+ 

[PATCH 03/11] crypto: caam: incorporate job descriptor into struct ahash_edesc

2016-08-08 Thread Russell King
Rather than giving the descriptor as hw_desc[0], give it's real size.
All places where we allocate an ahash_edesc incorporate DESC_JOB_IO_LEN
bytes of job descriptor.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 49 --
 1 file changed, 19 insertions(+), 30 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 47ea7b428156..ce9c1bc23795 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -604,7 +604,7 @@ struct ahash_edesc {
int src_nents;
int sec4_sg_bytes;
struct sec4_sg_entry *sec4_sg;
-   u32 hw_desc[0];
+   u32 hw_desc[DESC_JOB_IO_LEN / sizeof(u32)];
 };
 
 static inline void ahash_unmap(struct device *dev,
@@ -815,8 +815,8 @@ static int ahash_update_ctx(struct ahash_request *req)
 * allocate space for base edesc and hw desc commands,
 * link tables
 */
-   edesc = kzalloc(sizeof(*edesc) + DESC_JOB_IO_LEN +
-   sec4_sg_bytes, GFP_DMA | flags);
+   edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes,
+   GFP_DMA | flags);
if (!edesc) {
dev_err(jrdev,
"could not allocate extended descriptor\n");
@@ -825,8 +825,7 @@ static int ahash_update_ctx(struct ahash_request *req)
 
edesc->src_nents = src_nents;
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)edesc + sizeof(struct ahash_edesc) +
-DESC_JOB_IO_LEN;
+   edesc->sec4_sg = (void *)(edesc + 1);
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
 edesc->sec4_sg, DMA_BIDIRECTIONAL);
@@ -925,8 +924,7 @@ static int ahash_final_ctx(struct ahash_request *req)
sec4_sg_bytes = sec4_sg_src_index * sizeof(struct sec4_sg_entry);
 
/* allocate space for base edesc and hw desc commands, link tables */
-   edesc = kzalloc(sizeof(*edesc) + DESC_JOB_IO_LEN + sec4_sg_bytes,
-   GFP_DMA | flags);
+   edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes, GFP_DMA | flags);
if (!edesc) {
dev_err(jrdev, "could not allocate extended descriptor\n");
return -ENOMEM;
@@ -937,8 +935,7 @@ static int ahash_final_ctx(struct ahash_request *req)
init_job_desc_shared(desc, ptr, sh_len, HDR_SHARE_DEFER | HDR_REVERSE);
 
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)edesc + sizeof(struct ahash_edesc) +
-DESC_JOB_IO_LEN;
+   edesc->sec4_sg = (void *)(edesc + 1);
edesc->src_nents = 0;
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
@@ -1016,8 +1013,7 @@ static int ahash_finup_ctx(struct ahash_request *req)
 sizeof(struct sec4_sg_entry);
 
/* allocate space for base edesc and hw desc commands, link tables */
-   edesc = kzalloc(sizeof(*edesc) + DESC_JOB_IO_LEN + sec4_sg_bytes,
-   GFP_DMA | flags);
+   edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes, GFP_DMA | flags);
if (!edesc) {
dev_err(jrdev, "could not allocate extended descriptor\n");
return -ENOMEM;
@@ -1029,8 +1025,7 @@ static int ahash_finup_ctx(struct ahash_request *req)
 
edesc->src_nents = src_nents;
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)edesc + sizeof(struct ahash_edesc) +
-DESC_JOB_IO_LEN;
+   edesc->sec4_sg = (void *)(edesc + 1);
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
 edesc->sec4_sg, DMA_TO_DEVICE);
@@ -1106,14 +1101,12 @@ static int ahash_digest(struct ahash_request *req)
sec4_sg_bytes = 0;
 
/* allocate space for base edesc and hw desc commands, link tables */
-   edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes + DESC_JOB_IO_LEN,
-   GFP_DMA | flags);
+   edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes, GFP_DMA | flags);
if (!edesc) {
dev_err(jrdev, "could not allocate extended descriptor\n");
return -ENOMEM;
}
-   edesc->sec4_sg = (void *)edesc + sizeof(struct ahash_edesc) +
- DESC_JOB_IO_LEN;
+   edesc->sec4_sg = (void *)(edesc + 1);
edesc->sec4_sg_bytes = sec4_sg_bytes;
edesc->src_nents = src_nents;
 
@@ -1179,7 +1172,7 @@ static int ahash_final_no_ctx(struct ahash_request *req)
int sh_len;
 
/* allocate space for base edesc and hw desc commands, link tables */
-   edesc = kzallo

[PATCH 05/11] crypto: caam: replace sec4_sg pointer with array

2016-08-08 Thread Russell King
Since the extended descriptor includes the hardware descriptor, and the
sec4 scatterlist immediately follows this, we can declare it as a array
at the very end of the extended descriptor.  This allows us to get rid
of an initialiser for every site where we allocate an extended
descriptor.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index e9c52cbf9a41..d2129be43bf1 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -595,16 +595,16 @@ static int ahash_setkey(struct crypto_ahash *ahash,
  * @sec4_sg_dma: physical mapped address of h/w link table
  * @src_nents: number of segments in input scatterlist
  * @sec4_sg_bytes: length of dma mapped sec4_sg space
- * @sec4_sg: pointer to h/w link table
  * @hw_desc: the h/w job descriptor followed by any referenced link tables
+ * @sec4_sg: h/w link table
  */
 struct ahash_edesc {
dma_addr_t dst_dma;
dma_addr_t sec4_sg_dma;
int src_nents;
int sec4_sg_bytes;
-   struct sec4_sg_entry *sec4_sg;
u32 hw_desc[DESC_JOB_IO_LEN / sizeof(u32)] cacheline_aligned;
+   struct sec4_sg_entry sec4_sg[0];
 };
 
 static inline void ahash_unmap(struct device *dev,
@@ -825,7 +825,6 @@ static int ahash_update_ctx(struct ahash_request *req)
 
edesc->src_nents = src_nents;
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)(edesc + 1);
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
 edesc->sec4_sg, DMA_BIDIRECTIONAL);
@@ -935,7 +934,6 @@ static int ahash_final_ctx(struct ahash_request *req)
init_job_desc_shared(desc, ptr, sh_len, HDR_SHARE_DEFER | HDR_REVERSE);
 
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)(edesc + 1);
edesc->src_nents = 0;
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
@@ -1025,7 +1023,6 @@ static int ahash_finup_ctx(struct ahash_request *req)
 
edesc->src_nents = src_nents;
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)(edesc + 1);
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
 edesc->sec4_sg, DMA_TO_DEVICE);
@@ -1106,7 +1103,7 @@ static int ahash_digest(struct ahash_request *req)
dev_err(jrdev, "could not allocate extended descriptor\n");
return -ENOMEM;
}
-   edesc->sec4_sg = (void *)(edesc + 1);
+
edesc->sec4_sg_bytes = sec4_sg_bytes;
edesc->src_nents = src_nents;
 
@@ -1264,7 +1261,6 @@ static int ahash_update_no_ctx(struct ahash_request *req)
 
edesc->src_nents = src_nents;
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)(edesc + 1);
edesc->dst_dma = 0;
 
state->buf_dma = buf_map_to_sec4_sg(jrdev, edesc->sec4_sg,
@@ -1375,7 +1371,6 @@ static int ahash_finup_no_ctx(struct ahash_request *req)
 
edesc->src_nents = src_nents;
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)(edesc + 1);
 
state->buf_dma = try_buf_map_to_sec4_sg(jrdev, edesc->sec4_sg, buf,
state->buf_dma, buflen,
@@ -1470,7 +1465,6 @@ static int ahash_update_first(struct ahash_request *req)
 
edesc->src_nents = src_nents;
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)(edesc + 1);
edesc->dst_dma = 0;
 
if (src_nents > 1) {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/11] crypto: caam: mark the hardware descriptor as cache line aligned

2016-08-08 Thread Russell King
Mark the hardware descriptor as being cache line aligned; on DMA
incoherent architectures, the hardware descriptor should sit in a
separate cache line from the CPU accessed data to avoid polluting
the caches.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index ce9c1bc23795..e9c52cbf9a41 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -604,7 +604,7 @@ struct ahash_edesc {
int src_nents;
int sec4_sg_bytes;
struct sec4_sg_entry *sec4_sg;
-   u32 hw_desc[DESC_JOB_IO_LEN / sizeof(u32)];
+   u32 hw_desc[DESC_JOB_IO_LEN / sizeof(u32)] cacheline_aligned;
 };
 
 static inline void ahash_unmap(struct device *dev,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/11] crypto: caam: ensure descriptor buffers are cacheline aligned

2016-08-08 Thread Russell King
Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 85c8b048bdc1..47ea7b428156 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -99,17 +99,17 @@ static struct list_head hash_list;
 
 /* ahash per-session context */
 struct caam_hash_ctx {
-   struct device *jrdev;
-   u32 sh_desc_update[DESC_HASH_MAX_USED_LEN];
-   u32 sh_desc_update_first[DESC_HASH_MAX_USED_LEN];
-   u32 sh_desc_fin[DESC_HASH_MAX_USED_LEN];
-   u32 sh_desc_digest[DESC_HASH_MAX_USED_LEN];
-   u32 sh_desc_finup[DESC_HASH_MAX_USED_LEN];
-   dma_addr_t sh_desc_update_dma;
+   u32 sh_desc_update[DESC_HASH_MAX_USED_LEN] cacheline_aligned;
+   u32 sh_desc_update_first[DESC_HASH_MAX_USED_LEN] cacheline_aligned;
+   u32 sh_desc_fin[DESC_HASH_MAX_USED_LEN] cacheline_aligned;
+   u32 sh_desc_digest[DESC_HASH_MAX_USED_LEN] cacheline_aligned;
+   u32 sh_desc_finup[DESC_HASH_MAX_USED_LEN] cacheline_aligned;
+   dma_addr_t sh_desc_update_dma cacheline_aligned;
dma_addr_t sh_desc_update_first_dma;
dma_addr_t sh_desc_fin_dma;
dma_addr_t sh_desc_digest_dma;
dma_addr_t sh_desc_finup_dma;
+   struct device *jrdev;
u32 alg_type;
u32 alg_op;
u8 key[CAAM_MAX_HASH_KEY_SIZE];
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/11] crypto: caam: ensure that we clean up after an error

2016-08-08 Thread Russell King
Ensure that we clean up allocations and DMA mappings after encountering
an error rather than just giving up and leaking memory and resources.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 132 -
 1 file changed, 79 insertions(+), 53 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index d2129be43bf1..e1925bf3a7cc 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -829,7 +829,7 @@ static int ahash_update_ctx(struct ahash_request *req)
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
 edesc->sec4_sg, DMA_BIDIRECTIONAL);
if (ret)
-   return ret;
+   goto err;
 
state->buf_dma = try_buf_map_to_sec4_sg(jrdev,
edesc->sec4_sg + 1,
@@ -860,7 +860,8 @@ static int ahash_update_ctx(struct ahash_request *req)
 DMA_TO_DEVICE);
if (dma_mapping_error(jrdev, edesc->sec4_sg_dma)) {
dev_err(jrdev, "unable to map S/G table\n");
-   return -ENOMEM;
+   ret = -ENOMEM;
+   goto err;
}
 
append_seq_in_ptr(desc, edesc->sec4_sg_dma, ctx->ctx_len +
@@ -875,13 +876,10 @@ static int ahash_update_ctx(struct ahash_request *req)
 #endif
 
ret = caam_jr_enqueue(jrdev, desc, ahash_done_bi, req);
-   if (!ret) {
-   ret = -EINPROGRESS;
-   } else {
-   ahash_unmap_ctx(jrdev, edesc, req, ctx->ctx_len,
-  DMA_BIDIRECTIONAL);
-   kfree(edesc);
-   }
+   if (ret)
+   goto err;
+
+   ret = -EINPROGRESS;
} else if (*next_buflen) {
scatterwalk_map_and_copy(buf + *buflen, req->src, 0,
 req->nbytes, 0);
@@ -897,6 +895,11 @@ static int ahash_update_ctx(struct ahash_request *req)
 #endif
 
return ret;
+
+ err:
+   ahash_unmap_ctx(jrdev, edesc, req, ctx->ctx_len, DMA_BIDIRECTIONAL);
+   kfree(edesc);
+   return ret;
 }
 
 static int ahash_final_ctx(struct ahash_request *req)
@@ -939,7 +942,7 @@ static int ahash_final_ctx(struct ahash_request *req)
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
 edesc->sec4_sg, DMA_TO_DEVICE);
if (ret)
-   return ret;
+   goto err;
 
state->buf_dma = try_buf_map_to_sec4_sg(jrdev, edesc->sec4_sg + 1,
buf, state->buf_dma, buflen,
@@ -951,7 +954,8 @@ static int ahash_final_ctx(struct ahash_request *req)
sec4_sg_bytes, DMA_TO_DEVICE);
if (dma_mapping_error(jrdev, edesc->sec4_sg_dma)) {
dev_err(jrdev, "unable to map S/G table\n");
-   return -ENOMEM;
+   ret = -ENOMEM;
+   goto err;
}
 
append_seq_in_ptr(desc, edesc->sec4_sg_dma, ctx->ctx_len + buflen,
@@ -961,7 +965,8 @@ static int ahash_final_ctx(struct ahash_request *req)
digestsize);
if (dma_mapping_error(jrdev, edesc->dst_dma)) {
dev_err(jrdev, "unable to map dst\n");
-   return -ENOMEM;
+   ret = -ENOMEM;
+   goto err;
}
 
 #ifdef DEBUG
@@ -970,13 +975,14 @@ static int ahash_final_ctx(struct ahash_request *req)
 #endif
 
ret = caam_jr_enqueue(jrdev, desc, ahash_done_ctx_src, req);
-   if (!ret) {
-   ret = -EINPROGRESS;
-   } else {
-   ahash_unmap_ctx(jrdev, edesc, req, digestsize, DMA_FROM_DEVICE);
-   kfree(edesc);
-   }
+   if (ret)
+   goto err;
 
+   return -EINPROGRESS;
+
+err:
+   ahash_unmap_ctx(jrdev, edesc, req, digestsize, DMA_FROM_DEVICE);
+   kfree(edesc);
return ret;
 }
 
@@ -1027,7 +1033,7 @@ static int ahash_finup_ctx(struct ahash_request *req)
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
 edesc->sec4_sg, DMA_TO_DEVICE);
if (ret)
-   return ret;
+   goto err;
 
state->buf_dma = try_buf_map_to_sec4_sg(jrdev, edesc->sec4_sg + 1,
buf, state->buf_dma, buflen,
@@ -1040,7 +1046,8 @@ static int ahash_finup_ctx(struct ahash_request *req)
sec4_sg_bytes, DMA_TO_DEVICE);
if (dma_mapping_error(jr

[PATCH 01/11] crypto: caam: fix DMA API mapping leak

2016-08-08 Thread Russell King
caamhash contains this weird code:

src_nents = sg_count(req->src, req->nbytes);
dma_map_sg(jrdev, req->src, src_nents ? : 1, DMA_TO_DEVICE);
...
edesc->src_nents = src_nents;

sg_count() returns zero when sg_nents_for_len() returns zero or one.
This means we don't need to use a hardware scatterlist.  However,
setting src_nents to zero causes problems when we unmap:

if (edesc->src_nents)
dma_unmap_sg_chained(dev, req->src, edesc->src_nents,
 DMA_TO_DEVICE, edesc->chained);

as zero here means that we have no entries to unmap.  This causes us
to leak DMA mappings, where we map one scatterlist entry and then
fail to unmap it.

This can be fixed in two ways: either by writing the number of entries
that were requested of dma_map_sg(), or by reworking the "no SG
required" case.

We adopt the re-work solution here - we replace sg_count() with
sg_nents_for_len(), so src_nents now contains the real number of
scatterlist entries, and we then change the test for using the
hardware scatterlist to src_nents > 1 rather than just non-zero.

This change passes my sshd, openssl tests hashing /bin and tcrypt
tests.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index f1ecc8df8d41..85c8b048bdc1 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -1094,13 +1094,16 @@ static int ahash_digest(struct ahash_request *req)
u32 options;
int sh_len;
 
-   src_nents = sg_count(req->src, req->nbytes);
+   src_nents = sg_nents_for_len(req->src, req->nbytes);
if (src_nents < 0) {
dev_err(jrdev, "Invalid number of src SG.\n");
return src_nents;
}
-   dma_map_sg(jrdev, req->src, src_nents ? : 1, DMA_TO_DEVICE);
-   sec4_sg_bytes = src_nents * sizeof(struct sec4_sg_entry);
+   dma_map_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
+   if (src_nents > 1)
+   sec4_sg_bytes = src_nents * sizeof(struct sec4_sg_entry);
+   else
+   sec4_sg_bytes = 0;
 
/* allocate space for base edesc and hw desc commands, link tables */
edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes + DESC_JOB_IO_LEN,
@@ -1118,7 +1121,7 @@ static int ahash_digest(struct ahash_request *req)
desc = edesc->hw_desc;
init_job_desc_shared(desc, ptr, sh_len, HDR_SHARE_DEFER | HDR_REVERSE);
 
-   if (src_nents) {
+   if (src_nents > 1) {
sg_to_sec4_sg_last(req->src, src_nents, edesc->sec4_sg, 0);
edesc->sec4_sg_dma = dma_map_single(jrdev, edesc->sec4_sg,
sec4_sg_bytes, DMA_TO_DEVICE);
@@ -1246,7 +1249,7 @@ static int ahash_update_no_ctx(struct ahash_request *req)
 
if (to_hash) {
src_nents = sg_nents_for_len(req->src,
-req->nbytes - (*next_buflen));
+req->nbytes - *next_buflen);
if (src_nents < 0) {
dev_err(jrdev, "Invalid number of src SG.\n");
return src_nents;
@@ -1450,13 +1453,18 @@ static int ahash_update_first(struct ahash_request *req)
to_hash = req->nbytes - *next_buflen;
 
if (to_hash) {
-   src_nents = sg_count(req->src, req->nbytes - (*next_buflen));
+   src_nents = sg_nents_for_len(req->src,
+req->nbytes - *next_buflen);
if (src_nents < 0) {
dev_err(jrdev, "Invalid number of src SG.\n");
return src_nents;
}
-   dma_map_sg(jrdev, req->src, src_nents ? : 1, DMA_TO_DEVICE);
-   sec4_sg_bytes = src_nents * sizeof(struct sec4_sg_entry);
+   dma_map_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
+   if (src_nents > 1)
+   sec4_sg_bytes = src_nents *
+   sizeof(struct sec4_sg_entry);
+   else
+   sec4_sg_bytes = 0;
 
/*
 * allocate space for base edesc and hw desc commands,
@@ -1476,7 +1484,7 @@ static int ahash_update_first(struct ahash_request *req)
 DESC_JOB_IO_LEN;
edesc->dst_dma = 0;
 
-   if (src_nents) {
+   if (src_nents > 1) {
sg_to_sec4_sg_last(req->src, src_nents,
   edesc->sec4_sg, 0);
 

[PATCH 00/11] Further iMX CAAM updates

2016-08-08 Thread Russell King - ARM Linux
This is a re-post (with hopefully bugs fixed from December's review).
Untested, because AF_ALG appears to be broken in 4.8-rc1.  Maybe
someone can provide some hints how to test using tcrypt please?

Here are further imx-caam updates that I've had since before the
previous merge window.  Please review and (I guess) if Freescale
folk can provide acks etc that would be nice.  Thanks.

 drivers/crypto/caam/caamhash.c | 540 ++---
 drivers/crypto/caam/intern.h   |   1 -
 drivers/crypto/caam/jr.c   |  25 +-
 3 files changed, 305 insertions(+), 261 deletions(-)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] scatterlist: add sg_alloc_table_from_buf() helper

2016-03-31 Thread Russell King - ARM Linux
On Thu, Mar 31, 2016 at 04:45:57PM +0200, Boris Brezillon wrote:
> Hi Russell,
> 
> On Thu, 31 Mar 2016 15:14:13 +0100
> Russell King - ARM Linux <li...@arm.linux.org.uk> wrote:
> 
> > On Thu, Mar 31, 2016 at 02:29:42PM +0200, Boris Brezillon wrote:
> > > sg_alloc_table_from_buf() provides an easy solution to create an sg_table
> > > from a virtual address pointer. This function takes care of dealing with
> > > vmallocated buffers, buffer alignment, or DMA engine limitations (maximum
> > > DMA transfer size).
> > 
> > Please note that the DMA API does not take account of coherency of memory
> > regions other than non-high/lowmem - there are specific extensions to
> > deal with this.
> 
> Ok, you said 'non-high/lowmem', this means vmalloced and kmapped buffers
> already fall in this case, right?
> 
> Could you tell me more about those specific extensions?

I was slightly confused - the extensions I was thinking of are those
listed at the bottom of Documentation/cachetlb.txt, which have nothing
to do with DMA.

However, it's probably worth reading Documentation/DMA-API-HOWTO.txt
to read up on what kinds of memory are considered to be DMA-able in
the kernel.

> > What this means is that having an API that takes any virtual address
> > pointer, converts it to a scatterlist which is then DMA mapped, is
> > unsafe.
> 
> Which means some implementations already get this wrong (see
> spi_map_buf(), and I'm pretty sure it's not the only one).

Quite possible, but that is driver stuff, and driver stuff gets things
wrong all the time. :)

> > It'll be okay for PIPT and non-aliasing VIPT cache architectures, but
> > for other cache architectures this will hide this problem and make
> > review harder.
> > 
> 
> Ok, you lost me. I'll have to do my homework and try to understand what
> this means :).

P = physical address
V = virtual address
I = indexed
T = tag

The tag is held in each cache line.  When a location is looked up in the
cache, an index is used to locate a set of cache lines and the tag is
compared to check which cache line in the set is the correct one (or
whether the address even exists in the cache.)

How the index and tag are derived varies between cache architectures.

PIPT = indexed by physical address, tagged with physical address.  Never
aliases with itself in the presence of multiple virtual mappings.

VIPT = indexed by virtual address, tagged with physical address.  If the
bits from the virtual address do not overlap the MMU page size, it is
also alias free, otherwise aliases can exist, but can be eliminated by
"cache colouring" - ensuring that a physical address is always mapped
with the same overlapping bits.

VIVT = indexed by virtual address, tagged with virtual address.  The
worst kind of cache, since every different mapping of the same physical
address is guaranteed by design to alias with other mappings.

There is little cache colouring between different kernel mappings (eg,
between lowmem and vmalloc space.)

What this means is that, while the DMA API takes care of DMA aliases
in the lowmem mappings, an alias-prone VIPT cache will remain incoherent
with DMA if it is remapped into vmalloc space, and the mapping happens
to have a different cache colour.  In other words, this is a data
corruption issue.

Hence, taking a range of vmalloc() addresses, converting them into a
scatterlist, then using the DMA API on the scatterlist _only_ guarantees
that the lowmem (and kmap'd highmem mappings) are coherent with DMA.
There is no way for the DMA API to know that other mappings exist, and
obviously flushing every possible cache line just because a mapping might
exist multiplies the expense of the DMA API: not only in terms of time
spent running through all the possibilities, which doubles for every
aliasing bit of VIPT, but also TLB pressure since you'd have to create
a mapping for each alias and tear it back down.

VIVT is even worse, since there is no other virtual mapping which is
coherent, would need to be known, and each mapping would need to be
individually flushed.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] scatterlist: add sg_alloc_table_from_buf() helper

2016-03-31 Thread Russell King - ARM Linux
On Thu, Mar 31, 2016 at 02:29:42PM +0200, Boris Brezillon wrote:
> sg_alloc_table_from_buf() provides an easy solution to create an sg_table
> from a virtual address pointer. This function takes care of dealing with
> vmallocated buffers, buffer alignment, or DMA engine limitations (maximum
> DMA transfer size).

Please note that the DMA API does not take account of coherency of memory
regions other than non-high/lowmem - there are specific extensions to
deal with this.

What this means is that having an API that takes any virtual address
pointer, converts it to a scatterlist which is then DMA mapped, is
unsafe.

It'll be okay for PIPT and non-aliasing VIPT cache architectures, but
for other cache architectures this will hide this problem and make
review harder.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] crypto: marvell/cesa - replace dma_to_phys with dma_map_single

2016-03-19 Thread Russell King - ARM Linux
On Thu, Mar 17, 2016 at 07:17:24PM -0400, ok...@codeaurora.org wrote:
> What is the correct way? I don't want to write engine->sram_dma = sram

Well, what the driver _is_ wanting to do is to go from a CPU physical
address to a device DMA address.  phys_to_dma() looks like the correct
thing there to me, but I guess that's just an offset and doesn't take
account of any IOMMU that may be in the way.

If you have an IOMMU, then the whole phys_to_dma() thing is a total
failure as it only does a linear translation, and there are no
interfaces in the kernel to take account of an IOMMU in the way.  So,
it needs something designed for the job, implemented and discussed by
the normal methods of proposing a new cross-arch interface for drivers
to use.

What I'm certain of, though, is that the change proposed in this patch
will break current users of this driver: virt_to_page() on an address
returned by ioremap() is completely undefined, and will result in
either a kernel oops, or if not poking at memory which isn't a struct
page, ultimately resulting in something that isn't SRAM being pointed
to by "engine->sram_dma".

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] crypto: marvell/cesa - replace dma_to_phys with dma_map_single

2016-03-19 Thread Russell King - ARM Linux
On Thu, Mar 17, 2016 at 06:02:15PM -0400, Sinan Kaya wrote:
> Getting ready to remove dma_to_phys API. Drivers should not be
> using this API for DMA operations. Instead, they should go
> through the dma_map or dma_alloc APIs.
> 
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/crypto/marvell/cesa.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/crypto/marvell/cesa.c b/drivers/crypto/marvell/cesa.c
> index c0656e7..52ddfa4 100644
> --- a/drivers/crypto/marvell/cesa.c
> +++ b/drivers/crypto/marvell/cesa.c
> @@ -350,8 +350,8 @@ static int mv_cesa_get_sram(struct platform_device *pdev, 
> int idx)
>   if (IS_ERR(engine->sram))
>   return PTR_ERR(engine->sram);
>  
> - engine->sram_dma = phys_to_dma(cesa->dev,
> -(phys_addr_t)res->start);
> + engine->sram_dma = dma_map_single(cesa->dev, engine->sram,
> +   DMA_TO_DEVICE);

Documentation/DMA-API.txt

dma_addr_t
dma_map_single(struct device *dev, void *cpu_addr, size_t size,
  enum dma_data_direction direction)

Notes:  Not all memory regions in a machine can be mapped by this API.
Further, contiguous kernel virtual space may not be contiguous as
physical memory.  Since this API does not provide any scatter/gather
capability, it will fail if the user tries to map a non-physically
contiguous piece of memory.  For this reason, memory to be mapped by
this API should be obtained from sources which guarantee it to be
physically contiguous (like kmalloc).

Specifically, ioremapped memory will *not* work as you expect with
dma_map_single().

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 06/11] crypto: caam: ensure that we clean up after an error

2015-12-09 Thread Russell King - ARM Linux
On Wed, Dec 09, 2015 at 05:08:41PM +0200, Horia Geantă wrote:
> On 12/7/2015 9:12 PM, Russell King wrote:
> > Ensure that we clean up allocations and DMA mappings after encountering
> > an error rather than just giving up and leaking memory and resources.
> > 
> > Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
> 
> I guess the error cleanup code should be grouped under an "err" label,
> instead of duplicating it.

It'd be useful if you could quote the code you were commenting on please.
I now need to dig out the patch from my git tree to work out the relevance
of your comment is.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 07/11] crypto: caam: check and use dma_map_sg() return code

2015-12-09 Thread Russell King - ARM Linux
On Wed, Dec 09, 2015 at 05:20:45PM +0200, Horia Geantă wrote:
> On 12/7/2015 9:12 PM, Russell King wrote:
> > Strictly, dma_map_sg() may coalesce SG entries, but in practise on iMX
> > hardware, this will never happen.  However, dma_map_sg() can fail, and
> > we completely fail to check its return value.  So, fix this properly.
> > 
> > Arrange the code to map the scatterlist early, so we know how many
> > scatter table entries to allocate, and then fill them in.  This allows
> > us to keep relatively simple error cleanup paths.
> > 
> > Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
> 
> Some tcrypt tests fail - looks like those with zero plaintext:
> caam_jr ffe301000.jr: unable to map source for DMA
> alg: hash: digest failed on test 1 for sha1-caam: ret=12
> [...]
> 
> Need to be careful, dma_map_sg() returning zero is an error only if ptxt
> is not null (alternatively src_nents returned by sg_nents_for_len()
> could be checked).
> 
> > @@ -1091,7 +1102,7 @@ static int ahash_digest(struct ahash_request *req)
> > u32 *sh_desc = ctx->sh_desc_digest, *desc;
> > dma_addr_t ptr = ctx->sh_desc_digest_dma;
> > int digestsize = crypto_ahash_digestsize(ahash);
> > -   int src_nents, sec4_sg_bytes;
> > +   int src_nents, mapped_nents, sec4_sg_bytes;
> > dma_addr_t src_dma;
> > struct ahash_edesc *edesc;
> > int ret = 0;
> > @@ -1099,9 +1110,14 @@ static int ahash_digest(struct ahash_request *req)
> > int sh_len;
> >  
> > src_nents = sg_nents_for_len(req->src, req->nbytes);
> > -   dma_map_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
> > -   if (src_nents > 1)
> > -   sec4_sg_bytes = src_nents * sizeof(struct sec4_sg_entry);
> > +   mapped_nents = dma_map_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
> > +   if (mapped_nents == 0) {
> > +   dev_err(jrdev, "unable to map source for DMA\n");
> > +   return -ENOMEM;
> > +   }
> 
> This is at least one of the places where the error condition must change
> to smth. like (src_nents != 0 && mapped_nents == 0).

I guess we should avoid calling dma_map_sg() and dma_unmap_sg() when
src_nents is zero.  Thanks, I'll work that into the next patch revision.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 00/13] Further iMX CAAM updates

2015-12-09 Thread Russell King - ARM Linux
On Wed, Dec 09, 2015 at 05:06:03PM +0200, Horia Geantă wrote:
> On 12/7/2015 9:11 PM, Russell King - ARM Linux wrote:
> > Here are further imx-caam updates that I've had since before the
> > previous merge window.  Please review and (I guess) if Freescale
> > folk can provide acks etc that would be nice.  Thanks.
> 
> Thanks Russell.
> 
> Note that the patch set does not apply cleanly, please rebase on latest
> cryptodev-2.6 tree.

I'm afraid I'm not a fan of digging out random git trees, rebasing
patches on top of them, and then not being able to test them locally
without dragging the entire tree into my own tree.  I'd rather wait
another kernel cycle instead of polluting my own tree with commits
from other trees.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 07/11] crypto: caam: check and use dma_map_sg() return code

2015-12-07 Thread Russell King
Strictly, dma_map_sg() may coalesce SG entries, but in practise on iMX
hardware, this will never happen.  However, dma_map_sg() can fail, and
we completely fail to check its return value.  So, fix this properly.

Arrange the code to map the scatterlist early, so we know how many
scatter table entries to allocate, and then fill them in.  This allows
us to keep relatively simple error cleanup paths.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 109 -
 1 file changed, 74 insertions(+), 35 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 076cfddf32bb..9638e9f4f001 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -187,15 +187,6 @@ static inline dma_addr_t buf_map_to_sec4_sg(struct device 
*jrdev,
return buf_dma;
 }
 
-/* Map req->src and put it in link table */
-static inline void src_map_to_sec4_sg(struct device *jrdev,
- struct scatterlist *src, int src_nents,
- struct sec4_sg_entry *sec4_sg)
-{
-   dma_map_sg(jrdev, src, src_nents, DMA_TO_DEVICE);
-   sg_to_sec4_sg_last(src, src_nents, sec4_sg, 0);
-}
-
 /*
  * Only put buffer in link table if it contains data, which is possible,
  * since a buffer has previously been used, and needs to be unmapped,
@@ -791,7 +782,7 @@ static int ahash_update_ctx(struct ahash_request *req)
int in_len = *buflen + req->nbytes, to_hash;
u32 *sh_desc = ctx->sh_desc_update, *desc;
dma_addr_t ptr = ctx->sh_desc_update_dma;
-   int src_nents, sec4_sg_bytes, sec4_sg_src_index;
+   int src_nents, mapped_nents, sec4_sg_bytes, sec4_sg_src_index;
struct ahash_edesc *edesc;
int ret = 0;
int sh_len;
@@ -803,8 +794,19 @@ static int ahash_update_ctx(struct ahash_request *req)
if (to_hash) {
src_nents = sg_nents_for_len(req->src,
 req->nbytes - (*next_buflen));
+   if (src_nents) {
+   mapped_nents = dma_map_sg(jrdev, req->src, src_nents,
+ DMA_TO_DEVICE);
+   if (!mapped_nents) {
+   dev_err(jrdev, "unable to DMA map source\n");
+   return -ENOMEM;
+   }
+   } else {
+   mapped_nents = 0;
+   }
+
sec4_sg_src_index = 1 + (*buflen ? 1 : 0);
-   sec4_sg_bytes = (sec4_sg_src_index + src_nents) *
+   sec4_sg_bytes = (sec4_sg_src_index + mapped_nents) *
 sizeof(struct sec4_sg_entry);
 
/*
@@ -816,6 +818,7 @@ static int ahash_update_ctx(struct ahash_request *req)
if (!edesc) {
dev_err(jrdev,
"could not allocate extended descriptor\n");
+   dma_unmap_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
return -ENOMEM;
}
 
@@ -836,9 +839,10 @@ static int ahash_update_ctx(struct ahash_request *req)
buf, state->buf_dma,
*buflen, last_buflen);
 
-   if (src_nents) {
-   src_map_to_sec4_sg(jrdev, req->src, src_nents,
-  edesc->sec4_sg + sec4_sg_src_index);
+   if (mapped_nents) {
+   sg_to_sec4_sg_last(req->src, mapped_nents,
+  edesc->sec4_sg + sec4_sg_src_index,
+  0);
if (*next_buflen)
scatterwalk_map_and_copy(next_buf, req->src,
 to_hash - *buflen,
@@ -1004,21 +1008,28 @@ static int ahash_finup_ctx(struct ahash_request *req)
u32 *sh_desc = ctx->sh_desc_finup, *desc;
dma_addr_t ptr = ctx->sh_desc_finup_dma;
int sec4_sg_bytes, sec4_sg_src_index;
-   int src_nents;
+   int src_nents, mapped_nents;
int digestsize = crypto_ahash_digestsize(ahash);
struct ahash_edesc *edesc;
int ret = 0;
int sh_len;
 
src_nents = sg_nents_for_len(req->src, req->nbytes);
+   mapped_nents = dma_map_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
+   if (!mapped_nents) {
+   dev_err(jrdev, "unable to DMA map source\n");
+   return -ENOMEM;
+   }
+
sec4_sg_src_index = 1 + (buflen ? 1 : 0);
-   sec4_sg_bytes = (sec4_sg_src_index + src_nents) *
+   sec4_sg_bytes = (sec4_sg_src_index + mapped_nents) *

[PATCH RFC 02/11] crypto: caam: ensure descriptor buffers are cacheline aligned

2015-12-07 Thread Russell King
Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index eccde7207f92..6a6d74f38300 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -99,17 +99,17 @@ static struct list_head hash_list;
 
 /* ahash per-session context */
 struct caam_hash_ctx {
-   struct device *jrdev;
-   u32 sh_desc_update[DESC_HASH_MAX_USED_LEN];
-   u32 sh_desc_update_first[DESC_HASH_MAX_USED_LEN];
-   u32 sh_desc_fin[DESC_HASH_MAX_USED_LEN];
-   u32 sh_desc_digest[DESC_HASH_MAX_USED_LEN];
-   u32 sh_desc_finup[DESC_HASH_MAX_USED_LEN];
-   dma_addr_t sh_desc_update_dma;
+   u32 sh_desc_update[DESC_HASH_MAX_USED_LEN] cacheline_aligned;
+   u32 sh_desc_update_first[DESC_HASH_MAX_USED_LEN] cacheline_aligned;
+   u32 sh_desc_fin[DESC_HASH_MAX_USED_LEN] cacheline_aligned;
+   u32 sh_desc_digest[DESC_HASH_MAX_USED_LEN] cacheline_aligned;
+   u32 sh_desc_finup[DESC_HASH_MAX_USED_LEN] cacheline_aligned;
+   dma_addr_t sh_desc_update_dma cacheline_aligned;
dma_addr_t sh_desc_update_first_dma;
dma_addr_t sh_desc_fin_dma;
dma_addr_t sh_desc_digest_dma;
dma_addr_t sh_desc_finup_dma;
+   struct device *jrdev;
u32 alg_type;
u32 alg_op;
u8 key[CAAM_MAX_HASH_KEY_SIZE];
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 04/11] crypto: caam: mark the hardware descriptor as cache line aligned

2015-12-07 Thread Russell King
Mark the hardware descriptor as being cache line aligned; on DMA
incoherent architectures, the hardware descriptor should sit in a
separate cache line from the CPU accessed data to avoid polluting
the caches.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index cf557e1291b3..0a9665140d26 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -604,7 +604,7 @@ struct ahash_edesc {
int src_nents;
int sec4_sg_bytes;
struct sec4_sg_entry *sec4_sg;
-   u32 hw_desc[DESC_JOB_IO_LEN / sizeof(u32)];
+   u32 hw_desc[DESC_JOB_IO_LEN / sizeof(u32)] cacheline_aligned;
 };
 
 static inline void ahash_unmap(struct device *dev,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 05/11] crypto: caam: replace sec4_sg pointer with array

2015-12-07 Thread Russell King
Since the extended descriptor includes the hardware descriptor, and the
sec4 scatterlist immediately follows this, we can declare it as a array
at the very end of the extended descriptor.  This allows us to get rid
of an initialiser for every site where we allocate an extended
descriptor.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 0a9665140d26..d48974d1897d 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -595,16 +595,16 @@ static int ahash_setkey(struct crypto_ahash *ahash,
  * @sec4_sg_dma: physical mapped address of h/w link table
  * @src_nents: number of segments in input scatterlist
  * @sec4_sg_bytes: length of dma mapped sec4_sg space
- * @sec4_sg: pointer to h/w link table
  * @hw_desc: the h/w job descriptor followed by any referenced link tables
+ * @sec4_sg: h/w link table
  */
 struct ahash_edesc {
dma_addr_t dst_dma;
dma_addr_t sec4_sg_dma;
int src_nents;
int sec4_sg_bytes;
-   struct sec4_sg_entry *sec4_sg;
u32 hw_desc[DESC_JOB_IO_LEN / sizeof(u32)] cacheline_aligned;
+   struct sec4_sg_entry sec4_sg[0];
 };
 
 static inline void ahash_unmap(struct device *dev,
@@ -821,7 +821,6 @@ static int ahash_update_ctx(struct ahash_request *req)
 
edesc->src_nents = src_nents;
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)(edesc + 1);
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
 edesc->sec4_sg, DMA_BIDIRECTIONAL);
@@ -931,7 +930,6 @@ static int ahash_final_ctx(struct ahash_request *req)
init_job_desc_shared(desc, ptr, sh_len, HDR_SHARE_DEFER | HDR_REVERSE);
 
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)(edesc + 1);
edesc->src_nents = 0;
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
@@ -1016,7 +1014,6 @@ static int ahash_finup_ctx(struct ahash_request *req)
 
edesc->src_nents = src_nents;
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)(edesc + 1);
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
 edesc->sec4_sg, DMA_TO_DEVICE);
@@ -1093,7 +1090,7 @@ static int ahash_digest(struct ahash_request *req)
dev_err(jrdev, "could not allocate extended descriptor\n");
return -ENOMEM;
}
-   edesc->sec4_sg = (void *)(edesc + 1);
+
edesc->sec4_sg_bytes = sec4_sg_bytes;
edesc->src_nents = src_nents;
 
@@ -1247,7 +1244,6 @@ static int ahash_update_no_ctx(struct ahash_request *req)
 
edesc->src_nents = src_nents;
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)(edesc + 1);
edesc->dst_dma = 0;
 
state->buf_dma = buf_map_to_sec4_sg(jrdev, edesc->sec4_sg,
@@ -1354,7 +1350,6 @@ static int ahash_finup_no_ctx(struct ahash_request *req)
 
edesc->src_nents = src_nents;
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)(edesc + 1);
 
state->buf_dma = try_buf_map_to_sec4_sg(jrdev, edesc->sec4_sg, buf,
state->buf_dma, buflen,
@@ -1445,7 +1440,6 @@ static int ahash_update_first(struct ahash_request *req)
 
edesc->src_nents = src_nents;
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)(edesc + 1);
edesc->dst_dma = 0;
 
if (src_nents > 1) {
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 11/11] crypto: caam: get rid of tasklet

2015-12-07 Thread Russell King
Threaded interrupts can perform the function of the tasklet, and much
more safely too - without races when trying to take the tasklet and
interrupt down on device removal.

With the old code, there is a window where we call tasklet_kill().  If
the interrupt handler happens to be running on a different CPU, and
subsequently calls tasklet_schedule(), the tasklet will be re-scheduled
for execution.

Switching to a hardirq/threadirq combination implementation avoids this,
and it also means generic code deals with the teardown sequencing of the
threaded and non-threaded parts.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/intern.h |  1 -
 drivers/crypto/caam/jr.c | 25 +
 2 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/drivers/crypto/caam/intern.h b/drivers/crypto/caam/intern.h
index e2bcacc1a921..5d4c05074a5c 100644
--- a/drivers/crypto/caam/intern.h
+++ b/drivers/crypto/caam/intern.h
@@ -41,7 +41,6 @@ struct caam_drv_private_jr {
struct device   *dev;
int ridx;
struct caam_job_ring __iomem *rregs;/* JobR's register space */
-   struct tasklet_struct irqtask;
int irq;/* One per queue */
 
/* Number of scatterlist crypt transforms active on the JobR */
diff --git a/drivers/crypto/caam/jr.c b/drivers/crypto/caam/jr.c
index f7e0d8d4c3da..b77ee2b88f37 100644
--- a/drivers/crypto/caam/jr.c
+++ b/drivers/crypto/caam/jr.c
@@ -73,8 +73,6 @@ int caam_jr_shutdown(struct device *dev)
 
ret = caam_reset_hw_jr(dev);
 
-   tasklet_kill(>irqtask);
-
/* Release interrupt */
free_irq(jrp->irq, dev);
 
@@ -130,7 +128,7 @@ static irqreturn_t caam_jr_interrupt(int irq, void *st_dev)
 
/*
 * Check the output ring for ready responses, kick
-* tasklet if jobs done.
+* the threaded irq if jobs done.
 */
irqstate = rd_reg32(>rregs->jrintstatus);
if (!irqstate)
@@ -152,18 +150,13 @@ static irqreturn_t caam_jr_interrupt(int irq, void 
*st_dev)
/* Have valid interrupt at this point, just ACK and trigger */
wr_reg32(>rregs->jrintstatus, irqstate);
 
-   preempt_disable();
-   tasklet_schedule(>irqtask);
-   preempt_enable();
-
-   return IRQ_HANDLED;
+   return IRQ_WAKE_THREAD;
 }
 
-/* Deferred service handler, run as interrupt-fired tasklet */
-static void caam_jr_dequeue(unsigned long devarg)
+static irqreturn_t caam_jr_threadirq(int irq, void *st_dev)
 {
int hw_idx, sw_idx, i, head, tail;
-   struct device *dev = (struct device *)devarg;
+   struct device *dev = st_dev;
struct caam_drv_private_jr *jrp = dev_get_drvdata(dev);
void (*usercall)(struct device *dev, u32 *desc, u32 status, void *arg);
u32 *userdesc, userstatus;
@@ -237,6 +230,8 @@ static void caam_jr_dequeue(unsigned long devarg)
 
/* reenable / unmask IRQs */
clrbits32(>rregs->rconfig_lo, JRCFG_IMSK);
+
+   return IRQ_HANDLED;
 }
 
 /**
@@ -394,11 +389,10 @@ static int caam_jr_init(struct device *dev)
 
jrp = dev_get_drvdata(dev);
 
-   tasklet_init(>irqtask, caam_jr_dequeue, (unsigned long)dev);
-
/* Connect job ring interrupt handler. */
-   error = request_irq(jrp->irq, caam_jr_interrupt, IRQF_SHARED,
-   dev_name(dev), dev);
+   error = request_threaded_irq(jrp->irq, caam_jr_interrupt,
+caam_jr_threadirq, IRQF_SHARED,
+dev_name(dev), dev);
if (error) {
dev_err(dev, "can't connect JobR %d interrupt (%d)\n",
jrp->ridx, jrp->irq);
@@ -460,7 +454,6 @@ static int caam_jr_init(struct device *dev)
 out_free_irq:
free_irq(jrp->irq, dev);
 out_kill_deq:
-   tasklet_kill(>irqtask);
return error;
 }
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 03/11] crypto: caam: incorporate job descriptor into struct ahash_edesc

2015-12-07 Thread Russell King
Rather than giving the descriptor as hw_desc[0], give it's real size.
All places where we allocate an ahash_edesc incorporate DESC_JOB_IO_LEN
bytes of job descriptor.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 49 --
 1 file changed, 19 insertions(+), 30 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 6a6d74f38300..cf557e1291b3 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -604,7 +604,7 @@ struct ahash_edesc {
int src_nents;
int sec4_sg_bytes;
struct sec4_sg_entry *sec4_sg;
-   u32 hw_desc[0];
+   u32 hw_desc[DESC_JOB_IO_LEN / sizeof(u32)];
 };
 
 static inline void ahash_unmap(struct device *dev,
@@ -811,8 +811,8 @@ static int ahash_update_ctx(struct ahash_request *req)
 * allocate space for base edesc and hw desc commands,
 * link tables
 */
-   edesc = kzalloc(sizeof(*edesc) + DESC_JOB_IO_LEN +
-   sec4_sg_bytes, GFP_DMA | flags);
+   edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes,
+   GFP_DMA | flags);
if (!edesc) {
dev_err(jrdev,
"could not allocate extended descriptor\n");
@@ -821,8 +821,7 @@ static int ahash_update_ctx(struct ahash_request *req)
 
edesc->src_nents = src_nents;
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)edesc + sizeof(struct ahash_edesc) +
-DESC_JOB_IO_LEN;
+   edesc->sec4_sg = (void *)(edesc + 1);
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
 edesc->sec4_sg, DMA_BIDIRECTIONAL);
@@ -921,8 +920,7 @@ static int ahash_final_ctx(struct ahash_request *req)
sec4_sg_bytes = sec4_sg_src_index * sizeof(struct sec4_sg_entry);
 
/* allocate space for base edesc and hw desc commands, link tables */
-   edesc = kzalloc(sizeof(*edesc) + DESC_JOB_IO_LEN + sec4_sg_bytes,
-   GFP_DMA | flags);
+   edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes, GFP_DMA | flags);
if (!edesc) {
dev_err(jrdev, "could not allocate extended descriptor\n");
return -ENOMEM;
@@ -933,8 +931,7 @@ static int ahash_final_ctx(struct ahash_request *req)
init_job_desc_shared(desc, ptr, sh_len, HDR_SHARE_DEFER | HDR_REVERSE);
 
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)edesc + sizeof(struct ahash_edesc) +
-DESC_JOB_IO_LEN;
+   edesc->sec4_sg = (void *)(edesc + 1);
edesc->src_nents = 0;
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
@@ -1007,8 +1004,7 @@ static int ahash_finup_ctx(struct ahash_request *req)
 sizeof(struct sec4_sg_entry);
 
/* allocate space for base edesc and hw desc commands, link tables */
-   edesc = kzalloc(sizeof(*edesc) + DESC_JOB_IO_LEN + sec4_sg_bytes,
-   GFP_DMA | flags);
+   edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes, GFP_DMA | flags);
if (!edesc) {
dev_err(jrdev, "could not allocate extended descriptor\n");
return -ENOMEM;
@@ -1020,8 +1016,7 @@ static int ahash_finup_ctx(struct ahash_request *req)
 
edesc->src_nents = src_nents;
edesc->sec4_sg_bytes = sec4_sg_bytes;
-   edesc->sec4_sg = (void *)edesc + sizeof(struct ahash_edesc) +
-DESC_JOB_IO_LEN;
+   edesc->sec4_sg = (void *)(edesc + 1);
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
 edesc->sec4_sg, DMA_TO_DEVICE);
@@ -1093,14 +1088,12 @@ static int ahash_digest(struct ahash_request *req)
sec4_sg_bytes = 0;
 
/* allocate space for base edesc and hw desc commands, link tables */
-   edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes + DESC_JOB_IO_LEN,
-   GFP_DMA | flags);
+   edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes, GFP_DMA | flags);
if (!edesc) {
dev_err(jrdev, "could not allocate extended descriptor\n");
return -ENOMEM;
}
-   edesc->sec4_sg = (void *)edesc + sizeof(struct ahash_edesc) +
- DESC_JOB_IO_LEN;
+   edesc->sec4_sg = (void *)(edesc + 1);
edesc->sec4_sg_bytes = sec4_sg_bytes;
edesc->src_nents = src_nents;
 
@@ -1166,7 +1159,7 @@ static int ahash_final_no_ctx(struct ahash_request *req)
int sh_len;
 
/* allocate space for base edesc and hw desc commands, link tables */
-   edesc = kzallo

[PATCH RFC 08/11] crypto: caam: add ahash_edesc_alloc() for descriptor allocation

2015-12-07 Thread Russell King
Add a helper function to perform the descriptor allocation.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 60 +++---
 1 file changed, 33 insertions(+), 27 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 9638e9f4f001..f35f4cfc27a7 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -765,6 +765,25 @@ static void ahash_done_ctx_dst(struct device *jrdev, u32 
*desc, u32 err,
req->base.complete(>base, err);
 }
 
+/*
+ * Allocate an enhanced descriptor, which contains the hardware descriptor
+ * and space for hardware scatter table containing sg_num entries.
+ */
+static struct ahash_edesc *ahash_edesc_alloc(struct caam_hash_ctx *ctx,
+int sg_num, gfp_t flags)
+{
+   struct ahash_edesc *edesc;
+   unsigned int sg_size = sg_num * sizeof(struct sec4_sg_entry);
+
+   edesc = kzalloc(sizeof(*edesc) + sg_size, GFP_DMA | flags);
+   if (!edesc) {
+   dev_err(ctx->jrdev, "could not allocate extended descriptor\n");
+   return NULL;
+   }
+
+   return edesc;
+}
+
 /* submit update job descriptor */
 static int ahash_update_ctx(struct ahash_request *req)
 {
@@ -813,11 +832,9 @@ static int ahash_update_ctx(struct ahash_request *req)
 * allocate space for base edesc and hw desc commands,
 * link tables
 */
-   edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes,
-   GFP_DMA | flags);
+   edesc = ahash_edesc_alloc(ctx, sec4_sg_src_index + mapped_nents,
+ flags);
if (!edesc) {
-   dev_err(jrdev,
-   "could not allocate extended descriptor\n");
dma_unmap_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
return -ENOMEM;
}
@@ -930,11 +947,9 @@ static int ahash_final_ctx(struct ahash_request *req)
sec4_sg_bytes = sec4_sg_src_index * sizeof(struct sec4_sg_entry);
 
/* allocate space for base edesc and hw desc commands, link tables */
-   edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes, GFP_DMA | flags);
-   if (!edesc) {
-   dev_err(jrdev, "could not allocate extended descriptor\n");
+   edesc = ahash_edesc_alloc(ctx, sec4_sg_src_index, flags);
+   if (!edesc)
return -ENOMEM;
-   }
 
sh_len = desc_len(sh_desc);
desc = edesc->hw_desc;
@@ -1026,9 +1041,9 @@ static int ahash_finup_ctx(struct ahash_request *req)
 sizeof(struct sec4_sg_entry);
 
/* allocate space for base edesc and hw desc commands, link tables */
-   edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes, GFP_DMA | flags);
+   edesc = ahash_edesc_alloc(ctx, sec4_sg_src_index + mapped_nents,
+ flags);
if (!edesc) {
-   dev_err(jrdev, "could not allocate extended descriptor\n");
dma_unmap_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
return -ENOMEM;
}
@@ -1122,9 +1137,9 @@ static int ahash_digest(struct ahash_request *req)
sec4_sg_bytes = 0;
 
/* allocate space for base edesc and hw desc commands, link tables */
-   edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes, GFP_DMA | flags);
+   edesc = ahash_edesc_alloc(ctx, mapped_nents > 1 ? mapped_nents : 0,
+ flags);
if (!edesc) {
-   dev_err(jrdev, "could not allocate extended descriptor\n");
dma_unmap_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
return -ENOMEM;
}
@@ -1198,13 +1213,10 @@ static int ahash_final_no_ctx(struct ahash_request *req)
int sh_len;
 
/* allocate space for base edesc and hw desc commands, link tables */
-   edesc = kzalloc(sizeof(*edesc), GFP_DMA | flags);
-   if (!edesc) {
-   dev_err(jrdev, "could not allocate extended descriptor\n");
+   edesc = ahash_edesc_alloc(ctx, 0, flags);
+   if (!edesc)
return -ENOMEM;
-   }
 
-   edesc->sec4_sg_bytes = 0;
sh_len = desc_len(sh_desc);
desc = edesc->hw_desc;
init_job_desc_shared(desc, ptr, sh_len, HDR_SHARE_DEFER | HDR_REVERSE);
@@ -1287,11 +1299,8 @@ static int ahash_update_no_ctx(struct ahash_request *req)
 * allocate space for base edesc and hw desc commands,
 * link tables
 */
-   edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes,
-   GFP_DMA | flags);
+   edesc = ahash_edesc_alloc(ctx, 1 + mapped_nents, flags);
   

[PATCH RFC 09/11] crypto: caam: move job descriptor initialisation to ahash_edesc_alloc()

2015-12-07 Thread Russell King
Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 84 +-
 1 file changed, 34 insertions(+), 50 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index f35f4cfc27a7..241268d108ec 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -770,7 +770,9 @@ static void ahash_done_ctx_dst(struct device *jrdev, u32 
*desc, u32 err,
  * and space for hardware scatter table containing sg_num entries.
  */
 static struct ahash_edesc *ahash_edesc_alloc(struct caam_hash_ctx *ctx,
-int sg_num, gfp_t flags)
+int sg_num, u32 *sh_desc,
+dma_addr_t sh_desc_dma,
+gfp_t flags)
 {
struct ahash_edesc *edesc;
unsigned int sg_size = sg_num * sizeof(struct sec4_sg_entry);
@@ -781,6 +783,9 @@ static struct ahash_edesc *ahash_edesc_alloc(struct 
caam_hash_ctx *ctx,
return NULL;
}
 
+   init_job_desc_shared(edesc->hw_desc, sh_desc_dma, desc_len(sh_desc),
+HDR_SHARE_DEFER | HDR_REVERSE);
+
return edesc;
 }
 
@@ -799,12 +804,10 @@ static int ahash_update_ctx(struct ahash_request *req)
int *next_buflen = state->current_buf ? >buflen_0 :
   >buflen_1, last_buflen;
int in_len = *buflen + req->nbytes, to_hash;
-   u32 *sh_desc = ctx->sh_desc_update, *desc;
-   dma_addr_t ptr = ctx->sh_desc_update_dma;
+   u32 *desc;
int src_nents, mapped_nents, sec4_sg_bytes, sec4_sg_src_index;
struct ahash_edesc *edesc;
int ret = 0;
-   int sh_len;
 
last_buflen = *next_buflen;
*next_buflen = in_len & (crypto_tfm_alg_blocksize(>base) - 1);
@@ -833,7 +836,8 @@ static int ahash_update_ctx(struct ahash_request *req)
 * link tables
 */
edesc = ahash_edesc_alloc(ctx, sec4_sg_src_index + mapped_nents,
- flags);
+ ctx->sh_desc_update,
+ ctx->sh_desc_update_dma, flags);
if (!edesc) {
dma_unmap_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
return -ENOMEM;
@@ -871,10 +875,7 @@ static int ahash_update_ctx(struct ahash_request *req)
 
state->current_buf = !state->current_buf;
 
-   sh_len = desc_len(sh_desc);
desc = edesc->hw_desc;
-   init_job_desc_shared(desc, ptr, sh_len, HDR_SHARE_DEFER |
-HDR_REVERSE);
 
edesc->sec4_sg_dma = dma_map_single(jrdev, edesc->sec4_sg,
 sec4_sg_bytes,
@@ -935,25 +936,23 @@ static int ahash_final_ctx(struct ahash_request *req)
int buflen = state->current_buf ? state->buflen_1 : state->buflen_0;
int last_buflen = state->current_buf ? state->buflen_0 :
  state->buflen_1;
-   u32 *sh_desc = ctx->sh_desc_fin, *desc;
-   dma_addr_t ptr = ctx->sh_desc_fin_dma;
+   u32 *desc;
int sec4_sg_bytes, sec4_sg_src_index;
int digestsize = crypto_ahash_digestsize(ahash);
struct ahash_edesc *edesc;
int ret = 0;
-   int sh_len;
 
sec4_sg_src_index = 1 + (buflen ? 1 : 0);
sec4_sg_bytes = sec4_sg_src_index * sizeof(struct sec4_sg_entry);
 
/* allocate space for base edesc and hw desc commands, link tables */
-   edesc = ahash_edesc_alloc(ctx, sec4_sg_src_index, flags);
+   edesc = ahash_edesc_alloc(ctx, sec4_sg_src_index,
+ ctx->sh_desc_fin, ctx->sh_desc_fin_dma,
+ flags);
if (!edesc)
return -ENOMEM;
 
-   sh_len = desc_len(sh_desc);
desc = edesc->hw_desc;
-   init_job_desc_shared(desc, ptr, sh_len, HDR_SHARE_DEFER | HDR_REVERSE);
 
edesc->sec4_sg_bytes = sec4_sg_bytes;
edesc->src_nents = 0;
@@ -1020,14 +1019,12 @@ static int ahash_finup_ctx(struct ahash_request *req)
int buflen = state->current_buf ? state->buflen_1 : state->buflen_0;
int last_buflen = state->current_buf ? state->buflen_0 :
  state->buflen_1;
-   u32 *sh_desc = ctx->sh_desc_finup, *desc;
-   dma_addr_t ptr = ctx->sh_desc_finup_dma;
+   u32 *desc;
int sec4_sg_bytes, sec4_sg_src_index;
int src_nents, mapped_nents;
int digestsize = crypto_ahash_digestsize(ahash);
struct ahash_edesc *edesc;
int ret = 0;
-   int sh_len;
 
src_nents = sg_nents_for_len(r

[PATCH RFC 10/11] crypto: caam: add ahash_edesc_add_src()

2015-12-07 Thread Russell King
Add a helper to map the source scatterlist into the descriptor.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 106 +++--
 1 file changed, 49 insertions(+), 57 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 241268d108ec..4e73d3218481 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -789,6 +789,40 @@ static struct ahash_edesc *ahash_edesc_alloc(struct 
caam_hash_ctx *ctx,
return edesc;
 }
 
+static int ahash_edesc_add_src(struct caam_hash_ctx *ctx,
+  struct ahash_edesc *edesc,
+  struct ahash_request *req, int nents,
+  unsigned int first_sg, unsigned int first_bytes)
+{
+   dma_addr_t src_dma;
+   u32 options;
+
+   if (nents > 1 || first_sg) {
+   struct sec4_sg_entry *sg = edesc->sec4_sg;
+   unsigned int sgsize = sizeof(*sg) * (first_sg + nents);
+
+   sg_to_sec4_sg_last(req->src, nents, sg + first_sg, 0);
+
+   src_dma = dma_map_single(ctx->jrdev, sg, sgsize, DMA_TO_DEVICE);
+   if (dma_mapping_error(ctx->jrdev, src_dma)) {
+   dev_err(ctx->jrdev, "unable to map S/G table\n");
+   return -ENOMEM;
+   }
+
+   edesc->sec4_sg_bytes = sgsize;
+   edesc->sec4_sg_dma = src_dma;
+   options = LDST_SGF;
+   } else {
+   src_dma = sg_dma_address(req->src);
+   options = 0;
+   }
+
+   append_seq_in_ptr(edesc->hw_desc, src_dma, first_bytes + req->nbytes,
+ options);
+
+   return 0;
+}
+
 /* submit update job descriptor */
 static int ahash_update_ctx(struct ahash_request *req)
 {
@@ -1112,11 +1146,9 @@ static int ahash_digest(struct ahash_request *req)
   CRYPTO_TFM_REQ_MAY_SLEEP)) ? GFP_KERNEL : GFP_ATOMIC;
u32 *desc;
int digestsize = crypto_ahash_digestsize(ahash);
-   int src_nents, mapped_nents, sec4_sg_bytes;
-   dma_addr_t src_dma;
+   int src_nents, mapped_nents;
struct ahash_edesc *edesc;
int ret = 0;
-   u32 options;
 
src_nents = sg_nents_for_len(req->src, req->nbytes);
mapped_nents = dma_map_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
@@ -1125,11 +1157,6 @@ static int ahash_digest(struct ahash_request *req)
return -ENOMEM;
}
 
-   if (mapped_nents > 1)
-   sec4_sg_bytes = mapped_nents * sizeof(struct sec4_sg_entry);
-   else
-   sec4_sg_bytes = 0;
-
/* allocate space for base edesc and hw desc commands, link tables */
edesc = ahash_edesc_alloc(ctx, mapped_nents > 1 ? mapped_nents : 0,
  ctx->sh_desc_digest, ctx->sh_desc_digest_dma,
@@ -1139,28 +1166,16 @@ static int ahash_digest(struct ahash_request *req)
return -ENOMEM;
}
 
-   edesc->sec4_sg_bytes = sec4_sg_bytes;
edesc->src_nents = src_nents;
 
-   desc = edesc->hw_desc;
-
-   if (src_nents > 1) {
-   sg_to_sec4_sg_last(req->src, mapped_nents, edesc->sec4_sg, 0);
-   edesc->sec4_sg_dma = dma_map_single(jrdev, edesc->sec4_sg,
-   sec4_sg_bytes, DMA_TO_DEVICE);
-   if (dma_mapping_error(jrdev, edesc->sec4_sg_dma)) {
-   dev_err(jrdev, "unable to map S/G table\n");
-   ahash_unmap(jrdev, edesc, req, digestsize);
-   kfree(edesc);
-   return -ENOMEM;
-   }
-   src_dma = edesc->sec4_sg_dma;
-   options = LDST_SGF;
-   } else {
-   src_dma = sg_dma_address(req->src);
-   options = 0;
+   ret = ahash_edesc_add_src(ctx, edesc, req, mapped_nents, 0, 0);
+   if (ret) {
+   ahash_unmap(jrdev, edesc, req, digestsize);
+   kfree(edesc);
+   return ret;
}
-   append_seq_in_ptr(desc, src_dma, req->nbytes, options);
+
+   desc = edesc->hw_desc;
 
edesc->dst_dma = map_seq_out_ptr_result(desc, jrdev, req->result,
digestsize);
@@ -1471,9 +1486,7 @@ static int ahash_update_first(struct ahash_request *req)
>buflen_1 : >buflen_0;
int to_hash;
u32 *desc;
-   int sec4_sg_bytes, src_nents, mapped_nents;
-   dma_addr_t src_dma;
-   u32 options;
+   int src_nents, mapped_nents;
struct ahash_edesc *edesc;
int ret = 0;
 
@@ -1490,11 +1503,6 @@ static int ahash_update_first(struct ahash_request *req)
dev_err(jrdev

[PATCH RFC 06/11] crypto: caam: ensure that we clean up after an error

2015-12-07 Thread Russell King
Ensure that we clean up allocations and DMA mappings after encountering
an error rather than just giving up and leaking memory and resources.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 57 ++
 1 file changed, 52 insertions(+), 5 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index d48974d1897d..076cfddf32bb 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -824,8 +824,12 @@ static int ahash_update_ctx(struct ahash_request *req)
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
 edesc->sec4_sg, DMA_BIDIRECTIONAL);
-   if (ret)
+   if (ret) {
+   ahash_unmap_ctx(jrdev, edesc, req, ctx->ctx_len,
+   DMA_BIDIRECTIONAL);
+   kfree(edesc);
return ret;
+   }
 
state->buf_dma = try_buf_map_to_sec4_sg(jrdev,
edesc->sec4_sg + 1,
@@ -856,6 +860,9 @@ static int ahash_update_ctx(struct ahash_request *req)
 DMA_TO_DEVICE);
if (dma_mapping_error(jrdev, edesc->sec4_sg_dma)) {
dev_err(jrdev, "unable to map S/G table\n");
+   ahash_unmap_ctx(jrdev, edesc, req, ctx->ctx_len,
+   DMA_BIDIRECTIONAL);
+   kfree(edesc);
return -ENOMEM;
}
 
@@ -934,8 +941,11 @@ static int ahash_final_ctx(struct ahash_request *req)
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
 edesc->sec4_sg, DMA_TO_DEVICE);
-   if (ret)
+   if (ret) {
+   ahash_unmap_ctx(jrdev, edesc, req, digestsize, DMA_FROM_DEVICE);
+   kfree(edesc);
return ret;
+   }
 
state->buf_dma = try_buf_map_to_sec4_sg(jrdev, edesc->sec4_sg + 1,
buf, state->buf_dma, buflen,
@@ -946,6 +956,8 @@ static int ahash_final_ctx(struct ahash_request *req)
sec4_sg_bytes, DMA_TO_DEVICE);
if (dma_mapping_error(jrdev, edesc->sec4_sg_dma)) {
dev_err(jrdev, "unable to map S/G table\n");
+   ahash_unmap_ctx(jrdev, edesc, req, digestsize, DMA_FROM_DEVICE);
+   kfree(edesc);
return -ENOMEM;
}
 
@@ -956,6 +968,8 @@ static int ahash_final_ctx(struct ahash_request *req)
digestsize);
if (dma_mapping_error(jrdev, edesc->dst_dma)) {
dev_err(jrdev, "unable to map dst\n");
+   ahash_unmap_ctx(jrdev, edesc, req, digestsize, DMA_FROM_DEVICE);
+   kfree(edesc);
return -ENOMEM;
}
 
@@ -1017,8 +1031,11 @@ static int ahash_finup_ctx(struct ahash_request *req)
 
ret = ctx_map_to_sec4_sg(desc, jrdev, state, ctx->ctx_len,
 edesc->sec4_sg, DMA_TO_DEVICE);
-   if (ret)
+   if (ret) {
+   ahash_unmap_ctx(jrdev, edesc, req, digestsize, DMA_FROM_DEVICE);
+   kfree(edesc);
return ret;
+   }
 
state->buf_dma = try_buf_map_to_sec4_sg(jrdev, edesc->sec4_sg + 1,
buf, state->buf_dma, buflen,
@@ -1031,6 +1048,8 @@ static int ahash_finup_ctx(struct ahash_request *req)
sec4_sg_bytes, DMA_TO_DEVICE);
if (dma_mapping_error(jrdev, edesc->sec4_sg_dma)) {
dev_err(jrdev, "unable to map S/G table\n");
+   ahash_unmap_ctx(jrdev, edesc, req, digestsize, DMA_FROM_DEVICE);
+   kfree(edesc);
return -ENOMEM;
}
 
@@ -1041,6 +1060,8 @@ static int ahash_finup_ctx(struct ahash_request *req)
digestsize);
if (dma_mapping_error(jrdev, edesc->dst_dma)) {
dev_err(jrdev, "unable to map dst\n");
+   ahash_unmap_ctx(jrdev, edesc, req, digestsize, DMA_FROM_DEVICE);
+   kfree(edesc);
return -ENOMEM;
}
 
@@ -1104,6 +1125,8 @@ static int ahash_digest(struct ahash_request *req)
sec4_sg_bytes, DMA_TO_DEVICE);
if (dma_mapping_error(jrdev, edesc->sec4_sg_dma)) {
dev_err(jrdev, "unable to map S/G table\n");
+   ahash_unmap(jrdev, edesc, req, digestsize);
+   kfree(edesc);

[PATCH RFC 01/11] crypto: caam: fix DMA API mapping leak

2015-12-07 Thread Russell King
caamhash contains this weird code:

src_nents = sg_count(req->src, req->nbytes);
dma_map_sg(jrdev, req->src, src_nents ? : 1, DMA_TO_DEVICE);
...
edesc->src_nents = src_nents;

sg_count() returns zero when sg_nents_for_len() returns zero or one.
This means we don't need to use a hardware scatterlist.  However,
setting src_nents to zero causes problems when we unmap:

if (edesc->src_nents)
dma_unmap_sg_chained(dev, req->src, edesc->src_nents,
 DMA_TO_DEVICE, edesc->chained);

as zero here means that we have no entries to unmap.  This causes us
to leak DMA mappings, where we map one scatterlist entry and then
fail to unmap it.

This can be fixed in two ways: either by writing the number of entries
that were requested of dma_map_sg(), or by reworking the "no SG
required" case.

We adopt the re-work solution here - we replace sg_count() with
sg_nents_for_len(), so src_nents now contains the real number of
scatterlist entries, and we then change the test for using the
hardware scatterlist to src_nents > 1 rather than just non-zero.

This change passes my sshd, openssl tests hashing /bin and tcrypt
tests.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 49106ea42887..eccde7207f92 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -1085,9 +1085,12 @@ static int ahash_digest(struct ahash_request *req)
u32 options;
int sh_len;
 
-   src_nents = sg_count(req->src, req->nbytes);
-   dma_map_sg(jrdev, req->src, src_nents ? : 1, DMA_TO_DEVICE);
-   sec4_sg_bytes = src_nents * sizeof(struct sec4_sg_entry);
+   src_nents = sg_nents_for_len(req->src, req->nbytes);
+   dma_map_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
+   if (src_nents > 1)
+   sec4_sg_bytes = src_nents * sizeof(struct sec4_sg_entry);
+   else
+   sec4_sg_bytes = 0;
 
/* allocate space for base edesc and hw desc commands, link tables */
edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes + DESC_JOB_IO_LEN,
@@ -1105,7 +1108,7 @@ static int ahash_digest(struct ahash_request *req)
desc = edesc->hw_desc;
init_job_desc_shared(desc, ptr, sh_len, HDR_SHARE_DEFER | HDR_REVERSE);
 
-   if (src_nents) {
+   if (src_nents > 1) {
sg_to_sec4_sg_last(req->src, src_nents, edesc->sec4_sg, 0);
edesc->sec4_sg_dma = dma_map_single(jrdev, edesc->sec4_sg,
sec4_sg_bytes, DMA_TO_DEVICE);
@@ -1232,8 +1235,8 @@ static int ahash_update_no_ctx(struct ahash_request *req)
to_hash = in_len - *next_buflen;
 
if (to_hash) {
-   src_nents = sg_nents_for_len(req->src,
-req->nbytes - (*next_buflen));
+   src_nents = sg_nents_for_len(req->src, req->nbytes -
+  *next_buflen);
sec4_sg_bytes = (1 + src_nents) *
sizeof(struct sec4_sg_entry);
 
@@ -1429,9 +1432,14 @@ static int ahash_update_first(struct ahash_request *req)
to_hash = req->nbytes - *next_buflen;
 
if (to_hash) {
-   src_nents = sg_count(req->src, req->nbytes - (*next_buflen));
-   dma_map_sg(jrdev, req->src, src_nents ? : 1, DMA_TO_DEVICE);
-   sec4_sg_bytes = src_nents * sizeof(struct sec4_sg_entry);
+   src_nents = sg_nents_for_len(req->src, req->nbytes -
+  *next_buflen);
+   dma_map_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE);
+   if (src_nents > 1)
+   sec4_sg_bytes = src_nents *
+   sizeof(struct sec4_sg_entry);
+   else
+   sec4_sg_bytes = 0;
 
/*
 * allocate space for base edesc and hw desc commands,
@@ -1451,7 +1459,7 @@ static int ahash_update_first(struct ahash_request *req)
 DESC_JOB_IO_LEN;
edesc->dst_dma = 0;
 
-   if (src_nents) {
+   if (src_nents > 1) {
sg_to_sec4_sg_last(req->src, src_nents,
   edesc->sec4_sg, 0);
edesc->sec4_sg_dma = dma_map_single(jrdev,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 00/13] Further iMX CAAM updates

2015-12-07 Thread Russell King - ARM Linux
Here are further imx-caam updates that I've had since before the
previous merge window.  Please review and (I guess) if Freescale
folk can provide acks etc that would be nice.  Thanks.

 drivers/crypto/caam/caamhash.c | 415 +++--
 drivers/crypto/caam/intern.h   |   1 -
 drivers/crypto/caam/jr.c   |  25 +--
 3 files changed, 246 insertions(+), 195 deletions(-)


-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/18] crypto: marvell: add flag to determine algorithm endianness

2015-10-19 Thread Russell King - ARM Linux
On Mon, Oct 19, 2015 at 03:04:51PM +, Jason Cooper wrote:
> Hey Russell,
> 
> On Sun, Oct 18, 2015 at 05:23:40PM +0100, Russell King wrote:
> >  static int mv_cesa_ahash_init(struct ahash_request *req,
> > - struct mv_cesa_op_ctx *tmpl)
> > + struct mv_cesa_op_ctx *tmpl, bool algo_le)
> 
> nit: Would it make more sense the do bool algo_endian, and then ...

I don't like "bool"s that don't self-document what they mean.
What does "if (algo_endian)" mean?  It's pretty clear what
"if (algo_le)" means in the context of endianness, though I'll
give you that "if (algo_little_endian)" would be a lot better.

> 
> > @@ -861,7 +862,7 @@ static int mv_cesa_md5_init(struct ahash_request *req)
> >  
> > mv_cesa_set_op_cfg(, CESA_SA_DESC_CFG_MACM_MD5);
> >  
> > -   mv_cesa_ahash_init(req, );
> > +   mv_cesa_ahash_init(req, , true);
> 
>   mv_cesa_ahash_init(req, , ALGO_ENDIAN_LITTLE);
> 
> 'true' doesn't seem as obvious.  But again, nit-picky.

I did think about:

enum {
ALGO_LITTLE_ENDIAN,
ALGO_BIG_ENDIAN,
};

and passing an int algo_endian around, but that seemed to be like too
much bloat... though if you want to insist, I could make that change.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/18] crypto: further fixes for Marvell CESA hash

2015-10-18 Thread Russell King - ARM Linux
Following on from the previous series, this series addresses further
problems with the Marvell CESA hash driver found while testing it my
openssl/ssh scenarios.

The first patch improves one from the previous series: we can get the
transform more directly using req->base.tfm rather than going round
the houses.

The next few patches rework the algorithm endianness conversions.
There are two things which depend on the algorithm endianness - the
format of the result, and the format of the bit length in the last
block.  We introduce a flag to convey this information, and keep
the creq->state format in CPU endian mode for consistency.

Some of the inconsistent hash results are down to the template
operation not being properly initialised - so we zero initialise all
template operations.

The following patches (from "factor out first fragment decisions to
helper") rework the digest handling to ensure that we always provide
the hardware with a complete block of data to hash, otherwise it can
be left mid-calculation, which then causes state to leak to
subsequent operations.  This requires a re-structure of the way we
put together the DMA entries, so it's done in relatively small steps.

This results in the CESA driver passing all tests I can throw at it
via the AF_ALG openssl plugin with the exception of asking for the
hash of /dev/null.  This returns an all zero result, rather than the
correct hash value.  This bug is pending further diagnosis, but it
is believed not to be a driver specific bug as iMX6 CAAM also behaves
the same.

Unfortunately, this is a large series, but the driver unfortunately
needs this level of bug fixing to work properly.

 drivers/crypto/marvell/cesa.h |  11 +-
 drivers/crypto/marvell/hash.c | 307 +-
 2 files changed, 164 insertions(+), 154 deletions(-)

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/18] crypto: marvell: factor out adding an operation and launching it

2015-10-18 Thread Russell King
Add a helper to add the fragment operation block followed by the DMA
entry to launch the operation.

Although at the moment this pattern only strictly appears at one site,
two other sites can be factored as well by slightly changing the order
in which the DMA operations are performed.  This should be harmless as
the only thing which matters is to have all the data loaded into SRAM
prior to launching the operation.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/hash.c | 74 +--
 1 file changed, 36 insertions(+), 38 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 938ecfeb8ffe..8111e73ca848 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -472,6 +472,29 @@ static int mv_cesa_ahash_cache_req(struct ahash_request 
*req, bool *cached)
 }
 
 static struct mv_cesa_op_ctx *
+mv_cesa_dma_add_frag(struct mv_cesa_tdma_chain *chain,
+struct mv_cesa_op_ctx *tmpl, unsigned int frag_len,
+gfp_t flags)
+{
+   struct mv_cesa_op_ctx *op;
+   int ret;
+
+   op = mv_cesa_dma_add_op(chain, tmpl, false, flags);
+   if (IS_ERR(op))
+   return op;
+
+   /* Set the operation block fragment length. */
+   mv_cesa_set_mac_op_frag_len(op, frag_len);
+
+   /* Append dummy desc to launch operation */
+   ret = mv_cesa_dma_add_dummy_launch(chain, flags);
+   if (ret)
+   return ERR_PTR(ret);
+
+   return op;
+}
+
+static struct mv_cesa_op_ctx *
 mv_cesa_ahash_dma_add_cache(struct mv_cesa_tdma_chain *chain,
struct mv_cesa_ahash_dma_iter *dma_iter,
struct mv_cesa_ahash_req *creq,
@@ -493,18 +516,9 @@ mv_cesa_ahash_dma_add_cache(struct mv_cesa_tdma_chain 
*chain,
if (ret)
return ERR_PTR(ret);
 
-   if (!dma_iter->base.op_len) {
-   op = mv_cesa_dma_add_op(chain, >op_tmpl, false, flags);
-   if (IS_ERR(op))
-   return op;
-
-   mv_cesa_set_mac_op_frag_len(op, creq->cache_ptr);
-
-   /* Add dummy desc to launch crypto operation */
-   ret = mv_cesa_dma_add_dummy_launch(chain, flags);
-   if (ret)
-   return ERR_PTR(ret);
-   }
+   if (!dma_iter->base.op_len)
+   op = mv_cesa_dma_add_frag(chain, >op_tmpl,
+ creq->cache_ptr, flags);
 
return op;
 }
@@ -518,28 +532,22 @@ mv_cesa_ahash_dma_add_data(struct mv_cesa_tdma_chain 
*chain,
struct mv_cesa_op_ctx *op;
int ret;
 
-   op = mv_cesa_dma_add_op(chain, >op_tmpl, false, flags);
+   /* Add input transfers */
+   ret = mv_cesa_dma_add_op_transfers(chain, _iter->base,
+  _iter->src, flags);
+   if (ret)
+   return ERR_PTR(ret);
+
+   op = mv_cesa_dma_add_frag(chain, >op_tmpl, dma_iter->base.op_len,
+ flags);
if (IS_ERR(op))
return op;
 
-   mv_cesa_set_mac_op_frag_len(op, dma_iter->base.op_len);
-
if (mv_cesa_mac_op_is_first_frag(>op_tmpl))
mv_cesa_update_op_cfg(>op_tmpl,
  CESA_SA_DESC_CFG_MID_FRAG,
  CESA_SA_DESC_CFG_FRAG_MSK);
 
-   /* Add input transfers */
-   ret = mv_cesa_dma_add_op_transfers(chain, _iter->base,
-  _iter->src, flags);
-   if (ret)
-   return ERR_PTR(ret);
-
-   /* Add dummy desc to launch crypto operation */
-   ret = mv_cesa_dma_add_dummy_launch(chain, flags);
-   if (ret)
-   return ERR_PTR(ret);
-
return op;
 }
 
@@ -603,12 +611,6 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain 
*chain,
  CESA_SA_DESC_CFG_MID_FRAG,
  CESA_SA_DESC_CFG_FRAG_MSK);
 
-   op = mv_cesa_dma_add_op(chain, >op_tmpl, false, flags);
-   if (IS_ERR(op))
-   return op;
-
-   mv_cesa_set_mac_op_frag_len(op, trailerlen - padoff);
-
ret = mv_cesa_dma_add_data_transfer(chain,
CESA_SA_DATA_SRAM_OFFSET,
ahashdreq->padding_dma +
@@ -619,12 +621,8 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain 
*chain,
if (ret)
return ERR_PTR(ret);
 
-   /* Add dummy desc to launch crypto operation */
-   ret = mv_cesa_dma_add_dummy_launch(chain, flags);
-   if (ret)
-   return ERR_PTR(ret);
-
-   return op;
+   return mv_cesa_dma_add_frag(chain, >op_tmpl, trailerlen - padoff,
+   flags);
 }
 
 static int 

[PATCH 12/18] crypto: marvell: ensure iter.base.op_len is the full op length

2015-10-18 Thread Russell King
When we process the last request of data, and the request contains user
data, the loop in mv_cesa_ahash_dma_req_init() marks the first data size
as being iter.base.op_len which does not include the size of the cache
data.  This means we end up hashing an insufficient amount of data.

Fix this by always including the cache size in the first operation
length of any request.

This has the effect that for a request containing no user data,

iter.base.op_len === iter.src.op_offset === creq->cache_ptr

As a result, we include one further change to use iter.base.op_len in
the cache-but-no-user-data case to make the next change clearer.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/hash.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index c4693321fdc8..b4da73d294af 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -27,10 +27,10 @@ mv_cesa_ahash_req_iter_init(struct mv_cesa_ahash_dma_iter 
*iter,
struct ahash_request *req)
 {
struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
-   unsigned int len = req->nbytes;
+   unsigned int len = req->nbytes + creq->cache_ptr;
 
if (!creq->last_req)
-   len = (len + creq->cache_ptr) & ~CESA_HASH_BLOCK_SIZE_MSK;
+   len &= ~CESA_HASH_BLOCK_SIZE_MSK;
 
mv_cesa_req_dma_iter_init(>base, len);
mv_cesa_sg_dma_iter_init(>src, req->src, DMA_TO_DEVICE);
@@ -646,10 +646,10 @@ static int mv_cesa_ahash_dma_req_init(struct 
ahash_request *req)
goto err_free_tdma;
}
} while (mv_cesa_ahash_req_iter_next_op());
-   } else if (creq->cache_ptr) {
+   } else if (iter.base.op_len) {
/* Account for the data that was in the cache. */
op = mv_cesa_dma_add_frag(, >op_tmpl,
- creq->cache_ptr, flags);
+ iter.base.op_len, flags);
if (IS_ERR(op)) {
ret = PTR_ERR(op);
goto err_free_tdma;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/18] crypto: marvell: move mv_cesa_dma_add_frag() calls

2015-10-18 Thread Russell King
Move the calls to mv_cesa_dma_add_frag() into the parent function,
mv_cesa_ahash_dma_req_init().  This is in preparation to changing
when we generate the operation blocks, as we need to avoid generating
a block for a partial hash block at the end of the user data.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/hash.c | 71 ++-
 1 file changed, 29 insertions(+), 42 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index f567243da005..787cec1715ed 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -499,51 +499,23 @@ mv_cesa_dma_add_frag(struct mv_cesa_tdma_chain *chain,
return op;
 }
 
-static struct mv_cesa_op_ctx *
+static int
 mv_cesa_ahash_dma_add_cache(struct mv_cesa_tdma_chain *chain,
struct mv_cesa_ahash_dma_iter *dma_iter,
struct mv_cesa_ahash_req *creq,
gfp_t flags)
 {
struct mv_cesa_ahash_dma_req *ahashdreq = >req.dma;
-   struct mv_cesa_op_ctx *op = NULL;
-   int ret;
 
if (!creq->cache_ptr)
-   return NULL;
-
-   ret = mv_cesa_dma_add_data_transfer(chain,
-   CESA_SA_DATA_SRAM_OFFSET,
-   ahashdreq->cache_dma,
-   creq->cache_ptr,
-   CESA_TDMA_DST_IN_SRAM,
-   flags);
-   if (ret)
-   return ERR_PTR(ret);
-
-   if (!dma_iter->base.op_len)
-   op = mv_cesa_dma_add_frag(chain, >op_tmpl,
- creq->cache_ptr, flags);
-
-   return op;
-}
-
-static struct mv_cesa_op_ctx *
-mv_cesa_ahash_dma_add_data(struct mv_cesa_tdma_chain *chain,
-  struct mv_cesa_ahash_dma_iter *dma_iter,
-  struct mv_cesa_ahash_req *creq,
-  gfp_t flags)
-{
-   int ret;
-
-   /* Add input transfers */
-   ret = mv_cesa_dma_add_op_transfers(chain, _iter->base,
-  _iter->src, flags);
-   if (ret)
-   return ERR_PTR(ret);
+   return 0;
 
-   return mv_cesa_dma_add_frag(chain, >op_tmpl, 
dma_iter->base.op_len,
-   flags);
+   return mv_cesa_dma_add_data_transfer(chain,
+CESA_SA_DATA_SRAM_OFFSET,
+ahashdreq->cache_dma,
+creq->cache_ptr,
+CESA_TDMA_DST_IN_SRAM,
+flags);
 }
 
 static struct mv_cesa_op_ctx *
@@ -647,19 +619,34 @@ static int mv_cesa_ahash_dma_req_init(struct 
ahash_request *req)
mv_cesa_tdma_desc_iter_init();
mv_cesa_ahash_req_iter_init(, req);
 
-   op = mv_cesa_ahash_dma_add_cache(, ,
-creq, flags);
-   if (IS_ERR(op)) {
-   ret = PTR_ERR(op);
+   /*
+* Add the cache (left-over data from a previous block) first.
+* This will never overflow the SRAM size.
+*/
+   ret = mv_cesa_ahash_dma_add_cache(, , creq, flags);
+   if (ret)
goto err_free_tdma;
+
+   if (creq->cache_ptr && !iter.base.op_len) {
+   op = mv_cesa_dma_add_frag(, >op_tmpl,
+ creq->cache_ptr, flags);
+   if (IS_ERR(op)) {
+   ret = PTR_ERR(op);
+   goto err_free_tdma;
+   }
}
 
do {
if (!iter.base.op_len)
break;
 
-   op = mv_cesa_ahash_dma_add_data(, ,
-   creq, flags);
+   ret = mv_cesa_dma_add_op_transfers(, ,
+  , flags);
+   if (ret)
+   goto err_free_tdma;
+
+   op = mv_cesa_dma_add_frag(, >op_tmpl,
+ iter.base.op_len, flags);
if (IS_ERR(op)) {
ret = PTR_ERR(op);
goto err_free_tdma;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/18] crypto: marvell: rearrange last request handling

2015-10-18 Thread Russell King
Move the test for the last request out of mv_cesa_ahash_dma_last_req()
to its caller, and move the mv_cesa_dma_add_frag() down into this
function.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/hash.c | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index dc8ab343ad9f..b8ed0478031a 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -522,15 +522,21 @@ static struct mv_cesa_op_ctx *
 mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
   struct mv_cesa_ahash_dma_iter *dma_iter,
   struct mv_cesa_ahash_req *creq,
-  struct mv_cesa_op_ctx *op,
-  gfp_t flags)
+  unsigned int frag_len, gfp_t flags)
 {
struct mv_cesa_ahash_dma_req *ahashdreq = >req.dma;
unsigned int len, trailerlen, padoff = 0;
+   struct mv_cesa_op_ctx *op;
int ret;
 
-   if (!creq->last_req)
-   return op;
+   if (frag_len) {
+   op = mv_cesa_dma_add_frag(chain, >op_tmpl, frag_len,
+ flags);
+   if (IS_ERR(op))
+   return op;
+   } else {
+   op = NULL;
+   }
 
if (op && creq->len <= CESA_SA_DESC_MAC_SRC_TOTAL_LEN_MAX) {
u32 frag = CESA_SA_DESC_CFG_NOT_FRAG;
@@ -657,16 +663,18 @@ static int mv_cesa_ahash_dma_req_init(struct 
ahash_request *req)
frag_len = iter.base.op_len;
}
 
-   if (frag_len) {
+   /*
+* At this point, frag_len indicates whether we have any data
+* outstanding which needs an operation.  Queue up the final
+* operation, which depends whether this is the final request.
+*/
+   if (creq->last_req)
+   op = mv_cesa_ahash_dma_last_req(, , creq, frag_len,
+   flags);
+   else if (frag_len)
op = mv_cesa_dma_add_frag(, >op_tmpl, frag_len,
  flags);
-   if (IS_ERR(op)) {
-   ret = PTR_ERR(op);
-   goto err_free_tdma;
-   }
-   }
 
-   op = mv_cesa_ahash_dma_last_req(, , creq, op, flags);
if (IS_ERR(op)) {
ret = PTR_ERR(op);
goto err_free_tdma;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/18] crypto: marvell: fix the bit length endianness

2015-10-18 Thread Russell King
The endianness of the bit length used in the final stage depends on the
endianness of the algorithm - md5 hashes need it to be in little endian
format, whereas SHA hashes need it in big endian format.  Use the
previously added algorithm endianness flag to control this.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/hash.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index aa12274608ab..dc2c65b52d7a 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -179,7 +179,6 @@ static int mv_cesa_ahash_pad_len(struct mv_cesa_ahash_req 
*creq)
 
 static int mv_cesa_ahash_pad_req(struct mv_cesa_ahash_req *creq, u8 *buf)
 {
-   __be64 bits = cpu_to_be64(creq->len << 3);
unsigned int index, padlen;
 
buf[0] = 0x80;
@@ -187,7 +186,14 @@ static int mv_cesa_ahash_pad_req(struct mv_cesa_ahash_req 
*creq, u8 *buf)
index = creq->len & CESA_HASH_BLOCK_SIZE_MSK;
padlen = mv_cesa_ahash_pad_len(creq);
memset(buf + 1, 0, padlen - 1);
-   memcpy(buf + padlen, , sizeof(bits));
+
+   if (creq->algo_le) {
+   __le64 bits = cpu_to_le64(creq->len << 3);
+   memcpy(buf + padlen, , sizeof(bits));
+   } else {
+   __be64 bits = cpu_to_be64(creq->len << 3);
+   memcpy(buf + padlen, , sizeof(bits));
+   }
 
return padlen + 8;
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/18] crypto: marvell: add flag to determine algorithm endianness

2015-10-18 Thread Russell King
Rather than determining whether we're using a MD5 hash by looking at
the digest size, switch to a cleaner solution using a per-request flag
initialised by the method type.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/cesa.h |  1 +
 drivers/crypto/marvell/hash.c | 17 +
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index e19302c9dec9..5d5b66ea2ceb 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -612,6 +612,7 @@ struct mv_cesa_ahash_req {
u64 len;
int src_nents;
bool last_req;
+   bool algo_le;
u32 state[8];
 };
 
diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index f59faabcd34f..aa12274608ab 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -351,7 +351,7 @@ static int mv_cesa_ahash_process(struct 
crypto_async_request *req, u32 status)
 * Hardware's MD5 digest is in little endian format, but
 * SHA in big endian format
 */
-   if (digsize == MD5_DIGEST_SIZE) {
+   if (creq->algo_le) {
__le32 *result = (void *)ahashreq->result;
 
for (i = 0; i < digsize / 4; i++)
@@ -407,7 +407,7 @@ static const struct mv_cesa_req_ops mv_cesa_ahash_req_ops = 
{
 };
 
 static int mv_cesa_ahash_init(struct ahash_request *req,
- struct mv_cesa_op_ctx *tmpl)
+ struct mv_cesa_op_ctx *tmpl, bool algo_le)
 {
struct mv_cesa_ahash_req *creq = ahash_request_ctx(req);
 
@@ -421,6 +421,7 @@ static int mv_cesa_ahash_init(struct ahash_request *req,
mv_cesa_set_mac_op_frag_len(tmpl, 0);
creq->op_tmpl = *tmpl;
creq->len = 0;
+   creq->algo_le = algo_le;
 
return 0;
 }
@@ -861,7 +862,7 @@ static int mv_cesa_md5_init(struct ahash_request *req)
 
mv_cesa_set_op_cfg(, CESA_SA_DESC_CFG_MACM_MD5);
 
-   mv_cesa_ahash_init(req, );
+   mv_cesa_ahash_init(req, , true);
 
return 0;
 }
@@ -924,7 +925,7 @@ static int mv_cesa_sha1_init(struct ahash_request *req)
 
mv_cesa_set_op_cfg(, CESA_SA_DESC_CFG_MACM_SHA1);
 
-   mv_cesa_ahash_init(req, );
+   mv_cesa_ahash_init(req, , false);
 
return 0;
 }
@@ -987,7 +988,7 @@ static int mv_cesa_sha256_init(struct ahash_request *req)
 
mv_cesa_set_op_cfg(, CESA_SA_DESC_CFG_MACM_SHA256);
 
-   mv_cesa_ahash_init(req, );
+   mv_cesa_ahash_init(req, , false);
 
return 0;
 }
@@ -1218,7 +1219,7 @@ static int mv_cesa_ahmac_md5_init(struct ahash_request 
*req)
mv_cesa_set_op_cfg(, CESA_SA_DESC_CFG_MACM_HMAC_MD5);
memcpy(tmpl.ctx.hash.iv, ctx->iv, sizeof(ctx->iv));
 
-   mv_cesa_ahash_init(req, );
+   mv_cesa_ahash_init(req, , true);
 
return 0;
 }
@@ -1288,7 +1289,7 @@ static int mv_cesa_ahmac_sha1_init(struct ahash_request 
*req)
mv_cesa_set_op_cfg(, CESA_SA_DESC_CFG_MACM_HMAC_SHA1);
memcpy(tmpl.ctx.hash.iv, ctx->iv, sizeof(ctx->iv));
 
-   mv_cesa_ahash_init(req, );
+   mv_cesa_ahash_init(req, , false);
 
return 0;
 }
@@ -1378,7 +1379,7 @@ static int mv_cesa_ahmac_sha256_init(struct ahash_request 
*req)
mv_cesa_set_op_cfg(, CESA_SA_DESC_CFG_MACM_HMAC_SHA256);
memcpy(tmpl.ctx.hash.iv, ctx->iv, sizeof(ctx->iv));
 
-   mv_cesa_ahash_init(req, );
+   mv_cesa_ahash_init(req, , false);
 
return 0;
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/18] crypto: marvell: keep creq->state in CPU endian format at all times

2015-10-18 Thread Russell King
Currently, we read/write the state in CPU endian, but on the final
request, we convert its endian according to the requested algorithm.
(md5 is little endian, SHA are big endian.)

Always keep creq->state in CPU native endian format, and perform the
necessary conversion when copying the hash to the result.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/cesa.h |  2 +-
 drivers/crypto/marvell/hash.c | 25 ++---
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index bc2a55bc35e4..e19302c9dec9 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -612,7 +612,7 @@ struct mv_cesa_ahash_req {
u64 len;
int src_nents;
bool last_req;
-   __be32 state[8];
+   u32 state[8];
 };
 
 /* CESA functions */
diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index d9cc49e9c43b..f59faabcd34f 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -347,18 +347,21 @@ static int mv_cesa_ahash_process(struct 
crypto_async_request *req, u32 status)
   ahashreq->nbytes - creq->cache_ptr);
 
if (creq->last_req) {
-   for (i = 0; i < digsize / 4; i++) {
-   /*
-* Hardware provides MD5 digest in a different
-* endianness than SHA-1 and SHA-256 ones.
-*/
-   if (digsize == MD5_DIGEST_SIZE)
-   creq->state[i] = cpu_to_le32(creq->state[i]);
-   else
-   creq->state[i] = cpu_to_be32(creq->state[i]);
-   }
+   /*
+* Hardware's MD5 digest is in little endian format, but
+* SHA in big endian format
+*/
+   if (digsize == MD5_DIGEST_SIZE) {
+   __le32 *result = (void *)ahashreq->result;
+
+   for (i = 0; i < digsize / 4; i++)
+   result[i] = cpu_to_le32(creq->state[i]);
+   } else {
+   __be32 *result = (void *)ahashreq->result;
 
-   memcpy(ahashreq->result, creq->state, digsize);
+   for (i = 0; i < digsize / 4; i++)
+   result[i] = cpu_to_be32(creq->state[i]);
+   }
}
 
return ret;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/18] crypto: marvell: ensure template operation is initialised

2015-10-18 Thread Russell King
Ensure that the template operation is fully initialised, otherwise we
end up loading data from the kernel stack into the engines, which can
upset the hash results.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/hash.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index dc2c65b52d7a..82d9e3d09331 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -864,7 +864,7 @@ static int mv_cesa_ahash_import(struct ahash_request *req, 
const void *hash,
 
 static int mv_cesa_md5_init(struct ahash_request *req)
 {
-   struct mv_cesa_op_ctx tmpl;
+   struct mv_cesa_op_ctx tmpl = { };
 
mv_cesa_set_op_cfg(, CESA_SA_DESC_CFG_MACM_MD5);
 
@@ -927,7 +927,7 @@ struct ahash_alg mv_md5_alg = {
 
 static int mv_cesa_sha1_init(struct ahash_request *req)
 {
-   struct mv_cesa_op_ctx tmpl;
+   struct mv_cesa_op_ctx tmpl = { };
 
mv_cesa_set_op_cfg(, CESA_SA_DESC_CFG_MACM_SHA1);
 
@@ -990,7 +990,7 @@ struct ahash_alg mv_sha1_alg = {
 
 static int mv_cesa_sha256_init(struct ahash_request *req)
 {
-   struct mv_cesa_op_ctx tmpl;
+   struct mv_cesa_op_ctx tmpl = { };
 
mv_cesa_set_op_cfg(, CESA_SA_DESC_CFG_MACM_SHA256);
 
@@ -1220,7 +1220,7 @@ static int mv_cesa_ahmac_cra_init(struct crypto_tfm *tfm)
 static int mv_cesa_ahmac_md5_init(struct ahash_request *req)
 {
struct mv_cesa_hmac_ctx *ctx = crypto_tfm_ctx(req->base.tfm);
-   struct mv_cesa_op_ctx tmpl;
+   struct mv_cesa_op_ctx tmpl = { };
 
mv_cesa_set_op_cfg(, CESA_SA_DESC_CFG_MACM_HMAC_MD5);
memcpy(tmpl.ctx.hash.iv, ctx->iv, sizeof(ctx->iv));
@@ -1290,7 +1290,7 @@ struct ahash_alg mv_ahmac_md5_alg = {
 static int mv_cesa_ahmac_sha1_init(struct ahash_request *req)
 {
struct mv_cesa_hmac_ctx *ctx = crypto_tfm_ctx(req->base.tfm);
-   struct mv_cesa_op_ctx tmpl;
+   struct mv_cesa_op_ctx tmpl = { };
 
mv_cesa_set_op_cfg(, CESA_SA_DESC_CFG_MACM_HMAC_SHA1);
memcpy(tmpl.ctx.hash.iv, ctx->iv, sizeof(ctx->iv));
@@ -1380,7 +1380,7 @@ static int mv_cesa_ahmac_sha256_setkey(struct 
crypto_ahash *tfm, const u8 *key,
 static int mv_cesa_ahmac_sha256_init(struct ahash_request *req)
 {
struct mv_cesa_hmac_ctx *ctx = crypto_tfm_ctx(req->base.tfm);
-   struct mv_cesa_op_ctx tmpl;
+   struct mv_cesa_op_ctx tmpl = { };
 
mv_cesa_set_op_cfg(, CESA_SA_DESC_CFG_MACM_HMAC_SHA256);
memcpy(tmpl.ctx.hash.iv, ctx->iv, sizeof(ctx->iv));
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/18] crypto: marvell: always ensure mid-fragments after first-fragment

2015-10-18 Thread Russell King
If we add a template first-fragment operation, always update the
template to be a mid-fragment.  This ensures that mid-fragments
always follow on from a first fragment in every case.

This means we can move the first to mid-fragment update code out of
mv_cesa_ahash_dma_add_data().

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/hash.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 8111e73ca848..f567243da005 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -491,6 +491,11 @@ mv_cesa_dma_add_frag(struct mv_cesa_tdma_chain *chain,
if (ret)
return ERR_PTR(ret);
 
+   if (mv_cesa_mac_op_is_first_frag(tmpl))
+   mv_cesa_update_op_cfg(tmpl,
+ CESA_SA_DESC_CFG_MID_FRAG,
+ CESA_SA_DESC_CFG_FRAG_MSK);
+
return op;
 }
 
@@ -529,7 +534,6 @@ mv_cesa_ahash_dma_add_data(struct mv_cesa_tdma_chain *chain,
   struct mv_cesa_ahash_req *creq,
   gfp_t flags)
 {
-   struct mv_cesa_op_ctx *op;
int ret;
 
/* Add input transfers */
@@ -538,17 +542,8 @@ mv_cesa_ahash_dma_add_data(struct mv_cesa_tdma_chain 
*chain,
if (ret)
return ERR_PTR(ret);
 
-   op = mv_cesa_dma_add_frag(chain, >op_tmpl, dma_iter->base.op_len,
- flags);
-   if (IS_ERR(op))
-   return op;
-
-   if (mv_cesa_mac_op_is_first_frag(>op_tmpl))
-   mv_cesa_update_op_cfg(>op_tmpl,
- CESA_SA_DESC_CFG_MID_FRAG,
- CESA_SA_DESC_CFG_FRAG_MSK);
-
-   return op;
+   return mv_cesa_dma_add_frag(chain, >op_tmpl, 
dma_iter->base.op_len,
+   flags);
 }
 
 static struct mv_cesa_op_ctx *
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/18] crypto: marvell: avoid adding final operation within loop

2015-10-18 Thread Russell King
Avoid adding the final operation within the loop, but instead add it
outside.  We combine this with the handling for the no-data case.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/hash.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index b4da73d294af..dc8ab343ad9f 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -602,6 +602,7 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request 
*req)
struct mv_cesa_tdma_chain chain;
struct mv_cesa_ahash_dma_iter iter;
struct mv_cesa_op_ctx *op = NULL;
+   unsigned int frag_len;
int ret;
 
dreq->chain.first = NULL;
@@ -631,25 +632,34 @@ static int mv_cesa_ahash_dma_req_init(struct 
ahash_request *req)
/*
 * Add all the new data, inserting an operation block and
 * launch command between each full SRAM block-worth of
-* data.
+* data. We intentionally do not add the final op block.
 */
-   do {
+   while (true) {
ret = mv_cesa_dma_add_op_transfers(, ,
   , flags);
if (ret)
goto err_free_tdma;
 
+   frag_len = iter.base.op_len;
+
+   if (!mv_cesa_ahash_req_iter_next_op())
+   break;
+
op = mv_cesa_dma_add_frag(, >op_tmpl,
- iter.base.op_len, flags);
+ frag_len, flags);
if (IS_ERR(op)) {
ret = PTR_ERR(op);
goto err_free_tdma;
}
-   } while (mv_cesa_ahash_req_iter_next_op());
-   } else if (iter.base.op_len) {
+   }
+   } else {
/* Account for the data that was in the cache. */
-   op = mv_cesa_dma_add_frag(, >op_tmpl,
- iter.base.op_len, flags);
+   frag_len = iter.base.op_len;
+   }
+
+   if (frag_len) {
+   op = mv_cesa_dma_add_frag(, >op_tmpl, frag_len,
+ flags);
if (IS_ERR(op)) {
ret = PTR_ERR(op);
goto err_free_tdma;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/18] crypto: marvell: factor out first fragment decisions to helper

2015-10-18 Thread Russell King
Multiple locations in the driver test the operation context fragment
type, checking whether it is a first fragment or not.  Introduce a
mv_cesa_mac_op_is_first_frag() helper, which returns true if the
fragment operation is for a first fragment.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/cesa.h | 6 ++
 drivers/crypto/marvell/hash.c | 9 +++--
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index cd646d7c14e2..e9f732138ba3 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -686,6 +686,12 @@ static inline u32 mv_cesa_get_int_mask(struct 
mv_cesa_engine *engine)
return engine->int_mask;
 }
 
+static inline bool mv_cesa_mac_op_is_first_frag(const struct mv_cesa_op_ctx 
*op)
+{
+   return (mv_cesa_get_op_cfg(op) & CESA_SA_DESC_CFG_FRAG_MSK) ==
+   CESA_SA_DESC_CFG_FIRST_FRAG;
+}
+
 int mv_cesa_queue_req(struct crypto_async_request *req);
 
 /*
diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 82d9e3d09331..938ecfeb8ffe 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -524,8 +524,7 @@ mv_cesa_ahash_dma_add_data(struct mv_cesa_tdma_chain *chain,
 
mv_cesa_set_mac_op_frag_len(op, dma_iter->base.op_len);
 
-   if ((mv_cesa_get_op_cfg(>op_tmpl) & CESA_SA_DESC_CFG_FRAG_MSK) ==
-   CESA_SA_DESC_CFG_FIRST_FRAG)
+   if (mv_cesa_mac_op_is_first_frag(>op_tmpl))
mv_cesa_update_op_cfg(>op_tmpl,
  CESA_SA_DESC_CFG_MID_FRAG,
  CESA_SA_DESC_CFG_FRAG_MSK);
@@ -561,8 +560,7 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
if (op && creq->len <= CESA_SA_DESC_MAC_SRC_TOTAL_LEN_MAX) {
u32 frag = CESA_SA_DESC_CFG_NOT_FRAG;
 
-   if ((mv_cesa_get_op_cfg(op) & CESA_SA_DESC_CFG_FRAG_MSK) !=
-   CESA_SA_DESC_CFG_FIRST_FRAG)
+   if (!mv_cesa_mac_op_is_first_frag(op))
frag = CESA_SA_DESC_CFG_LAST_FRAG;
 
mv_cesa_update_op_cfg(op, frag, CESA_SA_DESC_CFG_FRAG_MSK);
@@ -600,8 +598,7 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain *chain,
if (padoff >= trailerlen)
return op;
 
-   if ((mv_cesa_get_op_cfg(>op_tmpl) & CESA_SA_DESC_CFG_FRAG_MSK) !=
-   CESA_SA_DESC_CFG_FIRST_FRAG)
+   if (!mv_cesa_mac_op_is_first_frag(>op_tmpl))
mv_cesa_update_op_cfg(>op_tmpl,
  CESA_SA_DESC_CFG_MID_FRAG,
  CESA_SA_DESC_CFG_FRAG_MSK);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 17/18] crypto: marvell: fix first-fragment handling in mv_cesa_ahash_dma_last_req()

2015-10-18 Thread Russell King
When adding the software padding, this must be done using the first/mid
fragment mode, and any subsequent operation needs to be a mid-fragment.
Fix this.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/hash.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index da541e59cc1d..34271e9eb3a5 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -575,20 +575,12 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain 
*chain,
if (IS_ERR(op))
return op;
 
-   mv_cesa_update_op_cfg(op, CESA_SA_DESC_CFG_MID_FRAG,
- CESA_SA_DESC_CFG_FRAG_MSK);
-
if (len == trailerlen)
return op;
 
padoff += len;
}
 
-   if (!mv_cesa_mac_op_is_first_frag(>op_tmpl))
-   mv_cesa_update_op_cfg(>op_tmpl,
- CESA_SA_DESC_CFG_MID_FRAG,
- CESA_SA_DESC_CFG_FRAG_MSK);
-
ret = mv_cesa_dma_add_data_transfer(chain,
CESA_SA_DATA_SRAM_OFFSET,
ahashdreq->padding_dma +
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/18] crypto: marvell: const-ify argument to mv_cesa_get_op_cfg()

2015-10-18 Thread Russell King
mv_cesa_get_op_cfg() does not write to its argument, it only reads.
So, let's make it const.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/cesa.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index 5d5b66ea2ceb..cd646d7c14e2 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -627,7 +627,7 @@ static inline void mv_cesa_update_op_cfg(struct 
mv_cesa_op_ctx *op,
op->desc.config |= cpu_to_le32(cfg);
 }
 
-static inline u32 mv_cesa_get_op_cfg(struct mv_cesa_op_ctx *op)
+static inline u32 mv_cesa_get_op_cfg(const struct mv_cesa_op_ctx *op)
 {
return le32_to_cpu(op->desc.config);
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/18] crypto: marvell: use presence of scatterlist to determine data load

2015-10-18 Thread Russell King
Use the presence of the scatterlist to determine whether we should load
any new user data to the engine.  The following shall always be true at
this point:

iter.base.op_len == 0 === iter.src.sg

In doing so, we can:

1. eliminate the test for iter.base.op_len inside the loop, which
   makes the loop operation more obvious and understandable.

2. move the operation generation for the cache-only case.

This prepares the code for the next step in its transformation, and also
uncovers a bug that will be fixed in the next patch.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/hash.c | 39 +--
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 787cec1715ed..c4693321fdc8 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -627,7 +627,27 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request 
*req)
if (ret)
goto err_free_tdma;
 
-   if (creq->cache_ptr && !iter.base.op_len) {
+   if (iter.src.sg) {
+   /*
+* Add all the new data, inserting an operation block and
+* launch command between each full SRAM block-worth of
+* data.
+*/
+   do {
+   ret = mv_cesa_dma_add_op_transfers(, ,
+  , flags);
+   if (ret)
+   goto err_free_tdma;
+
+   op = mv_cesa_dma_add_frag(, >op_tmpl,
+ iter.base.op_len, flags);
+   if (IS_ERR(op)) {
+   ret = PTR_ERR(op);
+   goto err_free_tdma;
+   }
+   } while (mv_cesa_ahash_req_iter_next_op());
+   } else if (creq->cache_ptr) {
+   /* Account for the data that was in the cache. */
op = mv_cesa_dma_add_frag(, >op_tmpl,
  creq->cache_ptr, flags);
if (IS_ERR(op)) {
@@ -636,23 +656,6 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request 
*req)
}
}
 
-   do {
-   if (!iter.base.op_len)
-   break;
-
-   ret = mv_cesa_dma_add_op_transfers(, ,
-  , flags);
-   if (ret)
-   goto err_free_tdma;
-
-   op = mv_cesa_dma_add_frag(, >op_tmpl,
- iter.base.op_len, flags);
-   if (IS_ERR(op)) {
-   ret = PTR_ERR(op);
-   goto err_free_tdma;
-   }
-   } while (mv_cesa_ahash_req_iter_next_op());
-
op = mv_cesa_ahash_dma_last_req(, , creq, op, flags);
if (IS_ERR(op)) {
ret = PTR_ERR(op);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/18] crypto: marvell: rearrange handling for hw finished hashes

2015-10-18 Thread Russell King
Rearrange the last request handling for hardware finished hashes
by moving the generation of the fragment operation into this path.
This results in a simplified sequence to handle this case, and
allows us to move the software padded case further down into the
function.  Add comments describing these parts.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/hash.c | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index b8ed0478031a..d2265beaaa6b 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -529,32 +529,45 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain 
*chain,
struct mv_cesa_op_ctx *op;
int ret;
 
-   if (frag_len) {
+   /*
+* If the transfer is smaller than our maximum length, and we have
+* some data outstanding, we can ask the engine to finish the hash.
+*/
+   if (creq->len <= CESA_SA_DESC_MAC_SRC_TOTAL_LEN_MAX && frag_len) {
op = mv_cesa_dma_add_frag(chain, >op_tmpl, frag_len,
  flags);
if (IS_ERR(op))
return op;
-   } else {
-   op = NULL;
-   }
-
-   if (op && creq->len <= CESA_SA_DESC_MAC_SRC_TOTAL_LEN_MAX) {
-   u32 frag = CESA_SA_DESC_CFG_NOT_FRAG;
-
-   if (!mv_cesa_mac_op_is_first_frag(op))
-   frag = CESA_SA_DESC_CFG_LAST_FRAG;
 
-   mv_cesa_update_op_cfg(op, frag, CESA_SA_DESC_CFG_FRAG_MSK);
+   mv_cesa_set_mac_op_total_len(op, creq->len);
+   mv_cesa_update_op_cfg(op, mv_cesa_mac_op_is_first_frag(op) ?
+   CESA_SA_DESC_CFG_NOT_FRAG :
+   CESA_SA_DESC_CFG_LAST_FRAG,
+ CESA_SA_DESC_CFG_FRAG_MSK);
 
return op;
}
 
+   /*
+* The request is longer than the engine can handle, or we have
+* no data outstanding. Manually generate the padding, adding it
+* as a "mid" fragment.
+*/
ret = mv_cesa_ahash_dma_alloc_padding(ahashdreq, flags);
if (ret)
return ERR_PTR(ret);
 
trailerlen = mv_cesa_ahash_pad_req(creq, ahashdreq->padding);
 
+   if (frag_len) {
+   op = mv_cesa_dma_add_frag(chain, >op_tmpl, frag_len,
+ flags);
+   if (IS_ERR(op))
+   return op;
+   } else {
+   op = NULL;
+   }
+
if (op) {
len = min(CESA_SA_SRAM_PAYLOAD_SIZE - dma_iter->base.op_len,
  trailerlen);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 18/18] crypto: marvell/cesa: fix memory leak

2015-10-18 Thread Russell King
From: Boris Brezillon <boris.brezil...@free-electrons.com>
To: Boris Brezillon <boris.brezil...@free-electrons.com>,Arnaud Ebalard 
<a...@natisbad.org>,Thomas Petazzoni 
<thomas.petazz...@free-electrons.com>,Jason Cooper <ja...@lakedaemon.net>

The local chain variable is not cleaned up if an error occurs in the middle
of DMA chain creation. Fix that by dropping the local chain variable and
using the dreq->chain field which will be cleaned up by
mv_cesa_dma_cleanup() in case of errors.

Signed-off-by: Boris Brezillon <boris.brezil...@free-electrons.com>
Reported-by: Thomas Petazzoni <thomas.petazz...@free-electrons.com>
Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/hash.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 34271e9eb3a5..7cd0f0decf6c 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -602,7 +602,6 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request 
*req)
  GFP_KERNEL : GFP_ATOMIC;
struct mv_cesa_ahash_dma_req *ahashdreq = >req.dma;
struct mv_cesa_tdma_req *dreq = >base;
-   struct mv_cesa_tdma_chain chain;
struct mv_cesa_ahash_dma_iter iter;
struct mv_cesa_op_ctx *op = NULL;
unsigned int frag_len;
@@ -620,14 +619,14 @@ static int mv_cesa_ahash_dma_req_init(struct 
ahash_request *req)
}
}
 
-   mv_cesa_tdma_desc_iter_init();
+   mv_cesa_tdma_desc_iter_init(>chain);
mv_cesa_ahash_req_iter_init(, req);
 
/*
 * Add the cache (left-over data from a previous block) first.
 * This will never overflow the SRAM size.
 */
-   ret = mv_cesa_ahash_dma_add_cache(, , creq, flags);
+   ret = mv_cesa_ahash_dma_add_cache(>chain, , creq, flags);
if (ret)
goto err_free_tdma;
 
@@ -638,7 +637,8 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request 
*req)
 * data. We intentionally do not add the final op block.
 */
while (true) {
-   ret = mv_cesa_dma_add_op_transfers(, ,
+   ret = mv_cesa_dma_add_op_transfers(>chain,
+  ,
   , flags);
if (ret)
goto err_free_tdma;
@@ -648,7 +648,7 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request 
*req)
if (!mv_cesa_ahash_req_iter_next_op())
break;
 
-   op = mv_cesa_dma_add_frag(, >op_tmpl,
+   op = mv_cesa_dma_add_frag(>chain, >op_tmpl,
  frag_len, flags);
if (IS_ERR(op)) {
ret = PTR_ERR(op);
@@ -666,11 +666,11 @@ static int mv_cesa_ahash_dma_req_init(struct 
ahash_request *req)
 * operation, which depends whether this is the final request.
 */
if (creq->last_req)
-   op = mv_cesa_ahash_dma_last_req(, , creq, frag_len,
-   flags);
+   op = mv_cesa_ahash_dma_last_req(>chain, , creq,
+   frag_len, flags);
else if (frag_len)
-   op = mv_cesa_dma_add_frag(, >op_tmpl, frag_len,
- flags);
+   op = mv_cesa_dma_add_frag(>chain, >op_tmpl,
+ frag_len, flags);
 
if (IS_ERR(op)) {
ret = PTR_ERR(op);
@@ -679,7 +679,7 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request 
*req)
 
if (op) {
/* Add dummy desc to wait for crypto operation end */
-   ret = mv_cesa_dma_add_dummy_end(, flags);
+   ret = mv_cesa_dma_add_dummy_end(>chain, flags);
if (ret)
goto err_free_tdma;
}
@@ -690,8 +690,6 @@ static int mv_cesa_ahash_dma_req_init(struct ahash_request 
*req)
else
creq->cache_ptr = 0;
 
-   dreq->chain = chain;
-
return 0;
 
 err_free_tdma:
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/18] crypto: marvell: rearrange handling for sw padded hashes

2015-10-18 Thread Russell King
Rearrange the last request handling for hashes which require software
padding.

We prepare the padding to be appended, and then append as much of the
padding to any existing data that's already queued up, adding an
operation block and launching the operation.

Any remainder is then appended as a separate operation.

This ensures that the hardware only ever sees multiples of the hash
block size to be operated on for software padded hashes, thus ensuring
that the engine always indicates that it has finished the calculation.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/hash.c | 44 ++-
 1 file changed, 18 insertions(+), 26 deletions(-)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index d2265beaaa6b..da541e59cc1d 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -559,38 +559,30 @@ mv_cesa_ahash_dma_last_req(struct mv_cesa_tdma_chain 
*chain,
 
trailerlen = mv_cesa_ahash_pad_req(creq, ahashdreq->padding);
 
-   if (frag_len) {
-   op = mv_cesa_dma_add_frag(chain, >op_tmpl, frag_len,
- flags);
-   if (IS_ERR(op))
-   return op;
-   } else {
-   op = NULL;
-   }
-
-   if (op) {
-   len = min(CESA_SA_SRAM_PAYLOAD_SIZE - dma_iter->base.op_len,
- trailerlen);
-   if (len) {
-   ret = mv_cesa_dma_add_data_transfer(chain,
+   len = min(CESA_SA_SRAM_PAYLOAD_SIZE - frag_len, trailerlen);
+   if (len) {
+   ret = mv_cesa_dma_add_data_transfer(chain,
CESA_SA_DATA_SRAM_OFFSET +
-   dma_iter->base.op_len,
+   frag_len,
ahashdreq->padding_dma,
len, CESA_TDMA_DST_IN_SRAM,
flags);
-   if (ret)
-   return ERR_PTR(ret);
+   if (ret)
+   return ERR_PTR(ret);
 
-   mv_cesa_update_op_cfg(op, CESA_SA_DESC_CFG_MID_FRAG,
- CESA_SA_DESC_CFG_FRAG_MSK);
-   mv_cesa_set_mac_op_frag_len(op,
-   dma_iter->base.op_len + len);
-   padoff += len;
-   }
-   }
+   op = mv_cesa_dma_add_frag(chain, >op_tmpl, frag_len + len,
+ flags);
+   if (IS_ERR(op))
+   return op;
 
-   if (padoff >= trailerlen)
-   return op;
+   mv_cesa_update_op_cfg(op, CESA_SA_DESC_CFG_MID_FRAG,
+ CESA_SA_DESC_CFG_FRAG_MSK);
+
+   if (len == trailerlen)
+   return op;
+
+   padoff += len;
+   }
 
if (!mv_cesa_mac_op_is_first_frag(>op_tmpl))
mv_cesa_update_op_cfg(>op_tmpl,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/6] Fix CAAM hash driver

2015-10-18 Thread Russell King - ARM Linux
The following series fixes the CAAM hash driver, allowing it to work
with the previously merged "crypto: ahash - ensure statesize is non-
zero" patch.

This is non-trivial, because CAAM exports a huge 1600 bytes of data,
which, if we set .statesize to this, still results in the core code
rejecting the driver.  So, we need to shrink the amount of state
exported.

The first, most obvious one to get rid of is the export of the
caam_hash_ctx structure, which is shared between the socket being
exported from and imported to - copying it away and back again was
a complete no-op.

The second is that we don't need to export both pending-bytes buffers.
Only one will be in use at any time.

A problem was encountered while testing, where the size of the pending
bytes buffer was not added to the scatterlist with the correct length.
This is also fixed in this series, by patch 3.  This bug was introduced
by a prior commit trying to fix a tcrypt error.  However, the change is
wrong, but I have to question whether the test was correct or not - the
backtrace contains a function "test_ahash_pnum" which doesn't seem to
exist in mainline, nor does it seem to exist in any previous mainline
kernel.

Version 2 of this series addresses a mismerge in patch 5 of the
original series, and adds further information to patch 3.

Further testing with tcrypt showed up a problem identified by the
DMA API debugging (different to the original one referred to in
patch 3) where we leak DMA mappings.

 drivers/crypto/caam/caamhash.c | 101 +
 1 file changed, 71 insertions(+), 30 deletions(-)

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] crypto: caam: print errno code when hash registration fails

2015-10-18 Thread Russell King
Print the errno code when hash registration fails, so we know why the
failure occurred.  This aids debugging.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 94433b9fc200..2faf71ccbd43 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -1952,8 +1952,9 @@ static int __init caam_algapi_hash_init(void)
 
err = crypto_register_ahash(_alg->ahash_alg);
if (err) {
-   pr_warn("%s alg registration failed\n",
-   t_alg->ahash_alg.halg.base.cra_driver_name);
+   pr_warn("%s alg registration failed: %d\n",
+   t_alg->ahash_alg.halg.base.cra_driver_name,
+   err);
kfree(t_alg);
} else
list_add_tail(_alg->entry, _list);
@@ -1968,8 +1969,9 @@ static int __init caam_algapi_hash_init(void)
 
err = crypto_register_ahash(_alg->ahash_alg);
if (err) {
-   pr_warn("%s alg registration failed\n",
-   t_alg->ahash_alg.halg.base.cra_driver_name);
+   pr_warn("%s alg registration failed: %d\n",
+   t_alg->ahash_alg.halg.base.cra_driver_name,
+   err);
kfree(t_alg);
} else
list_add_tail(_alg->entry, _list);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] crypto: caam: fix non-block aligned hash calculation

2015-10-18 Thread Russell King
caam does not properly calculate the size of the retained state
when non-block aligned hashes are requested - it uses the wrong
buffer sizes, which results in errors such as:

caam_jr 2102000.jr1: 4501: DECO: desc idx 5: SGT Length Error. The 
descriptor is trying to read more data than is contained in the SGT table.

We end up here with:

in_len 0x46 blocksize 0x40 last_bufsize 0x0 next_bufsize 0x6
to_hash 0x40 ctx_len 0x28 nbytes 0x20

which results in a job descriptor of:

jobdesc@889: ed03d918: b0861c08 3daa0080 f140 3d03d938
jobdesc@889: ed03d928: 0068 f840 3cde2a40 0028

where the word at 0xed03d928 is the expected data size (0x68), and a
scatterlist containing:

sg@892: ed03d938:  3cde2a40 0028 
sg@892: ed03d948:  3d03d100 0006 
sg@892: ed03d958:  7e8aa700 4020 

0x68 comes from 0x28 (the context size) plus the "in_len" rounded down
to a block size (0x40).  in_len comes from 0x26 bytes of unhashed data
from the previous operation, plus the 0x20 bytes from the latest
operation.

The fixed version would create:

sg@892: ed03d938:  3cde2a40 0028 
sg@892: ed03d948:  3d03d100 0026 
sg@892: ed03d958:  7e8aa700 4020 

which replaces the 0x06 length with the correct 0x26 bytes of previously
unhashed data.

This fixes a previous commit which erroneously "fixed" this due to a
DMA-API bug report; that commit indicates that the bug was caused via a
test_ahash_pnum() function in the tcrypt module.  No such function has
ever existed in the mainline kernel.  Given that the change in this
commit has been tested with DMA API debug enabled and shows no issue,
I can only conclude that test_ahash_pnum() was triggering that bad
behaviour by CAAM.

Fixes: 7d5196aba3c8 ("crypto: caam - Correct DMA unmap size in 
ahash_update_ctx()")
Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 3ce6083c2e43..dcee360065f3 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -829,7 +829,7 @@ static int ahash_update_ctx(struct ahash_request *req)
state->buf_dma = try_buf_map_to_sec4_sg(jrdev,
edesc->sec4_sg + 1,
buf, state->buf_dma,
-   *next_buflen, *buflen);
+   *buflen, last_buflen);
 
if (src_nents) {
src_map_to_sec4_sg(jrdev, req->src, src_nents,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] crypto: caam: avoid needlessly saving and restoring caam_hash_ctx

2015-10-18 Thread Russell King
When exporting and importing the hash state, we will only export and
import into hashes which share the same struct crypto_ahash pointer.
(See hash_accept->af_alg_accept->hash_accept_parent.)

This means that saving the caam_hash_ctx structure on export, and
restoring it on import is a waste of resources.  So, remove this code.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 2faf71ccbd43..3ce6083c2e43 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -1575,24 +1575,20 @@ static int ahash_final(struct ahash_request *req)
 static int ahash_export(struct ahash_request *req, void *out)
 {
struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
-   struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash);
struct caam_hash_state *state = ahash_request_ctx(req);
 
-   memcpy(out, ctx, sizeof(struct caam_hash_ctx));
-   memcpy(out + sizeof(struct caam_hash_ctx), state,
-  sizeof(struct caam_hash_state));
+   memcpy(out, state, sizeof(struct caam_hash_state));
+
return 0;
 }
 
 static int ahash_import(struct ahash_request *req, const void *in)
 {
struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
-   struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash);
struct caam_hash_state *state = ahash_request_ctx(req);
 
-   memcpy(ctx, in, sizeof(struct caam_hash_ctx));
-   memcpy(state, in + sizeof(struct caam_hash_ctx),
-  sizeof(struct caam_hash_state));
+   memcpy(state, in, sizeof(struct caam_hash_state));
+
return 0;
 }
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] crypto: caam: fix indentation of close braces

2015-10-18 Thread Russell King
The kernel's coding style suggests that closing braces for initialisers
should not be aligned to the open brace column.  The CodingStyle doc
shows how this should be done.  Remove the additional tab.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 7dd80b0b3f51..f30c93840bba 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -1653,8 +1653,8 @@ static struct caam_hash_template driver_hash[] = {
.halg = {
.digestsize = SHA1_DIGEST_SIZE,
.statesize = sizeof(struct caam_export_state),
-   },
},
+   },
.alg_type = OP_ALG_ALGSEL_SHA1,
.alg_op = OP_ALG_ALGSEL_SHA1 | OP_ALG_AAI_HMAC,
}, {
@@ -1675,8 +1675,8 @@ static struct caam_hash_template driver_hash[] = {
.halg = {
.digestsize = SHA224_DIGEST_SIZE,
.statesize = sizeof(struct caam_export_state),
-   },
},
+   },
.alg_type = OP_ALG_ALGSEL_SHA224,
.alg_op = OP_ALG_ALGSEL_SHA224 | OP_ALG_AAI_HMAC,
}, {
@@ -1697,8 +1697,8 @@ static struct caam_hash_template driver_hash[] = {
.halg = {
.digestsize = SHA256_DIGEST_SIZE,
.statesize = sizeof(struct caam_export_state),
-   },
},
+   },
.alg_type = OP_ALG_ALGSEL_SHA256,
.alg_op = OP_ALG_ALGSEL_SHA256 | OP_ALG_AAI_HMAC,
}, {
@@ -1719,8 +1719,8 @@ static struct caam_hash_template driver_hash[] = {
.halg = {
.digestsize = SHA384_DIGEST_SIZE,
.statesize = sizeof(struct caam_export_state),
-   },
},
+   },
.alg_type = OP_ALG_ALGSEL_SHA384,
.alg_op = OP_ALG_ALGSEL_SHA384 | OP_ALG_AAI_HMAC,
}, {
@@ -1741,8 +1741,8 @@ static struct caam_hash_template driver_hash[] = {
.halg = {
.digestsize = SHA512_DIGEST_SIZE,
.statesize = sizeof(struct caam_export_state),
-   },
},
+   },
.alg_type = OP_ALG_ALGSEL_SHA512,
.alg_op = OP_ALG_ALGSEL_SHA512 | OP_ALG_AAI_HMAC,
}, {
@@ -1763,8 +1763,8 @@ static struct caam_hash_template driver_hash[] = {
.halg = {
.digestsize = MD5_DIGEST_SIZE,
.statesize = sizeof(struct caam_export_state),
-   },
},
+   },
.alg_type = OP_ALG_ALGSEL_MD5,
.alg_op = OP_ALG_ALGSEL_MD5 | OP_ALG_AAI_HMAC,
},
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] crypto: caam: only export the state we really need to export

2015-10-18 Thread Russell King
Avoid exporting lots of state by only exporting what we really require,
which is the buffer containing the set of pending bytes to be hashed,
number of pending bytes, the context buffer, and the function pointer
state.  This reduces down the exported state size to 216 bytes from
576 bytes.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 44 ++
 1 file changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index dcee360065f3..7dd80b0b3f51 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -134,6 +134,15 @@ struct caam_hash_state {
int current_buf;
 };
 
+struct caam_export_state {
+   u8 buf[CAAM_MAX_HASH_BLOCK_SIZE];
+   u8 caam_ctx[MAX_CTX_LEN];
+   int buflen;
+   int (*update)(struct ahash_request *req);
+   int (*final)(struct ahash_request *req);
+   int (*finup)(struct ahash_request *req);
+};
+
 /* Common job descriptor seq in/out ptr routines */
 
 /* Map state->caam_ctx, and append seq_out_ptr command that points to it */
@@ -1574,20 +1583,41 @@ static int ahash_final(struct ahash_request *req)
 
 static int ahash_export(struct ahash_request *req, void *out)
 {
-   struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
struct caam_hash_state *state = ahash_request_ctx(req);
+   struct caam_export_state *export = out;
+   int len;
+   u8 *buf;
 
-   memcpy(out, state, sizeof(struct caam_hash_state));
+   if (state->current_buf) {
+   buf = state->buf_1;
+   len = state->buflen_1;
+   } else {
+   buf = state->buf_0;
+   len = state->buflen_1;
+   }
+
+   memcpy(export->buf, buf, len);
+   memcpy(export->caam_ctx, state->caam_ctx, sizeof(export->caam_ctx));
+   export->buflen = len;
+   export->update = state->update;
+   export->final = state->final;
+   export->finup = state->finup;
 
return 0;
 }
 
 static int ahash_import(struct ahash_request *req, const void *in)
 {
-   struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
struct caam_hash_state *state = ahash_request_ctx(req);
+   const struct caam_export_state *export = in;
 
-   memcpy(state, in, sizeof(struct caam_hash_state));
+   memset(state, 0, sizeof(*state));
+   memcpy(state->buf_0, export->buf, export->buflen);
+   memcpy(state->caam_ctx, export->caam_ctx, sizeof(state->caam_ctx));
+   state->buflen_0 = export->buflen;
+   state->update = export->update;
+   state->final = export->final;
+   state->finup = export->finup;
 
return 0;
 }
@@ -1622,6 +1652,7 @@ static struct caam_hash_template driver_hash[] = {
.setkey = ahash_setkey,
.halg = {
.digestsize = SHA1_DIGEST_SIZE,
+   .statesize = sizeof(struct caam_export_state),
},
},
.alg_type = OP_ALG_ALGSEL_SHA1,
@@ -1643,6 +1674,7 @@ static struct caam_hash_template driver_hash[] = {
.setkey = ahash_setkey,
.halg = {
.digestsize = SHA224_DIGEST_SIZE,
+   .statesize = sizeof(struct caam_export_state),
},
},
.alg_type = OP_ALG_ALGSEL_SHA224,
@@ -1664,6 +1696,7 @@ static struct caam_hash_template driver_hash[] = {
.setkey = ahash_setkey,
.halg = {
.digestsize = SHA256_DIGEST_SIZE,
+   .statesize = sizeof(struct caam_export_state),
},
},
.alg_type = OP_ALG_ALGSEL_SHA256,
@@ -1685,6 +1718,7 @@ static struct caam_hash_template driver_hash[] = {
.setkey = ahash_setkey,
.halg = {
.digestsize = SHA384_DIGEST_SIZE,
+   .statesize = sizeof(struct caam_export_state),
},
},
.alg_type = OP_ALG_ALGSEL_SHA384,
@@ -1706,6 +1740,7 @@ static struct caam_hash_template driver_hash[] = {
.setkey = ahash_setkey,
.halg = {
.digestsize = SHA512_DIGEST_SIZE,
+   .statesize = sizeof(struct caam_export_state),
},
},
.alg_type = OP_ALG_ALGSEL_SHA512,
@@ -1727,6 +1762,7 @@ static struct caam_hash_templ

[PATCH 6/6] crypto: caam: fix DMA API leak

2015-10-18 Thread Russell King
caamhash contains this weird code:

src_nents = sg_count(req->src, req->nbytes, );
dma_map_sg_chained(jrdev, req->src, src_nents ? : 1, DMA_TO_DEVICE,
   chained);
...
edesc->src_nents = src_nents;

sg_count() returns zero when __sg_count() returns zero or one.  This
is used to mean that we don't need to use a hardware scatterlist.
However, setting src_nents to zero causes problems when we unmap:

if (edesc->src_nents)
dma_unmap_sg_chained(dev, req->src, edesc->src_nents,
 DMA_TO_DEVICE, edesc->chained);

as zero here means that we have no entries to unmap.

This can be fixed in two ways: either by writing the number of entries
that were requested of dma_map_sg_chained(), or by reworking the "no
SG required" case.

We adopt the re-work solution here - we replace sg_count() with
__sg_count(), so src_nents now contains the real number of scatterlist
entries, and we then change the test for using the hardware scatterlist
to src_nents > 1 rather than just non-zero.

This change passes my sshd, openssl tests hashing /bin and tcrypt
tests.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index f30c93840bba..28434ad08cb4 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -1095,10 +1095,13 @@ static int ahash_digest(struct ahash_request *req)
u32 options;
int sh_len;
 
-   src_nents = sg_count(req->src, req->nbytes, );
-   dma_map_sg_chained(jrdev, req->src, src_nents ? : 1, DMA_TO_DEVICE,
+   src_nents = __sg_count(req->src, req->nbytes, );
+   dma_map_sg_chained(jrdev, req->src, src_nents, DMA_TO_DEVICE,
   chained);
-   sec4_sg_bytes = src_nents * sizeof(struct sec4_sg_entry);
+   if (src_nents > 1)
+   sec4_sg_bytes = src_nents * sizeof(struct sec4_sg_entry);
+   else
+   sec4_sg_bytes = 0;
 
/* allocate space for base edesc and hw desc commands, link tables */
edesc = kzalloc(sizeof(*edesc) + sec4_sg_bytes + DESC_JOB_IO_LEN,
@@ -1117,7 +1120,7 @@ static int ahash_digest(struct ahash_request *req)
desc = edesc->hw_desc;
init_job_desc_shared(desc, ptr, sh_len, HDR_SHARE_DEFER | HDR_REVERSE);
 
-   if (src_nents) {
+   if (src_nents > 1) {
sg_to_sec4_sg_last(req->src, src_nents, edesc->sec4_sg, 0);
edesc->sec4_sg_dma = dma_map_single(jrdev, edesc->sec4_sg,
sec4_sg_bytes, DMA_TO_DEVICE);
@@ -1447,11 +1450,15 @@ static int ahash_update_first(struct ahash_request *req)
to_hash = req->nbytes - *next_buflen;
 
if (to_hash) {
-   src_nents = sg_count(req->src, req->nbytes - (*next_buflen),
-);
-   dma_map_sg_chained(jrdev, req->src, src_nents ? : 1,
+   src_nents = __sg_count(req->src, req->nbytes - (*next_buflen),
+  );
+   dma_map_sg_chained(jrdev, req->src, src_nents,
   DMA_TO_DEVICE, chained);
-   sec4_sg_bytes = src_nents * sizeof(struct sec4_sg_entry);
+   if (src_nents > 1)
+   sec4_sg_bytes = src_nents *
+   sizeof(struct sec4_sg_entry);
+   else
+   sec4_sg_bytes = 0;
 
/*
 * allocate space for base edesc and hw desc commands,
@@ -1472,7 +1479,7 @@ static int ahash_update_first(struct ahash_request *req)
 DESC_JOB_IO_LEN;
edesc->dst_dma = 0;
 
-   if (src_nents) {
+   if (src_nents > 1) {
sg_to_sec4_sg_last(req->src, src_nents,
   edesc->sec4_sg, 0);
edesc->sec4_sg_dma = dma_map_single(jrdev,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] crypto: marvell: fix missing cpu_to_le32() in mv_cesa_dma_add_op()

2015-10-18 Thread Russell King
When tdma->src is freed in mv_cesa_dma_cleanup(), we convert the DMA
address from a little-endian value prior to calling dma_pool_free().
However, mv_cesa_dma_add_op() assigns tdma->src without first converting
the DMA address to little endian.  Fix this.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/tdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
index 2738e24bcc59..06bdeac27e3d 100644
--- a/drivers/crypto/marvell/tdma.c
+++ b/drivers/crypto/marvell/tdma.c
@@ -140,7 +140,7 @@ struct mv_cesa_op_ctx *mv_cesa_dma_add_op(struct 
mv_cesa_tdma_chain *chain,
tdma = chain->last;
tdma->op = op;
tdma->byte_cnt = (skip_ctx ? sizeof(op->desc) : sizeof(*op)) | BIT(31);
-   tdma->src = dma_handle;
+   tdma->src = cpu_to_le32(dma_handle);
tdma->flags = CESA_TDMA_DST_IN_SRAM | CESA_TDMA_OP;
 
return op;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] crypto: marvell: use __le32 for hardware descriptors

2015-10-18 Thread Russell King
Much of the driver uses cpu_to_le32() to convert values for descriptors
to little endian before writing.  Use __le32 to define the hardware-
accessed parts of the descriptors, and ensure most places where it's
reasonable to do so use cpu_to_le32() when assigning to these.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/cesa.h | 32 
 drivers/crypto/marvell/tdma.c |  9 ++---
 2 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index 752131ae1ef2..bd985e72520b 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -174,19 +174,19 @@
 
 #define CESA_SA_DESC_MAC_DATA(offset)  \
cpu_to_le32(CESA_SA_DATA_SRAM_OFFSET + (offset))
-#define CESA_SA_DESC_MAC_DATA_MSK  GENMASK(15, 0)
+#define CESA_SA_DESC_MAC_DATA_MSK  cpu_to_le32(GENMASK(15, 0))
 
 #define CESA_SA_DESC_MAC_TOTAL_LEN(total_len)  cpu_to_le32((total_len) << 16)
-#define CESA_SA_DESC_MAC_TOTAL_LEN_MSK GENMASK(31, 16)
+#define CESA_SA_DESC_MAC_TOTAL_LEN_MSK cpu_to_le32(GENMASK(31, 16))
 
 #define CESA_SA_DESC_MAC_SRC_TOTAL_LEN_MAX 0x
 
 #define CESA_SA_DESC_MAC_DIGEST(offset)
\
cpu_to_le32(CESA_SA_MAC_DIG_SRAM_OFFSET + (offset))
-#define CESA_SA_DESC_MAC_DIGEST_MSKGENMASK(15, 0)
+#define CESA_SA_DESC_MAC_DIGEST_MSKcpu_to_le32(GENMASK(15, 0))
 
 #define CESA_SA_DESC_MAC_FRAG_LEN(frag_len)cpu_to_le32((frag_len) << 16)
-#define CESA_SA_DESC_MAC_FRAG_LEN_MSK  GENMASK(31, 16)
+#define CESA_SA_DESC_MAC_FRAG_LEN_MSK  cpu_to_le32(GENMASK(31, 16))
 
 #define CESA_SA_DESC_MAC_IV(offset)\
cpu_to_le32((CESA_SA_MAC_IIV_SRAM_OFFSET + (offset)) |  \
@@ -219,14 +219,14 @@
  * to be executed.
  */
 struct mv_cesa_sec_accel_desc {
-   u32 config;
-   u32 enc_p;
-   u32 enc_len;
-   u32 enc_key_p;
-   u32 enc_iv;
-   u32 mac_src_p;
-   u32 mac_digest;
-   u32 mac_iv;
+   __le32 config;
+   __le32 enc_p;
+   __le32 enc_len;
+   __le32 enc_key_p;
+   __le32 enc_iv;
+   __le32 mac_src_p;
+   __le32 mac_digest;
+   __le32 mac_iv;
 };
 
 /**
@@ -293,10 +293,10 @@ struct mv_cesa_op_ctx {
  * operation.
  */
 struct mv_cesa_tdma_desc {
-   u32 byte_cnt;
-   u32 src;
-   u32 dst;
-   u32 next_dma;
+   __le32 byte_cnt;
+   __le32 src;
+   __le32 dst;
+   __le32 next_dma;
 
/* Software state */
dma_addr_t cur_dma;
diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
index 06bdeac27e3d..76427981275b 100644
--- a/drivers/crypto/marvell/tdma.c
+++ b/drivers/crypto/marvell/tdma.c
@@ -126,6 +126,7 @@ struct mv_cesa_op_ctx *mv_cesa_dma_add_op(struct 
mv_cesa_tdma_chain *chain,
struct mv_cesa_tdma_desc *tdma;
struct mv_cesa_op_ctx *op;
dma_addr_t dma_handle;
+   unsigned int size;
 
tdma = mv_cesa_dma_add_desc(chain, flags);
if (IS_ERR(tdma))
@@ -137,9 +138,11 @@ struct mv_cesa_op_ctx *mv_cesa_dma_add_op(struct 
mv_cesa_tdma_chain *chain,
 
*op = *op_templ;
 
+   size = skip_ctx ? sizeof(op->desc) : sizeof(*op);
+
tdma = chain->last;
tdma->op = op;
-   tdma->byte_cnt = (skip_ctx ? sizeof(op->desc) : sizeof(*op)) | BIT(31);
+   tdma->byte_cnt = cpu_to_le32(size | BIT(31));
tdma->src = cpu_to_le32(dma_handle);
tdma->flags = CESA_TDMA_DST_IN_SRAM | CESA_TDMA_OP;
 
@@ -156,7 +159,7 @@ int mv_cesa_dma_add_data_transfer(struct mv_cesa_tdma_chain 
*chain,
if (IS_ERR(tdma))
return PTR_ERR(tdma);
 
-   tdma->byte_cnt = size | BIT(31);
+   tdma->byte_cnt = cpu_to_le32(size | BIT(31));
tdma->src = src;
tdma->dst = dst;
 
@@ -185,7 +188,7 @@ int mv_cesa_dma_add_dummy_end(struct mv_cesa_tdma_chain 
*chain, gfp_t flags)
if (IS_ERR(tdma))
return PTR_ERR(tdma);
 
-   tdma->byte_cnt = BIT(31);
+   tdma->byte_cnt = cpu_to_le32(BIT(31));
 
return 0;
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] crypto: marvell: use dma_addr_t for cur_dma

2015-10-18 Thread Russell King
cur_dma is part of the software state, not read by the hardware.
Storing it in LE32 format is wrong, use dma_addr_t for this.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/cesa.h | 4 +++-
 drivers/crypto/marvell/tdma.c | 6 +++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index e19877402ec9..dbe4970d6c64 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -297,7 +297,9 @@ struct mv_cesa_tdma_desc {
u32 src;
u32 dst;
u32 next_dma;
-   u32 cur_dma;
+
+   /* Software state */
+   dma_addr_t cur_dma;
struct mv_cesa_tdma_desc *next;
union {
struct mv_cesa_op_ctx *op;
diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
index e8e8a7f7659b..c8256f5916b0 100644
--- a/drivers/crypto/marvell/tdma.c
+++ b/drivers/crypto/marvell/tdma.c
@@ -69,7 +69,7 @@ void mv_cesa_dma_cleanup(struct mv_cesa_tdma_req *dreq)
 
tdma = tdma->next;
dma_pool_free(cesa_dev->dma->tdma_desc_pool, old_tdma,
- le32_to_cpu(old_tdma->cur_dma));
+ old_tdma->cur_dma);
}
 
dreq->chain.first = NULL;
@@ -105,9 +105,9 @@ mv_cesa_dma_add_desc(struct mv_cesa_tdma_chain *chain, 
gfp_t flags)
return ERR_PTR(-ENOMEM);
 
memset(new_tdma, 0, sizeof(*new_tdma));
-   new_tdma->cur_dma = cpu_to_le32(dma_handle);
+   new_tdma->cur_dma = dma_handle;
if (chain->last) {
-   chain->last->next_dma = new_tdma->cur_dma;
+   chain->last->next_dma = cpu_to_le32(dma_handle);
chain->last->next = new_tdma;
} else {
chain->first = new_tdma;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] crypto: marvell: use memcpy_fromio()/memcpy_toio()

2015-10-18 Thread Russell King
Use the IO memcpy() functions when copying from/to MMIO memory.
These locations were found via sparse.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/cipher.c | 11 ++-
 drivers/crypto/marvell/hash.c   | 16 
 2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c
index 4db2d632204f..6edae64bb387 100644
--- a/drivers/crypto/marvell/cipher.c
+++ b/drivers/crypto/marvell/cipher.c
@@ -98,10 +98,10 @@ static void mv_cesa_ablkcipher_std_step(struct 
ablkcipher_request *req)
 
/* FIXME: only update enc_len field */
if (!sreq->skip_ctx) {
-   memcpy(engine->sram, >op, sizeof(sreq->op));
+   memcpy_toio(engine->sram, >op, sizeof(sreq->op));
sreq->skip_ctx = true;
} else {
-   memcpy(engine->sram, >op, sizeof(sreq->op.desc));
+   memcpy_toio(engine->sram, >op, sizeof(sreq->op.desc));
}
 
mv_cesa_set_int_mask(engine, CESA_SA_INT_ACCEL0_DONE);
@@ -145,8 +145,9 @@ static int mv_cesa_ablkcipher_process(struct 
crypto_async_request *req,
if (ret)
return ret;
 
-   memcpy(ablkreq->info, engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET,
-  crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(ablkreq)));
+   memcpy_fromio(ablkreq->info,
+ engine->sram + CESA_SA_CRYPT_IV_SRAM_OFFSET,
+ 
crypto_ablkcipher_ivsize(crypto_ablkcipher_reqtfm(ablkreq)));
 
return 0;
 }
@@ -181,7 +182,7 @@ mv_cesa_ablkcipher_std_prepare(struct ablkcipher_request 
*req)
sreq->size = 0;
sreq->offset = 0;
mv_cesa_adjust_op(engine, >op);
-   memcpy(engine->sram, >op, sizeof(sreq->op));
+   memcpy_toio(engine->sram, >op, sizeof(sreq->op));
 }
 
 static inline void mv_cesa_ablkcipher_prepare(struct crypto_async_request *req,
diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 84ddc4cbfa9d..8330815d72fc 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -209,8 +209,8 @@ static void mv_cesa_ahash_std_step(struct ahash_request 
*req)
size_t  len;
 
if (creq->cache_ptr)
-   memcpy(engine->sram + CESA_SA_DATA_SRAM_OFFSET, creq->cache,
-  creq->cache_ptr);
+   memcpy_toio(engine->sram + CESA_SA_DATA_SRAM_OFFSET,
+   creq->cache, creq->cache_ptr);
 
len = min_t(size_t, req->nbytes + creq->cache_ptr - sreq->offset,
CESA_SA_SRAM_PAYLOAD_SIZE);
@@ -251,10 +251,10 @@ static void mv_cesa_ahash_std_step(struct ahash_request 
*req)
if (len + trailerlen > CESA_SA_SRAM_PAYLOAD_SIZE) {
len &= CESA_HASH_BLOCK_SIZE_MSK;
new_cache_ptr = 64 - trailerlen;
-   memcpy(creq->cache,
-  engine->sram +
-  CESA_SA_DATA_SRAM_OFFSET + len,
-  new_cache_ptr);
+   memcpy_fromio(creq->cache,
+ engine->sram +
+ CESA_SA_DATA_SRAM_OFFSET + len,
+ new_cache_ptr);
} else {
len += mv_cesa_ahash_pad_req(creq,
engine->sram + len +
@@ -272,7 +272,7 @@ static void mv_cesa_ahash_std_step(struct ahash_request 
*req)
mv_cesa_update_op_cfg(op, frag_mode, CESA_SA_DESC_CFG_FRAG_MSK);
 
/* FIXME: only update enc_len field */
-   memcpy(engine->sram, op, sizeof(*op));
+   memcpy_toio(engine->sram, op, sizeof(*op));
 
if (frag_mode == CESA_SA_DESC_CFG_FIRST_FRAG)
mv_cesa_update_op_cfg(op, CESA_SA_DESC_CFG_MID_FRAG,
@@ -312,7 +312,7 @@ static void mv_cesa_ahash_std_prepare(struct ahash_request 
*req)
 
sreq->offset = 0;
mv_cesa_adjust_op(engine, >op_tmpl);
-   memcpy(engine->sram, >op_tmpl, sizeof(creq->op_tmpl));
+   memcpy_toio(engine->sram, >op_tmpl, sizeof(creq->op_tmpl));
 }
 
 static void mv_cesa_ahash_step(struct crypto_async_request *req)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] crypto: marvell: use gfp_t for gfp flags

2015-10-18 Thread Russell King
Use gfp_t not u32 for the GFP flags.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/cesa.h | 6 ++
 drivers/crypto/marvell/tdma.c | 5 ++---
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index dbe4970d6c64..752131ae1ef2 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -798,10 +798,8 @@ int mv_cesa_dma_add_data_transfer(struct 
mv_cesa_tdma_chain *chain,
  dma_addr_t dst, dma_addr_t src, u32 size,
  u32 flags, gfp_t gfp_flags);
 
-int mv_cesa_dma_add_dummy_launch(struct mv_cesa_tdma_chain *chain,
-u32 flags);
-
-int mv_cesa_dma_add_dummy_end(struct mv_cesa_tdma_chain *chain, u32 flags);
+int mv_cesa_dma_add_dummy_launch(struct mv_cesa_tdma_chain *chain, gfp_t 
flags);
+int mv_cesa_dma_add_dummy_end(struct mv_cesa_tdma_chain *chain, gfp_t flags);
 
 int mv_cesa_dma_add_op_transfers(struct mv_cesa_tdma_chain *chain,
 struct mv_cesa_dma_iter *dma_iter,
diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
index c8256f5916b0..2738e24bcc59 100644
--- a/drivers/crypto/marvell/tdma.c
+++ b/drivers/crypto/marvell/tdma.c
@@ -166,8 +166,7 @@ int mv_cesa_dma_add_data_transfer(struct mv_cesa_tdma_chain 
*chain,
return 0;
 }
 
-int mv_cesa_dma_add_dummy_launch(struct mv_cesa_tdma_chain *chain,
-u32 flags)
+int mv_cesa_dma_add_dummy_launch(struct mv_cesa_tdma_chain *chain, gfp_t flags)
 {
struct mv_cesa_tdma_desc *tdma;
 
@@ -178,7 +177,7 @@ int mv_cesa_dma_add_dummy_launch(struct mv_cesa_tdma_chain 
*chain,
return 0;
 }
 
-int mv_cesa_dma_add_dummy_end(struct mv_cesa_tdma_chain *chain, u32 flags)
+int mv_cesa_dma_add_dummy_end(struct mv_cesa_tdma_chain *chain, gfp_t flags)
 {
struct mv_cesa_tdma_desc *tdma;
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] crypto: marvell: use readl_relaxed()/writel_relaxed()

2015-10-18 Thread Russell King
Use relaxed IO accessors where appropriate.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/cesa.h   |  2 +-
 drivers/crypto/marvell/cipher.c |  2 +-
 drivers/crypto/marvell/hash.c   |  7 +++
 drivers/crypto/marvell/tdma.c   | 20 ++--
 4 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index e9f732138ba3..e19877402ec9 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -677,7 +677,7 @@ static inline void mv_cesa_set_int_mask(struct 
mv_cesa_engine *engine,
if (int_mask == engine->int_mask)
return;
 
-   writel(int_mask, engine->regs + CESA_SA_INT_MSK);
+   writel_relaxed(int_mask, engine->regs + CESA_SA_INT_MSK);
engine->int_mask = int_mask;
 }
 
diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c
index 3df2f4e7adb2..4db2d632204f 100644
--- a/drivers/crypto/marvell/cipher.c
+++ b/drivers/crypto/marvell/cipher.c
@@ -105,7 +105,7 @@ static void mv_cesa_ablkcipher_std_step(struct 
ablkcipher_request *req)
}
 
mv_cesa_set_int_mask(engine, CESA_SA_INT_ACCEL0_DONE);
-   writel(CESA_SA_CFG_PARA_DIS, engine->regs + CESA_SA_CFG);
+   writel_relaxed(CESA_SA_CFG_PARA_DIS, engine->regs + CESA_SA_CFG);
writel(CESA_SA_CMD_EN_CESA_SA_ACCL0, engine->regs + CESA_SA_CMD);
 }
 
diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index 7cd0f0decf6c..84ddc4cbfa9d 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -281,7 +281,7 @@ static void mv_cesa_ahash_std_step(struct ahash_request 
*req)
creq->cache_ptr = new_cache_ptr;
 
mv_cesa_set_int_mask(engine, CESA_SA_INT_ACCEL0_DONE);
-   writel(CESA_SA_CFG_PARA_DIS, engine->regs + CESA_SA_CFG);
+   writel_relaxed(CESA_SA_CFG_PARA_DIS, engine->regs + CESA_SA_CFG);
writel(CESA_SA_CMD_EN_CESA_SA_ACCL0, engine->regs + CESA_SA_CMD);
 }
 
@@ -344,7 +344,7 @@ static int mv_cesa_ahash_process(struct 
crypto_async_request *req, u32 status)
 
digsize = crypto_ahash_digestsize(crypto_ahash_reqtfm(ahashreq));
for (i = 0; i < digsize / 4; i++)
-   creq->state[i] = readl(engine->regs + CESA_IVDIG(i));
+   creq->state[i] = readl_relaxed(engine->regs + CESA_IVDIG(i));
 
if (creq->cache_ptr)
sg_pcopy_to_buffer(ahashreq->src, creq->src_nents,
@@ -390,8 +390,7 @@ static void mv_cesa_ahash_prepare(struct 
crypto_async_request *req,
 
digsize = crypto_ahash_digestsize(crypto_ahash_reqtfm(ahashreq));
for (i = 0; i < digsize / 4; i++)
-   writel(creq->state[i],
-  engine->regs + CESA_IVDIG(i));
+   writel_relaxed(creq->state[i], engine->regs + CESA_IVDIG(i));
 }
 
 static void mv_cesa_ahash_req_cleanup(struct crypto_async_request *req)
diff --git a/drivers/crypto/marvell/tdma.c b/drivers/crypto/marvell/tdma.c
index 64a366c50174..e8e8a7f7659b 100644
--- a/drivers/crypto/marvell/tdma.c
+++ b/drivers/crypto/marvell/tdma.c
@@ -41,18 +41,18 @@ void mv_cesa_dma_step(struct mv_cesa_tdma_req *dreq)
 {
struct mv_cesa_engine *engine = dreq->base.engine;
 
-   writel(0, engine->regs + CESA_SA_CFG);
+   writel_relaxed(0, engine->regs + CESA_SA_CFG);
 
mv_cesa_set_int_mask(engine, CESA_SA_INT_ACC0_IDMA_DONE);
-   writel(CESA_TDMA_DST_BURST_128B | CESA_TDMA_SRC_BURST_128B |
-  CESA_TDMA_NO_BYTE_SWAP | CESA_TDMA_EN,
-  engine->regs + CESA_TDMA_CONTROL);
-
-   writel(CESA_SA_CFG_ACT_CH0_IDMA | CESA_SA_CFG_MULTI_PKT |
-  CESA_SA_CFG_CH0_W_IDMA | CESA_SA_CFG_PARA_DIS,
-  engine->regs + CESA_SA_CFG);
-   writel(dreq->chain.first->cur_dma,
-  engine->regs + CESA_TDMA_NEXT_ADDR);
+   writel_relaxed(CESA_TDMA_DST_BURST_128B | CESA_TDMA_SRC_BURST_128B |
+  CESA_TDMA_NO_BYTE_SWAP | CESA_TDMA_EN,
+  engine->regs + CESA_TDMA_CONTROL);
+
+   writel_relaxed(CESA_SA_CFG_ACT_CH0_IDMA | CESA_SA_CFG_MULTI_PKT |
+  CESA_SA_CFG_CH0_W_IDMA | CESA_SA_CFG_PARA_DIS,
+  engine->regs + CESA_SA_CFG);
+   writel_relaxed(dreq->chain.first->cur_dma,
+  engine->regs + CESA_TDMA_NEXT_ADDR);
writel(CESA_SA_CMD_EN_CESA_SA_ACCL0, engine->regs + CESA_SA_CMD);
 }
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] Sparse related fixes

2015-10-18 Thread Russell King - ARM Linux
Continuing on from the previous set of 18 patches, I also fixed a
number of sparse problems and other cleanups.  I don't deem these
suitable for -rc merging, especially now that we're basically at
-rc6.

The first patch switches the driver over to appropriately using
the relaxed IO accessors - this avoids calling out to the heavy
barrier on every read and write operation, but only calling out on
those which really matter.

We switch to using dma_addr_t for DMA addresses which are not accessed
by hardware, and using gfp_t for the get_free_page flags.  String-based
MMIO accesses are used instead of plain memcpy()/memset() which prevents
us potentially stumbling over GCC optimisations that it thinks it may
make with these functions.

We convert as much of the hardware state to __le32 endian markings,
and use cpu_to_le32() as appropriate.  A number of places are left
unfixed, as we temporarily store CPU native endian values at these
locations; these warnings should not be fixed (basically, only
appropriate sparse warnings should be fixed without penalising code.)

 drivers/crypto/marvell/cesa.h   | 44 -
 drivers/crypto/marvell/cipher.c | 13 ++--
 drivers/crypto/marvell/hash.c   | 23 +++--
 drivers/crypto/marvell/tdma.c   | 42 ---
 4 files changed, 62 insertions(+), 60 deletions(-)

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: caam - Add support for hashing export and import functions

2015-10-17 Thread Russell King - ARM Linux
On Sat, Oct 17, 2015 at 11:52:54AM +0100, Russell King - ARM Linux wrote:
> Now, with that change, and with your change to buf_0/buf_1, I see
> (before the import/export functions are used) several of these errors:
> 
> caam_jr 2101000.jr0: 4501: DECO: desc idx 5: SGT Length Error. The 
> descriptor is trying to read more data than is contained in the SGT table.
> 
> when checking using openssh.  They don't occur when testing with openssl
> performing normal hashes (as per my previously posted script) but only
> with openssh.

The hardware seems to be quite justified in complaining - I've added code
to dump out the scatter list as well:

jobdesc@879: e4bd7c18: b0861c08 3cd29080 f140 34bd7c38  ...<..@.8|.4
jobdesc@879: e4bd7c28: 0068 f840 3ccf3640 0028  h.@.@6.<(...
sg@882: e4bd7c38:  3ccf3640 0028 
sg@882: e4bd7c48:  34bd7300 0006 
sg@882: e4bd7c58:  7528a070 4020 

If I'm interpreting this correctly:

f140 34bd7c38 0068

tells the CAAM to read 0x68 bytes using the scatterlist at 0x34bd7c38.

The scatterlist here contains three entries, which have lengths 0x28,
0x6 and 0x20, which total up to 0x4e bytes, not 0x68.

Now, looking at how the scatterlist is created, I have yet more questions.

1. What's this dma_map_sg_chained() and dma_unmap_sg_chained() stuff?
   What's wrong with just calling dma_map_sg() on the scatterlist, giving
   it the number of entries we want to map?  dma_map_sg() is supposed to
   deal with chained scatterlists, if it doesn't, it's buggy.

2. src_map_to_sec4_sg() fails to check whether dma_map_sg_chained() failed.
   if it failed, we continue anyway, poking invalid addresses into the
   hardware scatterlist, thereby causing memory corruption.

3. __sg_count() - what's going on with this.  This seems to assume that
   scatterlists aren't chained as:

if (!sg_is_last(sg) && (sg + 1)->length == 0)

   sg + 1 may _not_ be sg_next(sg) - and this will walk off the end of
   the array.

4. Is the try_buf_map_to_sec4_sg() call correct?  Shouldn't it be:

@@ -838,7 +838,7 @@ static int ahash_update_ctx(struct ahash_request *req)
state->buf_dma = try_buf_map_to_sec4_sg(jrdev,
edesc->sec4_sg + 1,
buf, state->buf_dma,
-   *next_buflen, *buflen);
+   *buflen, last_buflen);
 
if (src_nents) {
src_map_to_sec4_sg(jrdev, req->src, src_nents,

   Indeed, with this change, my ssh test works, and the DECO errors are
   gone.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: caam - Add support for hashing export and import functions

2015-10-17 Thread Russell King - ARM Linux
On Fri, Oct 16, 2015 at 04:19:33PM -0700, Victoria Milhoan wrote:
> @@ -1569,6 +1601,10 @@ static int ahash_import(struct ahash_request *req, 
> const void *in)
>   struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash);
>   struct caam_hash_state *state = ahash_request_ctx(req);
>  
> + /* Allocate new data buffers to use for this request */
> + state->buf_0 = kmalloc(CAAM_MAX_HASH_BLOCK_SIZE, GFP_KERNEL | GFP_DMA);
> + state->buf_1 = kmalloc(CAAM_MAX_HASH_BLOCK_SIZE, GFP_KERNEL | GFP_DMA);
> +

I'm really not sure you can do this at all.  What if the previous
digest calculation prior to the accept() cloning the state was for
a non-hash-block aligned size of data.  The above will lose that
state, and produce an incorrect hash result.  Herbert, can you
confirm please?

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

2015-10-17 Thread Russell King - ARM Linux
On Fri, Oct 16, 2015 at 04:24:54PM -0700, Victoria Milhoan wrote:
> On Thu, 15 Oct 2015 21:13:38 +0800
> Herbert Xu <herb...@gondor.apana.org.au> wrote:
> 
> > On Thu, Oct 15, 2015 at 01:59:44PM +0100, Russell King - ARM Linux wrote:
> > >
> > > I think the CAAM driver is pretty unfixable from a trivial point of
> > > view.  This driver exports a huge amount of state - it contains both a
> > > struct caam_hash_ctx and a struct caam_hash_state, which totals up to
> > > 1600 bytes.  This fails the:
> > 
> > Right just dumping the state out as is not going to work.  This
> > is not supposed to be how export works anyway.  But it doesn't
> > look too bad as most of that 1600 is consumed by the hardware
> > program descriptor which can easily be regenerated upon import.
> > 
> > The only things that need to be exported AFAICS are key and buf_X.
> 
> I just pushed out a patch for export/import functions in the CAAM driver. The
> patch has been through testing with OpenSSL and the AF_ALG plugin on the MX6.

Be careful with that.  There's two ways to test:

1. Checking hash output.

Preparation - copy openssl.cnf and add this to openssl.cnf:

openssl_conf= openssl_def

[openssl_def]
engines = engine_section

[engine_section]
af_alg  = af_alg_engine

[af_alg_engine]
CIPHERS=aes-128-cbc aes-192-cbc aes-256-cbc des-cbc des-ede3-cbc

DIGESTS=md5 sha1 sha256 sha512

# Putting this last means we register the above as the default algorithms
default_algorithms  = ALL

Then:

#!/bin/sh

for type in md5 sha1 sha256 sha512; do
   echo -n "Checking $type hash:"
   for file in /bin/*; do
  echo -n " $file"
  if ! OPENSSL_CONF=./openssl.cnf openssl dgst -$type < $file | sed 
"s,(stdin)= ,,;s,\$,\t$file," | ${type}sum -c > /dev/null; then
 echo " ... failed"
 echo -n "Openssl says: " >&2
 OPENSSL_CONF=./openssl.cnf openssl dgst -$type < $file | sed 
"s,(stdin)= ,,;s,\$,\t$file," >&2
 echo -n "${type}sum says: " >&2
 ${type}sum $file >&2
 exit 1
  fi
   done
   echo " ... ok"
done
echo "All hashes passed"

The above will verify that the hashes are producing the correct answers
for a range of files.  This does _not_ test the export/import paths.

2. Backup the existing openssl.cnf and replace it with the above modified
   version.  Then try to ssh into the platform.  This will verify the
   export/import side of things.  If ssh fails to connect to the target,
   you know that the crypto drivers for SHA1 are broken, probably due to
   export/import.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: caam - Add support for hashing export and import functions

2015-10-17 Thread Russell King - ARM Linux
On Fri, Oct 16, 2015 at 04:19:33PM -0700, Victoria Milhoan wrote:
> @@ -1569,6 +1601,10 @@ static int ahash_import(struct ahash_request *req, 
> const void *in)
>   struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash);
>   struct caam_hash_state *state = ahash_request_ctx(req);
>  
> + /* Allocate new data buffers to use for this request */
> + state->buf_0 = kmalloc(CAAM_MAX_HASH_BLOCK_SIZE, GFP_KERNEL | GFP_DMA);
> + state->buf_1 = kmalloc(CAAM_MAX_HASH_BLOCK_SIZE, GFP_KERNEL | GFP_DMA);
> +
>   memcpy(ctx, in, sizeof(struct caam_hash_ctx));
>   memcpy(state, in + sizeof(struct caam_hash_ctx),
>  sizeof(struct caam_hash_state));

This is buggy.  You loose the reference to the two buffers you've just
allocated because this memcpy() overwrites it.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: caam - Add support for hashing export and import functions

2015-10-17 Thread Russell King - ARM Linux
On Fri, Oct 16, 2015 at 04:19:33PM -0700, Victoria Milhoan wrote:
> @@ -319,7 +319,7 @@ static int ahash_set_sh_desc(struct crypto_ahash *ahash)
>   have_key = OP_ALG_AAI_HMAC_PRECOMP;
>  
>   /* ahash_update shared descriptor */
> - desc = ctx->sh_desc_update;
> + desc = kmalloc(DESC_HASH_MAX_USED_BYTES, GFP_KERNEL | GFP_DMA);

What if kmalloc() fails?  Should this really oops the kernel?  Ditto
for every other kmalloc you've added below.
 
> @@ -1557,6 +1575,20 @@ static int ahash_export(struct ahash_request *req, 
> void *out)
>   struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash);
>   struct caam_hash_state *state = ahash_request_ctx(req);
>  
> + /*
> +  * Do not export the data buffers. New buffers are
> +  * allocated during import.
> +  */
> + kfree(state->buf_0);
> + kfree(state->buf_1);
> + state->buf_0 = NULL;
> + state->buf_1 = NULL;
> +
> + state->current_buf = 0;
> + state->buf_dma = 0;
> + state->buflen_0 = 0;
> + state->buflen_1 = 0;
> +

So what if I export, and then continue using _this_ context later?

> @@ -1605,6 +1641,8 @@ static struct caam_hash_template driver_hash[] = {
>   .setkey = ahash_setkey,
>   .halg = {
>   .digestsize = SHA1_DIGEST_SIZE,
> + .statesize = sizeof(struct caam_hash_ctx) +
> +  sizeof(struct caam_hash_state),

Much prefer to have a 'struct caam_hash_export_state' thing rather than
litter the code with the knowledge that these two go together.

>   },
>   },
>   .alg_type = OP_ALG_ALGSEL_SHA1,
> @@ -1626,6 +1664,8 @@ static struct caam_hash_template driver_hash[] = {
>   .setkey = ahash_setkey,
>   .halg = {
>   .digestsize = SHA224_DIGEST_SIZE,
> + .statesize = sizeof(struct caam_hash_ctx) +
> +  sizeof(struct caam_hash_state),
>   },
>   },
>   .alg_type = OP_ALG_ALGSEL_SHA224,
> @@ -1647,6 +1687,8 @@ static struct caam_hash_template driver_hash[] = {
>   .setkey = ahash_setkey,
>   .halg = {
>   .digestsize = SHA256_DIGEST_SIZE,
> + .statesize = sizeof(struct caam_hash_ctx) +
> +  sizeof(struct caam_hash_state),
>   },
>   },
>   .alg_type = OP_ALG_ALGSEL_SHA256,
> @@ -1668,6 +1710,8 @@ static struct caam_hash_template driver_hash[] = {
>   .setkey = ahash_setkey,
>   .halg = {
>   .digestsize = SHA384_DIGEST_SIZE,
> + .statesize = sizeof(struct caam_hash_ctx) +
> +  sizeof(struct caam_hash_state),
>   },
>   },
>   .alg_type = OP_ALG_ALGSEL_SHA384,
> @@ -1689,6 +1733,8 @@ static struct caam_hash_template driver_hash[] = {
>   .setkey = ahash_setkey,
>   .halg = {
>   .digestsize = SHA512_DIGEST_SIZE,
> + .statesize = sizeof(struct caam_hash_ctx) +
> +  sizeof(struct caam_hash_state),
>   },
>   },
>   .alg_type = OP_ALG_ALGSEL_SHA512,
> @@ -1710,6 +1756,8 @@ static struct caam_hash_template driver_hash[] = {
>   .setkey = ahash_setkey,
>   .halg = {
>   .digestsize = MD5_DIGEST_SIZE,
> + .statesize = sizeof(struct caam_hash_ctx) +
> +  sizeof(struct caam_hash_state),
>   },
>   },
>   .alg_type = OP_ALG_ALGSEL_MD5,
> @@ -1796,6 +1844,12 @@ static void caam_hash_cra_exit(struct crypto_tfm *tfm)
>   dma_unmap_single(ctx->jrdev, ctx->sh_desc_finup_dma,
>desc_bytes(ctx->sh_desc_finup), DMA_TO_DEVICE);
>  
> + kfree(ctx->sh_desc_update);
> + kfree(ctx->sh_desc_update_first);
> + kfree(ctx->sh_desc_fin);
> + kfree(ctx->sh_desc_digest);
> + kfree(ctx->sh_desc_finup);
> +

What happens to these when ahash_import() overwrites all the context
state?  Doesn't this mean we end up with double-kfree()s ?

Also, did you test this DMA debug enabled?

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org

Re: [PATCH] crypto: caam - Add support for hashing export and import functions

2015-10-17 Thread Russell King - ARM Linux
On Fri, Oct 16, 2015 at 04:19:33PM -0700, Victoria Milhoan wrote:
> @@ -1557,6 +1575,20 @@ static int ahash_export(struct ahash_request *req, 
> void *out)
>   struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash);
>   struct caam_hash_state *state = ahash_request_ctx(req);
>  
> + /*
> +  * Do not export the data buffers. New buffers are
> +  * allocated during import.
> +  */
> + kfree(state->buf_0);
> + kfree(state->buf_1);
> + state->buf_0 = NULL;
> + state->buf_1 = NULL;
> +
> + state->current_buf = 0;
> + state->buf_dma = 0;
> + state->buflen_0 = 0;
> + state->buflen_1 = 0;
> +
>   memcpy(out, ctx, sizeof(struct caam_hash_ctx));
>   memcpy(out + sizeof(struct caam_hash_ctx), state,
>  sizeof(struct caam_hash_state));
> @@ -1569,6 +1601,10 @@ static int ahash_import(struct ahash_request *req, 
> const void *in)
>   struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash);
>   struct caam_hash_state *state = ahash_request_ctx(req);
>  
> + /* Allocate new data buffers to use for this request */
> + state->buf_0 = kmalloc(CAAM_MAX_HASH_BLOCK_SIZE, GFP_KERNEL | GFP_DMA);
> + state->buf_1 = kmalloc(CAAM_MAX_HASH_BLOCK_SIZE, GFP_KERNEL | GFP_DMA);
> +
>   memcpy(ctx, in, sizeof(struct caam_hash_ctx));
>   memcpy(state, in + sizeof(struct caam_hash_ctx),
>  sizeof(struct caam_hash_state));

Why are we even saving and restoring the caam_hash_ctx here?

That comes from: crypto_tfm_ctx(crypto_ahash_tfm(tfm)), which, tracing
that through the inline functions comes out as:

tfm->base.__crt_ctx

tfm comes from: __crypto_ahash_cast(req->base.tfm), where req->base.tfm
is a pointer to struct crypto_ahash's base member.

When a hash is accept()ed, req->base.tfm is copied from the original
socket handle to the new socket handle (see ahash_request_set_tfm(),
called from hash_accept_parent(), which is in turn called from
af_alg_accept() in hash_accept()).  So both the exporting and importing
hashes end up with the same struct caam_hash_ctx.

We can see this if we add debug print:

[156147.994219] ahash_export(ctx=ed087080,state=e2a6e600)
[156147.994257] ahash_import(ctx=ed087080,state=e2a6d600)
[156147.998256] ahash_export(ctx=ed083080,state=e2a6c600)
[156147.998294] ahash_import(ctx=ed083080,state=edb9ee00)
[156147.998659] ahash_export(ctx=ed087080,state=e2a6e600)
[156147.998700] ahash_import(ctx=ed087080,state=e2a6fa00)
[156147.999002] ahash_export(ctx=ed080880,state=e2a6d200)
[156147.999040] ahash_import(ctx=ed080880,state=e2a6d600)

Notice that the state changes between each export and import, but the
context doesn't.

So, I think we can (and should) completely drop the saving and restoring
of context in these functions.

Now, with that change, and with your change to buf_0/buf_1, I see
(before the import/export functions are used) several of these errors:

caam_jr 2101000.jr0: 4501: DECO: desc idx 5: SGT Length Error. The 
descriptor is trying to read more data than is contained in the SGT table.

when checking using openssh.  They don't occur when testing with openssl
performing normal hashes (as per my previously posted script) but only
with openssh.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] crypto: caam - Add support for hashing export and import functions

2015-10-17 Thread Russell King - ARM Linux
On Sat, Oct 17, 2015 at 06:57:44AM -0700, Victoria Milhoan wrote:
> Correct - this was apparently the wrong patch I pushed out. The one I'm
> actively using has this fixed (this is the only difference). I will make
> this change in v2 after reviewing your other comments.

Thanks Victoria, but please perform the tests I've suggested in one of
my previous emails - merely testing with openssl's built-in tests aren't
sufficient.  Also, openssl is non-obvious with what it's testing,
especially if you're using "openssl speed".  I'm afraid it's a case of
"you don't know what's being tested unless you've read the openssl
source" :(

However, while trying to debug the DECO stuff, I've stumbled across this:

static inline void append_ptr(u32 *desc, dma_addr_t ptr)
{
dma_addr_t *offset = (dma_addr_t *)desc_end(desc);

*offset = ptr;

(*desc) += CAAM_PTR_SZ / CAAM_CMD_SZ;
}

where I think 'desc' points to memory used for the descriptor, which is
read by hardware, correct?

If that's true, why are we using dma_addr_t here?  dma_addr_t can be
either 32-bit or 64-bit depending on the kernel configuration, and I
suspect if this driver is built with dma_addr_t set to 64-bit, things
will go awry.

Also, what endianness are the hardware descriptors?  Are they always
native CPU endian, or are they fixed as little endian?  If they're always
little endian, is there any reason not to use __le32 and cpu_to_le32()/
le32_to_cpu() as appropriate to ensure that it works on big endian
systems?

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] crypto: caam: print errno code when hash registration fails

2015-10-17 Thread Russell King
Print the errno code when hash registration fails, so we know why the
failure occurred.  This aids debugging.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 94433b9fc200..2faf71ccbd43 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -1952,8 +1952,9 @@ static int __init caam_algapi_hash_init(void)
 
err = crypto_register_ahash(_alg->ahash_alg);
if (err) {
-   pr_warn("%s alg registration failed\n",
-   t_alg->ahash_alg.halg.base.cra_driver_name);
+   pr_warn("%s alg registration failed: %d\n",
+   t_alg->ahash_alg.halg.base.cra_driver_name,
+   err);
kfree(t_alg);
} else
list_add_tail(_alg->entry, _list);
@@ -1968,8 +1969,9 @@ static int __init caam_algapi_hash_init(void)
 
err = crypto_register_ahash(_alg->ahash_alg);
if (err) {
-   pr_warn("%s alg registration failed\n",
-   t_alg->ahash_alg.halg.base.cra_driver_name);
+   pr_warn("%s alg registration failed: %d\n",
+   t_alg->ahash_alg.halg.base.cra_driver_name,
+   err);
kfree(t_alg);
} else
list_add_tail(_alg->entry, _list);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] crypto: caam: only export the state we really need to export

2015-10-17 Thread Russell King
Avoid exporting lots of state by only exporting what we really require,
which is the buffer containing the set of pending bytes to be hashed,
number of pending bytes, the context buffer, and the function pointer
state.  This reduces down the exported state size to 216 bytes from
576 bytes.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 44 ++
 1 file changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index dcee360065f3..7dd80b0b3f51 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -134,6 +134,15 @@ struct caam_hash_state {
int current_buf;
 };
 
+struct caam_export_state {
+   u8 buf[CAAM_MAX_HASH_BLOCK_SIZE];
+   u8 caam_ctx[MAX_CTX_LEN];
+   int buflen;
+   int (*update)(struct ahash_request *req);
+   int (*final)(struct ahash_request *req);
+   int (*finup)(struct ahash_request *req);
+};
+
 /* Common job descriptor seq in/out ptr routines */
 
 /* Map state->caam_ctx, and append seq_out_ptr command that points to it */
@@ -1574,20 +1583,41 @@ static int ahash_final(struct ahash_request *req)
 
 static int ahash_export(struct ahash_request *req, void *out)
 {
-   struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
struct caam_hash_state *state = ahash_request_ctx(req);
+   struct caam_export_state *export = out;
+   int len;
+   u8 *buf;
 
-   memcpy(out, state, sizeof(struct caam_hash_state));
+   if (state->current_buf) {
+   buf = state->buf_1;
+   len = state->buflen_1;
+   } else {
+   buf = state->buf_0;
+   len = state->buflen_1;
+   }
+
+   memcpy(export->buf, buf, len);
+   memcpy(export->caam_ctx, state->caam_ctx, sizeof(export->caam_ctx));
+   export->buflen = len;
+   export->update = state->update;
+   export->final = state->final;
+   export->finup = state->finup;
 
return 0;
 }
 
 static int ahash_import(struct ahash_request *req, const void *in)
 {
-   struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
struct caam_hash_state *state = ahash_request_ctx(req);
+   const struct caam_export_state *export = in;
 
-   memcpy(state, in, sizeof(struct caam_hash_state));
+   memset(state, 0, sizeof(*state));
+   memcpy(state->buf_0, export->buf, export->buflen);
+   memcpy(state->caam_ctx, export->caam_ctx, sizeof(state->caam_ctx));
+   state->buflen_0 = export->buflen;
+   state->update = export->update;
+   state->final = export->final;
+   state->finup = export->finup;
 
return 0;
 }
@@ -1622,6 +1652,7 @@ static struct caam_hash_template driver_hash[] = {
.setkey = ahash_setkey,
.halg = {
.digestsize = SHA1_DIGEST_SIZE,
+   .statesize = sizeof(struct caam_export_state),
},
},
.alg_type = OP_ALG_ALGSEL_SHA1,
@@ -1643,6 +1674,7 @@ static struct caam_hash_template driver_hash[] = {
.setkey = ahash_setkey,
.halg = {
.digestsize = SHA224_DIGEST_SIZE,
+   .statesize = sizeof(struct caam_export_state),
},
},
.alg_type = OP_ALG_ALGSEL_SHA224,
@@ -1664,6 +1696,7 @@ static struct caam_hash_template driver_hash[] = {
.setkey = ahash_setkey,
.halg = {
.digestsize = SHA256_DIGEST_SIZE,
+   .statesize = sizeof(struct caam_export_state),
},
},
.alg_type = OP_ALG_ALGSEL_SHA256,
@@ -1685,6 +1718,7 @@ static struct caam_hash_template driver_hash[] = {
.setkey = ahash_setkey,
.halg = {
.digestsize = SHA384_DIGEST_SIZE,
+   .statesize = sizeof(struct caam_export_state),
},
},
.alg_type = OP_ALG_ALGSEL_SHA384,
@@ -1706,6 +1740,7 @@ static struct caam_hash_template driver_hash[] = {
.setkey = ahash_setkey,
.halg = {
.digestsize = SHA512_DIGEST_SIZE,
+   .statesize = sizeof(struct caam_export_state),
},
},
.alg_type = OP_ALG_ALGSEL_SHA512,
@@ -1727,6 +1762,7 @@ static struct caam_hash_templ

[PATCH 3/5] crypto: caam: fix non-block aligned hash calculation

2015-10-17 Thread Russell King
caam does not properly calculate the size of the retained state
when non-block aligned hashes are requested - it uses the wrong
buffer sizes, which results in errors such as:

caam_jr 2102000.jr1: 4501: DECO: desc idx 5: SGT Length Error. The 
descriptor is trying to read more data than is contained in the SGT table.

We end up here with:

in_len 0x46 blocksize 0x40 last_bufsize 0x0 next_bufsize 0x6
to_hash 0x40 ctx_len 0x28 nbytes 0x20

which results in a job descriptor of:

jobdesc@889: ed03d918: b0861c08 3daa0080 f140 3d03d938
jobdesc@889: ed03d928: 0068 f840 3cde2a40 0028

where the word at 0xed03d928 is the expected data size (0x68), and a
scatterlist containing:

sg@892: ed03d938:  3cde2a40 0028 
sg@892: ed03d948:  3d03d100 0006 
sg@892: ed03d958:  7e8aa700 4020 

0x68 comes from 0x28 (the context size) plus the "in_len" rounded down
to a block size (0x40).  in_len comes from 0x26 bytes of unhashed data
from the previous operation, plus the 0x20 bytes from the latest
operation.

The fixed version would create:

sg@892: ed03d938:  3cde2a40 0028 
sg@892: ed03d948:  3d03d100 0026 
sg@892: ed03d958:  7e8aa700 4020 

which replaces the 0x06 length with the correct 0x26 bytes of previously
unhashed data.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 3ce6083c2e43..dcee360065f3 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -829,7 +829,7 @@ static int ahash_update_ctx(struct ahash_request *req)
state->buf_dma = try_buf_map_to_sec4_sg(jrdev,
edesc->sec4_sg + 1,
buf, state->buf_dma,
-   *next_buflen, *buflen);
+   *buflen, last_buflen);
 
if (src_nents) {
src_map_to_sec4_sg(jrdev, req->src, src_nents,
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] crypto: caam: avoid needlessly saving and restoring caam_hash_ctx

2015-10-17 Thread Russell King
When exporting and importing the hash state, we will only export and
import into hashes which share the same struct crypto_ahash pointer.
(See hash_accept->af_alg_accept->hash_accept_parent.)

This means that saving the caam_hash_ctx structure on export, and
restoring it on import is a waste of resources.  So, remove this code.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/caam/caamhash.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c
index 2faf71ccbd43..3ce6083c2e43 100644
--- a/drivers/crypto/caam/caamhash.c
+++ b/drivers/crypto/caam/caamhash.c
@@ -1575,24 +1575,20 @@ static int ahash_final(struct ahash_request *req)
 static int ahash_export(struct ahash_request *req, void *out)
 {
struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
-   struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash);
struct caam_hash_state *state = ahash_request_ctx(req);
 
-   memcpy(out, ctx, sizeof(struct caam_hash_ctx));
-   memcpy(out + sizeof(struct caam_hash_ctx), state,
-  sizeof(struct caam_hash_state));
+   memcpy(out, state, sizeof(struct caam_hash_state));
+
return 0;
 }
 
 static int ahash_import(struct ahash_request *req, const void *in)
 {
struct crypto_ahash *ahash = crypto_ahash_reqtfm(req);
-   struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash);
struct caam_hash_state *state = ahash_request_ctx(req);
 
-   memcpy(ctx, in, sizeof(struct caam_hash_ctx));
-   memcpy(state, in + sizeof(struct caam_hash_ctx),
-  sizeof(struct caam_hash_state));
+   memcpy(state, in, sizeof(struct caam_hash_state));
+
return 0;
 }
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5] Fix CAAM hash driver

2015-10-17 Thread Russell King - ARM Linux
The following series fixes the CAAM hash driver, allowing it to work
with the previously merged "crypto: ahash - ensure statesize is non-
zero" patch.

This is non-trivial, because CAAM exports a huge 1600 bytes of data,
which, if we set .statesize to this, still results in the core code
rejecting the driver.  So, we need to shrink the amount of state
exported.

The first, most obvious one to get rid of is the export of the
caam_hash_ctx structure, which is shared between the socket being
exported from and imported to - copying it away and back again was
a complete no-op.

The second is that we don't need to export both pending-bytes buffers.
Only one will be in use at any time.

A problem was encountered while testing, where the size of the pending
bytes buffer was not added to the scatterlist with the correct length.
This is also fixed in this series, by patch 3.

 drivers/crypto/caam/caamhash.c | 79 +++---
 1 file changed, 59 insertions(+), 20 deletions(-)

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Fix CAAM hash driver

2015-10-17 Thread Russell King - ARM Linux
On Sat, Oct 17, 2015 at 07:50:55PM +0100, Russell King - ARM Linux wrote:
> The following series fixes the CAAM hash driver, allowing it to work
> with the previously merged "crypto: ahash - ensure statesize is non-
> zero" patch.
> 
> This is non-trivial, because CAAM exports a huge 1600 bytes of data,
> which, if we set .statesize to this, still results in the core code
> rejecting the driver.  So, we need to shrink the amount of state
> exported.
> 
> The first, most obvious one to get rid of is the export of the
> caam_hash_ctx structure, which is shared between the socket being
> exported from and imported to - copying it away and back again was
> a complete no-op.
> 
> The second is that we don't need to export both pending-bytes buffers.
> Only one will be in use at any time.
> 
> A problem was encountered while testing, where the size of the pending
> bytes buffer was not added to the scatterlist with the correct length.
> This is also fixed in this series, by patch 3.
> 
>  drivers/crypto/caam/caamhash.c | 79 
> +++---
>  1 file changed, 59 insertions(+), 20 deletions(-)

This series has been tested using openssl to validate the hashes against
md5sum, sha1sum, sha224sum, sha256sum and sha512sum programs of every
program in my /bin directory, which checks correct functioning of the
hashing.

It has also been tested with openssh, where the crypto hardware is used
via openssl by sshd, which makes use of accept() and hence tests the
export/import paths.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

2015-10-15 Thread Russell King - ARM Linux
On Thu, Oct 15, 2015 at 05:41:47PM +0800, Herbert Xu wrote:
> On Thu, Oct 15, 2015 at 10:39:30AM +0100, Russell King - ARM Linux wrote:
> >
> > The CAAM driver is similarly buggy - it has export/import functions in
> > its ahash drivers, but zero statesize.
> > 
> > User exploitable kernel stack smashing... I'd suggest putting this patch
> > into stable kernels as high priority, as I'm pretty sure this could be
> 
> I agree.  It should already be on its way to stable as Linus has
> pulled it into his tree and it carries a stable cc.

Thanks.

I think the CAAM driver is pretty unfixable from a trivial point of
view.  This driver exports a huge amount of state - it contains both a
struct caam_hash_ctx and a struct caam_hash_state, which totals up to
1600 bytes.  This fails the:

alg->halg.statesize > PAGE_SIZE / 8 

in ahash_prepare_alg() if we set .statesize.  For ARM, this places a
limit of 512 bytes on the state size.

The CAAM authors need to come up with a better solution (and quickly,
as caamhash is going to fail in all kernels soon), or we need to
support larger exported states.

BTW, I can't find a MAINTAINERS entry for CAAM, so I've just grabbed
a couple of addresses from recent git history in the hope they'll know
who's responsible.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

2015-10-15 Thread Russell King - ARM Linux
On Tue, Oct 13, 2015 at 10:33:12PM +0800, Herbert Xu wrote:
> On Fri, Oct 09, 2015 at 08:43:33PM +0100, Russell King wrote:
> > If the algorithm passed a zero statesize, do not pass a valid pointer
> > into the export/import functions.  Passing a valid pointer covers up
> > bugs in driver code which then go on to smash the kernel stack.
> > Instead, pass NULL, which will cause any attempt to write to the
> > pointer to fail.
> > 
> > Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
> 
> Patch applied without the shash hunk.  I also replaced your commit
> message as it no longer makes any sense:
> 
> crypto: ahash - ensure statesize is non-zero
> 
> Unlike shash algorithms, ahash drivers must implement export
> and import as their descriptors may contain hardware state and
> cannot be exported as is.  Unfortunately some ahash drivers did
> not provide them and end up causing crashes with algif_hash.
> 
> This patch adds a check to prevent these drivers from registering
> ahash algorithms until they are fixed.
> 
> Thanks,

There will be fallout from this.

The CAAM driver is similarly buggy - it has export/import functions in
its ahash drivers, but zero statesize.

User exploitable kernel stack smashing... I'd suggest putting this patch
into stable kernels as high priority, as I'm pretty sure this could be
used to gain privileges via carefully crafted md5 hashes.  I've not
proven it, but given that the md5 hash and state data get copied over
the kernel stack, it's highly likely that it _is_ exploitable from any
user that can create an AF_ALG socket.

Yes, it means regressions in the form of various hw crypto no longer
being loadable, but I think that's preferable to the security hole here.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

2015-10-13 Thread Russell King - ARM Linux
On Tue, Oct 13, 2015 at 09:00:48PM +0800, Herbert Xu wrote:
> On Sun, Oct 11, 2015 at 02:55:05PM +0800, Herbert Xu wrote:
> > On Sat, Oct 10, 2015 at 05:17:54PM +0100, Russell King - ARM Linux wrote:
> > >
> > > I _also_ notice that despite having the ARM assembly crypto functions
> > > enabled and built-in to the kernel, they don't appear in /proc/crypto -
> > > and this is because of patch 1 in this series, which blocks out any
> > > crypto driver which has a zero statesize (as the ARM crypto functions
> > > appear to have.)  I found this by rebuilding the ARM crypto stuff as
> > > modules, and then trying to insert them:
> > 
> > They're buggy and unfortunately this wasn't caught during the
> > review process.  The import/export functions are required and
> > certainly not optional.
> 
> OK it looks like I was confused.  Yes the import and export functions
> are required but for shash algorithms we provide a simple default.
> This means that implementations can simply leave it NULL and they
> will then use the default import/export functions which exports
> the shash_desc as the state.
> 
> So Russell what I'll do is apply your patch without the hunk
> for shash_prepare_alg.  IOW we will block any ahash algorithms
> that leave state as zero since ahash does not provide a default
> import/export function (it can't because it may involve hardware
> state).  shash algorithms on the other hand won't be affected.

Okay.

I've a larger queue of patches building at the moment for the Marvell
driver - I've found that it's touch-and-go whether it produces the
correct hash result, and previous hashes can affect subsequent hashes
in some circumstances.  I'm currently at another 7 patches, plus a
"big" as-yet uncommitted patch which is virtually a rewrite of the
DMA preparation - diffstat wise, in total it's looking like:

 drivers/crypto/marvell/cesa.h |   8 +-
 drivers/crypto/marvell/hash.c | 258 ++
 drivers/crypto/marvell/tdma.c |   7 ++
 3 files changed, 151 insertions(+), 122 deletions(-)

and there's still a few more cases that need solving: zero bytes of
input should not give a zero hash, and an input which is the multiple
of the blocksize causes the hardware to hang - though I'm not yet
sure whether that's a result of my above changes.  At least with my
above changes, I now get stable and correct hash values for sizes
which do not hit any of those remaining bugs.

Given it's size so far, I'm not sure it makes much sense to backport
any of these fixes to stable; I think maybe we should just say "okay,
before these fixes, it's just broken."

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

2015-10-13 Thread Russell King - ARM Linux
On Tue, Oct 13, 2015 at 09:57:22PM +0800, Herbert Xu wrote:
> On Tue, Oct 13, 2015 at 02:55:18PM +0100, Russell King - ARM Linux wrote:
> >
> > Given it's size so far, I'm not sure it makes much sense to backport
> > any of these fixes to stable; I think maybe we should just say "okay,
> > before these fixes, it's just broken."
> 
> OK as long as your first patch goes into stable it should be fine
> as it'll essentially disable the marvell driver.

It will disable the Marvell driver, along with the SW implementations in
arch/arm/crypto/ which don't set .statesize, .import and .export.  I
would not be surprised if it also affects others too.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


hash import/export

2015-10-11 Thread Russell King - ARM Linux
Herbert,

I wonder if you can clear something up about the hash export/import
functionality.  In:

http://lists.infradead.org/pipermail/linux-arm-kernel/2015-May/344120.html

you seem to imply that the exported and imported state can't be defined
by the driver.

Boris tells me, "AFAIR, crypto users are expecting md5/sha1/sha256_state
depending on the crypto req state they are exporting".

>From what I can see, there is only one place in the core crypto code
where hash state is exported and imported, and that's in hash_accept(),
and that's always done with the same driver.  The only other place is
in the marvell cesa driver itself when initialising the hmac state.

Is there any reason a driver can't define its own structure to be
exported here which can be shared between each of the different methods
it supports?

Thanks.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: hash import/export

2015-10-11 Thread Russell King - ARM Linux
On Sun, Oct 11, 2015 at 08:34:27PM +0100, Russell King - ARM Linux wrote:
> Herbert,
> 
> I wonder if you can clear something up about the hash export/import
> functionality.  In:
> 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-May/344120.html
> 
> you seem to imply that the exported and imported state can't be defined
> by the driver.
> 
> Boris tells me, "AFAIR, crypto users are expecting md5/sha1/sha256_state
> depending on the crypto req state they are exporting".
> 
> From what I can see, there is only one place in the core crypto code
> where hash state is exported and imported, and that's in hash_accept(),
> and that's always done with the same driver.  The only other place is
> in the marvell cesa driver itself when initialising the hmac state.
> 
> Is there any reason a driver can't define its own structure to be
> exported here which can be shared between each of the different methods
> it supports?

A further question: it looks like struct md5_state's hash member is
always supposed to be in little endian format, whereas struct
sha*_state's state is always big endian format.  Is there a reason
why these are typed 'u32', rather than __le32 and __be32?

This would make the intention here more obvious, and should also allow
the opportunity for sparse to verify that we're getting the endianness
correct everywhere.

Thanks.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state

2015-10-10 Thread Russell King - ARM Linux
On Sat, Oct 10, 2015 at 06:46:07PM +0200, Boris Brezillon wrote:
> On Fri, 09 Oct 2015 20:43:33 +0100
> Russell King <rmk+ker...@arm.linux.org.uk> wrote:
> 
> > If the algorithm passed a zero statesize, do not pass a valid pointer
> > into the export/import functions.  Passing a valid pointer covers up
> > bugs in driver code which then go on to smash the kernel stack.
> > Instead, pass NULL, which will cause any attempt to write to the
> > pointer to fail.
> > 
> > Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
> > ---
> >  crypto/ahash.c | 3 ++-
> >  crypto/shash.c | 3 ++-
> >  2 files changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/crypto/ahash.c b/crypto/ahash.c
> > index 8acb886032ae..9c1dc8d6106a 100644
> > --- a/crypto/ahash.c
> > +++ b/crypto/ahash.c
> > @@ -544,7 +544,8 @@ static int ahash_prepare_alg(struct ahash_alg *alg)
> > struct crypto_alg *base = >halg.base;
> >  
> > if (alg->halg.digestsize > PAGE_SIZE / 8 ||
> > -   alg->halg.statesize > PAGE_SIZE / 8)
> > +   alg->halg.statesize > PAGE_SIZE / 8 ||
> > +   alg->halg.statesize == 0)
> 
> Just read Russel's answer to the cover letter, and I wonder if the
> following test wouldn't fix the problem:
> 
>   (alg->halg.statesize == 0 && (alg->import || alg->export))
> 
> I mean, the only valid case where statesize can be zero is when you
> don't have any state associated to the crypto algorithm, and if that's
> the case, ->import() and ->export() functions are useless, isn't ?

However, that brings up a question.

If you're using AF_ALG, and you attach to (say) the ARM Neon SHA512
implementation through it, and then use accept() to duplicate it's
state, what prevents the kernel from oopsing when hash_accept() calls
crypto_ahash_export(), which then dereferences the NULL alg->export
function pointer?

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

2015-10-10 Thread Russell King - ARM Linux
On Sat, Oct 10, 2015 at 12:29:25PM +0100, Russell King - ARM Linux wrote:
> On Sat, Oct 10, 2015 at 12:31:29PM +0200, Arnaud Ebalard wrote:
> > Russell King - ARM Linux <li...@arm.linux.org.uk> writes:
> > > Software:
> > > The 'numbers' are in 1000s of bytes per second processed.
> > > type 16 bytes 64 bytes256 bytes   1024 bytes   8192 
> > > bytes
> > > md5  13948.89k42477.61k   104619.41k   165140.82k   
> > > 199273.13k
> > > sha1 13091.91k36463.89k75393.88k   103893.33k   
> > > 117104.50k
> > > sha256   13573.92k30492.25k52700.33k64247.81k
> > > 68722.69k
> > >
> > > Hardware:
> > > The 'numbers' are in 1000s of bytes per second processed.
> > > type 16 bytes 64 bytes256 bytes   1024 bytes   8192 
> > > bytes
> > > md5   3964.55k13782.11k43181.71k   180263.38k  
> > > 1446616.18k
> > > sha1  4609.16k 8922.35k35422.87k   333575.31k  
> > > 2122547.20k
> > > sha256   13519.62k30484.10k52547.47k64285.21k
> > > 68530.60k
> 
> Okay, the reason for the difference in SHA256 speed is because the
> "openssl speed" code *totally* *bypasses* the engine support, whereas
> the md5 and sha1 do not.  It even bypasses the normal method used to
> get hold of the sha256 implementation (EVP_sha256), and goes straight
> to using SHA256() directly in openssl/crypto/sha/sha256.c.  It looks
> like the same goes for the AES tests too.

Okay, what's required is openssl speed -evp <cipher|digest> and it can
only do one at a time.  That's really confusing and non-intuitive, given
that some of the non-evp options end up testing the EVP methods.

So, running:

$ openssl speed -evp aes-128-cbc

I get the kernel spitting out a number of these warnings during its run:

DRBG: could not allocate digest TFM handle: hmac(sha512)

I also notice is that 50% of CPU time is spent in the kernel, presumably
because the lack of hardware sha512 means that we end up having to do
sha512 in software in the kernel.

I _also_ notice that despite having the ARM assembly crypto functions
enabled and built-in to the kernel, they don't appear in /proc/crypto -
and this is because of patch 1 in this series, which blocks out any
crypto driver which has a zero statesize (as the ARM crypto functions
appear to have.)  I found this by rebuilding the ARM crypto stuff as
modules, and then trying to insert them:

# modprobe sha512-arm
modprobe: ERROR: could not insert 'sha512_arm': Invalid argument

So, I think it's best if this patch series is *NOT* merged until someone
who knows the kernel crypto code gets to grips with what's supposed to be
done in various parts of the code.  Yes, I know that Herbert suggested
the approach in patch 1, but that _will_ cause regressions like the above
when if it's merged.

Either the ARM crypto code in arch/arm/crypto is buggy, or the approach
in patch 1 is buggy.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

2015-10-10 Thread Russell King - ARM Linux
On Sat, Oct 10, 2015 at 12:31:29PM +0200, Arnaud Ebalard wrote:
> Hi Russel,
   ^

> Russell King - ARM Linux <li...@arm.linux.org.uk> writes:
> > Software:
> > The 'numbers' are in 1000s of bytes per second processed.
> > type 16 bytes 64 bytes256 bytes   1024 bytes   8192 
> > bytes
> > md5  13948.89k42477.61k   104619.41k   165140.82k   
> > 199273.13k
> > sha1 13091.91k36463.89k75393.88k   103893.33k   
> > 117104.50k
> > sha256   13573.92k30492.25k52700.33k64247.81k
> > 68722.69k
> >
> > Hardware:
> > The 'numbers' are in 1000s of bytes per second processed.
> > type 16 bytes 64 bytes256 bytes   1024 bytes   8192 
> > bytes
> > md5   3964.55k13782.11k43181.71k   180263.38k  
> > 1446616.18k
> > sha1  4609.16k 8922.35k35422.87k   333575.31k  
> > 2122547.20k
> > sha256   13519.62k30484.10k52547.47k64285.21k
> > 68530.60k

Okay, the reason for the difference in SHA256 speed is because the
"openssl speed" code *totally* *bypasses* the engine support, whereas
the md5 and sha1 do not.  It even bypasses the normal method used to
get hold of the sha256 implementation (EVP_sha256), and goes straight
to using SHA256() directly in openssl/crypto/sha/sha256.c.  It looks
like the same goes for the AES tests too.

> I had a lot of performance results at various levels (tcrypt module on
> variations of the drivers (tasklet, threaded irq, full polling, etc),
> IPsec tunnel and transport mode through to see how it behaves w/ two
> mvneta instances also eating CPU cycles for incoming/outgoing packets)
> but those where done on an encryption use case. Some are provided
> in [2]. In an early (read dirty) polling-based version of the driver,
> the CESA on an Armada 370 (mirabox) was verified to be capable of near
> 100MB/s on buffers of 1500+ bytes for AES CBC encryption. Current
> version of the driver is not as good (say half that value) but it
> behaves better. A Mirabox can easily route 1500 bytes packets at 100MB/s
> between its two interfaces but when you mix both using IPsec in tunnel
> mode on one side, you end up w/ perfs between 10 to 15MB/s, IIRC. I
> think it's interesting to see where it ends up w/ the engine exposed to
> userland consumers (e.g. sth like SSH).
> 
> I cannot promise a huge amount of time but I'll try and find some to
> play w/ AF_ALG using openssl and CESA in the coming weeks.

I think what we draw from my investigation is that "openssl speed" is
utterly crap - you don't actually know what's being tested there.  Some
things test the engine, others bypass the engine infrastructure totally
and test the openssl software implementation instead.

So, if you think "openssl speed" is a good way to measure the speed of
digests and ciphers that openssl supplies to applications, *think again*.
It doesn't.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions

2015-10-09 Thread Russell King - ARM Linux
On Sat, Oct 10, 2015 at 12:37:33AM +0200, Arnaud Ebalard wrote:
> Hi Russel,
> 
> Russell King <rmk+ker...@arm.linux.org.uk> writes:
> 
> > As all the import functions and export functions are virtually
> > identical, factor out their common parts into a generic
> > mv_cesa_ahash_import() and mv_cesa_ahash_export() respectively.  This
> > performs the actual import or export, and we pass the data pointers and
> > length into these functions.
> >
> > We have to switch a % const operation to do_div() in the common import
> > function to avoid provoking gcc to use the expensive 64-bit by 64-bit
> > modulus operation.
> >
> > Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
> 
> Thanks for the refactoring and for the fixes. All patches look good to 
> me. Out of curiosity, can I ask what perf you get w/ openssh or openssl
> using AF_ALG and the CESA?

I would do, but it seems this AF_ALG plugin for openssl isn't
actually using it for encryption.  When I try:

openssl speed -engine af_alg aes-128-cbc

I get results for using openssl's software implementation.  If I do:

openssl speed -engine af_alg md5

then I get results from using the kernel's MD5.  Hence, I think the
only thing that I think openssh is using it for is the digest stuff,
not the crypto itself.  I can't be certain about that though.

I've tried debugging the af_alg engine plugin, but I'm not getting
very far (I'm not an openssl hacker!)  I see it registering the
function to get the ciphers (via ENGINE_set_ciphers), and I see this
called several times, returning a list of NID_xxx values describing
the methods it supports, which includes aes-128-cbc.  However,
unlike the equivalent digest function, I never see it called
requesting any of the ciphers.  Maybe it's an openssl bug, or a
"feature" preventing hardware crypto?  Maybe something is missing
from its initialisation?  I've no idea yet.  It seems I'm not alone
in this - this report from April 2015 is exactly what I'm seeing:

https://mta.openssl.org/pipermail/openssl-users/2015-April/001124.html

However, I'm coming to the conclusion that AF_ALG with openssl is a
dead project, and the only interface that everyone is using for that
is cryptodev - probably contary to Herbert and/or DaveM's wishes.  For
example, the openwrt guys seem to only support cryptodev, according to
their wiki page on the subject of hardware crypto:

http://wiki.openwrt.org/doc/hardware/cryptographic.hardware.accelerators

Here's the references to code for AF_ALG with openssl I've found so far:

Original af_alg plugin (dead):

http://src.carnivore.it/users/common/af_alg/

3rd party "maintained" af_alg openssl plugin, derived from commit
1851bbb010c38878c83729be844f168192059189 in the above repo but with
no history:

https://github.com/RidgeRun/af-alg-rr

and that doesn't contain any changes to the C code originally committed.
Whether this C code contains changes or not is anyone's guess: there's
no way to refer back to the original repository.

Anyway, here's the digest results:

Software:
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes
md5  13948.89k42477.61k   104619.41k   165140.82k   199273.13k
sha1 13091.91k36463.89k75393.88k   103893.33k   117104.50k
sha256   13573.92k30492.25k52700.33k64247.81k68722.69k

Hardware:
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes
md5   3964.55k13782.11k43181.71k   180263.38k  1446616.18k
sha1  4609.16k 8922.35k35422.87k   333575.31k  2122547.20k
sha256   13519.62k30484.10k52547.47k64285.21k68530.60k

There's actually something suspicious while running these tests:

Doing md5 for 3s on 16 size blocks: 32212 md5's in 0.13s
Doing md5 for 3s on 64 size blocks: 23688 md5's in 0.11s
Doing md5 for 3s on 256 size blocks: 23615 md5's in 0.14s
Doing md5 for 3s on 1024 size blocks: 22885 md5's in 0.13s
Doing md5 for 3s on 8192 size blocks: 15893 md5's in 0.09s
Doing sha1 for 3s on 16 size blocks: 31688 sha1's in 0.11s
Doing sha1 for 3s on 64 size blocks: 23700 sha1's in 0.17s
Doing sha1 for 3s on 256 size blocks: 23523 sha1's in 0.17s
Doing sha1 for 3s on 1024 size blocks: 22803 sha1's in 0.07s
Doing sha1 for 3s on 8192 size blocks: 15546 sha1's in 0.06s
Doing sha256 for 3s on 16 size blocks: 2518030 sha256's in 2.98s
Doing sha256 for 3s on 64 size blocks: 1419416 sha256's in 2.98s
Doing sha256 for 3s on 256 size blocks: 613738 sha256's in 2.99s
Doing sha256 for 3s on 1024 size blocks: 187080 sha256's in 2.98s
Doing sha256 for 3s on 8192 size blocks: 25013 sha256's in 2.99s

from the hardware - note the "in" figures are rediculously low, yet
t

[PATCH 3/3] crypto: marvell: initialise struct mv_cesa_ahash_req

2015-10-09 Thread Russell King
When a AF_ALG fd is accepted a second time (hence hash_accept() is
used), hash_accept_parent() allocates a new private context using
sock_kmalloc().  This context is uninitialised.  After use of the new
fd, we eventually end up with the kernel complaining:

marvell-cesa f109.crypto: dma_pool_free cesa_padding, c0627770/0 (bad dma)

where c0627770 is a random address.  Poisoning the memory allocated by
the above sock_kmalloc() produces kernel oopses within the marvell hash
code, particularly the interrupt handling.

The following simplfied call sequence occurs:

hash_accept()
  crypto_ahash_export()
marvell hash export function
  af_alg_accept()
hash_accept_parent()<== allocates uninitialised struct hash_ctx
  crypto_ahash_import()
marvell hash import function

hash_ctx contains the struct mv_cesa_ahash_req in its req.__ctx member,
and, as the marvell hash import function only partially initialises
this structure, we end up with a lot of members which are left with
whatever data was in memory prior to sock_kmalloc().

Add zero-initialisation of this structure.

Signed-off-by: Russell King <rmk+ker...@arm.linux.org.uk>
---
 drivers/crypto/marvell/hash.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
index a259aced3b42..699839816505 100644
--- a/drivers/crypto/marvell/hash.c
+++ b/drivers/crypto/marvell/hash.c
@@ -831,6 +831,8 @@ static int mv_cesa_md5_import(struct ahash_request *req, 
const void *in)
unsigned int cache_ptr;
int ret;
 
+   memset(creq, 0, sizeof(*creq));
+
creq->len = in_state->byte_count;
memcpy(creq->state, in_state->hash, digsize);
creq->cache_ptr = 0;
@@ -921,6 +923,8 @@ static int mv_cesa_sha1_import(struct ahash_request *req, 
const void *in)
unsigned int cache_ptr;
int ret;
 
+   memset(creq, 0, sizeof(*creq));
+
creq->len = in_state->count;
memcpy(creq->state, in_state->state, digsize);
creq->cache_ptr = 0;
@@ -1022,6 +1026,8 @@ static int mv_cesa_sha256_import(struct ahash_request 
*req, const void *in)
unsigned int cache_ptr;
int ret;
 
+   memset(creq, 0, sizeof(*creq));
+
creq->len = in_state->count;
memcpy(creq->state, in_state->state, digsize);
creq->cache_ptr = 0;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >