Re: [PATCH v9 00/20] simplify crypto wait for async op
On Sun, Oct 15, 2017 at 10:19:45AM +0100, Gilad Ben-Yossef wrote: > Many users of kernel async. crypto services have a pattern of > starting an async. crypto op and than using a completion > to wait for it to end. > > This patch set simplifies this common use case in two ways: > > First, by separating the return codes of the case where a > request is queued to a backlog due to the provider being > busy (-EBUSY) from the case the request has failed due > to the provider being busy and backlogging is not enabled > (-EAGAIN). > > Next, this change is than built on to create a generic API > to wait for a async. crypto operation to complete. > > The end result is a smaller code base and an API that is > easier to use and more difficult to get wrong. > > The patch set was boot tested on x86_64 and arm64 which > at the very least tests the crypto users via testmgr and > tcrypt but I do note that I do not have access to some > of the HW whose drivers are modified nor do I claim I was > able to test all of the corner cases. > > The patch set is based upon linux-next release tagged > next-20171013. Has there been any performance impact analysis of these changes? I ended up with patches for one of the crypto drivers which converted its interrupt handling to threaded interrupts being reverted because it caused a performance degredation. Moving code to latest APIs to simplify it is not always beneficial. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up
Re: [BISECT] ARM build errors on GCC v6.2 (crypto: arm/aes - replace scalar AES cipher)
On Sat, Jan 14, 2017 at 04:24:35PM +0200, Krzysztof Kozlowski wrote: > Hi, > > allyesconfig and multi_v7_defconfig fail to build on recent linux-next > on GCC 6.2.0. > > Errors: > ../arch/arm/crypto/aes-cipher-core.S: Assembler messages: > ../arch/arm/crypto/aes-cipher-core.S:21: Error: selected processor does not > support `tt .req ip' in ARM mode > ../arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- > `movw tt,#:lower16:crypto_ft_tab' > ../arch/arm/crypto/aes-cipher-core.S:174: Error: ARM register expected -- > `movt tt,#:upper16:crypto_ft_tab' > > Compiler: arm-linux-gnueabi-gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005 It's worth pointing out that _assembler_ messages come from _binutils_ rather than _gcc_. Please advise which version of _binutils_ you are using. Thanks. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] ARM: wire up HWCAP2 feature bits to the CPU modalias
On Mon, Jan 02, 2017 at 09:06:04PM +, Ard Biesheuvel wrote: > On 31 October 2016 at 16:13, Russell King - ARM Linux > <li...@armlinux.org.uk> wrote: > > On Sat, Oct 29, 2016 at 11:08:36AM +0100, Ard Biesheuvel wrote: > >> On 18 October 2016 at 11:52, Ard Biesheuvel <ard.biesheu...@linaro.org> > >> wrote: > >> > Wire up the generic support for exposing CPU feature bits via the > >> > modalias in /sys/device/system/cpu. This allows udev to automatically > >> > load modules for things like crypto algorithms that are implemented > >> > using optional instructions. > >> > > >> > Signed-off-by: Ard Biesheuvel <ard.biesheu...@linaro.org> > >> > --- > >> > arch/arm/Kconfig | 1 + > >> > arch/arm/include/asm/cpufeature.h | 32 > >> > 2 files changed, 33 insertions(+) > >> > > >> > >> Russell, > >> > >> do you have any concerns regarding this patch? If not, I will drop it > >> into the patch system. > > > > It's still something I need to look at... I've been offline last week, > > and sort-of offline the previous week, so I'm catching up. > > > > Hi Russell, > > Any thoughts yet? None, and the patch is well buried now that it'll take me a while to find... back in mid-October? Yea, I'll have to drop everything and go digging through my mailboxes to find it... and I'm just catching up (again) after a week and a bit's time offline - yep, it's wonderful timing. Sorry, no time to look at it right now, you're not the only one wanting my attention at the moment. Please try again in about a week's time - don't leave it a few months, and please include the patch. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/14] Revert "crypto: caam - get rid of tasklet"
Please include Thomas in this. On Wed, Nov 09, 2016 at 10:46:21AM +0200, Horia Geantă wrote: > This reverts commit 66d2e2028091a074aa1290d2eeda5ddb1a6c329c. > > Quoting from Russell's findings: > https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg21136.html > > [quote] > Okay, I've re-tested, using a different way of measuring, because using > openssl speed is impractical for off-loaded engines. I've decided to > use this way to measure the performance: > > dd if=/dev/zero bs=1048576 count=128 | /usr/bin/time openssl dgst -md5 > > For the threaded IRQs case gives: > > 0.05user 2.74system 0:05.30elapsed 52%CPU (0avgtext+0avgdata 2400maxresident)k > 0.06user 2.52system 0:05.18elapsed 49%CPU (0avgtext+0avgdata 2404maxresident)k > 0.12user 2.60system 0:05.61elapsed 48%CPU (0avgtext+0avgdata 2460maxresident)k > => 5.36s => 25.0MB/s > > and the tasklet case: > > 0.08user 2.53system 0:04.83elapsed 54%CPU (0avgtext+0avgdata 2468maxresident)k > 0.09user 2.47system 0:05.16elapsed 49%CPU (0avgtext+0avgdata 2368maxresident)k > 0.10user 2.51system 0:04.87elapsed 53%CPU (0avgtext+0avgdata 2460maxresident)k > => 4.95 => 27.1MB/s > > which corresponds to an 8% slowdown for the threaded IRQ case. So, > tasklets are indeed faster than threaded IRQs. > > [...] > > I think I've proven from the above that this patch needs to be reverted > due to the performance regression, and that there _is_ most definitely > a deterimental effect of switching from tasklets to threaded IRQs. > [/quote] > > Signed-off-by: Horia Geantă> --- > drivers/crypto/caam/intern.h | 1 + > drivers/crypto/caam/jr.c | 25 - > 2 files changed, 17 insertions(+), 9 deletions(-) > > diff --git a/drivers/crypto/caam/intern.h b/drivers/crypto/caam/intern.h > index 5d4c05074a5c..e2bcacc1a921 100644 > --- a/drivers/crypto/caam/intern.h > +++ b/drivers/crypto/caam/intern.h > @@ -41,6 +41,7 @@ struct caam_drv_private_jr { > struct device *dev; > int ridx; > struct caam_job_ring __iomem *rregs;/* JobR's register space */ > + struct tasklet_struct irqtask; > int irq;/* One per queue */ > > /* Number of scatterlist crypt transforms active on the JobR */ > diff --git a/drivers/crypto/caam/jr.c b/drivers/crypto/caam/jr.c > index 7331ea734f37..c8604dfadbf5 100644 > --- a/drivers/crypto/caam/jr.c > +++ b/drivers/crypto/caam/jr.c > @@ -73,6 +73,8 @@ static int caam_jr_shutdown(struct device *dev) > > ret = caam_reset_hw_jr(dev); > > + tasklet_kill(>irqtask); > + > /* Release interrupt */ > free_irq(jrp->irq, dev); > > @@ -128,7 +130,7 @@ static irqreturn_t caam_jr_interrupt(int irq, void > *st_dev) > > /* >* Check the output ring for ready responses, kick > - * the threaded irq if jobs done. > + * tasklet if jobs done. >*/ > irqstate = rd_reg32(>rregs->jrintstatus); > if (!irqstate) > @@ -150,13 +152,18 @@ static irqreturn_t caam_jr_interrupt(int irq, void > *st_dev) > /* Have valid interrupt at this point, just ACK and trigger */ > wr_reg32(>rregs->jrintstatus, irqstate); > > - return IRQ_WAKE_THREAD; > + preempt_disable(); > + tasklet_schedule(>irqtask); > + preempt_enable(); > + > + return IRQ_HANDLED; > } > > -static irqreturn_t caam_jr_threadirq(int irq, void *st_dev) > +/* Deferred service handler, run as interrupt-fired tasklet */ > +static void caam_jr_dequeue(unsigned long devarg) > { > int hw_idx, sw_idx, i, head, tail; > - struct device *dev = st_dev; > + struct device *dev = (struct device *)devarg; > struct caam_drv_private_jr *jrp = dev_get_drvdata(dev); > void (*usercall)(struct device *dev, u32 *desc, u32 status, void *arg); > u32 *userdesc, userstatus; > @@ -230,8 +237,6 @@ static irqreturn_t caam_jr_threadirq(int irq, void > *st_dev) > > /* reenable / unmask IRQs */ > clrsetbits_32(>rregs->rconfig_lo, JRCFG_IMSK, 0); > - > - return IRQ_HANDLED; > } > > /** > @@ -389,10 +394,11 @@ static int caam_jr_init(struct device *dev) > > jrp = dev_get_drvdata(dev); > > + tasklet_init(>irqtask, caam_jr_dequeue, (unsigned long)dev); > + > /* Connect job ring interrupt handler. */ > - error = request_threaded_irq(jrp->irq, caam_jr_interrupt, > - caam_jr_threadirq, IRQF_SHARED, > - dev_name(dev), dev); > + error = request_irq(jrp->irq, caam_jr_interrupt, IRQF_SHARED, > + dev_name(dev), dev); > if (error) { > dev_err(dev, "can't connect JobR %d interrupt (%d)\n", > jrp->ridx, jrp->irq); > @@ -454,6 +460,7 @@ static int caam_jr_init(struct device *dev) > out_free_irq: > free_irq(jrp->irq, dev); > out_kill_deq: > + tasklet_kill(>irqtask); > return error; > } > > -- >
Re: [PATCH 1/5] ARM: wire up HWCAP2 feature bits to the CPU modalias
On Sat, Oct 29, 2016 at 11:08:36AM +0100, Ard Biesheuvel wrote: > On 18 October 2016 at 11:52, Ard Biesheuvelwrote: > > Wire up the generic support for exposing CPU feature bits via the > > modalias in /sys/device/system/cpu. This allows udev to automatically > > load modules for things like crypto algorithms that are implemented > > using optional instructions. > > > > Signed-off-by: Ard Biesheuvel > > --- > > arch/arm/Kconfig | 1 + > > arch/arm/include/asm/cpufeature.h | 32 > > 2 files changed, 33 insertions(+) > > > > Russell, > > do you have any concerns regarding this patch? If not, I will drop it > into the patch system. It's still something I need to look at... I've been offline last week, and sort-of offline the previous week, so I'm catching up. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND][PATCH] crypto: caam: add support for iMX6UL
On Mon, Oct 17, 2016 at 01:28:00PM +0200, Marcus Folkesson wrote: > i.MX6UL does only require three clocks to enable CAAM module. > > Signed-off-by: Marcus Folkesson> Acked-by: Rob Herring > Reviewed-by: Horia Geantă > --- > .../devicetree/bindings/crypto/fsl-sec4.txt| 20 + > drivers/crypto/caam/ctrl.c | 35 > -- > 2 files changed, 40 insertions(+), 15 deletions(-) > > diff --git a/Documentation/devicetree/bindings/crypto/fsl-sec4.txt > b/Documentation/devicetree/bindings/crypto/fsl-sec4.txt > index adeca34..10a425f 100644 > --- a/Documentation/devicetree/bindings/crypto/fsl-sec4.txt > +++ b/Documentation/devicetree/bindings/crypto/fsl-sec4.txt > @@ -123,6 +123,9 @@ PROPERTIES > > > EXAMPLE > + > +iMX6QDL/SX requires four clocks > + > crypto@30 { > compatible = "fsl,sec-v4.0"; > fsl,sec-era = <2>; > @@ -139,6 +142,23 @@ EXAMPLE > clock-names = "mem", "aclk", "ipg", "emi_slow"; > }; > > + > +iMX6UL does only require three clocks > + > + crypto: caam@214 { > + compatible = "fsl,sec-v4.0"; > + #address-cells = <1>; > + #size-cells = <1>; > + reg = <0x214 0x3c000>; > + ranges = <0 0x214 0x3c000>; > + interrupts = ; > + > + clocks = < IMX6UL_CLK_CAAM_MEM>, > + < IMX6UL_CLK_CAAM_ACLK>, > + < IMX6UL_CLK_CAAM_IPG>; > + clock-names = "mem", "aclk", "ipg"; > + }; > + > = > Job Ring (JR) Node > > diff --git a/drivers/crypto/caam/ctrl.c b/drivers/crypto/caam/ctrl.c > index 0ec112e..5abaf37 100644 > --- a/drivers/crypto/caam/ctrl.c > +++ b/drivers/crypto/caam/ctrl.c > @@ -329,8 +329,8 @@ static int caam_remove(struct platform_device *pdev) > clk_disable_unprepare(ctrlpriv->caam_ipg); > clk_disable_unprepare(ctrlpriv->caam_mem); > clk_disable_unprepare(ctrlpriv->caam_aclk); > - clk_disable_unprepare(ctrlpriv->caam_emi_slow); > - > + if (!of_machine_is_compatible("fsl,imx6ul")) > + clk_disable_unprepare(ctrlpriv->caam_emi_slow); There is no need to re-lookup the platform here. Can't you check the validity of ctrlpriv->caam_emi_slow ? > return 0; > } > > @@ -481,14 +481,16 @@ static int caam_probe(struct platform_device *pdev) > } > ctrlpriv->caam_aclk = clk; > > - clk = caam_drv_identify_clk(>dev, "emi_slow"); > - if (IS_ERR(clk)) { > - ret = PTR_ERR(clk); > - dev_err(>dev, > - "can't identify CAAM emi_slow clk: %d\n", ret); > - return ret; > + if (!of_machine_is_compatible("fsl,imx6ul")) { > + clk = caam_drv_identify_clk(>dev, "emi_slow"); > + if (IS_ERR(clk)) { > + ret = PTR_ERR(clk); > + dev_err(>dev, > + "can't identify CAAM emi_slow clk: %d\n", ret); > + return ret; > + } > + ctrlpriv->caam_emi_slow = clk; > } > - ctrlpriv->caam_emi_slow = clk; > > ret = clk_prepare_enable(ctrlpriv->caam_ipg); > if (ret < 0) { > @@ -509,11 +511,13 @@ static int caam_probe(struct platform_device *pdev) > goto disable_caam_mem; > } > > - ret = clk_prepare_enable(ctrlpriv->caam_emi_slow); > - if (ret < 0) { > - dev_err(>dev, "can't enable CAAM emi slow clock: %d\n", > - ret); > - goto disable_caam_aclk; > + if (!of_machine_is_compatible("fsl,imx6ul")) { Same here. > + ret = clk_prepare_enable(ctrlpriv->caam_emi_slow); > + if (ret < 0) { > + dev_err(>dev, "can't enable CAAM emi slow clock: > %d\n", > + ret); > + goto disable_caam_aclk; > + } > } > > /* Get configuration properties from device tree */ > @@ -829,7 +833,8 @@ caam_remove: > iounmap_ctrl: > iounmap(ctrl); > disable_caam_emi_slow: > - clk_disable_unprepare(ctrlpriv->caam_emi_slow); > + if (!of_machine_is_compatible("fsl,imx6ul")) > + clk_disable_unprepare(ctrlpriv->caam_emi_slow); and here. > disable_caam_aclk: > clk_disable_unprepare(ctrlpriv->caam_aclk); > disable_caam_mem: > -- > 2.8.0 > -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crypto: caam from tasklet to threadirq
Okay, I've re-tested, using a different way of measuring, because using openssl speed is impractical for off-loaded engines. I've decided to use this way to measure the performance: dd if=/dev/zero bs=1048576 count=128 | /usr/bin/time openssl dgst -md5 For the threaded IRQs case gives: 0.05user 2.74system 0:05.30elapsed 52%CPU (0avgtext+0avgdata 2400maxresident)k 0.06user 2.52system 0:05.18elapsed 49%CPU (0avgtext+0avgdata 2404maxresident)k 0.12user 2.60system 0:05.61elapsed 48%CPU (0avgtext+0avgdata 2460maxresident)k => 5.36s => 25.0MB/s and the tasklet case: 0.08user 2.53system 0:04.83elapsed 54%CPU (0avgtext+0avgdata 2468maxresident)k 0.09user 2.47system 0:05.16elapsed 49%CPU (0avgtext+0avgdata 2368maxresident)k 0.10user 2.51system 0:04.87elapsed 53%CPU (0avgtext+0avgdata 2460maxresident)k => 4.95 => 27.1MB/s which corresponds to an 8% slowdown for the threaded IRQ case. So, tasklets are indeed faster than threaded IRQs. I guess the reason is that tasklets are much simpler, being able to run just before we return to userspace without involving scheduler overheads, but that's speculation. I've tried to perf it, but... Samples: 31K of event 'cycles', Event count (approx.): 3552246846 Overhead Command Shared Object Symbol + 33.22% kworker/0:1 [kernel.vmlinux] [k] __do_softirq + 15.78% irq/311-2101000 [kernel.vmlinux] [k] __do_softirq +7.49% irqbalance [kernel.vmlinux] [k] __do_softirq +7.26% openssl [kernel.vmlinux] [k] __do_softirq +5.71% ksoftirqd/0 [kernel.vmlinux] [k] __do_softirq +3.64% kworker/0:2 [kernel.vmlinux] [k] __do_softirq +3.52% swapper [kernel.vmlinux] [k] __do_softirq +3.14% kworker/0:1 [kernel.vmlinux] [k] _raw_spin_unlock_irq I was going to try to get the threaded IRQ case, but I've ended up with perf getting buggered because of the iMX6 SMP perf disfunctionality: [ 3448.810416] irq 24: nobody cared (try booting with the "irqpoll" option) ... [ 3448.824528] Disabling IRQ #24 caused by FSL's utterly brain-dead idea of routing all the perf interrupts to single non-CPU local interrupt input, and the refusal of kernel folk to find an acceptable solution to support this. So, sorry, I'm not going to bother trying to get any further with this. If the job was not made harder stupid hardware design and kernel politics, then I might be more inclined to do deeper investigation, but right now I'm finding that I'm not interested in trying to jump through these stupid hoops. I think I've proven from the above that this patch needs to be reverted due to the performance regression, and that there _is_ most definitely a deterimental effect of switching from tasklets to threaded IRQs. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crypto: caam from tasklet to threadirq
On Fri, Sep 16, 2016 at 02:01:00PM +, Cata Vasile wrote: > Hi, > > We've tried to test and benchmark your submitted work[1]. > > Cryptographic offloading is also used in IPsec in the Linux Kernel. In > heavy traffic scenarios, the NIC driver competes with the crypto device > driver. Most NICs use the NAPI context, which is one of the most > prioritized context types. In IPsec scenarios the performance is > trashed because, although raw data gets in to device, the data is > encrypted/decrypted and the dequeue code in CAAM driver has a hard time > being scheduled to actually call the callback to notify the networking > stack it can continue working with that data. Having received a reply from Thomas Gleixner today, there appears to be some disagreement with your findings, and a suggestion that the problem needs proper and more in-depth investigation. Thomas indicates that the NAPI processing shows an improvement when moved to the same context that threaded interrupts run in, as opposed to the current softirq context - which also would run the tasklets. What I would say is that if threaded IRQs are causing harm, then there seems to be something very wrong somewhere. > Being this scenario, at heavy load, the Kernel warns on rcu stalls and > the forwarding path has a lot of latency. Have you tried benchmarking > the board you used for testing? It's way too long ago for me to remember - these patches were created almost a year ago - October 20th 2015, which is when I'd have tested them. So, I'm afraid I can't help very much at this point, apart from trying to re-run some benchmarks. I'd suggest testing the openssl (with AF_ALG support), which is probably what I tested and benchmarked. However, as I say, it's far too long ago for me to really remember at this point. > I have ran some on our other platforms. The after benchmark fails to > run at the top level of the before results. Sorry, that last sentence doesn't make any sense to me. I don't have the bandwidth to look at this, and IPsec doesn't interest me one bit - I've never been able to work out how to setup IPsec locally. From what I remember when I looked into it many years ago, you had to have significant information about ipsec to get it up and running. Maybe things have changed since then, I don't know. If you want me to reproduce it, please send me a step-by-step idiots guide on setting up a working test scenario which reproduces your problem. Thanks. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
AF_ALG zero-size hash fails
Hi, While testing AF_ALG with openssl af-alg-rr, I've found that: OPENSSL_CONF=/shared/crypto/openssl-imx.cnf openssl dgst -sha1 http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AF_ALG broken?
On Tue, Aug 09, 2016 at 03:14:02PM +0800, Herbert Xu wrote: > On Tue, Aug 09, 2016 at 08:08:59AM +0100, Russell King - ARM Linux wrote: > > > > I thought I gave the commands and link to your example code. The > > openssl case is md5, though sha* also gives the same result. Your > > example code was sha1 iirc. I guess none of these would be using > > HMAC - the openssl cases used to give results compatible with the > > md5sum/ sha1sum etc userspace commands. > > > > /proc/crypto: > > > > name : md5 > > driver : md5-caam > > Right, caam is providing a setkey function for md5, which leads the > API to think that a key is required. We should fix it so that setkey > is only set for the HMAC-variant. Thanks, that works nicely again, and passes my tests. 8< From: Russell King <rmk+ker...@armlinux.org.uk> Subject: [PATCH] crypto: caam - fix non-hmac hashes Since 6de62f15b581 ("crypto: algif_hash - Require setkey before accept(2)"), the AF_ALG interface requires userspace to provide a key to any algorithm that has a setkey method. However, the non-HMAC algorithms are not keyed, so setting a key is unnecessary. Fix this by removing the setkey method from the non-keyed hash algorithms. Fixes: 6de62f15b581 ("crypto: algif_hash - Require setkey before accept(2)") Cc: <sta...@vger.kernel.org> Signed-off-by: Russell King <rmk+ker...@armlinux.org.uk> --- drivers/crypto/caam/caamhash.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/crypto/caam/caamhash.c b/drivers/crypto/caam/caamhash.c index ea284e3909ef..9d7fc9ec0b7e 100644 --- a/drivers/crypto/caam/caamhash.c +++ b/drivers/crypto/caam/caamhash.c @@ -1950,6 +1950,7 @@ caam_hash_alloc(struct caam_hash_template *template, template->name); snprintf(alg->cra_driver_name, CRYPTO_MAX_ALG_NAME, "%s", template->driver_name); + t_alg->ahash_alg.setkey = NULL; } alg->cra_module = THIS_MODULE; alg->cra_init = caam_hash_cra_init; -- 2.1.0 -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: AF_ALG broken?
On Mon, Aug 08, 2016 at 01:47:33PM -0400, Jeffrey Walton wrote: > > When trying to use the openssl AF_ALG module with 4.8-rc1 with imx > > caam, I get this: > > > > $ OPENSSL_CONF=/shared/crypto/openssl-imx.cnf strace openssl dgst -md5 > > > ... > > socket(PF_ALG, SOCK_SEQPACKET, 0) = 3 > > close(3)= 0 > > socket(PF_ALG, SOCK_SEQPACKET, 0) = 3 > > bind(3, {sa_family=AF_ALG, sa_data="hash\0\0\0\0\0\0\0\0\0\0"}, 88) = 0 > > accept(3, 0, NULL) = 4 > > fstat64(0, {st_mode=S_IFREG|0755, st_size=666864, ...}) = 0 > > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = > > 0xb6fab000 > > read(0, > > "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\2\0(\0\1\0\0\0\21'\2\0004\0\0\0"..., 8192) > > = 8192 > > send(4, > > "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\2\0(\0\1\0\0\0\21'\2\0004\0\0\0"..., 8192, > > MSG_MORE) = -1 ENOKEY (Required key not available) > > As far as I know from testing on x86, it has never worked as expected. > I believe you have to use 'sendto' and 'recvfrom' because 'send' and > 'recv' use default structures, and they configure the object > incorrectly. This used to work, because that's how I've tested my previous CAAM and Marvell CESA patches. So, this is not a case of "never worked" but is definitely a regression caused by some kernel change. There's also people's presentations that illustrate example code: https://events.linuxfoundation.org/sites/events/files/slides/lcj-2014-crypto-user.pdf which can also be found at: https://lwn.net/Articles/410833/ and that example code comes from Herbert, so one would assume that it's tested and was working. Note that it doesn't use any send*, but uses write(). Testing that code on 4.8-rc (and 4.7 fwiw) gives: socket(PF_ALG, SOCK_SEQPACKET, 0) = 3 bind(3, {sa_family=AF_ALG, sa_data="hash\0\0\0\0\0\0\0\0\0\0"}, 88) = 0 accept(3, 0, NULL) = 4 write(4, "abc", 3) = -1 ENOKEY (Required key not available) read(4, 0xbec50508, 20) = -1 ENOKEY (Required key not available) IOW, the same problem - and it seems not to be a recent regression. Since the last time I tested CESA or CAAM was back in 4.4 times, it's got to be something between 4.4 and 4.7. Looking at the history, my guess would be the setkey changes - crypto: algif_skcipher - Require setkey before accept(2) crypto: af_alg - Disallow bind/setkey/... after accept(2) crypto: af_alg - Add nokey compatibility path crypto: hash - Add crypto_ahash_has_setkey crypto: algif_hash - Require setkey before accept(2) -- Rmk's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/11] Further iMX CAAM updates
This is a re-post (with hopefully bugs fixed from December's review). Untested, because AF_ALG appears to be broken in 4.8-rc1. Maybe someone can provide some hints how to test using tcrypt please? Here are further imx-caam updates that I've had since before the previous merge window. Please review and (I guess) if Freescale folk can provide acks etc that would be nice. Thanks. drivers/crypto/caam/caamhash.c | 540 ++--- drivers/crypto/caam/intern.h | 1 - drivers/crypto/caam/jr.c | 25 +- 3 files changed, 305 insertions(+), 261 deletions(-) -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] scatterlist: add sg_alloc_table_from_buf() helper
On Thu, Mar 31, 2016 at 04:45:57PM +0200, Boris Brezillon wrote: > Hi Russell, > > On Thu, 31 Mar 2016 15:14:13 +0100 > Russell King - ARM Linux <li...@arm.linux.org.uk> wrote: > > > On Thu, Mar 31, 2016 at 02:29:42PM +0200, Boris Brezillon wrote: > > > sg_alloc_table_from_buf() provides an easy solution to create an sg_table > > > from a virtual address pointer. This function takes care of dealing with > > > vmallocated buffers, buffer alignment, or DMA engine limitations (maximum > > > DMA transfer size). > > > > Please note that the DMA API does not take account of coherency of memory > > regions other than non-high/lowmem - there are specific extensions to > > deal with this. > > Ok, you said 'non-high/lowmem', this means vmalloced and kmapped buffers > already fall in this case, right? > > Could you tell me more about those specific extensions? I was slightly confused - the extensions I was thinking of are those listed at the bottom of Documentation/cachetlb.txt, which have nothing to do with DMA. However, it's probably worth reading Documentation/DMA-API-HOWTO.txt to read up on what kinds of memory are considered to be DMA-able in the kernel. > > What this means is that having an API that takes any virtual address > > pointer, converts it to a scatterlist which is then DMA mapped, is > > unsafe. > > Which means some implementations already get this wrong (see > spi_map_buf(), and I'm pretty sure it's not the only one). Quite possible, but that is driver stuff, and driver stuff gets things wrong all the time. :) > > It'll be okay for PIPT and non-aliasing VIPT cache architectures, but > > for other cache architectures this will hide this problem and make > > review harder. > > > > Ok, you lost me. I'll have to do my homework and try to understand what > this means :). P = physical address V = virtual address I = indexed T = tag The tag is held in each cache line. When a location is looked up in the cache, an index is used to locate a set of cache lines and the tag is compared to check which cache line in the set is the correct one (or whether the address even exists in the cache.) How the index and tag are derived varies between cache architectures. PIPT = indexed by physical address, tagged with physical address. Never aliases with itself in the presence of multiple virtual mappings. VIPT = indexed by virtual address, tagged with physical address. If the bits from the virtual address do not overlap the MMU page size, it is also alias free, otherwise aliases can exist, but can be eliminated by "cache colouring" - ensuring that a physical address is always mapped with the same overlapping bits. VIVT = indexed by virtual address, tagged with virtual address. The worst kind of cache, since every different mapping of the same physical address is guaranteed by design to alias with other mappings. There is little cache colouring between different kernel mappings (eg, between lowmem and vmalloc space.) What this means is that, while the DMA API takes care of DMA aliases in the lowmem mappings, an alias-prone VIPT cache will remain incoherent with DMA if it is remapped into vmalloc space, and the mapping happens to have a different cache colour. In other words, this is a data corruption issue. Hence, taking a range of vmalloc() addresses, converting them into a scatterlist, then using the DMA API on the scatterlist _only_ guarantees that the lowmem (and kmap'd highmem mappings) are coherent with DMA. There is no way for the DMA API to know that other mappings exist, and obviously flushing every possible cache line just because a mapping might exist multiplies the expense of the DMA API: not only in terms of time spent running through all the possibilities, which doubles for every aliasing bit of VIPT, but also TLB pressure since you'd have to create a mapping for each alias and tear it back down. VIVT is even worse, since there is no other virtual mapping which is coherent, would need to be known, and each mapping would need to be individually flushed. -- RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] scatterlist: add sg_alloc_table_from_buf() helper
On Thu, Mar 31, 2016 at 02:29:42PM +0200, Boris Brezillon wrote: > sg_alloc_table_from_buf() provides an easy solution to create an sg_table > from a virtual address pointer. This function takes care of dealing with > vmallocated buffers, buffer alignment, or DMA engine limitations (maximum > DMA transfer size). Please note that the DMA API does not take account of coherency of memory regions other than non-high/lowmem - there are specific extensions to deal with this. What this means is that having an API that takes any virtual address pointer, converts it to a scatterlist which is then DMA mapped, is unsafe. It'll be okay for PIPT and non-aliasing VIPT cache architectures, but for other cache architectures this will hide this problem and make review harder. -- RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] crypto: marvell/cesa - replace dma_to_phys with dma_map_single
On Thu, Mar 17, 2016 at 07:17:24PM -0400, ok...@codeaurora.org wrote: > What is the correct way? I don't want to write engine->sram_dma = sram Well, what the driver _is_ wanting to do is to go from a CPU physical address to a device DMA address. phys_to_dma() looks like the correct thing there to me, but I guess that's just an offset and doesn't take account of any IOMMU that may be in the way. If you have an IOMMU, then the whole phys_to_dma() thing is a total failure as it only does a linear translation, and there are no interfaces in the kernel to take account of an IOMMU in the way. So, it needs something designed for the job, implemented and discussed by the normal methods of proposing a new cross-arch interface for drivers to use. What I'm certain of, though, is that the change proposed in this patch will break current users of this driver: virt_to_page() on an address returned by ioremap() is completely undefined, and will result in either a kernel oops, or if not poking at memory which isn't a struct page, ultimately resulting in something that isn't SRAM being pointed to by "engine->sram_dma". -- RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] crypto: marvell/cesa - replace dma_to_phys with dma_map_single
On Thu, Mar 17, 2016 at 06:02:15PM -0400, Sinan Kaya wrote: > Getting ready to remove dma_to_phys API. Drivers should not be > using this API for DMA operations. Instead, they should go > through the dma_map or dma_alloc APIs. > > Signed-off-by: Sinan Kaya> --- > drivers/crypto/marvell/cesa.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/crypto/marvell/cesa.c b/drivers/crypto/marvell/cesa.c > index c0656e7..52ddfa4 100644 > --- a/drivers/crypto/marvell/cesa.c > +++ b/drivers/crypto/marvell/cesa.c > @@ -350,8 +350,8 @@ static int mv_cesa_get_sram(struct platform_device *pdev, > int idx) > if (IS_ERR(engine->sram)) > return PTR_ERR(engine->sram); > > - engine->sram_dma = phys_to_dma(cesa->dev, > -(phys_addr_t)res->start); > + engine->sram_dma = dma_map_single(cesa->dev, engine->sram, > + DMA_TO_DEVICE); Documentation/DMA-API.txt dma_addr_t dma_map_single(struct device *dev, void *cpu_addr, size_t size, enum dma_data_direction direction) Notes: Not all memory regions in a machine can be mapped by this API. Further, contiguous kernel virtual space may not be contiguous as physical memory. Since this API does not provide any scatter/gather capability, it will fail if the user tries to map a non-physically contiguous piece of memory. For this reason, memory to be mapped by this API should be obtained from sources which guarantee it to be physically contiguous (like kmalloc). Specifically, ioremapped memory will *not* work as you expect with dma_map_single(). -- RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 06/11] crypto: caam: ensure that we clean up after an error
On Wed, Dec 09, 2015 at 05:08:41PM +0200, Horia Geantă wrote: > On 12/7/2015 9:12 PM, Russell King wrote: > > Ensure that we clean up allocations and DMA mappings after encountering > > an error rather than just giving up and leaking memory and resources. > > > > Signed-off-by: Russell King> > I guess the error cleanup code should be grouped under an "err" label, > instead of duplicating it. It'd be useful if you could quote the code you were commenting on please. I now need to dig out the patch from my git tree to work out the relevance of your comment is. -- RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 07/11] crypto: caam: check and use dma_map_sg() return code
On Wed, Dec 09, 2015 at 05:20:45PM +0200, Horia Geantă wrote: > On 12/7/2015 9:12 PM, Russell King wrote: > > Strictly, dma_map_sg() may coalesce SG entries, but in practise on iMX > > hardware, this will never happen. However, dma_map_sg() can fail, and > > we completely fail to check its return value. So, fix this properly. > > > > Arrange the code to map the scatterlist early, so we know how many > > scatter table entries to allocate, and then fill them in. This allows > > us to keep relatively simple error cleanup paths. > > > > Signed-off-by: Russell King> > Some tcrypt tests fail - looks like those with zero plaintext: > caam_jr ffe301000.jr: unable to map source for DMA > alg: hash: digest failed on test 1 for sha1-caam: ret=12 > [...] > > Need to be careful, dma_map_sg() returning zero is an error only if ptxt > is not null (alternatively src_nents returned by sg_nents_for_len() > could be checked). > > > @@ -1091,7 +1102,7 @@ static int ahash_digest(struct ahash_request *req) > > u32 *sh_desc = ctx->sh_desc_digest, *desc; > > dma_addr_t ptr = ctx->sh_desc_digest_dma; > > int digestsize = crypto_ahash_digestsize(ahash); > > - int src_nents, sec4_sg_bytes; > > + int src_nents, mapped_nents, sec4_sg_bytes; > > dma_addr_t src_dma; > > struct ahash_edesc *edesc; > > int ret = 0; > > @@ -1099,9 +1110,14 @@ static int ahash_digest(struct ahash_request *req) > > int sh_len; > > > > src_nents = sg_nents_for_len(req->src, req->nbytes); > > - dma_map_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE); > > - if (src_nents > 1) > > - sec4_sg_bytes = src_nents * sizeof(struct sec4_sg_entry); > > + mapped_nents = dma_map_sg(jrdev, req->src, src_nents, DMA_TO_DEVICE); > > + if (mapped_nents == 0) { > > + dev_err(jrdev, "unable to map source for DMA\n"); > > + return -ENOMEM; > > + } > > This is at least one of the places where the error condition must change > to smth. like (src_nents != 0 && mapped_nents == 0). I guess we should avoid calling dma_map_sg() and dma_unmap_sg() when src_nents is zero. Thanks, I'll work that into the next patch revision. -- RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 00/13] Further iMX CAAM updates
On Wed, Dec 09, 2015 at 05:06:03PM +0200, Horia Geantă wrote: > On 12/7/2015 9:11 PM, Russell King - ARM Linux wrote: > > Here are further imx-caam updates that I've had since before the > > previous merge window. Please review and (I guess) if Freescale > > folk can provide acks etc that would be nice. Thanks. > > Thanks Russell. > > Note that the patch set does not apply cleanly, please rebase on latest > cryptodev-2.6 tree. I'm afraid I'm not a fan of digging out random git trees, rebasing patches on top of them, and then not being able to test them locally without dragging the entire tree into my own tree. I'd rather wait another kernel cycle instead of polluting my own tree with commits from other trees. -- RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 00/13] Further iMX CAAM updates
Here are further imx-caam updates that I've had since before the previous merge window. Please review and (I guess) if Freescale folk can provide acks etc that would be nice. Thanks. drivers/crypto/caam/caamhash.c | 415 +++-- drivers/crypto/caam/intern.h | 1 - drivers/crypto/caam/jr.c | 25 +-- 3 files changed, 246 insertions(+), 195 deletions(-) -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/18] crypto: marvell: add flag to determine algorithm endianness
On Mon, Oct 19, 2015 at 03:04:51PM +, Jason Cooper wrote: > Hey Russell, > > On Sun, Oct 18, 2015 at 05:23:40PM +0100, Russell King wrote: > > static int mv_cesa_ahash_init(struct ahash_request *req, > > - struct mv_cesa_op_ctx *tmpl) > > + struct mv_cesa_op_ctx *tmpl, bool algo_le) > > nit: Would it make more sense the do bool algo_endian, and then ... I don't like "bool"s that don't self-document what they mean. What does "if (algo_endian)" mean? It's pretty clear what "if (algo_le)" means in the context of endianness, though I'll give you that "if (algo_little_endian)" would be a lot better. > > > @@ -861,7 +862,7 @@ static int mv_cesa_md5_init(struct ahash_request *req) > > > > mv_cesa_set_op_cfg(, CESA_SA_DESC_CFG_MACM_MD5); > > > > - mv_cesa_ahash_init(req, ); > > + mv_cesa_ahash_init(req, , true); > > mv_cesa_ahash_init(req, , ALGO_ENDIAN_LITTLE); > > 'true' doesn't seem as obvious. But again, nit-picky. I did think about: enum { ALGO_LITTLE_ENDIAN, ALGO_BIG_ENDIAN, }; and passing an int algo_endian around, but that seemed to be like too much bloat... though if you want to insist, I could make that change. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/18] crypto: further fixes for Marvell CESA hash
Following on from the previous series, this series addresses further problems with the Marvell CESA hash driver found while testing it my openssl/ssh scenarios. The first patch improves one from the previous series: we can get the transform more directly using req->base.tfm rather than going round the houses. The next few patches rework the algorithm endianness conversions. There are two things which depend on the algorithm endianness - the format of the result, and the format of the bit length in the last block. We introduce a flag to convey this information, and keep the creq->state format in CPU endian mode for consistency. Some of the inconsistent hash results are down to the template operation not being properly initialised - so we zero initialise all template operations. The following patches (from "factor out first fragment decisions to helper") rework the digest handling to ensure that we always provide the hardware with a complete block of data to hash, otherwise it can be left mid-calculation, which then causes state to leak to subsequent operations. This requires a re-structure of the way we put together the DMA entries, so it's done in relatively small steps. This results in the CESA driver passing all tests I can throw at it via the AF_ALG openssl plugin with the exception of asking for the hash of /dev/null. This returns an all zero result, rather than the correct hash value. This bug is pending further diagnosis, but it is believed not to be a driver specific bug as iMX6 CAAM also behaves the same. Unfortunately, this is a large series, but the driver unfortunately needs this level of bug fixing to work properly. drivers/crypto/marvell/cesa.h | 11 +- drivers/crypto/marvell/hash.c | 307 +- 2 files changed, 164 insertions(+), 154 deletions(-) -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/6] Fix CAAM hash driver
The following series fixes the CAAM hash driver, allowing it to work with the previously merged "crypto: ahash - ensure statesize is non- zero" patch. This is non-trivial, because CAAM exports a huge 1600 bytes of data, which, if we set .statesize to this, still results in the core code rejecting the driver. So, we need to shrink the amount of state exported. The first, most obvious one to get rid of is the export of the caam_hash_ctx structure, which is shared between the socket being exported from and imported to - copying it away and back again was a complete no-op. The second is that we don't need to export both pending-bytes buffers. Only one will be in use at any time. A problem was encountered while testing, where the size of the pending bytes buffer was not added to the scatterlist with the correct length. This is also fixed in this series, by patch 3. This bug was introduced by a prior commit trying to fix a tcrypt error. However, the change is wrong, but I have to question whether the test was correct or not - the backtrace contains a function "test_ahash_pnum" which doesn't seem to exist in mainline, nor does it seem to exist in any previous mainline kernel. Version 2 of this series addresses a mismerge in patch 5 of the original series, and adds further information to patch 3. Further testing with tcrypt showed up a problem identified by the DMA API debugging (different to the original one referred to in patch 3) where we leak DMA mappings. drivers/crypto/caam/caamhash.c | 101 + 1 file changed, 71 insertions(+), 30 deletions(-) -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] Sparse related fixes
Continuing on from the previous set of 18 patches, I also fixed a number of sparse problems and other cleanups. I don't deem these suitable for -rc merging, especially now that we're basically at -rc6. The first patch switches the driver over to appropriately using the relaxed IO accessors - this avoids calling out to the heavy barrier on every read and write operation, but only calling out on those which really matter. We switch to using dma_addr_t for DMA addresses which are not accessed by hardware, and using gfp_t for the get_free_page flags. String-based MMIO accesses are used instead of plain memcpy()/memset() which prevents us potentially stumbling over GCC optimisations that it thinks it may make with these functions. We convert as much of the hardware state to __le32 endian markings, and use cpu_to_le32() as appropriate. A number of places are left unfixed, as we temporarily store CPU native endian values at these locations; these warnings should not be fixed (basically, only appropriate sparse warnings should be fixed without penalising code.) drivers/crypto/marvell/cesa.h | 44 - drivers/crypto/marvell/cipher.c | 13 ++-- drivers/crypto/marvell/hash.c | 23 +++-- drivers/crypto/marvell/tdma.c | 42 --- 4 files changed, 62 insertions(+), 60 deletions(-) -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: caam - Add support for hashing export and import functions
On Sat, Oct 17, 2015 at 11:52:54AM +0100, Russell King - ARM Linux wrote: > Now, with that change, and with your change to buf_0/buf_1, I see > (before the import/export functions are used) several of these errors: > > caam_jr 2101000.jr0: 4501: DECO: desc idx 5: SGT Length Error. The > descriptor is trying to read more data than is contained in the SGT table. > > when checking using openssh. They don't occur when testing with openssl > performing normal hashes (as per my previously posted script) but only > with openssh. The hardware seems to be quite justified in complaining - I've added code to dump out the scatter list as well: jobdesc@879: e4bd7c18: b0861c08 3cd29080 f140 34bd7c38 ...<..@.8|.4 jobdesc@879: e4bd7c28: 0068 f840 3ccf3640 0028 h.@.@6.<(... sg@882: e4bd7c38: 3ccf3640 0028 sg@882: e4bd7c48: 34bd7300 0006 sg@882: e4bd7c58: 7528a070 4020 If I'm interpreting this correctly: f140 34bd7c38 0068 tells the CAAM to read 0x68 bytes using the scatterlist at 0x34bd7c38. The scatterlist here contains three entries, which have lengths 0x28, 0x6 and 0x20, which total up to 0x4e bytes, not 0x68. Now, looking at how the scatterlist is created, I have yet more questions. 1. What's this dma_map_sg_chained() and dma_unmap_sg_chained() stuff? What's wrong with just calling dma_map_sg() on the scatterlist, giving it the number of entries we want to map? dma_map_sg() is supposed to deal with chained scatterlists, if it doesn't, it's buggy. 2. src_map_to_sec4_sg() fails to check whether dma_map_sg_chained() failed. if it failed, we continue anyway, poking invalid addresses into the hardware scatterlist, thereby causing memory corruption. 3. __sg_count() - what's going on with this. This seems to assume that scatterlists aren't chained as: if (!sg_is_last(sg) && (sg + 1)->length == 0) sg + 1 may _not_ be sg_next(sg) - and this will walk off the end of the array. 4. Is the try_buf_map_to_sec4_sg() call correct? Shouldn't it be: @@ -838,7 +838,7 @@ static int ahash_update_ctx(struct ahash_request *req) state->buf_dma = try_buf_map_to_sec4_sg(jrdev, edesc->sec4_sg + 1, buf, state->buf_dma, - *next_buflen, *buflen); + *buflen, last_buflen); if (src_nents) { src_map_to_sec4_sg(jrdev, req->src, src_nents, Indeed, with this change, my ssh test works, and the DECO errors are gone. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: caam - Add support for hashing export and import functions
On Fri, Oct 16, 2015 at 04:19:33PM -0700, Victoria Milhoan wrote: > @@ -1569,6 +1601,10 @@ static int ahash_import(struct ahash_request *req, > const void *in) > struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash); > struct caam_hash_state *state = ahash_request_ctx(req); > > + /* Allocate new data buffers to use for this request */ > + state->buf_0 = kmalloc(CAAM_MAX_HASH_BLOCK_SIZE, GFP_KERNEL | GFP_DMA); > + state->buf_1 = kmalloc(CAAM_MAX_HASH_BLOCK_SIZE, GFP_KERNEL | GFP_DMA); > + I'm really not sure you can do this at all. What if the previous digest calculation prior to the accept() cloning the state was for a non-hash-block aligned size of data. The above will lose that state, and produce an incorrect hash result. Herbert, can you confirm please? -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state
On Fri, Oct 16, 2015 at 04:24:54PM -0700, Victoria Milhoan wrote: > On Thu, 15 Oct 2015 21:13:38 +0800 > Herbert Xu <herb...@gondor.apana.org.au> wrote: > > > On Thu, Oct 15, 2015 at 01:59:44PM +0100, Russell King - ARM Linux wrote: > > > > > > I think the CAAM driver is pretty unfixable from a trivial point of > > > view. This driver exports a huge amount of state - it contains both a > > > struct caam_hash_ctx and a struct caam_hash_state, which totals up to > > > 1600 bytes. This fails the: > > > > Right just dumping the state out as is not going to work. This > > is not supposed to be how export works anyway. But it doesn't > > look too bad as most of that 1600 is consumed by the hardware > > program descriptor which can easily be regenerated upon import. > > > > The only things that need to be exported AFAICS are key and buf_X. > > I just pushed out a patch for export/import functions in the CAAM driver. The > patch has been through testing with OpenSSL and the AF_ALG plugin on the MX6. Be careful with that. There's two ways to test: 1. Checking hash output. Preparation - copy openssl.cnf and add this to openssl.cnf: openssl_conf= openssl_def [openssl_def] engines = engine_section [engine_section] af_alg = af_alg_engine [af_alg_engine] CIPHERS=aes-128-cbc aes-192-cbc aes-256-cbc des-cbc des-ede3-cbc DIGESTS=md5 sha1 sha256 sha512 # Putting this last means we register the above as the default algorithms default_algorithms = ALL Then: #!/bin/sh for type in md5 sha1 sha256 sha512; do echo -n "Checking $type hash:" for file in /bin/*; do echo -n " $file" if ! OPENSSL_CONF=./openssl.cnf openssl dgst -$type < $file | sed "s,(stdin)= ,,;s,\$,\t$file," | ${type}sum -c > /dev/null; then echo " ... failed" echo -n "Openssl says: " >&2 OPENSSL_CONF=./openssl.cnf openssl dgst -$type < $file | sed "s,(stdin)= ,,;s,\$,\t$file," >&2 echo -n "${type}sum says: " >&2 ${type}sum $file >&2 exit 1 fi done echo " ... ok" done echo "All hashes passed" The above will verify that the hashes are producing the correct answers for a range of files. This does _not_ test the export/import paths. 2. Backup the existing openssl.cnf and replace it with the above modified version. Then try to ssh into the platform. This will verify the export/import side of things. If ssh fails to connect to the target, you know that the crypto drivers for SHA1 are broken, probably due to export/import. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: caam - Add support for hashing export and import functions
On Fri, Oct 16, 2015 at 04:19:33PM -0700, Victoria Milhoan wrote: > @@ -1569,6 +1601,10 @@ static int ahash_import(struct ahash_request *req, > const void *in) > struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash); > struct caam_hash_state *state = ahash_request_ctx(req); > > + /* Allocate new data buffers to use for this request */ > + state->buf_0 = kmalloc(CAAM_MAX_HASH_BLOCK_SIZE, GFP_KERNEL | GFP_DMA); > + state->buf_1 = kmalloc(CAAM_MAX_HASH_BLOCK_SIZE, GFP_KERNEL | GFP_DMA); > + > memcpy(ctx, in, sizeof(struct caam_hash_ctx)); > memcpy(state, in + sizeof(struct caam_hash_ctx), > sizeof(struct caam_hash_state)); This is buggy. You loose the reference to the two buffers you've just allocated because this memcpy() overwrites it. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: caam - Add support for hashing export and import functions
On Fri, Oct 16, 2015 at 04:19:33PM -0700, Victoria Milhoan wrote: > @@ -319,7 +319,7 @@ static int ahash_set_sh_desc(struct crypto_ahash *ahash) > have_key = OP_ALG_AAI_HMAC_PRECOMP; > > /* ahash_update shared descriptor */ > - desc = ctx->sh_desc_update; > + desc = kmalloc(DESC_HASH_MAX_USED_BYTES, GFP_KERNEL | GFP_DMA); What if kmalloc() fails? Should this really oops the kernel? Ditto for every other kmalloc you've added below. > @@ -1557,6 +1575,20 @@ static int ahash_export(struct ahash_request *req, > void *out) > struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash); > struct caam_hash_state *state = ahash_request_ctx(req); > > + /* > + * Do not export the data buffers. New buffers are > + * allocated during import. > + */ > + kfree(state->buf_0); > + kfree(state->buf_1); > + state->buf_0 = NULL; > + state->buf_1 = NULL; > + > + state->current_buf = 0; > + state->buf_dma = 0; > + state->buflen_0 = 0; > + state->buflen_1 = 0; > + So what if I export, and then continue using _this_ context later? > @@ -1605,6 +1641,8 @@ static struct caam_hash_template driver_hash[] = { > .setkey = ahash_setkey, > .halg = { > .digestsize = SHA1_DIGEST_SIZE, > + .statesize = sizeof(struct caam_hash_ctx) + > + sizeof(struct caam_hash_state), Much prefer to have a 'struct caam_hash_export_state' thing rather than litter the code with the knowledge that these two go together. > }, > }, > .alg_type = OP_ALG_ALGSEL_SHA1, > @@ -1626,6 +1664,8 @@ static struct caam_hash_template driver_hash[] = { > .setkey = ahash_setkey, > .halg = { > .digestsize = SHA224_DIGEST_SIZE, > + .statesize = sizeof(struct caam_hash_ctx) + > + sizeof(struct caam_hash_state), > }, > }, > .alg_type = OP_ALG_ALGSEL_SHA224, > @@ -1647,6 +1687,8 @@ static struct caam_hash_template driver_hash[] = { > .setkey = ahash_setkey, > .halg = { > .digestsize = SHA256_DIGEST_SIZE, > + .statesize = sizeof(struct caam_hash_ctx) + > + sizeof(struct caam_hash_state), > }, > }, > .alg_type = OP_ALG_ALGSEL_SHA256, > @@ -1668,6 +1710,8 @@ static struct caam_hash_template driver_hash[] = { > .setkey = ahash_setkey, > .halg = { > .digestsize = SHA384_DIGEST_SIZE, > + .statesize = sizeof(struct caam_hash_ctx) + > + sizeof(struct caam_hash_state), > }, > }, > .alg_type = OP_ALG_ALGSEL_SHA384, > @@ -1689,6 +1733,8 @@ static struct caam_hash_template driver_hash[] = { > .setkey = ahash_setkey, > .halg = { > .digestsize = SHA512_DIGEST_SIZE, > + .statesize = sizeof(struct caam_hash_ctx) + > + sizeof(struct caam_hash_state), > }, > }, > .alg_type = OP_ALG_ALGSEL_SHA512, > @@ -1710,6 +1756,8 @@ static struct caam_hash_template driver_hash[] = { > .setkey = ahash_setkey, > .halg = { > .digestsize = MD5_DIGEST_SIZE, > + .statesize = sizeof(struct caam_hash_ctx) + > + sizeof(struct caam_hash_state), > }, > }, > .alg_type = OP_ALG_ALGSEL_MD5, > @@ -1796,6 +1844,12 @@ static void caam_hash_cra_exit(struct crypto_tfm *tfm) > dma_unmap_single(ctx->jrdev, ctx->sh_desc_finup_dma, >desc_bytes(ctx->sh_desc_finup), DMA_TO_DEVICE); > > + kfree(ctx->sh_desc_update); > + kfree(ctx->sh_desc_update_first); > + kfree(ctx->sh_desc_fin); > + kfree(ctx->sh_desc_digest); > + kfree(ctx->sh_desc_finup); > + What happens to these when ahash_import() overwrites all the context state? Doesn't this mean we end up with double-kfree()s ? Also, did you test this DMA debug enabled? -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org
Re: [PATCH] crypto: caam - Add support for hashing export and import functions
On Fri, Oct 16, 2015 at 04:19:33PM -0700, Victoria Milhoan wrote: > @@ -1557,6 +1575,20 @@ static int ahash_export(struct ahash_request *req, > void *out) > struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash); > struct caam_hash_state *state = ahash_request_ctx(req); > > + /* > + * Do not export the data buffers. New buffers are > + * allocated during import. > + */ > + kfree(state->buf_0); > + kfree(state->buf_1); > + state->buf_0 = NULL; > + state->buf_1 = NULL; > + > + state->current_buf = 0; > + state->buf_dma = 0; > + state->buflen_0 = 0; > + state->buflen_1 = 0; > + > memcpy(out, ctx, sizeof(struct caam_hash_ctx)); > memcpy(out + sizeof(struct caam_hash_ctx), state, > sizeof(struct caam_hash_state)); > @@ -1569,6 +1601,10 @@ static int ahash_import(struct ahash_request *req, > const void *in) > struct caam_hash_ctx *ctx = crypto_ahash_ctx(ahash); > struct caam_hash_state *state = ahash_request_ctx(req); > > + /* Allocate new data buffers to use for this request */ > + state->buf_0 = kmalloc(CAAM_MAX_HASH_BLOCK_SIZE, GFP_KERNEL | GFP_DMA); > + state->buf_1 = kmalloc(CAAM_MAX_HASH_BLOCK_SIZE, GFP_KERNEL | GFP_DMA); > + > memcpy(ctx, in, sizeof(struct caam_hash_ctx)); > memcpy(state, in + sizeof(struct caam_hash_ctx), > sizeof(struct caam_hash_state)); Why are we even saving and restoring the caam_hash_ctx here? That comes from: crypto_tfm_ctx(crypto_ahash_tfm(tfm)), which, tracing that through the inline functions comes out as: tfm->base.__crt_ctx tfm comes from: __crypto_ahash_cast(req->base.tfm), where req->base.tfm is a pointer to struct crypto_ahash's base member. When a hash is accept()ed, req->base.tfm is copied from the original socket handle to the new socket handle (see ahash_request_set_tfm(), called from hash_accept_parent(), which is in turn called from af_alg_accept() in hash_accept()). So both the exporting and importing hashes end up with the same struct caam_hash_ctx. We can see this if we add debug print: [156147.994219] ahash_export(ctx=ed087080,state=e2a6e600) [156147.994257] ahash_import(ctx=ed087080,state=e2a6d600) [156147.998256] ahash_export(ctx=ed083080,state=e2a6c600) [156147.998294] ahash_import(ctx=ed083080,state=edb9ee00) [156147.998659] ahash_export(ctx=ed087080,state=e2a6e600) [156147.998700] ahash_import(ctx=ed087080,state=e2a6fa00) [156147.999002] ahash_export(ctx=ed080880,state=e2a6d200) [156147.999040] ahash_import(ctx=ed080880,state=e2a6d600) Notice that the state changes between each export and import, but the context doesn't. So, I think we can (and should) completely drop the saving and restoring of context in these functions. Now, with that change, and with your change to buf_0/buf_1, I see (before the import/export functions are used) several of these errors: caam_jr 2101000.jr0: 4501: DECO: desc idx 5: SGT Length Error. The descriptor is trying to read more data than is contained in the SGT table. when checking using openssh. They don't occur when testing with openssl performing normal hashes (as per my previously posted script) but only with openssh. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: caam - Add support for hashing export and import functions
On Sat, Oct 17, 2015 at 06:57:44AM -0700, Victoria Milhoan wrote: > Correct - this was apparently the wrong patch I pushed out. The one I'm > actively using has this fixed (this is the only difference). I will make > this change in v2 after reviewing your other comments. Thanks Victoria, but please perform the tests I've suggested in one of my previous emails - merely testing with openssl's built-in tests aren't sufficient. Also, openssl is non-obvious with what it's testing, especially if you're using "openssl speed". I'm afraid it's a case of "you don't know what's being tested unless you've read the openssl source" :( However, while trying to debug the DECO stuff, I've stumbled across this: static inline void append_ptr(u32 *desc, dma_addr_t ptr) { dma_addr_t *offset = (dma_addr_t *)desc_end(desc); *offset = ptr; (*desc) += CAAM_PTR_SZ / CAAM_CMD_SZ; } where I think 'desc' points to memory used for the descriptor, which is read by hardware, correct? If that's true, why are we using dma_addr_t here? dma_addr_t can be either 32-bit or 64-bit depending on the kernel configuration, and I suspect if this driver is built with dma_addr_t set to 64-bit, things will go awry. Also, what endianness are the hardware descriptors? Are they always native CPU endian, or are they fixed as little endian? If they're always little endian, is there any reason not to use __le32 and cpu_to_le32()/ le32_to_cpu() as appropriate to ensure that it works on big endian systems? -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] Fix CAAM hash driver
The following series fixes the CAAM hash driver, allowing it to work with the previously merged "crypto: ahash - ensure statesize is non- zero" patch. This is non-trivial, because CAAM exports a huge 1600 bytes of data, which, if we set .statesize to this, still results in the core code rejecting the driver. So, we need to shrink the amount of state exported. The first, most obvious one to get rid of is the export of the caam_hash_ctx structure, which is shared between the socket being exported from and imported to - copying it away and back again was a complete no-op. The second is that we don't need to export both pending-bytes buffers. Only one will be in use at any time. A problem was encountered while testing, where the size of the pending bytes buffer was not added to the scatterlist with the correct length. This is also fixed in this series, by patch 3. drivers/crypto/caam/caamhash.c | 79 +++--- 1 file changed, 59 insertions(+), 20 deletions(-) -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Fix CAAM hash driver
On Sat, Oct 17, 2015 at 07:50:55PM +0100, Russell King - ARM Linux wrote: > The following series fixes the CAAM hash driver, allowing it to work > with the previously merged "crypto: ahash - ensure statesize is non- > zero" patch. > > This is non-trivial, because CAAM exports a huge 1600 bytes of data, > which, if we set .statesize to this, still results in the core code > rejecting the driver. So, we need to shrink the amount of state > exported. > > The first, most obvious one to get rid of is the export of the > caam_hash_ctx structure, which is shared between the socket being > exported from and imported to - copying it away and back again was > a complete no-op. > > The second is that we don't need to export both pending-bytes buffers. > Only one will be in use at any time. > > A problem was encountered while testing, where the size of the pending > bytes buffer was not added to the scatterlist with the correct length. > This is also fixed in this series, by patch 3. > > drivers/crypto/caam/caamhash.c | 79 > +++--- > 1 file changed, 59 insertions(+), 20 deletions(-) This series has been tested using openssl to validate the hashes against md5sum, sha1sum, sha224sum, sha256sum and sha512sum programs of every program in my /bin directory, which checks correct functioning of the hashing. It has also been tested with openssh, where the crypto hardware is used via openssl by sshd, which makes use of accept() and hence tests the export/import paths. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state
On Thu, Oct 15, 2015 at 05:41:47PM +0800, Herbert Xu wrote: > On Thu, Oct 15, 2015 at 10:39:30AM +0100, Russell King - ARM Linux wrote: > > > > The CAAM driver is similarly buggy - it has export/import functions in > > its ahash drivers, but zero statesize. > > > > User exploitable kernel stack smashing... I'd suggest putting this patch > > into stable kernels as high priority, as I'm pretty sure this could be > > I agree. It should already be on its way to stable as Linus has > pulled it into his tree and it carries a stable cc. Thanks. I think the CAAM driver is pretty unfixable from a trivial point of view. This driver exports a huge amount of state - it contains both a struct caam_hash_ctx and a struct caam_hash_state, which totals up to 1600 bytes. This fails the: alg->halg.statesize > PAGE_SIZE / 8 in ahash_prepare_alg() if we set .statesize. For ARM, this places a limit of 512 bytes on the state size. The CAAM authors need to come up with a better solution (and quickly, as caamhash is going to fail in all kernels soon), or we need to support larger exported states. BTW, I can't find a MAINTAINERS entry for CAAM, so I've just grabbed a couple of addresses from recent git history in the hope they'll know who's responsible. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state
On Tue, Oct 13, 2015 at 10:33:12PM +0800, Herbert Xu wrote: > On Fri, Oct 09, 2015 at 08:43:33PM +0100, Russell King wrote: > > If the algorithm passed a zero statesize, do not pass a valid pointer > > into the export/import functions. Passing a valid pointer covers up > > bugs in driver code which then go on to smash the kernel stack. > > Instead, pass NULL, which will cause any attempt to write to the > > pointer to fail. > > > > Signed-off-by: Russell King> > Patch applied without the shash hunk. I also replaced your commit > message as it no longer makes any sense: > > crypto: ahash - ensure statesize is non-zero > > Unlike shash algorithms, ahash drivers must implement export > and import as their descriptors may contain hardware state and > cannot be exported as is. Unfortunately some ahash drivers did > not provide them and end up causing crashes with algif_hash. > > This patch adds a check to prevent these drivers from registering > ahash algorithms until they are fixed. > > Thanks, There will be fallout from this. The CAAM driver is similarly buggy - it has export/import functions in its ahash drivers, but zero statesize. User exploitable kernel stack smashing... I'd suggest putting this patch into stable kernels as high priority, as I'm pretty sure this could be used to gain privileges via carefully crafted md5 hashes. I've not proven it, but given that the md5 hash and state data get copied over the kernel stack, it's highly likely that it _is_ exploitable from any user that can create an AF_ALG socket. Yes, it means regressions in the form of various hw crypto no longer being loadable, but I think that's preferable to the security hole here. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions
On Tue, Oct 13, 2015 at 09:00:48PM +0800, Herbert Xu wrote: > On Sun, Oct 11, 2015 at 02:55:05PM +0800, Herbert Xu wrote: > > On Sat, Oct 10, 2015 at 05:17:54PM +0100, Russell King - ARM Linux wrote: > > > > > > I _also_ notice that despite having the ARM assembly crypto functions > > > enabled and built-in to the kernel, they don't appear in /proc/crypto - > > > and this is because of patch 1 in this series, which blocks out any > > > crypto driver which has a zero statesize (as the ARM crypto functions > > > appear to have.) I found this by rebuilding the ARM crypto stuff as > > > modules, and then trying to insert them: > > > > They're buggy and unfortunately this wasn't caught during the > > review process. The import/export functions are required and > > certainly not optional. > > OK it looks like I was confused. Yes the import and export functions > are required but for shash algorithms we provide a simple default. > This means that implementations can simply leave it NULL and they > will then use the default import/export functions which exports > the shash_desc as the state. > > So Russell what I'll do is apply your patch without the hunk > for shash_prepare_alg. IOW we will block any ahash algorithms > that leave state as zero since ahash does not provide a default > import/export function (it can't because it may involve hardware > state). shash algorithms on the other hand won't be affected. Okay. I've a larger queue of patches building at the moment for the Marvell driver - I've found that it's touch-and-go whether it produces the correct hash result, and previous hashes can affect subsequent hashes in some circumstances. I'm currently at another 7 patches, plus a "big" as-yet uncommitted patch which is virtually a rewrite of the DMA preparation - diffstat wise, in total it's looking like: drivers/crypto/marvell/cesa.h | 8 +- drivers/crypto/marvell/hash.c | 258 ++ drivers/crypto/marvell/tdma.c | 7 ++ 3 files changed, 151 insertions(+), 122 deletions(-) and there's still a few more cases that need solving: zero bytes of input should not give a zero hash, and an input which is the multiple of the blocksize causes the hardware to hang - though I'm not yet sure whether that's a result of my above changes. At least with my above changes, I now get stable and correct hash values for sizes which do not hit any of those remaining bugs. Given it's size so far, I'm not sure it makes much sense to backport any of these fixes to stable; I think maybe we should just say "okay, before these fixes, it's just broken." -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions
On Tue, Oct 13, 2015 at 09:57:22PM +0800, Herbert Xu wrote: > On Tue, Oct 13, 2015 at 02:55:18PM +0100, Russell King - ARM Linux wrote: > > > > Given it's size so far, I'm not sure it makes much sense to backport > > any of these fixes to stable; I think maybe we should just say "okay, > > before these fixes, it's just broken." > > OK as long as your first patch goes into stable it should be fine > as it'll essentially disable the marvell driver. It will disable the Marvell driver, along with the SW implementations in arch/arm/crypto/ which don't set .statesize, .import and .export. I would not be surprised if it also affects others too. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
hash import/export
Herbert, I wonder if you can clear something up about the hash export/import functionality. In: http://lists.infradead.org/pipermail/linux-arm-kernel/2015-May/344120.html you seem to imply that the exported and imported state can't be defined by the driver. Boris tells me, "AFAIR, crypto users are expecting md5/sha1/sha256_state depending on the crypto req state they are exporting". >From what I can see, there is only one place in the core crypto code where hash state is exported and imported, and that's in hash_accept(), and that's always done with the same driver. The only other place is in the marvell cesa driver itself when initialising the hmac state. Is there any reason a driver can't define its own structure to be exported here which can be shared between each of the different methods it supports? Thanks. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hash import/export
On Sun, Oct 11, 2015 at 08:34:27PM +0100, Russell King - ARM Linux wrote: > Herbert, > > I wonder if you can clear something up about the hash export/import > functionality. In: > > http://lists.infradead.org/pipermail/linux-arm-kernel/2015-May/344120.html > > you seem to imply that the exported and imported state can't be defined > by the driver. > > Boris tells me, "AFAIR, crypto users are expecting md5/sha1/sha256_state > depending on the crypto req state they are exporting". > > From what I can see, there is only one place in the core crypto code > where hash state is exported and imported, and that's in hash_accept(), > and that's always done with the same driver. The only other place is > in the marvell cesa driver itself when initialising the hmac state. > > Is there any reason a driver can't define its own structure to be > exported here which can be shared between each of the different methods > it supports? A further question: it looks like struct md5_state's hash member is always supposed to be in little endian format, whereas struct sha*_state's state is always big endian format. Is there a reason why these are typed 'u32', rather than __le32 and __be32? This would make the intention here more obvious, and should also allow the opportunity for sparse to verify that we're getting the endianness correct everywhere. Thanks. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/5] crypto: ensure algif_hash does not pass a zero-sized state
On Sat, Oct 10, 2015 at 06:46:07PM +0200, Boris Brezillon wrote: > On Fri, 09 Oct 2015 20:43:33 +0100 > Russell Kingwrote: > > > If the algorithm passed a zero statesize, do not pass a valid pointer > > into the export/import functions. Passing a valid pointer covers up > > bugs in driver code which then go on to smash the kernel stack. > > Instead, pass NULL, which will cause any attempt to write to the > > pointer to fail. > > > > Signed-off-by: Russell King > > --- > > crypto/ahash.c | 3 ++- > > crypto/shash.c | 3 ++- > > 2 files changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/crypto/ahash.c b/crypto/ahash.c > > index 8acb886032ae..9c1dc8d6106a 100644 > > --- a/crypto/ahash.c > > +++ b/crypto/ahash.c > > @@ -544,7 +544,8 @@ static int ahash_prepare_alg(struct ahash_alg *alg) > > struct crypto_alg *base = >halg.base; > > > > if (alg->halg.digestsize > PAGE_SIZE / 8 || > > - alg->halg.statesize > PAGE_SIZE / 8) > > + alg->halg.statesize > PAGE_SIZE / 8 || > > + alg->halg.statesize == 0) > > Just read Russel's answer to the cover letter, and I wonder if the > following test wouldn't fix the problem: > > (alg->halg.statesize == 0 && (alg->import || alg->export)) > > I mean, the only valid case where statesize can be zero is when you > don't have any state associated to the crypto algorithm, and if that's > the case, ->import() and ->export() functions are useless, isn't ? However, that brings up a question. If you're using AF_ALG, and you attach to (say) the ARM Neon SHA512 implementation through it, and then use accept() to duplicate it's state, what prevents the kernel from oopsing when hash_accept() calls crypto_ahash_export(), which then dereferences the NULL alg->export function pointer? -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions
On Sat, Oct 10, 2015 at 12:29:25PM +0100, Russell King - ARM Linux wrote: > On Sat, Oct 10, 2015 at 12:31:29PM +0200, Arnaud Ebalard wrote: > > Russell King - ARM Linux <li...@arm.linux.org.uk> writes: > > > Software: > > > The 'numbers' are in 1000s of bytes per second processed. > > > type 16 bytes 64 bytes256 bytes 1024 bytes 8192 > > > bytes > > > md5 13948.89k42477.61k 104619.41k 165140.82k > > > 199273.13k > > > sha1 13091.91k36463.89k75393.88k 103893.33k > > > 117104.50k > > > sha256 13573.92k30492.25k52700.33k64247.81k > > > 68722.69k > > > > > > Hardware: > > > The 'numbers' are in 1000s of bytes per second processed. > > > type 16 bytes 64 bytes256 bytes 1024 bytes 8192 > > > bytes > > > md5 3964.55k13782.11k43181.71k 180263.38k > > > 1446616.18k > > > sha1 4609.16k 8922.35k35422.87k 333575.31k > > > 2122547.20k > > > sha256 13519.62k30484.10k52547.47k64285.21k > > > 68530.60k > > Okay, the reason for the difference in SHA256 speed is because the > "openssl speed" code *totally* *bypasses* the engine support, whereas > the md5 and sha1 do not. It even bypasses the normal method used to > get hold of the sha256 implementation (EVP_sha256), and goes straight > to using SHA256() directly in openssl/crypto/sha/sha256.c. It looks > like the same goes for the AES tests too. Okay, what's required is openssl speed -evp <cipher|digest> and it can only do one at a time. That's really confusing and non-intuitive, given that some of the non-evp options end up testing the EVP methods. So, running: $ openssl speed -evp aes-128-cbc I get the kernel spitting out a number of these warnings during its run: DRBG: could not allocate digest TFM handle: hmac(sha512) I also notice is that 50% of CPU time is spent in the kernel, presumably because the lack of hardware sha512 means that we end up having to do sha512 in software in the kernel. I _also_ notice that despite having the ARM assembly crypto functions enabled and built-in to the kernel, they don't appear in /proc/crypto - and this is because of patch 1 in this series, which blocks out any crypto driver which has a zero statesize (as the ARM crypto functions appear to have.) I found this by rebuilding the ARM crypto stuff as modules, and then trying to insert them: # modprobe sha512-arm modprobe: ERROR: could not insert 'sha512_arm': Invalid argument So, I think it's best if this patch series is *NOT* merged until someone who knows the kernel crypto code gets to grips with what's supposed to be done in various parts of the code. Yes, I know that Herbert suggested the approach in patch 1, but that _will_ cause regressions like the above when if it's merged. Either the ARM crypto code in arch/arm/crypto is buggy, or the approach in patch 1 is buggy. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions
On Sat, Oct 10, 2015 at 12:31:29PM +0200, Arnaud Ebalard wrote: > Hi Russel, ^ > Russell King - ARM Linux <li...@arm.linux.org.uk> writes: > > Software: > > The 'numbers' are in 1000s of bytes per second processed. > > type 16 bytes 64 bytes256 bytes 1024 bytes 8192 > > bytes > > md5 13948.89k42477.61k 104619.41k 165140.82k > > 199273.13k > > sha1 13091.91k36463.89k75393.88k 103893.33k > > 117104.50k > > sha256 13573.92k30492.25k52700.33k64247.81k > > 68722.69k > > > > Hardware: > > The 'numbers' are in 1000s of bytes per second processed. > > type 16 bytes 64 bytes256 bytes 1024 bytes 8192 > > bytes > > md5 3964.55k13782.11k43181.71k 180263.38k > > 1446616.18k > > sha1 4609.16k 8922.35k35422.87k 333575.31k > > 2122547.20k > > sha256 13519.62k30484.10k52547.47k64285.21k > > 68530.60k Okay, the reason for the difference in SHA256 speed is because the "openssl speed" code *totally* *bypasses* the engine support, whereas the md5 and sha1 do not. It even bypasses the normal method used to get hold of the sha256 implementation (EVP_sha256), and goes straight to using SHA256() directly in openssl/crypto/sha/sha256.c. It looks like the same goes for the AES tests too. > I had a lot of performance results at various levels (tcrypt module on > variations of the drivers (tasklet, threaded irq, full polling, etc), > IPsec tunnel and transport mode through to see how it behaves w/ two > mvneta instances also eating CPU cycles for incoming/outgoing packets) > but those where done on an encryption use case. Some are provided > in [2]. In an early (read dirty) polling-based version of the driver, > the CESA on an Armada 370 (mirabox) was verified to be capable of near > 100MB/s on buffers of 1500+ bytes for AES CBC encryption. Current > version of the driver is not as good (say half that value) but it > behaves better. A Mirabox can easily route 1500 bytes packets at 100MB/s > between its two interfaces but when you mix both using IPsec in tunnel > mode on one side, you end up w/ perfs between 10 to 15MB/s, IIRC. I > think it's interesting to see where it ends up w/ the engine exposed to > userland consumers (e.g. sth like SSH). > > I cannot promise a huge amount of time but I'll try and find some to > play w/ AF_ALG using openssl and CESA in the coming weeks. I think what we draw from my investigation is that "openssl speed" is utterly crap - you don't actually know what's being tested there. Some things test the engine, others bypass the engine infrastructure totally and test the openssl software implementation instead. So, if you think "openssl speed" is a good way to measure the speed of digests and ciphers that openssl supplies to applications, *think again*. It doesn't. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3b 5/5] crypto: marvell: factor out common import/export functions
On Sat, Oct 10, 2015 at 12:37:33AM +0200, Arnaud Ebalard wrote: > Hi Russel, > > Russell Kingwrites: > > > As all the import functions and export functions are virtually > > identical, factor out their common parts into a generic > > mv_cesa_ahash_import() and mv_cesa_ahash_export() respectively. This > > performs the actual import or export, and we pass the data pointers and > > length into these functions. > > > > We have to switch a % const operation to do_div() in the common import > > function to avoid provoking gcc to use the expensive 64-bit by 64-bit > > modulus operation. > > > > Signed-off-by: Russell King > > Thanks for the refactoring and for the fixes. All patches look good to > me. Out of curiosity, can I ask what perf you get w/ openssh or openssl > using AF_ALG and the CESA? I would do, but it seems this AF_ALG plugin for openssl isn't actually using it for encryption. When I try: openssl speed -engine af_alg aes-128-cbc I get results for using openssl's software implementation. If I do: openssl speed -engine af_alg md5 then I get results from using the kernel's MD5. Hence, I think the only thing that I think openssh is using it for is the digest stuff, not the crypto itself. I can't be certain about that though. I've tried debugging the af_alg engine plugin, but I'm not getting very far (I'm not an openssl hacker!) I see it registering the function to get the ciphers (via ENGINE_set_ciphers), and I see this called several times, returning a list of NID_xxx values describing the methods it supports, which includes aes-128-cbc. However, unlike the equivalent digest function, I never see it called requesting any of the ciphers. Maybe it's an openssl bug, or a "feature" preventing hardware crypto? Maybe something is missing from its initialisation? I've no idea yet. It seems I'm not alone in this - this report from April 2015 is exactly what I'm seeing: https://mta.openssl.org/pipermail/openssl-users/2015-April/001124.html However, I'm coming to the conclusion that AF_ALG with openssl is a dead project, and the only interface that everyone is using for that is cryptodev - probably contary to Herbert and/or DaveM's wishes. For example, the openwrt guys seem to only support cryptodev, according to their wiki page on the subject of hardware crypto: http://wiki.openwrt.org/doc/hardware/cryptographic.hardware.accelerators Here's the references to code for AF_ALG with openssl I've found so far: Original af_alg plugin (dead): http://src.carnivore.it/users/common/af_alg/ 3rd party "maintained" af_alg openssl plugin, derived from commit 1851bbb010c38878c83729be844f168192059189 in the above repo but with no history: https://github.com/RidgeRun/af-alg-rr and that doesn't contain any changes to the C code originally committed. Whether this C code contains changes or not is anyone's guess: there's no way to refer back to the original repository. Anyway, here's the digest results: Software: The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes md5 13948.89k42477.61k 104619.41k 165140.82k 199273.13k sha1 13091.91k36463.89k75393.88k 103893.33k 117104.50k sha256 13573.92k30492.25k52700.33k64247.81k68722.69k Hardware: The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes md5 3964.55k13782.11k43181.71k 180263.38k 1446616.18k sha1 4609.16k 8922.35k35422.87k 333575.31k 2122547.20k sha256 13519.62k30484.10k52547.47k64285.21k68530.60k There's actually something suspicious while running these tests: Doing md5 for 3s on 16 size blocks: 32212 md5's in 0.13s Doing md5 for 3s on 64 size blocks: 23688 md5's in 0.11s Doing md5 for 3s on 256 size blocks: 23615 md5's in 0.14s Doing md5 for 3s on 1024 size blocks: 22885 md5's in 0.13s Doing md5 for 3s on 8192 size blocks: 15893 md5's in 0.09s Doing sha1 for 3s on 16 size blocks: 31688 sha1's in 0.11s Doing sha1 for 3s on 64 size blocks: 23700 sha1's in 0.17s Doing sha1 for 3s on 256 size blocks: 23523 sha1's in 0.17s Doing sha1 for 3s on 1024 size blocks: 22803 sha1's in 0.07s Doing sha1 for 3s on 8192 size blocks: 15546 sha1's in 0.06s Doing sha256 for 3s on 16 size blocks: 2518030 sha256's in 2.98s Doing sha256 for 3s on 64 size blocks: 1419416 sha256's in 2.98s Doing sha256 for 3s on 256 size blocks: 613738 sha256's in 2.99s Doing sha256 for 3s on 1024 size blocks: 187080 sha256's in 2.98s Doing sha256 for 3s on 8192 size blocks: 25013 sha256's in 2.99s from the hardware - note the "in" figures are rediculously low, yet they do wait 3s for each test. Also, the sha256 results are close enough to being the software version. No ideas on any of this yet...
[PATCH 0/3] crypto: fixes for Marvell hash
This small series of patches addresses oopses seen when trying to use the AF_ALG interface via openssl with openssh. This series does not address all problems, but merely stops the kernel from smashing its kernel stack and oopsing. With these fixes in place, the kernel no longer oopses. However, with the digests enabled in openssl, openssh refuses to work, producing the following when attempting to connect to the target system: Corrupted MAC on input. Disconnecting: Packet corrupt It's been hard enough to get this far; the crypto code is not the easiest code to debug for a new-comer due to the amount of state needed to be retained to understand the code (all the inline functions masking multiple levels of containerisation and pointer dereference does not make it easy to track what is stored where, and once I've been through one bit of code, I find I'm having to revisit the same piece of code a bit later to re-understand what it's doing.) It's been difficult enough to find the engine plugin for openssl - the original git repo which hosted it is now dead (http://src.carnivore.it/users/common/af_alg/). All that seems to be left is someone's modified version on github, which seems to get some maintanence. Debian doesn't seem to carry AF_ALG openssl support, and seems to only carry one package (strongswan) which supports this interface. Hence, I'm leaving further debugging to other parties, especially as the userspace tooling for the AF_ALG seems rather lacking. (Are there any test programs, if so, can their location be documented and placed in Documentation/crypto please?) I'm not sure who the maintainer for drivers/crypto/marvell is, so I've picked Thomas. It would be nice if there was an entry in MAINTAINERS for this driver. The first patch in this series avoids kernel stack smashing if a crypto driver forgets to set the 'statesize' member, but writes to what seems to be a valid pointer passed to its export function. Of course, this won't completely stop stack smashing if the statesize member is smaller than the data which the export function writes. This patch is optional. The second patch adds the necessary statesize members to the Marvell code which were previously missing. Fixing this uncovered a further problem, which the third patch addresses. crypto/algif_hash.c | 6 +- drivers/crypto/marvell/hash.c | 9 + 2 files changed, 14 insertions(+), 1 deletion(-) -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/3] crypto: fixes for Marvell hash
Version 2, same as version 1 but with a different patch 1, thanks to Herbert for an alternative approach on that one. crypto/ahash.c| 3 ++- crypto/shash.c| 3 ++- drivers/crypto/marvell/hash.c | 9 + 3 files changed, 13 insertions(+), 2 deletions(-) On Fri, Oct 09, 2015 at 11:29:04AM +0100, Russell King - ARM Linux wrote: > This small series of patches addresses oopses seen when trying to use > the AF_ALG interface via openssl with openssh. This series does not > address all problems, but merely stops the kernel from smashing its > kernel stack and oopsing. > > With these fixes in place, the kernel no longer oopses. However, with > the digests enabled in openssl, openssh refuses to work, producing the > following when attempting to connect to the target system: > > Corrupted MAC on input. > Disconnecting: Packet corrupt > > It's been hard enough to get this far; the crypto code is not the easiest > code to debug for a new-comer due to the amount of state needed to be > retained to understand the code (all the inline functions masking > multiple levels of containerisation and pointer dereference does not > make it easy to track what is stored where, and once I've been through > one bit of code, I find I'm having to revisit the same piece of code a > bit later to re-understand what it's doing.) > > It's been difficult enough to find the engine plugin for openssl - the > original git repo which hosted it is now dead > (http://src.carnivore.it/users/common/af_alg/). All that seems to be > left is someone's modified version on github, which seems to get some > maintanence. Debian doesn't seem to carry AF_ALG openssl support, and > seems to only carry one package (strongswan) which supports this > interface. > > Hence, I'm leaving further debugging to other parties, especially as > the userspace tooling for the AF_ALG seems rather lacking. (Are there > any test programs, if so, can their location be documented and placed > in Documentation/crypto please?) > > I'm not sure who the maintainer for drivers/crypto/marvell is, so I've > picked Thomas. It would be nice if there was an entry in MAINTAINERS > for this driver. > > The first patch in this series avoids kernel stack smashing if a crypto > driver forgets to set the 'statesize' member, but writes to what seems > to be a valid pointer passed to its export function. Of course, this > won't completely stop stack smashing if the statesize member is > smaller than the data which the export function writes. This patch is > optional. > > The second patch adds the necessary statesize members to the Marvell > code which were previously missing. Fixing this uncovered a further > problem, which the third patch addresses. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] crypto: fixes for Marvell hash
On Fri, Oct 09, 2015 at 02:12:43PM +0200, Thomas Petazzoni wrote: > Hello Russell, > > On Fri, 9 Oct 2015 11:29:05 +0100, Russell King - ARM Linux wrote: > > This small series of patches addresses oopses seen when trying to use > > the AF_ALG interface via openssl with openssh. This series does not > > address all problems, but merely stops the kernel from smashing its > > kernel stack and oopsing. > > Thanks for debugging this and for the patches. However, could you Cc > the main authors of the driver: > > Boris Brezillon <boris.brezil...@free-electrons.com> > Arnaud Ebalard <a...@natisbad.org> Hmm. You know, given how broken this is, I've a good mind to say that this is more hassle than it's worth to get these simple fixes merged, especially as there's no MAINTAINERS entry for them. I'm not sending the patch series a third time. I could say if maintainers can't be bothered to ensure that they're listed in MAINTAINERS then they aren't maintainers at all. If this means it won't be merged, so be it, I don't care enough given that there's very little userspace support for kernel hw crypto. > > I'm not sure who the maintainer for drivers/crypto/marvell is, so I've > > picked Thomas. It would be nice if there was an entry in MAINTAINERS > > for this driver. > > Indeed. However get_maintainer.pl gets this right: Try with --no-git, which is the only sane mode for using get_maintainer.pl. Picking up everyone who's ever touched a driver with as little as a spelling fix in a git commit is abhorrent. > $ ./scripts/get_maintainer.pl -f drivers/crypto/marvell/cesa.c > Herbert Xu <herb...@gondor.apana.org.au> (maintainer:CRYPTO > API,commit_signer:11/12=92%) > "David S. Miller" <da...@davemloft.net> (maintainer:CRYPTO API) > Boris BREZILLON <boris.brezil...@free-electrons.com> > (commit_signer:11/12=92%,authored:6/12=50%,added_lines:538/558=96%,removed_lines:7/12=58%) > Arnaud Ebalard <a...@natisbad.org> > (commit_signer:7/12=58%,authored:4/12=33%,removed_lines:1/12=8%) > Vladimir Zapolskiy <vladimir_zapols...@mentor.com> > (commit_signer:1/12=8%,authored:1/12=8%,removed_lines:3/12=25%) > Krzysztof Kozlowski <k.kozlow...@samsung.com> > (commit_signer:1/12=8%,authored:1/12=8%,removed_lines:1/12=8%) > > Thomas > -- > Thomas Petazzoni, CTO, Free Electrons > Embedded Linux, Kernel and Android engineering > http://free-electrons.com -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/3] crypto: fixes for Marvell hash
Thomas says you're the maintainers of the marvell crypto driver. I picked sending these patches to Thomas because I couldn't work out who should be receiving them. If you wish to be copied on patches, please put yourselves in the MAINTAINERS file. In the mean time, you can find the patches on the crypto list if you wish to review them. Thanks. On Fri, Oct 09, 2015 at 11:46:37AM +0100, Russell King - ARM Linux wrote: > Version 2, same as version 1 but with a different patch 1, thanks to > Herbert for an alternative approach on that one. > > crypto/ahash.c| 3 ++- > crypto/shash.c| 3 ++- > drivers/crypto/marvell/hash.c | 9 + > 3 files changed, 13 insertions(+), 2 deletions(-) > > On Fri, Oct 09, 2015 at 11:29:04AM +0100, Russell King - ARM Linux wrote: > > This small series of patches addresses oopses seen when trying to use > > the AF_ALG interface via openssl with openssh. This series does not > > address all problems, but merely stops the kernel from smashing its > > kernel stack and oopsing. > > > > With these fixes in place, the kernel no longer oopses. However, with > > the digests enabled in openssl, openssh refuses to work, producing the > > following when attempting to connect to the target system: > > > > Corrupted MAC on input. > > Disconnecting: Packet corrupt > > > > It's been hard enough to get this far; the crypto code is not the easiest > > code to debug for a new-comer due to the amount of state needed to be > > retained to understand the code (all the inline functions masking > > multiple levels of containerisation and pointer dereference does not > > make it easy to track what is stored where, and once I've been through > > one bit of code, I find I'm having to revisit the same piece of code a > > bit later to re-understand what it's doing.) > > > > It's been difficult enough to find the engine plugin for openssl - the > > original git repo which hosted it is now dead > > (http://src.carnivore.it/users/common/af_alg/). All that seems to be > > left is someone's modified version on github, which seems to get some > > maintanence. Debian doesn't seem to carry AF_ALG openssl support, and > > seems to only carry one package (strongswan) which supports this > > interface. > > > > Hence, I'm leaving further debugging to other parties, especially as > > the userspace tooling for the AF_ALG seems rather lacking. (Are there > > any test programs, if so, can their location be documented and placed > > in Documentation/crypto please?) > > > > I'm not sure who the maintainer for drivers/crypto/marvell is, so I've > > picked Thomas. It would be nice if there was an entry in MAINTAINERS > > for this driver. > > > > The first patch in this series avoids kernel stack smashing if a crypto > > driver forgets to set the 'statesize' member, but writes to what seems > > to be a valid pointer passed to its export function. Of course, this > > won't completely stop stack smashing if the statesize member is > > smaller than the data which the export function writes. This patch is > > optional. > > > > The second patch adds the necessary statesize members to the Marvell > > code which were previously missing. Fixing this uncovered a further > > problem, which the third patch addresses. > > -- > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] crypto: ensure algif_hash does not pass a zero-sized state
On Fri, Oct 09, 2015 at 06:34:28PM +0800, Herbert Xu wrote: > On Fri, Oct 09, 2015 at 11:29:44AM +0100, Russell King wrote: > > If the algorithm passed a zero statesize, do not pass a valid pointer > > into the export/import functions. Passing a valid pointer covers up > > bugs in driver code which then go on to smash the kernel stack. > > Instead, pass NULL, which will cause any attempt to write to the > > pointer to fail. > > > > Signed-off-by: Russell King> > The state size should never be zero for a hash algorithm. Having > a zero state means that the hash output must always be identical. > Such an algorithm would be quite useless. > > So how about adding a check upon hash registration to verify that > the state size is greater than zero? The place to do it would be > shash_prepare_alg and ahash_prepare_alg. Do you mean something like this? As statesize is an unsigned int, testing for zero should be sufficient. diff --git a/crypto/ahash.c b/crypto/ahash.c index 8acb886032ae..9c1dc8d6106a 100644 --- a/crypto/ahash.c +++ b/crypto/ahash.c @@ -544,7 +544,8 @@ static int ahash_prepare_alg(struct ahash_alg *alg) struct crypto_alg *base = >halg.base; if (alg->halg.digestsize > PAGE_SIZE / 8 || - alg->halg.statesize > PAGE_SIZE / 8) + alg->halg.statesize > PAGE_SIZE / 8 || + alg->halg.statesize == 0) return -EINVAL; base->cra_type = _ahash_type; diff --git a/crypto/shash.c b/crypto/shash.c index ecb1e3d39bf0..ab3384b38542 100644 --- a/crypto/shash.c +++ b/crypto/shash.c @@ -585,7 +585,8 @@ static int shash_prepare_alg(struct shash_alg *alg) if (alg->digestsize > PAGE_SIZE / 8 || alg->descsize > PAGE_SIZE / 8 || - alg->statesize > PAGE_SIZE / 8) + alg->statesize > PAGE_SIZE / 8 || + alg->statesize == 0) return -EINVAL; base->cra_type = _shash_type; -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/5] crypto: fixes for Marvell hash
After all, here's a version 3, which fixes the other bug I mentioned below - we now have correct hash results. A couple of patches have been added, one to fix that, and a second patch to factor out the duplication in the import functions. This gets openssh working with the digests enabled. Acks so far received have been added. Patch 3 has changed, so I didn't add the ack for the previous version. Please re-review patch 3. Thanks. crypto/ahash.c| 3 +- crypto/shash.c| 3 +- drivers/crypto/marvell/hash.c | 80 +-- 3 files changed, 36 insertions(+), 50 deletions(-) On Fri, Oct 09, 2015 at 11:46:37AM +0100, Russell King - ARM Linux wrote: > Version 2, same as version 1 but with a different patch 1, thanks to > Herbert for an alternative approach on that one. > > crypto/ahash.c| 3 ++- > crypto/shash.c| 3 ++- > drivers/crypto/marvell/hash.c | 9 + > 3 files changed, 13 insertions(+), 2 deletions(-) > > On Fri, Oct 09, 2015 at 11:29:04AM +0100, Russell King - ARM Linux wrote: > > This small series of patches addresses oopses seen when trying to use > > the AF_ALG interface via openssl with openssh. This series does not > > address all problems, but merely stops the kernel from smashing its > > kernel stack and oopsing. > > > > With these fixes in place, the kernel no longer oopses. However, with > > the digests enabled in openssl, openssh refuses to work, producing the > > following when attempting to connect to the target system: > > > > Corrupted MAC on input. > > Disconnecting: Packet corrupt > > > > It's been hard enough to get this far; the crypto code is not the easiest > > code to debug for a new-comer due to the amount of state needed to be > > retained to understand the code (all the inline functions masking > > multiple levels of containerisation and pointer dereference does not > > make it easy to track what is stored where, and once I've been through > > one bit of code, I find I'm having to revisit the same piece of code a > > bit later to re-understand what it's doing.) > > > > It's been difficult enough to find the engine plugin for openssl - the > > original git repo which hosted it is now dead > > (http://src.carnivore.it/users/common/af_alg/). All that seems to be > > left is someone's modified version on github, which seems to get some > > maintanence. Debian doesn't seem to carry AF_ALG openssl support, and > > seems to only carry one package (strongswan) which supports this > > interface. > > > > Hence, I'm leaving further debugging to other parties, especially as > > the userspace tooling for the AF_ALG seems rather lacking. (Are there > > any test programs, if so, can their location be documented and placed > > in Documentation/crypto please?) > > > > I'm not sure who the maintainer for drivers/crypto/marvell is, so I've > > picked Thomas. It would be nice if there was an entry in MAINTAINERS > > for this driver. > > > > The first patch in this series avoids kernel stack smashing if a crypto > > driver forgets to set the 'statesize' member, but writes to what seems > > to be a valid pointer passed to its export function. Of course, this > > won't completely stop stack smashing if the statesize member is > > smaller than the data which the export function writes. This patch is > > optional. > > > > The second patch adds the necessary statesize members to the Marvell > > code which were previously missing. Fixing this uncovered a further > > problem, which the third patch addresses. > > -- > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/5] crypto: marvell: initialise struct mv_cesa_ahash_req
On Fri, Oct 09, 2015 at 09:50:18PM +0200, Boris Brezillon wrote: > Hi Russel, > > On Fri, 09 Oct 2015 20:43:43 +0100 > Russell Kingwrote: > > > When a AF_ALG fd is accepted a second time (hence hash_accept() is > > used), hash_accept_parent() allocates a new private context using > > sock_kmalloc(). This context is uninitialised. After use of the new > > fd, we eventually end up with the kernel complaining: > > > > marvell-cesa f109.crypto: dma_pool_free cesa_padding, c0627770/0 (bad > > dma) > > > > where c0627770 is a random address. Poisoning the memory allocated by > > the above sock_kmalloc() produces kernel oopses within the marvell hash > > code, particularly the interrupt handling. > > > > The following simplfied call sequence occurs: > > > > hash_accept() > > crypto_ahash_export() > > marvell hash export function > > af_alg_accept() > > hash_accept_parent()<== allocates uninitialised struct hash_ctx > > crypto_ahash_import() > > marvell hash import function > > > > hash_ctx contains the struct mv_cesa_ahash_req in its req.__ctx member, > > and, as the marvell hash import function only partially initialises > > this structure, we end up with a lot of members which are left with > > whatever data was in memory prior to sock_kmalloc(). > > > > Add zero-initialisation of this structure. > > Maybe you should also change your commit message since this patch no > longer initializes the req struct to zero, otherwise Oops. s/zero-// :) > Acked-by: Boris Brezillon > > > > > Signed-off-by: Russell King > > --- > > drivers/crypto/marvell/hash.c | 12 > > 1 file changed, 12 insertions(+) > > > > diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c > > index a259aced3b42..458867ce9515 100644 > > --- a/drivers/crypto/marvell/hash.c > > +++ b/drivers/crypto/marvell/hash.c > > @@ -831,6 +831,10 @@ static int mv_cesa_md5_import(struct ahash_request > > *req, const void *in) > > unsigned int cache_ptr; > > int ret; > > > > + ret = crypto_ahash_init(req); > > + if (ret) > > + return ret; > > + > > creq->len = in_state->byte_count; > > memcpy(creq->state, in_state->hash, digsize); > > creq->cache_ptr = 0; > > @@ -921,6 +925,10 @@ static int mv_cesa_sha1_import(struct ahash_request > > *req, const void *in) > > unsigned int cache_ptr; > > int ret; > > > > + ret = crypto_ahash_init(req); > > + if (ret) > > + return ret; > > + > > creq->len = in_state->count; > > memcpy(creq->state, in_state->state, digsize); > > creq->cache_ptr = 0; > > @@ -1022,6 +1030,10 @@ static int mv_cesa_sha256_import(struct > > ahash_request *req, const void *in) > > unsigned int cache_ptr; > > int ret; > > > > + ret = crypto_ahash_init(req); > > + if (ret) > > + return ret; > > + > > creq->len = in_state->count; > > memcpy(creq->state, in_state->state, digsize); > > creq->cache_ptr = 0; > > > > -- > Boris Brezillon, Free Electrons > Embedded Linux and Kernel engineering > http://free-electrons.com -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] MAINTAINERS: add maintainers for the Marvell Crypto driver
On Fri, Oct 09, 2015 at 05:39:02PM +0200, Thomas Petazzoni wrote: > A new crypto driver for Marvell ARM platforms was added in > drivers/crypto/marvell/ as part of commit f63601fd616ab ("crypto: > marvell/cesa - add a new driver for Marvell's CESA"). This commit adds > the relevant developers to the list of maintainers. > > Cc: Boris Brezillon> Cc: Arnaud Ebalard > Cc: Herbert Xu > Cc: Russell King Suggested-by: Russell King ;) > Signed-off-by: Thomas Petazzoni > --- > MAINTAINERS | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 7ba7ab7..d3f47f5c 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -6555,6 +6555,13 @@ M: Guenter Roeck > S: Maintained > F: drivers/net/dsa/mv88e6352.c > > +MARVELL CRYPTO DRIVER > +M: Boris Brezillon > +M: Arnaud Ebalard > +F: drivers/crypto/marvell/ > +S: Maintained > +L: linux-crypto@vger.kernel.org > + > MARVELL GIGABIT ETHERNET DRIVERS (skge/sky2) > M: Mirko Lindner > M: Stephen Hemminger > -- > 2.6.1 > -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers
BTW, off-topic for this thread... but I notice from Mark Brown's builder that mv_cesa is causing build errors in mainline now: arm-allmodconfig ../drivers/crypto/mv_cesa.c:1037:2: error: implicit declaration of function 'of_get_named_gen_pool' [-Werror=implicit-function-declaration] arm-multi_v5_defconfig ../drivers/crypto/mv_cesa.c:1037:2: error: implicit declaration of function 'of_get_named_gen_pool' [-Werror=implicit-function-declaration] It seems it was fine on July 2nd, but the above was introduced today. A few other things I notice when looking at this code: /* Not all platforms can gate the clock, so it is not an error if the clock does not exists. */ cp-clk = clk_get(pdev-dev, NULL); if (!IS_ERR(cp-clk)) clk_prepare_enable(cp-clk); So, if clk_get() returns PTR_ERR(-EPROBE_DEFER) we treat that clock as missing? Is that really the behaviour you want there? ret = request_irq(irq, crypto_int, 0, dev_name(pdev-dev), cp); What happens if crypto_int() is called when request_irq() unlocks its spinlock, before the clock has been found? Eg, because we're booting from a kexec'd kernel. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers
On Fri, Jul 03, 2015 at 11:43:05AM +0200, Boris Brezillon wrote: Which led us to think that this could be related to a non cache-line aligned buffer problem: if we share the cache line with someone modifying its data during the DMA transfer, we could experience data loss when the cpu decide to flush the data from this cache line to the memory. There's really only two options when you have an overlapping cache line and the cache line contains CPU-dirty data, but the memory contains DMA-dirty data. One of those dirty data has to win out, and the other has to be destroyed. (Either you invalidate the CPU cache line, loosing the CPU-dirty data, or you write-back the cache line, overwriting the DMA- dirty data.) What that means is that it's a _bug_ to end up in this situation, nothing more, nothing less. There's nothing that arch code can do about it, it's a DMA-API user bug. What you'll find is that this is well documented. See Documentation/DMA-API.txt, part Id (Streaming DMA mappings): Warnings: Memory coherency operates at a granularity called the cache line width. In order for memory mapped by this API to operate correctly, the mapped region must begin exactly on a cache line boundary and end exactly on one (to prevent two separately mapped regions from sharing a single cache line). Since the cache line size may not be known at compile time, the API will not enforce this requirement. Therefore, it is recommended that driver writers who don't take special care to determine the cache line size at run time only map virtual regions that begin and end on page boundaries (which are guaranteed also to be cache line boundaries). The behaviour you're seeing here is not ARM specific - any CPU which uses cache lines and doesn't have hardware coherency will suffer from this problem. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 05/14] crypto: marvell/CESA: add TDMA support
On Wed, Jun 17, 2015 at 05:50:01PM +0800, Herbert Xu wrote: On Wed, Jun 17, 2015 at 09:45:33AM +0200, Boris Brezillon wrote: + ret = dma_map_sg(cesa_dev-dev, req-src, creq-src_nents, +DMA_TO_DEVICE); + if (!ret) + return -ENOMEM; + + creq-src_nents = ret; DMA-API-HOWTO says that you must retain the original nents and use it when you call dma_unmap_sg. So I'm afraid one more repost is needed :) It's worse than that... You're right on that point, but there's an additional point. If dma_map_sg() coalesces scatterlist entries, then ret will be smaller than src_nents, and ret indicates how many scatterlist entries to be walked during DMA - you should not use src_nents for that. I couldn't see where the driver used that information. In fact, the driver seems to be capable of walking more than src_nents/ret numbers of scatterlist entries: it just keeps going with sg_next() until it hits the end of the allocated scatterlist. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 05/14] crypto: marvell/CESA: add TDMA support
On Thu, Jun 18, 2015 at 11:33:24AM +0200, Boris Brezillon wrote: Hi Russel, On Thu, 18 Jun 2015 10:04:00 +0100 Russell King - ARM Linux li...@arm.linux.org.uk wrote: On Wed, Jun 17, 2015 at 05:50:01PM +0800, Herbert Xu wrote: On Wed, Jun 17, 2015 at 09:45:33AM +0200, Boris Brezillon wrote: + ret = dma_map_sg(cesa_dev-dev, req-src, creq-src_nents, +DMA_TO_DEVICE); + if (!ret) + return -ENOMEM; + + creq-src_nents = ret; DMA-API-HOWTO says that you must retain the original nents and use it when you call dma_unmap_sg. So I'm afraid one more repost is needed :) It's worse than that... You're right on that point, but there's an additional point. If dma_map_sg() coalesces scatterlist entries, then ret will be smaller than src_nents, and ret indicates how many scatterlist entries to be walked during DMA - you should not use src_nents for that. I couldn't see where the driver used that information. In fact, the driver seems to be capable of walking more than src_nents/ret numbers of scatterlist entries: it just keeps going with sg_next() until it hits the end of the allocated scatterlist. Yes, I realized that, and I never used the value returned by dma_map_sg() to walk the scatterlist anyway: I was using the sg_next() and sg-length value (which I replaced by sg_dma_len() in v7 as suggested by Herbert). So the -src_nents assignment to dma_map_sg() return value was just a silly mistake caused by an uncareful read of the DMA-API-HOWTO. Am I missing something else ? Yes. 'ret' should be used to indicate the number of scatterlist entries to walk for DMA purposes after the scatterlist has been mapped. For PIO purposes, using src_nents is still acceptable. As Herbert points out, you're stopping after the sum of transferred bytes matches, so I suppose that's fine. One other point though: you should use sg_dma_address() rather than dereferencing sg-dma_address directly. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG?] crypto: caam: little/big endianness on ARM vs PPC
On Mon, Jun 15, 2015 at 06:33:17PM +0200, Jon Nettleton wrote: Funny enough I tackled this problem over the weekend as well. My approach was to switch the driver over to use the *_relaxed() io functions and then special case the bits missing from the various ARCHs. Basically adding setbits32 and clrbits32 for !PPC architectures and letting PPC and ARM share a writeq/readq set of functions. I left the existing LITTLE_ENDIAN special case until I could verify if it was needed, or had been tested. I'll follow up here with what I've mentioned elsewhere, and some further thoughts. I think this shows the dangers of using struct { } to define register offsets. Let's start here: /* * caam_job_ring - direct job ring setup * 1-4 possible per instantiation, base + 1000/2000/3000/4000 * Padded out to 0x1000 */ struct caam_job_ring { /* Input ring */ u64 inpring_base; /* IRBAx - Input desc ring baseaddr */ u32 rsvd1; Apparently, this is a CPU-endian 64-bit value (it's not defined using le64 or be64 which would fix it's endian.) The second question, which comes up in light of the breakage that's being reported is: is this really a 64-bit register, or is it a pair of 32-bit registers side-by-side? The documentation I'm looking at doesn't document the register at base + 0x1000, but documents the one at base + 0x1004, and the one at 0x1004 is given the name IRBAR0_LS, which presumably stands for input ring base address register 0, least significant. As the code originally stood for PPC, IRBAR0_LS is also at 0x1004, but appears to be big endian. On ARM, IRBAR0_LS appears at the same address, but is little endian. This is *not* a 64-bit register at all, but is a pair of 32-bit registers side by side. Moreover, readq() should not be used - no amount of arch mangling could ever produce a sane readq() which coped with this. So, the CAAM code is buggy in this regard: using readq() here when endian-portability is required is wrong. It's got to be two 32-bit reads or two 32-bit writes in the appropriate endian. Also, note that there's a big difference between __raw_readl() and readl_relaxed(). readl_relaxed() is always little-endian. __raw_readl() is god-knows-what-the-archtecture-decides endian. Switching PPC drivers from __raw_readl() to readl_relaxed() is really not a good idea unless someone from the PPC camp reviews and tests the code. So, what I'd suggest is just fixing rd_reg64() and wr_reg64() to do the right thing: keeping the two 32-bit words in the same order irrespective of the endian-ness, and staying with the __raw_* accessors until PPC people can look at this. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG?] crypto: caam: little/big endianness on ARM vs PPC
On Mon, Jun 15, 2015 at 05:59:07PM +0200, Steffen Trumtrar wrote: I'm working on CAAM support for the ARM-based i.MX6 SoCs. The current drivers/crypto/caam driver only works for PowerPC AFAIK. Actually, there isn't that much to do, to get support for the i.MX6 but one patch breaks the driver severely: commit ef94b1d834aace7101de77c3a7c2631b9ae9c5f6 crypto: caam - Add definition of rd/wr_reg64 for little endian platform You're not the only one who's hitting problems with this - Jon Nettleton has been working on it recently. The way this has been done is fairly yucky to start with: several things about it are particularly horrid. The first is the repetitive code for the BE and LE cases, when all that's actually different is the register order between the two code cases. The second thing is the excessive use of masking - I'm pretty sure the compiler won't complain with the below. diff --git a/drivers/crypto/caam/regs.h b/drivers/crypto/caam/regs.h index 378ddc17f60e..ba0fa630a25d 100644 --- a/drivers/crypto/caam/regs.h +++ b/drivers/crypto/caam/regs.h @@ -84,34 +84,28 @@ #endif #ifndef CONFIG_64BIT -#ifdef __BIG_ENDIAN -static inline void wr_reg64(u64 __iomem *reg, u64 data) -{ - wr_reg32((u32 __iomem *)reg, (data 0xull) 32); - wr_reg32((u32 __iomem *)reg + 1, data 0xull); -} - -static inline u64 rd_reg64(u64 __iomem *reg) -{ - return (((u64)rd_reg32((u32 __iomem *)reg)) 32) | - ((u64)rd_reg32((u32 __iomem *)reg + 1)); -} +#if defined(__BIG_ENDIAN) +#define REG64_HI32(reg) ((u32 __iomem *)(reg)) +#define REG64_LO32(reg) ((u32 __iomem *)(reg) + 1) +#elif defined(__LITTLE_ENDIAN) +#define REG64_HI32(reg) ((u32 __iomem *)(reg) + 1) +#define REG64_LO32(reg) ((u32 __iomem *)(reg)) #else -#ifdef __LITTLE_ENDIAN +#error Broken endian? +#endif + static inline void wr_reg64(u64 __iomem *reg, u64 data) { - wr_reg32((u32 __iomem *)reg + 1, (data 0xull) 32); - wr_reg32((u32 __iomem *)reg, data 0xull); + wr_reg32(REG64_HI32(reg), data 32); + wr_reg32(REG64_LO32(reg), data); } static inline u64 rd_reg64(u64 __iomem *reg) { - return (((u64)rd_reg32((u32 __iomem *)reg + 1)) 32) | - ((u64)rd_reg32((u32 __iomem *)reg)); + return ((u64)rd_reg32(REG64_HI32(reg))) 32 | + rd_reg32(REG64_LO32(reg)); } #endif -#endif -#endif /* * jr_outentry The second issue is that apparently, the register order doesn't actually change for LE devices - in other words, the byte order within each register does change, but they aren't a 64-bit register, they're two separate 32-bit registers. So, they should always be written as such. So, I'd get rid of the #if defined(__BIG_ENDIAN) stuff and instead have: +/* + * The DMA address register is a pair of 32-bit registers, and should + * always be accessed as such. + */ +#define REG64_HI32(reg) ((u32 __iomem *)(reg)) +#define REG64_LO32(reg) ((u32 __iomem *)(reg) + 1) -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [V4 PATCH 3/6] pci: Generic function for setting up PCI device DMA coherency
On Wed, May 20, 2015 at 07:00:25AM -0500, Suravee Suthikulanit wrote: On 5/20/2015 4:34 AM, Catalin Marinas wrote: We have a dummy of_dma_configure() already when !CONFIG_OF, otherwise we would need #ifndef here. I already replied, I think for other architectures we need this check to avoid a useless host-of_node test. It seems that there are several places that have similar check. Would it be good to convert this into a macro? Something like: #define OF_NODE_ENABLED(dev) (IS_ENABLED(CONFIG_OF) dev-of_node) This /could/ be a useful compile-time optimisation: when CONFIG_OF is disabled, dev-of_node exists but will always be NULL - but the compiler doesn't know this. Your suggestion above would tell the compiler that when CONFIG_OF is disabled, OF_NODE_ENABLED() will evaluate to a constant false, which means it can eliminate code. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ARM: convert all mov.* pc, reg to bx reg for ARMv6+ (part1)
On Tue, Jul 01, 2014 at 05:42:42PM +0100, Måns Rullgård wrote: Russell King rmk+ker...@arm.linux.org.uk writes: ARMv6 and greater introduced a new instruction (bx) which can be used to return from function calls. Recent CPUs perform better when the bx lr instruction is used rather than the mov pc, lr instruction, and this sequence is strongly recommended to be used by the ARM architecture manual (section A.4.1.1). We provide a new macro ret with all its variants for the condition code which will resolve to the appropriate instruction. When the source register is not lr the name ret is a misnomer since only the bx lr instruction is predicted as a function return. The bx instruction with other source registers uses the normal prediction mechanisms, leaving the return stack alone, and should not be used for function returns. Any code currently using another register to return from a function should probably be modified to use lr instead, unless there are special reasons for doing otherwise. If code jumping to an address in a non-lr register is not a return, using the ret name will make for some rather confusing reading. I would suggest either using a more neutral name than ret or adding an alias to be used for non-return jumps so as to make the intent clearer. If you read the patch, the branches which are changed are those which do indeed return in some way. There are those which do this having moved lr to a different register. As you point out, bx lr /may/ be treated specially (I've actually been discussing this with Will Deacon over the last couple of days, who has also been talking to the hardware people in ARM, and Will is happy with this patch as in its current form.) This is why I've changed all mov pc, reg instructions which return in some way to use this macro, and left others (those which are used to call some function and return back to the same point) alone. I have thought about introducing a call macro for those other sites, but as there are soo few of them, I've left them as-is for the time being (this patch is already large enough.) If there are any in the patch which you have specific concerns about, please specifically point them out those which give you a concern. Thanks. -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/4] crypto: Add Allwinner Security System crypto accelerator
On Sun, Jun 22, 2014 at 02:23:15PM +0200, Marek Vasut wrote: On Sunday, June 22, 2014 at 01:58:08 PM, Corentin LABBE wrote: [...] + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation version 2 of the License The license text seems incomplete. [...] I will replace it with a simplier line Licensed under the GPL-2. I'd suggest you to use the SPDX license identifiers then, but that's not something the kernel crowd agreed upon yet IIRC.Therefore , just make the text complete please. Marek, The full text of the first paragraph (in COPYING) is: This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. which is deemed to be entirely sufficient; quoting the full text is discouraged, especially when it includes the FSF address. It is also acceptable to restrict it to version 2 only, in which case something like this can be used: This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation. Of course, it's up to the author to make up their own mind at the end of the day. Oh, ok, good question -- dear list, shall one use sg_page()+kmap or sg_virt()? sg_page() + kmap() is preferred, because sg_virt() fails with highmem. Using sg_virt() means you restrict the driver to non-highmem memory, and if the kernel wants to place the data into a highmem page, it will have to use bounce buffers (so it's inefficient). What's even better is to use the scatterlist iterator, which will handle this for you. See the sg_miter_*() functions. -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 1/4] crypto: Add Allwinner Security System crypto accelerator
On Tue, Jun 10, 2014 at 02:43:14PM +0200, LABBE Corentin wrote: +int sunxi_aes_poll(struct ablkcipher_request *areq) +{ ... + if (areq-src == NULL || areq-dst == NULL) { + dev_err(ss-dev, ERROR: Some SGs are NULL %u\n, areq-nbytes); + return -1; You return -1 from here quite frequently. Have you verified that there is no way for this return value to get to userspace? Even if it can't get to userspace now, what if someone modifies the code so it can be returned? I detest crap that's written with return -1 in it because it looks like there is no care with the established error handling semantics for the kernel (small negative numbers are errno codes.) -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 19/51] DMA-API: media: dt3155v4l: replace dma_set_mask()+dma_set_coherent_mask() with new helper
On Thu, Oct 31, 2013 at 09:46:40AM -0200, Mauro Carvalho Chehab wrote: Hi Russell, Em Mon, 30 Sep 2013 13:57:47 +0200 Hans Verkuil hverk...@xs4all.nl escreveu: On 09/19/2013 11:44 PM, Russell King wrote: Replace the following sequence: dma_set_mask(dev, mask); dma_set_coherent_mask(dev, mask); with a call to the new helper dma_set_mask_and_coherent(). Signed-off-by: Russell King rmk+ker...@arm.linux.org.uk Acked-by: Hans Verkuil hans.verk...@cisco.com Somehow, I lost your original post (I got unsubscribed on a few days from all vger mailing lists at the end of september). I suspect that you want to sent this via your tree, right? Yes please. If so: Acked-by: Mauro Carvalho Chehab m.che...@samsung.com Added, thanks. - err = dma_set_mask(pdev-dev, DMA_BIT_MASK(32)); - if (err) - return -ENODEV; - err = dma_set_coherent_mask(pdev-dev, DMA_BIT_MASK(32)); + err = dma_set_mask_and_coherent(pdev-dev, DMA_BIT_MASK(32)); if (err) return -ENODEV; One thing I've just noticed is that return should be return err not return -ENODEV - are you okay for me to change that in this patch? Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/3] ARM: NEON based fast(er) AES in CBC/CTR/XTS modes
On Fri, Oct 04, 2013 at 08:04:50PM +0200, Ard Biesheuvel wrote: First of all, please note that the whole point of working so closely with the OpenSSL maintainer on this is that the version I am presenting here is the verbatim output of the Perl script that lives in the OpenSSL tree. So just shipped, not shipped and hacked. Personally, I would much prefer merging the .pl file as well, I just thought (and I did poll some people informally) that this is not something most people are happy about. If I am wrong about this, than I am quite happy to respin so the .S is generated on the fly. While it is desirable to keep the dependencies on external tools to a minimum, as perl is already on the list of required dependencies, there is no problem with including a script. Also, remember that the GPL says: The source code for a work means the preferred form of the work for making modifications to it. So here's the question: is the assembly code the perferred form to make modifications? From what you're saying above, the answer to that seems to be no. Now, I'm not going to throw toys out of the pram and say that this is a hard requirement, but just take a moment to think about how we treat vendors who don't do this, instead supplying non-preferred forms of source code, and think about whether there's double standards here. Now, while I can imagine that people have an ideological objection to perl (using comments like write only code etc) that's not a good enough excuse to avoid including the perferred form for future modification. Now, we have mechanisms in the kernel build where we can include a prepared source which can be used to lessen the burden on the toolset required to build the kernel. So, including both the perl script and the pre-generated assembly is entirely acceptable. This tends to nullify the excuse that we don't want to add additional tool burden to kbuild argument. So, what I'd strongly recommend is that we add both the pre-generated assembly with a _shipped suffix, the perl script, and include a rule like this: quiet_cmd_perl = PERL$@ cmd_perl = perl $ $@ $(src)/blahblah.S_shipped: $(src)/myperlscript $(call cmd,perl) and that should end up running myperlscript whenever it has a date stamp newer than the _shipped file, or if that file is missing. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/3] ARM: NEON based fast(er) AES in CBC/CTR/XTS modes
On Fri, Oct 04, 2013 at 02:34:01PM -0400, Nicolas Pitre wrote: Do you have an example of something that does require perl to build the kernel on ARM? I was under the impression that people try to avoid it as much as possible in general. I'm personally sitting on the fence between effectively adding a new kernel build dependencies or carrying the output of the perl script. But if the kernel build does already require perl in practice then this might tip the balance. That is really not a concern with the modern kbuild. It has supported having shipped versions of generated files included in the source tree for years. So, we can have the perl version included (the preferred form for editing) while avoiding the issue of requiring everyone to have perl. In other words, if you want to build a kernel, you don't need perl for this. If you want to edit the preferred form, then you do need perl so that the preferred form can be turned into assembly to update the shipped version. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/3] ARM: NEON based fast(er) AES in CBC/CTR/XTS modes
On Fri, Oct 04, 2013 at 08:41:35PM +0200, Ard Biesheuvel wrote: On 4 October 2013 20:34, Nicolas Pitre nicolas.pi...@linaro.org wrote: On Fri, 4 Oct 2013, Will Deacon wrote: [...] Why do you consider it unsuitable to ship the perl script with the kernel? Perl 5 is already documented as a build dependency in Documentation/Changes Do you have an example of something that does require perl to build the kernel on ARM? I was under the impression that people try to avoid it as much as possible in general. I'm personally sitting on the fence between effectively adding a new kernel build dependencies or carrying the output of the perl script. But if the kernel build does already require perl in practice then this might tip the balance. I like Russell's suggestion the most, in fact. In this case, the build time requirement for Perl effectively gets suspended until you start making modifications to the perl script, and the relation between the .S and the .pl files is made explicit by the make rule. Should I put the cmd_perl rule in scripts/Makefile.build ? Or can I just keep it under arch/arm/crypto ? Just running through the Makefiles, it seems we have a fair amount of stuff already using perl in various ways. So I wouldn't worry too much about where it's placed. It's probably something that should eventually end up in scripts/ at _some_ point. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/51] DMA mask changes
On Thu, Sep 26, 2013 at 10:23:08PM +0200, Rafał Miłecki wrote: 2013/9/19 Russell King - ARM Linux li...@arm.linux.org.uk: This email is only being sent to the mailing lists in question, not to anyone personally. The list of individuals is far to great to do that. I'm hoping no mailing lists reject the patches based on the number of recipients. Huh, I think it was enough to send only 3 patches to the b43-dev@. Like: [PATCH 01/51] DMA-API: provide a helper to set both DMA and coherent DMA masks [PATCH 14/51] DMA-API: net: b43: (...) [PATCH 15/51] DMA-API: net: b43legacy: (...) ;) I believe Joe has some nice script for doing it that way. When fixing some coding style / formatting, he sends only related patches to the given ML. If I did that, then the mailing lists would not get the first patch, because almost none of the lists would be referred to by patch 1. Moreover, people complain when they don't have access to the full patch series - they assume patches are missing for some reason, and they ask for the rest of the series. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [alsa-devel] [PATCH 43/51] DMA-API: dma: edma.c: no need to explicitly initialize DMA masks
On Mon, Sep 23, 2013 at 03:55:33PM +0530, Vinod Koul wrote: On Fri, Sep 20, 2013 at 12:15:39AM +0100, Russell King wrote: register_platform_device_full() can setup the DMA mask provided the appropriate member is set in struct platform_device_info. So lets make that be the case. This avoids a direct reference to the DMA masks by this driver. Signed-off-by: Russell King rmk+ker...@arm.linux.org.uk Acked-by: Vinod Koul vinod.k...@intel.com This also brings me question that should we force the driver to use the dma_set_mask_and_coherent() API or they have below flexiblity too? There's two issues here: 1. dma_set_mask_and_coherent() will only work if dev-dma_mask points at some storage for the mask. This needs to have .dma_mask in the platform_device_info initialised. 2. Yes, this driver should also be calling the appropriate DMA mask setting functions in addition to having the mask initialized at device creation time. Here's a replacement patch, though maybe it would be better to roll all the additions of dma_set_mask_and_coherent() in drivers/dma into one patch? In other words, combine the addition of this with these two patches: dma: pl330: add dma_set_mask_and_coherent() call dma: pl08x: add dma_set_mask_and_coherent() call 8= From: Russell King rmk+ker...@arm.linux.org.uk Subject: [PATCH] DMA-API: dma: edma.c: no need to explicitly initialize DMA masks register_platform_device_full() can setup the DMA mask provided the appropriate member is set in struct platform_device_info. So lets make that be the case. This avoids a direct reference to the DMA masks by this driver. While here, add the dma_set_mask_and_coherent() call which the DMA API requires DMA-using drivers to call. Signed-off-by: Russell King rmk+ker...@arm.linux.org.uk --- drivers/dma/edma.c | 10 ++ 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c index ff50ff4..fd5e48c 100644 --- a/drivers/dma/edma.c +++ b/drivers/dma/edma.c @@ -631,6 +631,10 @@ static int edma_probe(struct platform_device *pdev) struct edma_cc *ecc; int ret; + ret = dma_set_mask_and_coherent(pdev-dev, DMA_BIT_MASK(32)); + if (ret) + return ret; + ecc = devm_kzalloc(pdev-dev, sizeof(*ecc), GFP_KERNEL); if (!ecc) { dev_err(pdev-dev, Can't allocate controller\n); @@ -702,11 +706,13 @@ static struct platform_device *pdev0, *pdev1; static const struct platform_device_info edma_dev_info0 = { .name = edma-dma-engine, .id = 0, + .dma_mask = DMA_BIT_MASK(32), }; static const struct platform_device_info edma_dev_info1 = { .name = edma-dma-engine, .id = 1, + .dma_mask = DMA_BIT_MASK(32), }; static int edma_init(void) @@ -720,8 +726,6 @@ static int edma_init(void) ret = PTR_ERR(pdev0); goto out; } - pdev0-dev.dma_mask = pdev0-dev.coherent_dma_mask; - pdev0-dev.coherent_dma_mask = DMA_BIT_MASK(32); } if (EDMA_CTLRS == 2) { @@ -731,8 +735,6 @@ static int edma_init(void) platform_device_unregister(pdev0); ret = PTR_ERR(pdev1); } - pdev1-dev.dma_mask = pdev1-dev.coherent_dma_mask; - pdev1-dev.coherent_dma_mask = DMA_BIT_MASK(32); } out: -- 1.7.4.4 -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 36/51] DMA-API: usb: use dma_set_coherent_mask()
On Mon, Sep 23, 2013 at 02:27:39PM -0400, Alan Stern wrote: On Thu, 19 Sep 2013, Russell King wrote: The correct way for a driver to specify the coherent DMA mask is not to directly access the field in the struct device, but to use dma_set_coherent_mask(). Only arch and bus code should access this member directly. Convert all direct write accesses to using the correct API. Signed-off-by: Russell King rmk+ker...@arm.linux.org.uk diff --git a/drivers/usb/host/ehci-platform.c b/drivers/usb/host/ehci-platform.c index f6b790c..5b0cd2d 100644 --- a/drivers/usb/host/ehci-platform.c +++ b/drivers/usb/host/ehci-platform.c @@ -91,8 +91,9 @@ static int ehci_platform_probe(struct platform_device *dev) dev-dev.platform_data = ehci_platform_defaults; if (!dev-dev.dma_mask) dev-dev.dma_mask = dev-dev.coherent_dma_mask; - if (!dev-dev.coherent_dma_mask) - dev-dev.coherent_dma_mask = DMA_BIT_MASK(32); + err = dma_set_coherent_mask(dev-dev, DMA_BIT_MASK(32)); + if (err) + return err; pdata = dev_get_platdata(dev-dev); ehci-platform.c is a generic file, intended for use by multiple platforms. Is it not possible that on some platforms, the arch or bus code may already have initialized the DMA masks? Isn't something like this (i.e., DMA bindings) planned for Device Tree? By eliminating the tests above and calling dma_set_coherent_mask or dma_coerce_mask_and_coherent unconditionally, this patch (and the next) would overwrite those initial settings. I don't know to what extent the same may be true for the other, platform-specific, drivers changed by this patch. But it's something to be aware of. Please check the DMA API documentation: = For correct operation, you must interrogate the kernel in your device probe routine to see if the DMA controller on the machine can properly support the DMA addressing limitation your device has. It is good style to do this even if your device holds the default setting, because this shows that you did think about these issues wrt. your device. The query is performed via a call to dma_set_mask(): int dma_set_mask(struct device *dev, u64 mask); The query for consistent allocations is performed via a call to dma_set_coherent_mask(): int dma_set_coherent_mask(struct device *dev, u64 mask); = So, All drivers which use DMA are supposed to issue the appropriate calls to check the DMA masks before they perform DMA, even if the default is correct. And yes, this is all part of sorting out the DT mess - we should start not from the current mess (which is really totally haphazard) but from the well established position of how the DMA API _should_ be used. What that means is that all drivers should be issuing these calls as appropriate today. The default mask setup when the device is created is just that - it's a default mask, and it should not be used to decide anything about the device. That's something which the driver should compute on its own accord and then inform the various other layers via the appropriate call. Remember, on PCI, even when we have 64-bit, we do not declare PCI devices with a 64-bit DMA mask: that's up to PCI device drivers to _try_ to set a 64-bit DMA mask, which when successful _allows_ them to use 64-bit DMA. If it fails, they have to only use 32-bit. If they want a smaller mask, the _driver_ has to set the smaller mask, not the device creating code. The reason we're into this (particularly on ARM) is that we got lazy because we could get by with declaring a DMA mask at device creation time, since all devices were statically declared. Now it's time to get rid of those old lazy ways and start doing things correctly and to the requirements of the APIs. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 42/51] DMA-API: usb: musb: use platform_device_register_full() to avoid directly messing with dma masks
On Fri, Sep 20, 2013 at 08:11:25AM -0500, Felipe Balbi wrote: Hi, On Fri, Sep 20, 2013 at 12:14:38AM +0100, Russell King wrote: Use platform_device_register_full() for those drivers which can, to avoid messing directly with DMA masks. This can only be done when the driver does not need to access the allocated musb platform device from within its callbacks, which may be called during the musb device probing. Signed-off-by: Russell King rmk+ker...@arm.linux.org.uk you want me to carry this one through my tree or you prefer getting my Acked-by ? Either way works for me: Acked-by: Felipe Balbi ba...@ti.com there's also the third option of me setting up a branch with only this patch and we both merge it, that'd also work. I think this patch is sufficiently stand-alone that it should be fine if you want to take it through your tree. That may be better in the long run to avoid conflicts with this patch and any future work in this area during this cycle. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 39/51] DMA-API: others: use dma_set_coherent_mask()
On Fri, Sep 20, 2013 at 07:16:52AM -0500, Tejun Heo wrote: On Fri, Sep 20, 2013 at 12:11:38AM +0100, Russell King wrote: The correct way for a driver to specify the coherent DMA mask is not to directly access the field in the struct device, but to use dma_set_coherent_mask(). Only arch and bus code should access this member directly. Convert all direct write accesses to using the correct API. Signed-off-by: Russell King rmk+ker...@arm.linux.org.uk Acked-by: Tejun Heo t...@kernel.org The patch is pretty widely spread. I don't mind how it gets routed but what's the plan? The plan is... I'm going to try and avoid going through the hell of re-posting this patch series to all the recipients another time... (It's taken some 17 hours and lots of hand holding to get this patch set out without exim jumping off a cliff into deep OOM - soo deep that even the OOM killer doesn't run and the CPU is 100% idle because every single process stuck in an uninterruptible sleep waiting for every other process to free some memory - ouch!) I know that dealing with this patch set will be a problem due to how widespread this is, but much of the driver level changes come down to depending on a couple of patches. One solution would be if I published a branch with just the dependencies in, which subsystem maintainers could pull, and then apply the appropriate patches on top. Another would be if subsystem maintainers are happy that I carry them, I can add the acks, and then later on towards the end of the cycle, provide a branch subsystem maintainers could pull. Or... if you can think of something easier... -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/51] DMA-API: provide a helper to set both DMA and coherent DMA masks
On Fri, Sep 20, 2013 at 02:21:37AM +0100, Ben Hutchings wrote: On Thu, 2013-09-19 at 22:25 +0100, Russell King wrote: [...] -dma_set_coherent_mask() will always be able to set the same or a -smaller mask as dma_set_mask(). However for the rare case that a +The coherent coherent mask will always be able to set the same or a +smaller mask as the streaming mask. However for the rare case that a [...] The new wording doesn't make sense; a mask doesn't set itself. I would suggest: The coherent mask can always be set to the same or a smaller mask than the streaming mask. Yes, the original sentence is not particularly good, but I think even your modified version can be interpreted as a mask setting itself for all the same reasons that the original can be (which mask does same refer to?) Even so, I prefer your version. Thanks. :) -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/51] DMA mask changes
This started out as a request to look at the DMA mask situation, and how to solve the issues which we have on ARM - notably how the DMA mask should be setup. However, I started off reviewing how the dma_mask and coherent_dma_mask was being used, and what I found was rather messy, and in some cases rather buggy. I tried to get some of the bug fixes in before the last merge window, but it seems that the maintainers preferred to have the full solution rather than a simple -rc suitable bug fix. So, this is an attempt to clean things up. The first point here is that drivers performing DMA should be calling dma_set_mask()/dma_set_coherent_mask() in their probe function to verify that DMA can be performed. Lots of ARM drivers omit this step; please refer to the DMA API documentation on this subject. What this means is that the DMA mask provided by bus code is a default value - nothing more. It doesn't have to accurately reflect what the device is actually capable of. Apart from the storage for dev-dma_mask being initialised for any device which is DMA capable, there is no other initialisation which is strictly necessary at device creation time. Now, these cleanups address two major areas: 1. The setting of DMA masks, particularly when both the coherent and streaming DMA masks are set together. 2. The initialisation of DMA masks by drivers - this seems to be becoming a popular habbit, one which may not be entirely the right solution. Rather than having this scattered throughout the tree, I've pulled that into a central location (and called it coercing the DMA mask - because it really is about forcing the DMA mask to be that value.) 3. Finally, addressing the long held misbelief that DMA masks somehow correspond with physical addresses. We already have established long ago that dma_addr_t values returned from the DMA API are the values which you program into the DMA controller, and so are the bus addresses. It is _only_ sane that DMA masks are also bus related too, and not related to physical address spaces. (3) is a very important point for LPAE systems, which may still have less than 4GB of memory, but this memory is all located above the 4GB physical boundary. This means with the current model, any device using a 32-bit DMA mask fails - even though the DMA controller is still only a 32-bit DMA controller but the 32-bit bus addresses map to system memory. To put it another way, the bus addresses have a 4GB physical offset on them. This email is only being sent to the mailing lists in question, not to anyone personally. The list of individuals is far to great to do that. I'm hoping no mailing lists reject the patches based on the number of recipients. Patches based on v3.12-rc1. Documentation/DMA-API-HOWTO.txt | 37 +-- Documentation/DMA-API.txt |8 +++ arch/arm/include/asm/dma-mapping.h|8 +++ arch/arm/mm/dma-mapping.c | 49 ++-- arch/arm/mm/init.c| 12 +++--- arch/arm/mm/mm.h |2 + arch/powerpc/kernel/vio.c |3 +- block/blk-settings.c |8 ++-- drivers/amba/bus.c|6 +-- drivers/ata/pata_ixp4xx_cf.c |5 ++- drivers/ata/pata_octeon_cf.c |5 +- drivers/block/nvme-core.c | 10 ++--- drivers/crypto/ixp4xx_crypto.c| 48 ++-- drivers/dma/amba-pl08x.c |5 ++ drivers/dma/dw/platform.c |8 +-- drivers/dma/edma.c|6 +-- drivers/dma/pl330.c |4 ++ drivers/firmware/dcdbas.c | 23 +- drivers/firmware/google/gsmi.c| 13 +++-- drivers/gpu/drm/exynos/exynos_drm_drv.c |6 ++- drivers/gpu/drm/omapdrm/omap_dmm_tiler.c |5 +- drivers/media/platform/omap3isp/isp.c |6 +- drivers/media/platform/omap3isp/isp.h |3 - drivers/mmc/card/queue.c |3 +- drivers/mmc/host/sdhci-acpi.c |5 +- drivers/net/ethernet/broadcom/b44.c |3 +- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |8 +--- drivers/net/ethernet/brocade/bna/bnad.c | 13 ++ drivers/net/ethernet/emulex/benet/be_main.c | 12 + drivers/net/ethernet/intel/e1000/e1000_main.c |9 +--- drivers/net/ethernet/intel/e1000e/netdev.c| 18 +++- drivers/net/ethernet/intel/igb/igb_main.c | 18 +++- drivers/net/ethernet/intel/igbvf/netdev.c | 18 +++- drivers/net/ethernet/intel/ixgb/ixgb_main.c | 16 ++- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
Re: [RFC PATCH 2/2] arm64: add support for AES using ARMv8 Crypto Extensions
On Sat, Sep 14, 2013 at 05:11:53PM +0300, Jussi Kivilinna wrote: On 14.09.2013 16:30, Ard Biesheuvel wrote: On 14 September 2013 10:08, Jussi Kivilinna jussi.kivili...@iki.fi wrote: On 13.09.2013 18:08, Ard Biesheuvel wrote: This adds ARMv8 Crypto Extensions based implemenations of AES in CBC, CTR and XTS mode. Signed-off-by: Ard Biesheuvel ard.biesheu...@linaro.org --- ..snip.. +static int xts_set_key(struct crypto_tfm *tfm, const u8 *in_key, +unsigned int key_len) +{ + struct crypto_aes_xts_ctx *ctx = crypto_tfm_ctx(tfm); + u32 *flags = tfm-crt_flags; + int ret; + + ret = crypto_aes_expand_key(ctx-key1, in_key, key_len/2); + if (!ret) + ret = crypto_aes_expand_key(ctx-key2, in_key[key_len/2], + key_len/2); Use checkpatch. Um, I did get a bunch of errors and warnings from checkpatch.pl tbh, put not in this particular location. Care to elaborate? Well, the checkpatch.pl I had stored to brain had become corrupted and kept saying that you need spaces around all operators. But apparently spaces are just required for assignment operators. checkpatch is not definitive. It is merely an implementation of the coding style. The coding style is the definitive documentation, and it says about this: | Use one space around (on each side of) most binary and ternary operators, | such as any of these: | |= + - * / % |^ = = == != ? : | | but no space after unary operators: | * + - ~ ! sizeof typeof alignof __attribute__ defined | | no space before the postfix increment decrement unary operators: |++ -- | | no space after the prefix increment decrement unary operators: |++ -- | | and no space around the '.' and - structure member operators. So, you're quite right that the above is wrong. It needs spaces around the / operators. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/9] crypto: ux500/cryp - Set DMA configuration though dma_slave_config()
On Fri, Apr 26, 2013 at 01:46:46PM +0530, Vinod Koul wrote: On Fri, Apr 26, 2013 at 10:28:39AM +0200, Linus Walleij wrote: On Thu, Apr 25, 2013 at 4:11 PM, Arnd Bergmann a...@arndb.de wrote: The dma engine driver must know the address in its dma space, while the slave driver has it available in physical space. These two are often the same, but there is no generic way to convert between the two, especially if the dma engine resides behind an IOMMU. The best assumption we can make is that the dma engine driver knows how to convert between the two. Interestingly the documentation for dma_slave_config talks about physical address, while the structure itself uses a dma_addr_t. Linus Walleij introduced the structure in c156d0a5b0 DMAENGINE: generic slave channel control v3, so I assume he can shed some light on what he was thinking. I assume the documentation is right but the structure is not and should be converted to use phys_add_t or resource_size_t. OK I could cook a patch for that, but I think I need some input from Vinod and/or Russell on this. the dma_slave_config is physical address that should be passed directly to the controller. Obviosuly it should phys_addr_t :) What you've just said is actually confusing. physical address is normally the term used to describe the addresses seen to the RAM. phys_addr_t describes this. This is not necessarily what needs to be programmed into the DMA controller. For RAM addresses, they must be mapped via the DMA API - and this gives you a dma_addr_t. DMA address is the address to be programmed into a DMA controller to access a particular address in RAM or device, and has type dma_addr_t. When you're programming a DMA controller to access a device, you are clearly telling it the address on the _DMA controller's bus_ to access that register, which may or may not be the same as the physical address. There are platforms in existence where phys_addr_t can be 32-bit but dma_addr_t can be 64-bit. Getting this stuff wrong can cause problems. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/9] crypto: ux500/cryp - Set DMA configuration though dma_slave_config()
On Fri, Apr 26, 2013 at 11:39:20AM +0200, Arnd Bergmann wrote: On Friday 26 April 2013 13:46:46 Vinod Koul wrote: The mapping unmapping of dma buffer (memcpy and memory buffer in this txn) is required to be performed by the client driver. The dmanegine or dmaengine driver will not do that for you... I've been wondering about this part: since we need to pass the 'struct device' of the dma engine (rather than the slave) into dma_map_single, what is the official way to do that? Should the slave driver just rely on dma_chan-device-dev to work correctly for the purpose of dma-mapping.h interfaces? Yes. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mv_cesa dcache problem since 2.6.37 was: Re: mv_cesa hash functions
On Sun, May 06, 2012 at 12:49:30AM +0200, Simon Baatz wrote: Am 23.02.2012 19:34, schrieb Phil Sutter: But you might suffer from another problem, which is only present on ARM machines with VIVT cache and linux = 2.6.37: due to commit f8b63c1, ARM: 6382/1: Remove superfluous flush_kernel_dcache_page() which prevents pages being flushed from inside the scatterlist iterator API. This patch seems to introduce problems in other places (namely NFS), too but I sadly did not have time to investigate this further. I will post a possible (cryptodev-internal) solution to cryptodev-linux-de...@gna.org, maybe this fixes the problem with openssl. Greetings, Phil Well, the reason it has been removed is as follows: When pages are allocated, they have a special flag bit called PG_arch_1 cleared by the page allocator. We used to use PG_arch_1 to indicate that the page contained dirty cache lines - so by default, the page was marked clean. Normal kernel behaviour upon allocating a page into the page cache is to read data off the disk into the new page cache page, calling flush_kernel_dcache_page() in the process. This would ensure that data would be pushed out of the cache lines associated with the kernel mapping of the page into RAM. This means when we map the page into userspace, we can be sure that the new mapping will see the data written there. Moreover, when we place a page into a user process, we check this PG_arch_1 bit, and if it's set, we flush the kernel mapping cache lines at this point as well. Patch 6379/1 changed this behaviour: PG_arch_1 being _clear_ now indicates that the page contains dirty data. What this means is that when a page is placed into userspace, we check the PG_arch_1 bit. If it is clear, we flush the kernel mapping for the page to push dirty data out to RAM. This makes the flush in flush_kernel_dcache_page() entirely redundant. since there has been no reaction on this, I would like to bring this issue up again (I sadly don't have the expertise to investigate this further...). The issue is not limited to cryptodev, but seems to be either a problem with commit f8b63c1 or a problem in mv_cesa that was uncovered by this commit. Can you reproduce it with anything else? It could be a problem with the way crypto stuff works - I've never had any dealings with that subsystem, so I really can't comment. If crypto uses scatterlists, and walks them with the standard API, and uses scatterlists with pages which are already mapped into userspace, then I can see how the above commit would make things go wrong. So, without knowledge of the crypto subsystem, I'm not sure I can help sort this out. If it's possible to reproduce with NFS, and it seems sorted in the latest kernel, that's probably because something changed with NFS - NFS did for a time have issues on ARM for various reasons (because of remapping kernel memory into vmalloc space and expecting it to be DMA coherent...) and that got fixed. Whether that's why you're not seeing problems with v3.4-rc5, I couldn't be sure unless you did a bisection between the bad and good kernel versions. That would at least allow us to confirm that that issue has been properly resolved. Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm: new W macro to WORD_ACCESS
On Wed, Sep 16, 2009 at 07:58:12PM +0200, Sebastian Andrzej Siewior wrote: leads to a build error because the crypto/cast6.c defines a function which is named W. W has nothing to do with the access size, so this change makes it _really_ confusing. What it's about is telling the compiler to use the ARM 32-bit version of the instruction coding when building for Thumb2 (which can intermix Thumb and ARM instruction sets.) So the macro needs to be renamed... NAK on this patch as it stands. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] arm/orion5x: add sram support for crypto
On Thu, Jun 11, 2009 at 11:07:34PM +0200, Sebastian Andrzej Siewior wrote: If you thing it is too early I can keep hacking in my own git tree until I get the dmac_flush_range() hack out or so. The problem that I percieve with these kinds of hacks is that they tend to spread into other code, and then we end up with problems when new architectures come along. For these interfaces, I am a strong believer in purpose-defined interfaces to caches and the like. If what we have doesn't provide what's required, we need to provide something else. So, the question is what are you trying to do with this dmac_flush_range() and your SRAM? Are you trying to achieve I/D cache coherency for data written there, so you can execute it later, or are you trying to ensure DMA coherency from SRAM? -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] crypto: fix handling of sg buffers in ixp4xx driver
On Mon, Mar 02, 2009 at 12:45:12PM +0100, Christian Hohnstaedt wrote: - keep dma functions away from chained scatterlists. Use the existing scatterlist iteration inside the driver to call dma_map_single() for each chunk and avoid dma_map_sg(). Hmm, interesting. So crypto has its own scatterlist chaining stuff which is different and incompatible from what is supported by the rest of the kernel. That's really confusing. There's know way to know whether a struct scatterlist should be walked by using the scatterwalk_* stuff or the generic stuff. Excellent for reviewing. So, sorry, I'm not qualified to review this. Please find someone else who knows about crypto stuff. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ixp4xx_crypto panic with fragmented packets in scatterlist
On Wed, Feb 25, 2009 at 04:35:40PM +0100, Karl Hiramoto wrote: The attached patch fixes my issue, but am not sure if it is correct or will cause problems else where. diff -Naurp linux-2.6.28.7.a/arch/arm/include/asm/scatterlist.h linux-2.6.28.7.b/arch/arm/include/asm/scatterlist.h --- linux-2.6.28.7.a/arch/arm/include/asm/scatterlist.h 2009-02-20 23:41:27.0 +0100 +++ linux-2.6.28.7.b/arch/arm/include/asm/scatterlist.h 2009-02-25 16:19:59.0 +0100 @@ -24,4 +24,6 @@ struct scatterlist { #define sg_dma_address(sg) ((sg)-dma_address) #define sg_dma_len(sg) ((sg)-length) +#define ARCH_HAS_SG_CHAIN + We can't merge this until _all_ of ARM has been fixed for walking scatterlist chains. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ixp4xx_crypto panic with fragmented packets in scatterlist
On Thu, Feb 26, 2009 at 08:10:43PM +0800, Herbert Xu wrote: Russell King - ARM Linux li...@arm.linux.org.uk wrote: We can't merge this until _all_ of ARM has been fixed for walking scatterlist chains. Right, this is definitely not the way to fix this bug. Because even if ARM completely supported chaining, you still have to fix all the other architectures as well. I don't think you can use chained scatterlists unless the architecture supports it. It's not a case that you have to flatten the chaining before passing it over to the arch - it seems you're not allowed to create a chained scatterlist in the first place: static inline void sg_chain(struct scatterlist *prv, unsigned int prv_nents, struct scatterlist *sgl) { #ifndef ARCH_HAS_SG_CHAIN BUG(); #endif -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html