Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers
On Fri, Jul 03, 2015 at 11:43:05AM +0200, Boris Brezillon wrote: We also noticed that the cra_alignmask fields were all set to 0 in the CESA driver, but modifying them (setting them to ARCH_DMA_MINALIGN - 1) does not seem to guarantee any alignment. ITOH, the xxx_walk_xxx API seem to check these misalignment issues and allocate new buffers to prevent such cases, but AFAICT, this is not automatically done by the crypto framework, and it's the responsibility of each driver to ensure the correct alignment. Herbert, could you confirm that ? Correct. The crypto API does provide limited help for things like the key and the tfm context, but as far as the input/output data is concerned it is up to the driver to deal with them. Note that I don't think we have a suitable helper for DMA-aligned processing of SG lists. The ablkcipher_walk_* interface is mostly identical to the blkcipher_walk_* interface. Cheers, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers
On Sat, Jul 04, 2015 at 03:33:55PM +0800, Herbert Xu wrote: Note that I don't think we have a suitable helper for DMA-aligned processing of SG lists. The ablkcipher_walk_* interface is mostly identical to the blkcipher_walk_* interface. Actually I take that back. The ablkcipher_walk_* should work even if may be a bit unwieldy. Patches to improve it are of course welcome :) Cheers, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers
Hi Russel, On Fri, 3 Jul 2015 14:10:59 +0100 Russell King - ARM Linux li...@arm.linux.org.uk wrote: BTW, off-topic for this thread... but I notice from Mark Brown's builder that mv_cesa is causing build errors in mainline now: arm-allmodconfig ../drivers/crypto/mv_cesa.c:1037:2: error: implicit declaration of function 'of_get_named_gen_pool' [-Werror=implicit-function-declaration] arm-multi_v5_defconfig ../drivers/crypto/mv_cesa.c:1037:2: error: implicit declaration of function 'of_get_named_gen_pool' [-Werror=implicit-function-declaration] It seems it was fine on July 2nd, but the above was introduced today. It has been fixed by Stephen in linux-next (see [1]). A few other things I notice when looking at this code: /* Not all platforms can gate the clock, so it is not an error if the clock does not exists. */ cp-clk = clk_get(pdev-dev, NULL); if (!IS_ERR(cp-clk)) clk_prepare_enable(cp-clk); So, if clk_get() returns PTR_ERR(-EPROBE_DEFER) we treat that clock as missing? Is that really the behaviour you want there? Nope, I'll fix that. ret = request_irq(irq, crypto_int, 0, dev_name(pdev-dev), cp); What happens if crypto_int() is called when request_irq() unlocks its spinlock, before the clock has been found? Eg, because we're booting from a kexec'd kernel. I don't know, and I agree that we should either request the clock before registering the irq, or disable the IRQ using the SEC_ACCEL_INT_MASK before calling request_irq. Ideally we should do both. Note that none of my patches changed the clk_get(), request_irq() calls order, or the clk_get() error path ;-). Anyway, I'll propose a patch fixing the problem. Thanks for the feedback, Boris [1]https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/mv_cesa.c?id=7e6ce3db5329a32c83da9753bae162eb3f22dfca -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers
On Fri, 3 Jul 2015 10:58:07 +0100 Russell King - ARM Linux li...@arm.linux.org.uk wrote: On Fri, Jul 03, 2015 at 11:43:05AM +0200, Boris Brezillon wrote: Which led us to think that this could be related to a non cache-line aligned buffer problem: if we share the cache line with someone modifying its data during the DMA transfer, we could experience data loss when the cpu decide to flush the data from this cache line to the memory. There's really only two options when you have an overlapping cache line and the cache line contains CPU-dirty data, but the memory contains DMA-dirty data. One of those dirty data has to win out, and the other has to be destroyed. (Either you invalidate the CPU cache line, loosing the CPU-dirty data, or you write-back the cache line, overwriting the DMA- dirty data.) What that means is that it's a _bug_ to end up in this situation, nothing more, nothing less. There's nothing that arch code can do about it, it's a DMA-API user bug. What you'll find is that this is well documented. See Documentation/DMA-API.txt, part Id (Streaming DMA mappings): Warnings: Memory coherency operates at a granularity called the cache line width. In order for memory mapped by this API to operate correctly, the mapped region must begin exactly on a cache line boundary and end exactly on one (to prevent two separately mapped regions from sharing a single cache line). Since the cache line size may not be known at compile time, the API will not enforce this requirement. Therefore, it is recommended that driver writers who don't take special care to determine the cache line size at run time only map virtual regions that begin and end on page boundaries (which are guaranteed also to be cache line boundaries). The behaviour you're seeing here is not ARM specific - any CPU which uses cache lines and doesn't have hardware coherency will suffer from this problem. That's pretty much the answer I was expecting :-). I'll wait for Herbert's feedback to know how this should be fixed (I guess using the alg_walk_xxx functions is the right this to do, but I'd like to be sure). Thanks, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers
BTW, off-topic for this thread... but I notice from Mark Brown's builder that mv_cesa is causing build errors in mainline now: arm-allmodconfig ../drivers/crypto/mv_cesa.c:1037:2: error: implicit declaration of function 'of_get_named_gen_pool' [-Werror=implicit-function-declaration] arm-multi_v5_defconfig ../drivers/crypto/mv_cesa.c:1037:2: error: implicit declaration of function 'of_get_named_gen_pool' [-Werror=implicit-function-declaration] It seems it was fine on July 2nd, but the above was introduced today. A few other things I notice when looking at this code: /* Not all platforms can gate the clock, so it is not an error if the clock does not exists. */ cp-clk = clk_get(pdev-dev, NULL); if (!IS_ERR(cp-clk)) clk_prepare_enable(cp-clk); So, if clk_get() returns PTR_ERR(-EPROBE_DEFER) we treat that clock as missing? Is that really the behaviour you want there? ret = request_irq(irq, crypto_int, 0, dev_name(pdev-dev), cp); What happens if crypto_int() is called when request_irq() unlocks its spinlock, before the clock has been found? Eg, because we're booting from a kexec'd kernel. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers
A few other things I notice when looking at this code: /* Not all platforms can gate the clock, so it is not an error if the clock does not exists. */ cp-clk = clk_get(pdev-dev, NULL); if (!IS_ERR(cp-clk)) clk_prepare_enable(cp-clk); So, if clk_get() returns PTR_ERR(-EPROBE_DEFER) we treat that clock as missing? Is that really the behaviour you want there? Hi Russell I probably added that to the older driver and it got copied into the newer one. The clock it is using is added to the system using CLK_OF_DECLARE() This results in an entry in the __clk_of_table table. That table is walked in of_clk_init(), which is called by time_init() in arch/arc/kernel/time.c, early in the boot. If we get -EPROBE_DEFER, something is seriously wrong. Andrew -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers
On Fri, Jul 03, 2015 at 11:43:05AM +0200, Boris Brezillon wrote: Which led us to think that this could be related to a non cache-line aligned buffer problem: if we share the cache line with someone modifying its data during the DMA transfer, we could experience data loss when the cpu decide to flush the data from this cache line to the memory. There's really only two options when you have an overlapping cache line and the cache line contains CPU-dirty data, but the memory contains DMA-dirty data. One of those dirty data has to win out, and the other has to be destroyed. (Either you invalidate the CPU cache line, loosing the CPU-dirty data, or you write-back the cache line, overwriting the DMA- dirty data.) What that means is that it's a _bug_ to end up in this situation, nothing more, nothing less. There's nothing that arch code can do about it, it's a DMA-API user bug. What you'll find is that this is well documented. See Documentation/DMA-API.txt, part Id (Streaming DMA mappings): Warnings: Memory coherency operates at a granularity called the cache line width. In order for memory mapped by this API to operate correctly, the mapped region must begin exactly on a cache line boundary and end exactly on one (to prevent two separately mapped regions from sharing a single cache line). Since the cache line size may not be known at compile time, the API will not enforce this requirement. Therefore, it is recommended that driver writers who don't take special care to determine the cache line size at run time only map virtual regions that begin and end on page boundaries (which are guaranteed also to be cache line boundaries). The behaviour you're seeing here is not ARM specific - any CPU which uses cache lines and doesn't have hardware coherency will suffer from this problem. -- FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
crypto: marvell/CESA: Issues with non cache-line aligned buffers
Hello, We've been facing a bug with the new CESA driver on armada 370 for a couple of days. Arnaud found out that this bug appeared when we fixed the req-src == req-dst case (by passing DMA_BIDIRECTIONAL when mapping the sg list instead of mapping the same sg list twice, once with the TO_DEVICE option and the second time with the FROM_DEVICE option). After further investigations, we found out that invalidating the cache (in addition to the clean operation already done when mapping the region for DMA with the BIDIR option) was somehow fixing the problem. Which led us to think that this could be related to a non cache-line aligned buffer problem: if we share the cache line with someone modifying its data during the DMA transfer, we could experience data loss when the cpu decide to flush the data from this cache line to the memory. Russel, Will, could you confirm this is an issue that could happen ? To validate this assumption we decided to test the alignment of the src and dst sg list entries passed by the crypto framework, and most of them are indeed not aligned on ARCH_DMA_MINALIGN. We made a dirty hack to copy the content of non aligned sg lists into a new ones containing a single aligned buffer (allocated with kmalloc) and it seems to solve the problem. We also noticed that the cra_alignmask fields were all set to 0 in the CESA driver, but modifying them (setting them to ARCH_DMA_MINALIGN - 1) does not seem to guarantee any alignment. ITOH, the xxx_walk_xxx API seem to check these misalignment issues and allocate new buffers to prevent such cases, but AFAICT, this is not automatically done by the crypto framework, and it's the responsibility of each driver to ensure the correct alignment. Herbert, could you confirm that ? Do you see anything else we should check ? Best Regards, Boris -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html