Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers

2015-07-05 Thread Herbert Xu
On Fri, Jul 03, 2015 at 11:43:05AM +0200, Boris Brezillon wrote:

 We also noticed that the cra_alignmask fields were all set to 0 in the
 CESA driver, but modifying them (setting them to ARCH_DMA_MINALIGN - 1)
 does not seem to guarantee any alignment.
 ITOH, the xxx_walk_xxx API seem to check these misalignment issues and
 allocate new buffers to prevent such cases, but AFAICT, this is not
 automatically done by the crypto framework, and it's the
 responsibility of each driver to ensure the correct alignment.
 Herbert, could you confirm that ?

Correct.  The crypto API does provide limited help for things like
the key and the tfm context, but as far as the input/output data is
concerned it is up to the driver to deal with them.

Note that I don't think we have a suitable helper for DMA-aligned
processing of SG lists.  The ablkcipher_walk_* interface is mostly
identical to the blkcipher_walk_* interface.

Cheers,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers

2015-07-05 Thread Herbert Xu
On Sat, Jul 04, 2015 at 03:33:55PM +0800, Herbert Xu wrote:
 Note that I don't think we have a suitable helper for DMA-aligned
 processing of SG lists.  The ablkcipher_walk_* interface is mostly
 identical to the blkcipher_walk_* interface.

Actually I take that back.  The ablkcipher_walk_* should work
even if may be a bit unwieldy.  Patches to improve it are of
course welcome :)

Cheers,
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers

2015-07-03 Thread Boris Brezillon
Hi Russel,

On Fri, 3 Jul 2015 14:10:59 +0100
Russell King - ARM Linux li...@arm.linux.org.uk wrote:

 BTW, off-topic for this thread... but I notice from Mark Brown's builder
 that mv_cesa is causing build errors in mainline now:
 
 arm-allmodconfig
 ../drivers/crypto/mv_cesa.c:1037:2: error: implicit declaration of function 
 'of_get_named_gen_pool' [-Werror=implicit-function-declaration]
 
 arm-multi_v5_defconfig
 ../drivers/crypto/mv_cesa.c:1037:2: error: implicit declaration of function 
 'of_get_named_gen_pool' [-Werror=implicit-function-declaration]
 
 It seems it was fine on July 2nd, but the above was introduced today.

It has been fixed by Stephen in linux-next (see [1]).

 
 A few other things I notice when looking at this code:
 
 /* Not all platforms can gate the clock, so it is not
an error if the clock does not exists. */
 cp-clk = clk_get(pdev-dev, NULL);
 if (!IS_ERR(cp-clk))
 clk_prepare_enable(cp-clk);
 
 So, if clk_get() returns PTR_ERR(-EPROBE_DEFER) we treat that clock as
 missing?  Is that really the behaviour you want there?

Nope, I'll fix that.

 
 ret = request_irq(irq, crypto_int, 0, dev_name(pdev-dev),
 cp);
 
 What happens if crypto_int() is called when request_irq() unlocks its
 spinlock, before the clock has been found?  Eg, because we're booting
 from a kexec'd kernel.

I don't know, and I agree that we should either request the clock
before registering the irq, or disable the IRQ using the
SEC_ACCEL_INT_MASK before calling request_irq. Ideally we should do
both.

Note that none of my patches changed the clk_get(), request_irq() calls
order, or the clk_get() error path ;-).

Anyway, I'll propose a patch fixing the problem.

Thanks for the feedback,

Boris

[1]https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/mv_cesa.c?id=7e6ce3db5329a32c83da9753bae162eb3f22dfca

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers

2015-07-03 Thread Boris Brezillon
On Fri, 3 Jul 2015 10:58:07 +0100
Russell King - ARM Linux li...@arm.linux.org.uk wrote:

 On Fri, Jul 03, 2015 at 11:43:05AM +0200, Boris Brezillon wrote:
  Which led us to think that this could be related to a non cache-line
  aligned buffer problem: if we share the cache line with someone
  modifying its data during the DMA transfer, we could experience data
  loss when the cpu decide to flush the data from this cache line to the
  memory.
 
 There's really only two options when you have an overlapping cache line
 and the cache line contains CPU-dirty data, but the memory contains
 DMA-dirty data.  One of those dirty data has to win out, and the other
 has to be destroyed.  (Either you invalidate the CPU cache line, loosing
 the CPU-dirty data, or you write-back the cache line, overwriting the DMA-
 dirty data.)
 
 What that means is that it's a _bug_ to end up in this situation, nothing
 more, nothing less.  There's nothing that arch code can do about it, it's
 a DMA-API user bug.
 
 What you'll find is that this is well documented.  See
 Documentation/DMA-API.txt, part Id (Streaming DMA mappings):
 
   Warnings:  Memory coherency operates at a granularity called the cache
   line width.  In order for memory mapped by this API to operate
   correctly, the mapped region must begin exactly on a cache line
   boundary and end exactly on one (to prevent two separately mapped
   regions from sharing a single cache line).  Since the cache line size
   may not be known at compile time, the API will not enforce this
   requirement.  Therefore, it is recommended that driver writers who
   don't take special care to determine the cache line size at run time
   only map virtual regions that begin and end on page boundaries (which
   are guaranteed also to be cache line boundaries).
 
 The behaviour you're seeing here is not ARM specific - any CPU which
 uses cache lines and doesn't have hardware coherency will suffer from
 this problem.
 

That's pretty much the answer I was expecting :-).
I'll wait for Herbert's feedback to know how this should be fixed (I
guess using the alg_walk_xxx functions is the right this to do, but
I'd like to be sure).

Thanks,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers

2015-07-03 Thread Russell King - ARM Linux
BTW, off-topic for this thread... but I notice from Mark Brown's builder
that mv_cesa is causing build errors in mainline now:

arm-allmodconfig
../drivers/crypto/mv_cesa.c:1037:2: error: implicit declaration of function 
'of_get_named_gen_pool' [-Werror=implicit-function-declaration]

arm-multi_v5_defconfig
../drivers/crypto/mv_cesa.c:1037:2: error: implicit declaration of function 
'of_get_named_gen_pool' [-Werror=implicit-function-declaration]

It seems it was fine on July 2nd, but the above was introduced today.

A few other things I notice when looking at this code:

/* Not all platforms can gate the clock, so it is not
   an error if the clock does not exists. */
cp-clk = clk_get(pdev-dev, NULL);
if (!IS_ERR(cp-clk))
clk_prepare_enable(cp-clk);

So, if clk_get() returns PTR_ERR(-EPROBE_DEFER) we treat that clock as
missing?  Is that really the behaviour you want there?

ret = request_irq(irq, crypto_int, 0, dev_name(pdev-dev),
cp);

What happens if crypto_int() is called when request_irq() unlocks its
spinlock, before the clock has been found?  Eg, because we're booting
from a kexec'd kernel.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers

2015-07-03 Thread Andrew Lunn
 A few other things I notice when looking at this code:
 
 /* Not all platforms can gate the clock, so it is not
an error if the clock does not exists. */
 cp-clk = clk_get(pdev-dev, NULL);
 if (!IS_ERR(cp-clk))
 clk_prepare_enable(cp-clk);
 
 So, if clk_get() returns PTR_ERR(-EPROBE_DEFER) we treat that clock as
 missing?  Is that really the behaviour you want there?

Hi Russell

I probably added that to the older driver and it got copied into the
newer one.

The clock it is using is added to the system using CLK_OF_DECLARE()
This results in an entry in the __clk_of_table table. That table is
walked in of_clk_init(), which is called by time_init() in
arch/arc/kernel/time.c, early in the boot.

If we get -EPROBE_DEFER, something is seriously wrong.

   Andrew
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers

2015-07-03 Thread Russell King - ARM Linux
On Fri, Jul 03, 2015 at 11:43:05AM +0200, Boris Brezillon wrote:
 Which led us to think that this could be related to a non cache-line
 aligned buffer problem: if we share the cache line with someone
 modifying its data during the DMA transfer, we could experience data
 loss when the cpu decide to flush the data from this cache line to the
 memory.

There's really only two options when you have an overlapping cache line
and the cache line contains CPU-dirty data, but the memory contains
DMA-dirty data.  One of those dirty data has to win out, and the other
has to be destroyed.  (Either you invalidate the CPU cache line, loosing
the CPU-dirty data, or you write-back the cache line, overwriting the DMA-
dirty data.)

What that means is that it's a _bug_ to end up in this situation, nothing
more, nothing less.  There's nothing that arch code can do about it, it's
a DMA-API user bug.

What you'll find is that this is well documented.  See
Documentation/DMA-API.txt, part Id (Streaming DMA mappings):

  Warnings:  Memory coherency operates at a granularity called the cache
  line width.  In order for memory mapped by this API to operate
  correctly, the mapped region must begin exactly on a cache line
  boundary and end exactly on one (to prevent two separately mapped
  regions from sharing a single cache line).  Since the cache line size
  may not be known at compile time, the API will not enforce this
  requirement.  Therefore, it is recommended that driver writers who
  don't take special care to determine the cache line size at run time
  only map virtual regions that begin and end on page boundaries (which
  are guaranteed also to be cache line boundaries).

The behaviour you're seeing here is not ARM specific - any CPU which
uses cache lines and doesn't have hardware coherency will suffer from
this problem.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


crypto: marvell/CESA: Issues with non cache-line aligned buffers

2015-07-03 Thread Boris Brezillon
Hello, 

We've been facing a bug with the new CESA driver on armada 370 for a
couple of days.

Arnaud found out that this bug appeared when we fixed the
req-src == req-dst case (by passing DMA_BIDIRECTIONAL when mapping the
sg list instead of mapping the same sg list twice, once with the
TO_DEVICE option and the second time with the FROM_DEVICE option).

After further investigations, we found out that invalidating the cache
(in addition to the clean operation already done when mapping the
region for DMA with the BIDIR option) was somehow fixing the problem.

Which led us to think that this could be related to a non cache-line
aligned buffer problem: if we share the cache line with someone
modifying its data during the DMA transfer, we could experience data
loss when the cpu decide to flush the data from this cache line to the
memory.
Russel, Will, could you confirm this is an issue that could happen ?

To validate this assumption we decided to test the alignment of the
src and dst sg list entries passed by the crypto framework, and most of
them are indeed not aligned on ARCH_DMA_MINALIGN. We made a dirty hack
to copy the content of non aligned sg lists into a new ones containing a
single aligned buffer (allocated with kmalloc) and it seems to solve
the problem.

We also noticed that the cra_alignmask fields were all set to 0 in the
CESA driver, but modifying them (setting them to ARCH_DMA_MINALIGN - 1)
does not seem to guarantee any alignment.
ITOH, the xxx_walk_xxx API seem to check these misalignment issues and
allocate new buffers to prevent such cases, but AFAICT, this is not
automatically done by the crypto framework, and it's the
responsibility of each driver to ensure the correct alignment.
Herbert, could you confirm that ?

Do you see anything else we should check ?

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html