Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers

2015-07-03 Thread Boris Brezillon
Hi Russel,

On Fri, 3 Jul 2015 14:10:59 +0100
Russell King - ARM Linux li...@arm.linux.org.uk wrote:

 BTW, off-topic for this thread... but I notice from Mark Brown's builder
 that mv_cesa is causing build errors in mainline now:
 
 arm-allmodconfig
 ../drivers/crypto/mv_cesa.c:1037:2: error: implicit declaration of function 
 'of_get_named_gen_pool' [-Werror=implicit-function-declaration]
 
 arm-multi_v5_defconfig
 ../drivers/crypto/mv_cesa.c:1037:2: error: implicit declaration of function 
 'of_get_named_gen_pool' [-Werror=implicit-function-declaration]
 
 It seems it was fine on July 2nd, but the above was introduced today.

It has been fixed by Stephen in linux-next (see [1]).

 
 A few other things I notice when looking at this code:
 
 /* Not all platforms can gate the clock, so it is not
an error if the clock does not exists. */
 cp-clk = clk_get(pdev-dev, NULL);
 if (!IS_ERR(cp-clk))
 clk_prepare_enable(cp-clk);
 
 So, if clk_get() returns PTR_ERR(-EPROBE_DEFER) we treat that clock as
 missing?  Is that really the behaviour you want there?

Nope, I'll fix that.

 
 ret = request_irq(irq, crypto_int, 0, dev_name(pdev-dev),
 cp);
 
 What happens if crypto_int() is called when request_irq() unlocks its
 spinlock, before the clock has been found?  Eg, because we're booting
 from a kexec'd kernel.

I don't know, and I agree that we should either request the clock
before registering the irq, or disable the IRQ using the
SEC_ACCEL_INT_MASK before calling request_irq. Ideally we should do
both.

Note that none of my patches changed the clk_get(), request_irq() calls
order, or the clk_get() error path ;-).

Anyway, I'll propose a patch fixing the problem.

Thanks for the feedback,

Boris

[1]https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/crypto/mv_cesa.c?id=7e6ce3db5329a32c83da9753bae162eb3f22dfca

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers

2015-07-03 Thread Boris Brezillon
On Fri, 3 Jul 2015 10:58:07 +0100
Russell King - ARM Linux li...@arm.linux.org.uk wrote:

 On Fri, Jul 03, 2015 at 11:43:05AM +0200, Boris Brezillon wrote:
  Which led us to think that this could be related to a non cache-line
  aligned buffer problem: if we share the cache line with someone
  modifying its data during the DMA transfer, we could experience data
  loss when the cpu decide to flush the data from this cache line to the
  memory.
 
 There's really only two options when you have an overlapping cache line
 and the cache line contains CPU-dirty data, but the memory contains
 DMA-dirty data.  One of those dirty data has to win out, and the other
 has to be destroyed.  (Either you invalidate the CPU cache line, loosing
 the CPU-dirty data, or you write-back the cache line, overwriting the DMA-
 dirty data.)
 
 What that means is that it's a _bug_ to end up in this situation, nothing
 more, nothing less.  There's nothing that arch code can do about it, it's
 a DMA-API user bug.
 
 What you'll find is that this is well documented.  See
 Documentation/DMA-API.txt, part Id (Streaming DMA mappings):
 
   Warnings:  Memory coherency operates at a granularity called the cache
   line width.  In order for memory mapped by this API to operate
   correctly, the mapped region must begin exactly on a cache line
   boundary and end exactly on one (to prevent two separately mapped
   regions from sharing a single cache line).  Since the cache line size
   may not be known at compile time, the API will not enforce this
   requirement.  Therefore, it is recommended that driver writers who
   don't take special care to determine the cache line size at run time
   only map virtual regions that begin and end on page boundaries (which
   are guaranteed also to be cache line boundaries).
 
 The behaviour you're seeing here is not ARM specific - any CPU which
 uses cache lines and doesn't have hardware coherency will suffer from
 this problem.
 

That's pretty much the answer I was expecting :-).
I'll wait for Herbert's feedback to know how this should be fixed (I
guess using the alg_walk_xxx functions is the right this to do, but
I'd like to be sure).

Thanks,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers

2015-07-03 Thread Russell King - ARM Linux
BTW, off-topic for this thread... but I notice from Mark Brown's builder
that mv_cesa is causing build errors in mainline now:

arm-allmodconfig
../drivers/crypto/mv_cesa.c:1037:2: error: implicit declaration of function 
'of_get_named_gen_pool' [-Werror=implicit-function-declaration]

arm-multi_v5_defconfig
../drivers/crypto/mv_cesa.c:1037:2: error: implicit declaration of function 
'of_get_named_gen_pool' [-Werror=implicit-function-declaration]

It seems it was fine on July 2nd, but the above was introduced today.

A few other things I notice when looking at this code:

/* Not all platforms can gate the clock, so it is not
   an error if the clock does not exists. */
cp-clk = clk_get(pdev-dev, NULL);
if (!IS_ERR(cp-clk))
clk_prepare_enable(cp-clk);

So, if clk_get() returns PTR_ERR(-EPROBE_DEFER) we treat that clock as
missing?  Is that really the behaviour you want there?

ret = request_irq(irq, crypto_int, 0, dev_name(pdev-dev),
cp);

What happens if crypto_int() is called when request_irq() unlocks its
spinlock, before the clock has been found?  Eg, because we're booting
from a kexec'd kernel.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers

2015-07-03 Thread Andrew Lunn
 A few other things I notice when looking at this code:
 
 /* Not all platforms can gate the clock, so it is not
an error if the clock does not exists. */
 cp-clk = clk_get(pdev-dev, NULL);
 if (!IS_ERR(cp-clk))
 clk_prepare_enable(cp-clk);
 
 So, if clk_get() returns PTR_ERR(-EPROBE_DEFER) we treat that clock as
 missing?  Is that really the behaviour you want there?

Hi Russell

I probably added that to the older driver and it got copied into the
newer one.

The clock it is using is added to the system using CLK_OF_DECLARE()
This results in an entry in the __clk_of_table table. That table is
walked in of_clk_init(), which is called by time_init() in
arch/arc/kernel/time.c, early in the boot.

If we get -EPROBE_DEFER, something is seriously wrong.

   Andrew
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] [RFC] crypto/testmgr: add null test for 842 algorithm

2015-07-03 Thread Stephan Mueller
Am Donnerstag, 2. Juli 2015, 15:41:19 schrieb Nishanth Aravamudan:

Hi Nishanth,

Currently, when the nx-842-pseries driver loads, the following message
is emitted:

alg: No test for 842 (842-nx)

It seems like the simplest way to fix this message (other than adding a
proper test) is to just insert the null test into the list in the
testmgr.

Signed-off-by: Nishanth Aravamudan n...@linux.vnet.ibm.com

---
 crypto/testmgr.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index d0a42bd3aae9..ff0f76e0d0b4 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1982,6 +1982,9 @@ static int alg_test_null(const struct alg_test_desc
*desc, /* Please keep this list sorted by algorithm name. */
 static const struct alg_test_desc alg_test_descs[] = {
   {
+  .alg = 842,
+  .test = alg_test_null,
+  }, {
   .alg = __cbc-cast5-avx,
   .test = alg_test_null,

As this is a compression algo, it is safe to add fips_allowed = 1 here as 
otherwise the algo is not available in fips=1

   }, {


Ciao
Stephan
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crypto: marvell/CESA: Issues with non cache-line aligned buffers

2015-07-03 Thread Russell King - ARM Linux
On Fri, Jul 03, 2015 at 11:43:05AM +0200, Boris Brezillon wrote:
 Which led us to think that this could be related to a non cache-line
 aligned buffer problem: if we share the cache line with someone
 modifying its data during the DMA transfer, we could experience data
 loss when the cpu decide to flush the data from this cache line to the
 memory.

There's really only two options when you have an overlapping cache line
and the cache line contains CPU-dirty data, but the memory contains
DMA-dirty data.  One of those dirty data has to win out, and the other
has to be destroyed.  (Either you invalidate the CPU cache line, loosing
the CPU-dirty data, or you write-back the cache line, overwriting the DMA-
dirty data.)

What that means is that it's a _bug_ to end up in this situation, nothing
more, nothing less.  There's nothing that arch code can do about it, it's
a DMA-API user bug.

What you'll find is that this is well documented.  See
Documentation/DMA-API.txt, part Id (Streaming DMA mappings):

  Warnings:  Memory coherency operates at a granularity called the cache
  line width.  In order for memory mapped by this API to operate
  correctly, the mapped region must begin exactly on a cache line
  boundary and end exactly on one (to prevent two separately mapped
  regions from sharing a single cache line).  Since the cache line size
  may not be known at compile time, the API will not enforce this
  requirement.  Therefore, it is recommended that driver writers who
  don't take special care to determine the cache line size at run time
  only map virtual regions that begin and end on page boundaries (which
  are guaranteed also to be cache line boundaries).

The behaviour you're seeing here is not ARM specific - any CPU which
uses cache lines and doesn't have hardware coherency will suffer from
this problem.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


crypto: marvell/CESA: Issues with non cache-line aligned buffers

2015-07-03 Thread Boris Brezillon
Hello, 

We've been facing a bug with the new CESA driver on armada 370 for a
couple of days.

Arnaud found out that this bug appeared when we fixed the
req-src == req-dst case (by passing DMA_BIDIRECTIONAL when mapping the
sg list instead of mapping the same sg list twice, once with the
TO_DEVICE option and the second time with the FROM_DEVICE option).

After further investigations, we found out that invalidating the cache
(in addition to the clean operation already done when mapping the
region for DMA with the BIDIR option) was somehow fixing the problem.

Which led us to think that this could be related to a non cache-line
aligned buffer problem: if we share the cache line with someone
modifying its data during the DMA transfer, we could experience data
loss when the cpu decide to flush the data from this cache line to the
memory.
Russel, Will, could you confirm this is an issue that could happen ?

To validate this assumption we decided to test the alignment of the
src and dst sg list entries passed by the crypto framework, and most of
them are indeed not aligned on ARCH_DMA_MINALIGN. We made a dirty hack
to copy the content of non aligned sg lists into a new ones containing a
single aligned buffer (allocated with kmalloc) and it seems to solve
the problem.

We also noticed that the cra_alignmask fields were all set to 0 in the
CESA driver, but modifying them (setting them to ARCH_DMA_MINALIGN - 1)
does not seem to guarantee any alignment.
ITOH, the xxx_walk_xxx API seem to check these misalignment issues and
allocate new buffers to prevent such cases, but AFAICT, this is not
automatically done by the crypto framework, and it's the
responsibility of each driver to ensure the correct alignment.
Herbert, could you confirm that ?

Do you see anything else we should check ?

Best Regards,

Boris

-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line unsubscribe linux-crypto in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html