Re: [RFC PATCH v2 4/4] crypto: aes - add generic time invariant AES for CTR/CCM/GCM

2017-02-01 Thread Ard Biesheuvel
On 2 February 2017 at 07:48, Ard Biesheuvel  wrote:
> On 2 February 2017 at 07:38, Eric Biggers  wrote:
>> Hi Ard,
>>
>> On Sat, Jan 28, 2017 at 11:33:33PM +, Ard Biesheuvel wrote:
>>>
>>> Note that this only implements AES encryption, which is all we need
>>> for CTR and CBC-MAC. AES decryption can easily be implemented in a
>>> similar way, but is significantly more costly.
>>
>> Is the expectation of decryption being more costly the only reason why 
>> "aes-ti"
>> couldn't be a "cipher" algorithm, allowing it to automatically be used by the
>> existing templates for CTR, CBC-MAC, CBC, ECB, XTS, CMAC, etc.?
>
> Yes.
>
>> It doesn't seem
>> to do anything expensive on a per-block basis like loading SSE registers, so 
>> it
>> seems it would fit better as a "cipher" algorithm if at all possible.  Then
>> there would be no need to implement all these modes yet again.
>>
>
> True.
>
>> Also, what would be the feasibility of simply replacing aes-generic with the
>> time-invariant implementation, rather than offering two implementations and
>> requiring users to choose one, usually without the needed expertise?
>>
>
> Well, it is a policy decision whether you want the best performance or
> reduced correlation between timing and the input, so there is no way
> to make everybody happy by replacing one with the other. But I can
> certainly implement is as a cipher, and we can take the discussion
> from there.
>
>>> +
>>> +/*
>>> + * Emit the sbox as __weak with external linkage to prevent the compiler
>>> + * from doing constant folding on sbox references involving fixed indexes.
>>> + */
>>> +__weak const u8 __cacheline_aligned __aesti_sbox[] = {
>>
>> Did you consider marking it 'volatile' instead?
>>
>
> I did not, and I expect the result to be the same. I can replace it if
> it matters.
>
>>> +static int aesti_set_key(struct aes_ti_ctx *ctx, const u8 *in_key,
>>> +  unsigned int key_len)
>>> +{
>>> + struct crypto_aes_ctx rk;
>>> + int err;
>>> +
>>> + err = crypto_aes_expand_key(&rk, in_key, key_len);
>>> + if (err)
>>> + return err;
>>
>> crypto_aes_expand_key() assumes that the key is u32-aligned; I don't think
>> that's guaranteed here.
>>
>
> No, I don't think so. AFAICT it expects the round keys to be u32
> aligned, which is guaranteed due to the fact that struct
> crypto_aes_ctx encapsulates an array of u32
>

I stand corrected: I misread le32_to_cpu() for get_unaligned_le32()


Re: [RFC PATCH] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic

2017-02-01 Thread Ard Biesheuvel
On 2 February 2017 at 06:47, Eric Biggers  wrote:
> On Mon, Jan 30, 2017 at 02:11:29PM +, Ard Biesheuvel wrote:
>> Instead of unconditionally forcing 4 byte alignment for all generic
>> chaining modes that rely on crypto_xor() or crypto_inc() (which may
>> result in unnecessary copying of data when the underlying hardware
>> can perform unaligned accesses efficiently), make those functions
>> deal with unaligned input explicitly, but only if the Kconfig symbol
>> HAVE_EFFICIENT_UNALIGNED_ACCESS is set. This will allow us to drop
>> the alignmasks from the CBC, CMAC, CTR, CTS, PCBC and SEQIV drivers.
>>
>> For crypto_inc(), this simply involves making the 4-byte stride
>> conditional on HAVE_EFFICIENT_UNALIGNED_ACCESS being set, given that
>> it typically operates on 16 byte buffers.
>>
>> For crypto_xor(), an algorithm is implemented that simply runs through
>> the input using the largest strides possible if unaligned accesses are
>> allowed. If they are not, an optimal sequence of memory accesses is
>> emitted that takes the relative alignment of the input buffers into
>> account, e.g., if the relative misalignment of dst and src is 4 bytes,
>> the entire xor operation will be completed using 4 byte loads and stores
>> (modulo unaligned bits at the start and end). Note that all expressions
>> involving startalign and misalign are simply eliminated by the compiler
>> if HAVE_EFFICIENT_UNALIGNED_ACCESS is defined.
>>
>
> Hi Ard,
>
> This is a good idea, and I think it was error-prone to be requiring 4-byte
> alignment always, and also inefficient on many architectures.
>
> The new crypto_inc() looks fine, but the new crypto_xor() is quite 
> complicated.
> I'm wondering whether it has to be that way, especially since it seems to most
> commonly be used on very small input buffers, e.g. 8 or 16-byte blocks.  There
> are a couple trivial ways it could be simplified, e.g. using 'dst' and 'src'
> directly instead of 'a' and 'b' (which also seems to improve code generation 
> by
> getting rid of the '+= len & ~mask' parts), or using sizeof(long) directly
> instead of 'size' and 'mask'.
>
> But also when I tried testing the proposed crypto_xor() on MIPS, it didn't 
> work
> correctly on a misaligned buffer.  With startalign=1, it did one iteration of
> the following loop and then exited with startalign=0 and entered the "unsigned
> long at a time" loop, which is incorrect since at that point the buffers were
> not yet fully aligned:
>

Right. I knew it was convoluted but I thought that was justified by
its correctness :-)

>>   do {
>>   if (len < sizeof(u8))
>>   break;
>>
>>   if (len >= size && !(startalign & 1) && !(misalign & 
>> 1))
>>   break;
>>
>>   *dst++ ^= *src++;
>>   len -= sizeof(u8);
>>   startalign &= ~sizeof(u8);
>>   } while (misalign & 1);
>
> I think it would need to do instead:
>
> startalign += sizeof(u8);
> startalign %= sizeof(unsigned long);
>
> But I am wondering whether you considered something simpler, using the
> get_unaligned/put_unaligned helpers, maybe even using a switch statement for 
> the
> last (sizeof(long) - 1) bytes so it can be compiled as a jump table.  
> Something
> like this:
>
> #define xor_unaligned(dst, src) \
> put_unaligned(get_unaligned(dst) ^ get_unaligned(src), (dst))
>
> void crypto_xor(u8 *dst, const u8 *src, unsigned int len)
> {
> while (len >= sizeof(unsigned long)) {
> xor_unaligned((unsigned long *)dst, (unsigned long *)src);
> dst += sizeof(unsigned long);
> src += sizeof(unsigned long);
> len -= sizeof(unsigned long);
> }
>
> switch (len) {
> #ifdef CONFIG_64BIT
> case 7:
> dst[6] ^= src[6];
> /* fall through */
> case 6:
> xor_unaligned((u16 *)&dst[4], (u16 *)&src[4]);
> goto len_4;
> case 5:
> dst[4] ^= src[4];
> /* fall through */
> case 4:
> len_4:
> xor_unaligned((u32 *)dst, (u32 *)src);
> break;
> #endif
> case 3:
> dst[2] ^= src[2];
> /* fall through */
> case 2:
> xor_unaligned((u16 *)dst, (u16 *)src);
> break;
> case 1:
> dst[0] ^= src[0];
> break;
> }
> }
>
> That would seem like a better choice for small buffers, which seems to be the
> more common case.  It should generate slightly faster code on architectures 
> with
> fast unaligned access like x86_64, while still being sufficient on 
> architectures
> without --- perhaps even faster, since it wouldn't have as many branches.
>

Well, what I tried to deal with explicitly is misaligned dst and

Re: [RFC PATCH v2 4/4] crypto: aes - add generic time invariant AES for CTR/CCM/GCM

2017-02-01 Thread Ard Biesheuvel
On 2 February 2017 at 07:38, Eric Biggers  wrote:
> Hi Ard,
>
> On Sat, Jan 28, 2017 at 11:33:33PM +, Ard Biesheuvel wrote:
>>
>> Note that this only implements AES encryption, which is all we need
>> for CTR and CBC-MAC. AES decryption can easily be implemented in a
>> similar way, but is significantly more costly.
>
> Is the expectation of decryption being more costly the only reason why 
> "aes-ti"
> couldn't be a "cipher" algorithm, allowing it to automatically be used by the
> existing templates for CTR, CBC-MAC, CBC, ECB, XTS, CMAC, etc.?

Yes.

> It doesn't seem
> to do anything expensive on a per-block basis like loading SSE registers, so 
> it
> seems it would fit better as a "cipher" algorithm if at all possible.  Then
> there would be no need to implement all these modes yet again.
>

True.

> Also, what would be the feasibility of simply replacing aes-generic with the
> time-invariant implementation, rather than offering two implementations and
> requiring users to choose one, usually without the needed expertise?
>

Well, it is a policy decision whether you want the best performance or
reduced correlation between timing and the input, so there is no way
to make everybody happy by replacing one with the other. But I can
certainly implement is as a cipher, and we can take the discussion
from there.

>> +
>> +/*
>> + * Emit the sbox as __weak with external linkage to prevent the compiler
>> + * from doing constant folding on sbox references involving fixed indexes.
>> + */
>> +__weak const u8 __cacheline_aligned __aesti_sbox[] = {
>
> Did you consider marking it 'volatile' instead?
>

I did not, and I expect the result to be the same. I can replace it if
it matters.

>> +static int aesti_set_key(struct aes_ti_ctx *ctx, const u8 *in_key,
>> +  unsigned int key_len)
>> +{
>> + struct crypto_aes_ctx rk;
>> + int err;
>> +
>> + err = crypto_aes_expand_key(&rk, in_key, key_len);
>> + if (err)
>> + return err;
>
> crypto_aes_expand_key() assumes that the key is u32-aligned; I don't think
> that's guaranteed here.
>

No, I don't think so. AFAICT it expects the round keys to be u32
aligned, which is guaranteed due to the fact that struct
crypto_aes_ctx encapsulates an array of u32

Regards,
Ard.


Re: [RFC PATCH v2 4/4] crypto: aes - add generic time invariant AES for CTR/CCM/GCM

2017-02-01 Thread Eric Biggers
Hi Ard,

On Sat, Jan 28, 2017 at 11:33:33PM +, Ard Biesheuvel wrote:
> 
> Note that this only implements AES encryption, which is all we need
> for CTR and CBC-MAC. AES decryption can easily be implemented in a
> similar way, but is significantly more costly.

Is the expectation of decryption being more costly the only reason why "aes-ti"
couldn't be a "cipher" algorithm, allowing it to automatically be used by the
existing templates for CTR, CBC-MAC, CBC, ECB, XTS, CMAC, etc.?  It doesn't seem
to do anything expensive on a per-block basis like loading SSE registers, so it
seems it would fit better as a "cipher" algorithm if at all possible.  Then
there would be no need to implement all these modes yet again.

Also, what would be the feasibility of simply replacing aes-generic with the
time-invariant implementation, rather than offering two implementations and
requiring users to choose one, usually without the needed expertise?

> +
> +/*
> + * Emit the sbox as __weak with external linkage to prevent the compiler
> + * from doing constant folding on sbox references involving fixed indexes.
> + */
> +__weak const u8 __cacheline_aligned __aesti_sbox[] = {

Did you consider marking it 'volatile' instead?

> +static int aesti_set_key(struct aes_ti_ctx *ctx, const u8 *in_key,
> +  unsigned int key_len)
> +{
> + struct crypto_aes_ctx rk;
> + int err;
> +
> + err = crypto_aes_expand_key(&rk, in_key, key_len);
> + if (err)
> + return err;

crypto_aes_expand_key() assumes that the key is u32-aligned; I don't think
that's guaranteed here.

Eric


Re: [RFC PATCH] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic

2017-02-01 Thread Eric Biggers
On Mon, Jan 30, 2017 at 02:11:29PM +, Ard Biesheuvel wrote:
> Instead of unconditionally forcing 4 byte alignment for all generic
> chaining modes that rely on crypto_xor() or crypto_inc() (which may
> result in unnecessary copying of data when the underlying hardware
> can perform unaligned accesses efficiently), make those functions
> deal with unaligned input explicitly, but only if the Kconfig symbol
> HAVE_EFFICIENT_UNALIGNED_ACCESS is set. This will allow us to drop
> the alignmasks from the CBC, CMAC, CTR, CTS, PCBC and SEQIV drivers.
> 
> For crypto_inc(), this simply involves making the 4-byte stride
> conditional on HAVE_EFFICIENT_UNALIGNED_ACCESS being set, given that
> it typically operates on 16 byte buffers.
> 
> For crypto_xor(), an algorithm is implemented that simply runs through
> the input using the largest strides possible if unaligned accesses are
> allowed. If they are not, an optimal sequence of memory accesses is
> emitted that takes the relative alignment of the input buffers into
> account, e.g., if the relative misalignment of dst and src is 4 bytes,
> the entire xor operation will be completed using 4 byte loads and stores
> (modulo unaligned bits at the start and end). Note that all expressions
> involving startalign and misalign are simply eliminated by the compiler
> if HAVE_EFFICIENT_UNALIGNED_ACCESS is defined.
> 

Hi Ard,

This is a good idea, and I think it was error-prone to be requiring 4-byte
alignment always, and also inefficient on many architectures.

The new crypto_inc() looks fine, but the new crypto_xor() is quite complicated.
I'm wondering whether it has to be that way, especially since it seems to most
commonly be used on very small input buffers, e.g. 8 or 16-byte blocks.  There
are a couple trivial ways it could be simplified, e.g. using 'dst' and 'src'
directly instead of 'a' and 'b' (which also seems to improve code generation by
getting rid of the '+= len & ~mask' parts), or using sizeof(long) directly
instead of 'size' and 'mask'.

But also when I tried testing the proposed crypto_xor() on MIPS, it didn't work
correctly on a misaligned buffer.  With startalign=1, it did one iteration of
the following loop and then exited with startalign=0 and entered the "unsigned
long at a time" loop, which is incorrect since at that point the buffers were
not yet fully aligned:

>   do {
>   if (len < sizeof(u8))
>   break;
>
>   if (len >= size && !(startalign & 1) && !(misalign & 1))
>   break;
>
>   *dst++ ^= *src++;
>   len -= sizeof(u8);
>   startalign &= ~sizeof(u8);
>   } while (misalign & 1);

I think it would need to do instead:

startalign += sizeof(u8);
startalign %= sizeof(unsigned long);

But I am wondering whether you considered something simpler, using the
get_unaligned/put_unaligned helpers, maybe even using a switch statement for the
last (sizeof(long) - 1) bytes so it can be compiled as a jump table.  Something
like this:

#define xor_unaligned(dst, src) \
put_unaligned(get_unaligned(dst) ^ get_unaligned(src), (dst))

void crypto_xor(u8 *dst, const u8 *src, unsigned int len)
{
while (len >= sizeof(unsigned long)) {
xor_unaligned((unsigned long *)dst, (unsigned long *)src);
dst += sizeof(unsigned long);
src += sizeof(unsigned long);
len -= sizeof(unsigned long);
}

switch (len) {
#ifdef CONFIG_64BIT
case 7:
dst[6] ^= src[6];
/* fall through */
case 6:
xor_unaligned((u16 *)&dst[4], (u16 *)&src[4]);
goto len_4;
case 5:
dst[4] ^= src[4];
/* fall through */
case 4:
len_4:
xor_unaligned((u32 *)dst, (u32 *)src);
break;
#endif
case 3:
dst[2] ^= src[2];
/* fall through */
case 2:
xor_unaligned((u16 *)dst, (u16 *)src);
break;
case 1:
dst[0] ^= src[0];
break;
}
}

That would seem like a better choice for small buffers, which seems to be the
more common case.  It should generate slightly faster code on architectures with
fast unaligned access like x86_64, while still being sufficient on architectures
without --- perhaps even faster, since it wouldn't have as many branches.

Eric


Re: [PATCH 3/6] async_tx: Handle DMA devices having support for fewer PQ coefficients

2017-02-01 Thread Dan Williams
On Wed, Feb 1, 2017 at 8:47 PM, Anup Patel  wrote:
> The DMAENGINE framework assumes that if PQ offload is supported by a
> DMA device then all 256 PQ coefficients are supported. This assumption
> does not hold anymore because we now have BCM-SBA-RAID offload engine
> which supports PQ offload with limited number of PQ coefficients.
>
> This patch extends async_tx APIs to handle DMA devices with support
> for fewer PQ coefficients.
>
> Signed-off-by: Anup Patel 
> Reviewed-by: Scott Branden 
> ---
>  crypto/async_tx/async_pq.c  |  3 +++
>  crypto/async_tx/async_raid6_recov.c | 12 ++--
>  include/linux/dmaengine.h   | 19 +++
>  include/linux/raid/pq.h |  3 +++
>  4 files changed, 35 insertions(+), 2 deletions(-)

So, I hate the way async_tx does these checks on each operation, and
it's ok for me to say that because it's my fault. Really it's md that
should be validating engine offload capabilities once at the beginning
of time. I'd rather we move in that direction than continue to pile
onto a bad design.


Re: [PATCH -stable] crypto: ccm - deal with CTR ciphers that honour iv_out

2017-02-01 Thread Herbert Xu
On Wed, Feb 01, 2017 at 08:08:09PM +, Ard Biesheuvel wrote:
>
> Could you please forward this patch to Linus as well? I noticed that the patch

Sure, I will do that.

> crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes
> 
> is now in mainline, which means CCM is now broken on arm64, given that
> the iv_out requirement for CTR apparently isn't honored by *any*
> implementation, and CCM wrongly assumes that req->iv retains its value
> across the call into the CTR skcipher

Hmm, I wonder why we don't see this breakage with the generic
CTR as it seems to do exactly the same thing.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: 4.10 aesni-intel no longer having lrw/ablk_helper dependencies?

2017-02-01 Thread Herbert Xu
On Wed, Feb 01, 2017 at 05:08:03PM +0100, Arkadiusz Miśkiewicz wrote:
> 
> q: Will later loading of pcbc (so intel-aseni loaded from initrd, no pcbc 
> available, rootfs gets mounted; pcbc is loaded) enable its "functionality" 
> for 
> intel-aesni just like it would be available at intel-aesni load time ?

No it won't but it's no big deal as pcbc(aes) isn't actually used
anywhere so it's really just an example of using fpu.  AFAIK pcbc
is only ever used with fcrypt.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


[PATCH 4/6] async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()

2017-02-01 Thread Anup Patel
The DMA_PREP_FENCE is to be used when preparing Tx descriptor if output
of Tx descriptor is to be used by next/dependent Tx descriptor.

The DMA_PREP_FENSE will not be set correctly in do_async_gen_syndrome()
when calling dma->device_prep_dma_pq() under following conditions:
1. ASYNC_TX_FENCE not set in submit->flags
2. DMA_PREP_FENCE not set in dma_flags
3. src_cnt (= (disks - 2)) is greater than dma_maxpq(dma, dma_flags)

This patch fixes DMA_PREP_FENCE usage in do_async_gen_syndrome() taking
inspiration from do_async_xor() implementation.

Signed-off-by: Anup Patel 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 crypto/async_tx/async_pq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
index 16c6526..947cf35 100644
--- a/crypto/async_tx/async_pq.c
+++ b/crypto/async_tx/async_pq.c
@@ -62,9 +62,6 @@ do_async_gen_syndrome(struct dma_chan *chan,
dma_addr_t dma_dest[2];
int src_off = 0;
 
-   if (submit->flags & ASYNC_TX_FENCE)
-   dma_flags |= DMA_PREP_FENCE;
-
while (src_cnt > 0) {
submit->flags = flags_orig;
pq_src_cnt = min(src_cnt, dma_maxpq(dma, dma_flags));
@@ -83,6 +80,8 @@ do_async_gen_syndrome(struct dma_chan *chan,
if (cb_fn_orig)
dma_flags |= DMA_PREP_INTERRUPT;
}
+   if (submit->flags & ASYNC_TX_FENCE)
+   dma_flags |= DMA_PREP_FENCE;
 
/* Drivers force forward progress in case they can not provide
 * a descriptor
-- 
2.7.4



[PATCH 0/6] Broadcom SBA RAID support

2017-02-01 Thread Anup Patel
The Broadcom SBA RAID is a stream-based device which provides
RAID5/6 offload.

It requires a SoC specific ring manager (such as Broadcom FlexRM
ring manager) to provide ring-based programming interface. Due to
this, the Broadcom SBA RAID driver (mailbox client) implements
DMA device having one DMA channel using a set of mailbox channels
provided by Broadcom SoC specific ring manager driver (mailbox
controller).

Important limitations of Broadcom SBA RAID hardware are:
1. Requires disk position instead of disk coefficient 
2. Supports only 30 PQ disk coefficients

To address limitation #1, we have added raid_gflog table which
will help driver convert disk coefficient to disk position. To
address limitation #2, we have extended Linux Async Tx APIs to
check for available PQ coefficients before doing PQ offload.

This patchset is based on Linux-4.10-rc6 and depends on patchset
"[PATCH v4 0/2] Broadcom FlexRM ring manager support"

It is also available at sba-raid-v1 branch of
https://github.com/Broadcom/arm64-linux.git

Anup Patel (6):
  mailbox: Add new API mbox_channel_device() for clients
  lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position
  async_tx: Handle DMA devices having support for fewer PQ coefficients
  async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()
  dmaengine: Add Broadcom SBA RAID driver
  dt-bindings: Add DT bindings document for Broadcom SBA RAID driver

 .../devicetree/bindings/dma/brcm,iproc-sba.txt |   29 +
 crypto/async_tx/async_pq.c |8 +-
 crypto/async_tx/async_raid6_recov.c|   12 +-
 drivers/dma/Kconfig|   13 +
 drivers/dma/Makefile   |1 +
 drivers/dma/bcm-sba-raid.c | 1309 
 drivers/mailbox/mailbox.c  |   21 +
 include/linux/dmaengine.h  |   19 +
 include/linux/mailbox_client.h |1 +
 include/linux/raid/pq.h|4 +
 lib/raid6/mktables.c   |   20 +
 11 files changed, 1432 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
 create mode 100644 drivers/dma/bcm-sba-raid.c

-- 
2.7.4



[PATCH 5/6] dmaengine: Add Broadcom SBA RAID driver

2017-02-01 Thread Anup Patel
The Broadcom stream buffer accelerator (SBA) provides offloading
capabilities for RAID operations. This SBA offload engine is
accessible via Broadcom SoC specific ring manager.

This patch adds Broadcom SBA RAID driver which provides one
DMA device with RAID capabilities using one or more Broadcom
SoC specific ring manager channels. The SBA RAID driver in its
current shape implements memcpy, xor, and pq operations.

Signed-off-by: Anup Patel 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 drivers/dma/Kconfig|   13 +
 drivers/dma/Makefile   |1 +
 drivers/dma/bcm-sba-raid.c | 1309 
 3 files changed, 1323 insertions(+)
 create mode 100644 drivers/dma/bcm-sba-raid.c

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 263495d..58d0463 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -99,6 +99,19 @@ config AXI_DMAC
  controller is often used in Analog Device's reference designs for FPGA
  platforms.
 
+config BCM_SBA_RAID
+tristate "Broadcom SBA RAID engine support"
+depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST
+select DMA_ENGINE
+select DMA_ENGINE_RAID
+   select ASYNC_TX_ENABLE_CHANNEL_SWITCH
+   default ARCH_BCM_IPROC
+help
+ Enable support for Broadcom SBA RAID Engine. The SBA RAID
+ engine is available on most of the Broadcom iProc SoCs. It
+ has the capability to offload memcpy, xor and pq computation
+ for raid5/6.
+
 config COH901318
bool "ST-Ericsson COH901318 DMA support"
select DMA_ENGINE
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index a4fa336..ba96bdd 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_AMCC_PPC440SPE_ADMA) += ppc4xx/
 obj-$(CONFIG_AT_HDMAC) += at_hdmac.o
 obj-$(CONFIG_AT_XDMAC) += at_xdmac.o
 obj-$(CONFIG_AXI_DMAC) += dma-axi-dmac.o
+obj-$(CONFIG_BCM_SBA_RAID) += bcm-sba-raid.o
 obj-$(CONFIG_COH901318) += coh901318.o coh901318_lli.o
 obj-$(CONFIG_DMA_BCM2835) += bcm2835-dma.o
 obj-$(CONFIG_DMA_JZ4740) += dma-jz4740.o
diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c
new file mode 100644
index 000..bf39f3f
--- /dev/null
+++ b/drivers/dma/bcm-sba-raid.c
@@ -0,0 +1,1309 @@
+/*
+ * Copyright (C) 2017 Broadcom
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * Broadcom SBA RAID Driver
+ *
+ * The Broadcom stream buffer accelerator (SBA) provides offloading
+ * capabilities for RAID operations. The SBA offload engine is accessible
+ * via Broadcom SoC specific ring manager. Two or more offload engines
+ * can share same Broadcom SoC specific ring manager due to this Broadcom
+ * SoC specific ring manager driver is implemented as a mailbox controller
+ * driver and offload engine drivers are implemented as mallbox clients.
+ *
+ * Typically, Broadcom SoC specific ring manager will implement larger
+ * number of hardware rings over one or more SBA hardware devices. By
+ * design, the internal buffer size of SBA hardware device is limited
+ * but all offload operations supported by SBA can be broken down into
+ * multiple small size requests and executed parallely on multiple SBA
+ * hardware devices for achieving high through-put.
+ *
+ * The Broadcom SBA RAID driver does not require any register programming
+ * except submitting request to SBA hardware device via mailbox channels.
+ * This driver implements a DMA device with one DMA channel using a set
+ * of mailbox channels provided by Broadcom SoC specific ring manager
+ * driver. To exploit parallelism (as described above), all DMA request
+ * coming to SBA RAID DMA channel are broken down to smaller requests
+ * and submitted to multiple mailbox channels in round-robin fashion.
+ * For having more SBA DMA channels, we can create more SBA device nodes
+ * in Broadcom SoC specific DTS based on number of hardware rings supported
+ * by Broadcom SoC ring manager.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "dmaengine.h"
+
+/* SBA command helper macros */
+#define SBA_DEC(_d, _s, _m)(((_d) >> (_s)) & (_m))
+#define SBA_ENC(_d, _v, _s, _m)\
+   do {\
+   (_d) &= ~((u64)(_m) << (_s));   \
+   (_d) |= (((u64)(_v) & (_m)) << (_s));   \
+   } while (0)
+
+/* SBA command related defines */
+#define SBA_TYPE_SHIFT 48
+#define SBA_TYPE_MASK  0x3
+#define SBA_TYPE_A 0x0
+#define SBA_TYPE_B 0x2
+#define SBA_TYPE_C   

[PATCH 3/6] async_tx: Handle DMA devices having support for fewer PQ coefficients

2017-02-01 Thread Anup Patel
The DMAENGINE framework assumes that if PQ offload is supported by a
DMA device then all 256 PQ coefficients are supported. This assumption
does not hold anymore because we now have BCM-SBA-RAID offload engine
which supports PQ offload with limited number of PQ coefficients.

This patch extends async_tx APIs to handle DMA devices with support
for fewer PQ coefficients.

Signed-off-by: Anup Patel 
Reviewed-by: Scott Branden 
---
 crypto/async_tx/async_pq.c  |  3 +++
 crypto/async_tx/async_raid6_recov.c | 12 ++--
 include/linux/dmaengine.h   | 19 +++
 include/linux/raid/pq.h |  3 +++
 4 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
index f83de99..16c6526 100644
--- a/crypto/async_tx/async_pq.c
+++ b/crypto/async_tx/async_pq.c
@@ -187,6 +187,9 @@ async_gen_syndrome(struct page **blocks, unsigned int 
offset, int disks,
 
BUG_ON(disks > 255 || !(P(blocks, disks) || Q(blocks, disks)));
 
+   if (device && dma_maxpqcoef(device) < src_cnt)
+   device = NULL;
+
if (device)
unmap = dmaengine_get_unmap_data(device->dev, disks, 
GFP_NOWAIT);
 
diff --git a/crypto/async_tx/async_raid6_recov.c 
b/crypto/async_tx/async_raid6_recov.c
index 8fab627..2916f95 100644
--- a/crypto/async_tx/async_raid6_recov.c
+++ b/crypto/async_tx/async_raid6_recov.c
@@ -352,6 +352,7 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, 
int failb,
 {
void *scribble = submit->scribble;
int non_zero_srcs, i;
+   struct dma_chan *chan = async_dma_find_channel(DMA_PQ);
 
BUG_ON(faila == failb);
if (failb < faila)
@@ -359,12 +360,15 @@ async_raid6_2data_recov(int disks, size_t bytes, int 
faila, int failb,
 
pr_debug("%s: disks: %d len: %zu\n", __func__, disks, bytes);
 
+   if (chan && dma_maxpqcoef(chan->device) < RAID6_PQ_MAX_COEF)
+   chan = NULL;
+
/* if a dma resource is not available or a scribble buffer is not
 * available punt to the synchronous path.  In the 'dma not
 * available' case be sure to use the scribble buffer to
 * preserve the content of 'blocks' as the caller intended.
 */
-   if (!async_dma_find_channel(DMA_PQ) || !scribble) {
+   if (!chan || !scribble) {
void **ptrs = scribble ? scribble : (void **) blocks;
 
async_tx_quiesce(&submit->depend_tx);
@@ -432,15 +436,19 @@ async_raid6_datap_recov(int disks, size_t bytes, int 
faila,
void *scribble = submit->scribble;
int good_srcs, good, i;
struct page *srcs[2];
+   struct dma_chan *chan = async_dma_find_channel(DMA_PQ);
 
pr_debug("%s: disks: %d len: %zu\n", __func__, disks, bytes);
 
+   if (chan && dma_maxpqcoef(chan->device) < RAID6_PQ_MAX_COEF)
+   chan = NULL;
+
/* if a dma resource is not available or a scribble buffer is not
 * available punt to the synchronous path.  In the 'dma not
 * available' case be sure to use the scribble buffer to
 * preserve the content of 'blocks' as the caller intended.
 */
-   if (!async_dma_find_channel(DMA_PQ) || !scribble) {
+   if (!chan || !scribble) {
void **ptrs = scribble ? scribble : (void **) blocks;
 
async_tx_quiesce(&submit->depend_tx);
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index feee6ec..d938a8b 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /**
@@ -668,6 +669,7 @@ struct dma_filter {
  * @cap_mask: one or more dma_capability flags
  * @max_xor: maximum number of xor sources, 0 if no capability
  * @max_pq: maximum number of PQ sources and PQ-continue capability
+ * @max_pqcoef: maximum number of PQ coefficients, 0 if all supported
  * @copy_align: alignment shift for memcpy operations
  * @xor_align: alignment shift for xor operations
  * @pq_align: alignment shift for pq operations
@@ -727,11 +729,13 @@ struct dma_device {
dma_cap_mask_t  cap_mask;
unsigned short max_xor;
unsigned short max_pq;
+   unsigned short max_pqcoef;
enum dmaengine_alignment copy_align;
enum dmaengine_alignment xor_align;
enum dmaengine_alignment pq_align;
enum dmaengine_alignment fill_align;
#define DMA_HAS_PQ_CONTINUE (1 << 15)
+   #define DMA_HAS_FEWER_PQ_COEF (1 << 15)
 
int dev_id;
struct device *dev;
@@ -1122,6 +1126,21 @@ static inline int dma_maxpq(struct dma_device *dma, enum 
dma_ctrl_flags flags)
BUG();
 }
 
+static inline void dma_set_maxpqcoef(struct dma_device *dma,
+unsigned short max_pqcoef)
+{
+   if (max_pqcoef < RAID6_PQ_MAX_COEF) {
+   dma->max_pqcoef = max_pqcoef;
+   dma->max_pqcoef |= DMA_HAS_FEWER

[PATCH 6/6] dt-bindings: Add DT bindings document for Broadcom SBA RAID driver

2017-02-01 Thread Anup Patel
This patch adds the DT bindings document for newly added Broadcom
SBA RAID driver.

Signed-off-by: Anup Patel 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
---
 .../devicetree/bindings/dma/brcm,iproc-sba.txt | 29 ++
 1 file changed, 29 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt

diff --git a/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt 
b/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
new file mode 100644
index 000..092913a
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt
@@ -0,0 +1,29 @@
+* Broadcom SBA RAID engine
+
+Required properties:
+- compatible: Should be one of the following
+ "brcm,iproc-sba"
+ "brcm,iproc-sba-v2"
+  The "brcm,iproc-sba" has support for only 6 PQ coefficients
+  The "brcm,iproc-sba-v2" has support for only 30 PQ coefficients
+- mboxes: List of phandle and mailbox channel specifiers
+
+Example:
+
+raid_mbox: mbox@6740 {
+   ...
+   #mbox-cells = <3>;
+   ...
+};
+
+raid0 {
+   compatible = "brcm,iproc-sba-v2";
+   mboxes = <&raid_mbox 0 0x1 0x>,
+<&raid_mbox 1 0x1 0x>,
+<&raid_mbox 2 0x1 0x>,
+<&raid_mbox 3 0x1 0x>,
+<&raid_mbox 4 0x1 0x>,
+<&raid_mbox 5 0x1 0x>,
+<&raid_mbox 6 0x1 0x>,
+<&raid_mbox 7 0x1 0x>;
+};
-- 
2.7.4



[PATCH 2/6] lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position

2017-02-01 Thread Anup Patel
The raid6_gfexp table represents {2}^n values for 0 <= n < 256. The
Linux async_tx framework pass values from raid6_gfexp as coefficients
for each source to prep_dma_pq() callback of DMA channel with PQ
capability. This creates problem for RAID6 offload engines (such as
Broadcom SBA) which take disk position (i.e. log of {2}) instead of
multiplicative cofficients from raid6_gfexp table.

This patch adds raid6_gflog table having log-of-2 value for any given
x such that 0 <= x < 256. For any given disk coefficient x, the
corresponding disk position is given by raid6_gflog[x]. The RAID6
offload engine driver can use this newly added raid6_gflog table to
get disk position from multiplicative coefficient.

Signed-off-by: Anup Patel 
Reviewed-by: Scott Branden 
Reviewed-by: Ray Jui 
---
 include/linux/raid/pq.h |  1 +
 lib/raid6/mktables.c| 20 
 2 files changed, 21 insertions(+)

diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
index 4d57bba..30f9453 100644
--- a/include/linux/raid/pq.h
+++ b/include/linux/raid/pq.h
@@ -142,6 +142,7 @@ int raid6_select_algo(void);
 extern const u8 raid6_gfmul[256][256] __attribute__((aligned(256)));
 extern const u8 raid6_vgfmul[256][32] __attribute__((aligned(256)));
 extern const u8 raid6_gfexp[256]  __attribute__((aligned(256)));
+extern const u8 raid6_gflog[256]  __attribute__((aligned(256)));
 extern const u8 raid6_gfinv[256]  __attribute__((aligned(256)));
 extern const u8 raid6_gfexi[256]  __attribute__((aligned(256)));
 
diff --git a/lib/raid6/mktables.c b/lib/raid6/mktables.c
index 39787db..e824d08 100644
--- a/lib/raid6/mktables.c
+++ b/lib/raid6/mktables.c
@@ -125,6 +125,26 @@ int main(int argc, char *argv[])
printf("EXPORT_SYMBOL(raid6_gfexp);\n");
printf("#endif\n");
 
+   /* Compute log-of-2 table */
+   printf("\nconst u8 __attribute__((aligned(256)))\n"
+  "raid6_gflog[256] =\n" "{\n");
+   for (i = 0; i < 256; i += 8) {
+   printf("\t");
+   for (j = 0; j < 8; j++) {
+   v = 255;
+   for (k = 0; k < 256; k++)
+   if (exptbl[k] == (i + j)) {
+   v = k;
+   break;
+   }
+   printf("0x%02x,%c", v, (j == 7) ? '\n' : ' ');
+   }
+   }
+   printf("};\n");
+   printf("#ifdef __KERNEL__\n");
+   printf("EXPORT_SYMBOL(raid6_gflog);\n");
+   printf("#endif\n");
+
/* Compute inverse table x^-1 == x^254 */
printf("\nconst u8 __attribute__((aligned(256)))\n"
   "raid6_gfinv[256] =\n" "{\n");
-- 
2.7.4



[PATCH 1/6] mailbox: Add new API mbox_channel_device() for clients

2017-02-01 Thread Anup Patel
The remote processor can have DMAENGINE capabilities and client
can pass data to be processed via main memory. In such cases,
the client will require DMAble memory for remote processor.

This patch adds new API mbox_channel_device() which can be
used by clients to get struct device pointer of underlying
mailbox controller. This struct device pointer of mailbox
controller can be used by clients to allocate DMAble memory
for remote processor.

Signed-off-by: Anup Patel 
Reviewed-by: Scott Branden 
Reviewed-by: Ray Jui 
---
 drivers/mailbox/mailbox.c  | 21 +
 include/linux/mailbox_client.h |  1 +
 2 files changed, 22 insertions(+)

diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c
index 4671f8a..d4380fc 100644
--- a/drivers/mailbox/mailbox.c
+++ b/drivers/mailbox/mailbox.c
@@ -281,6 +281,27 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg)
 EXPORT_SYMBOL_GPL(mbox_send_message);
 
 /**
+ * mbox_channel_device - Get device pointer of a mailbox channel.
+ * @chan: Mailbox channel assigned to this client.
+ *
+ * The remote processor can have DMAENGINE capabilities and client
+ * can pass data to be processed via main memory. In such cases,
+ * the client will require struct device pointer of the mailbox
+ * channel to map/unmap/allocate/free DMAble memory.
+ *
+ * Return: Pointer to the struct device of mailbox channel.
+ *ERR_PTR on failure.
+ */
+struct device *mbox_channel_device(struct mbox_chan *chan)
+{
+   if (!chan || !chan->cl)
+   return ERR_PTR(-EINVAL);
+
+   return chan->mbox->dev;
+}
+EXPORT_SYMBOL_GPL(mbox_channel_device);
+
+/**
  * mbox_request_channel - Request a mailbox channel.
  * @cl: Identity of the client requesting the channel.
  * @index: Index of mailbox specifier in 'mboxes' property.
diff --git a/include/linux/mailbox_client.h b/include/linux/mailbox_client.h
index 4434871..3daffad 100644
--- a/include/linux/mailbox_client.h
+++ b/include/linux/mailbox_client.h
@@ -40,6 +40,7 @@ struct mbox_client {
void (*tx_done)(struct mbox_client *cl, void *mssg, int r);
 };
 
+struct device *mbox_channel_device(struct mbox_chan *chan);
 struct mbox_chan *mbox_request_channel_byname(struct mbox_client *cl,
  const char *name);
 struct mbox_chan *mbox_request_channel(struct mbox_client *cl, int index);
-- 
2.7.4



Re: [PATCH v6 1/5] lib: Update LZ4 compressor module

2017-02-01 Thread Sven Schmidt
On Tue, Jan 31, 2017 at 03:27:44PM -0700, Jonathan Corbet wrote:
> On Fri, 27 Jan 2017 23:02:00 +0100
> Sven Schmidt <4ssch...@informatik.uni-hamburg.de> wrote:
> 
> I have one quick question...
> 
> >  /*
> > + * LZ4_compress_default()
> > + * Compresses 'sourceSize' bytes from buffer 'source'
> > + * into already allocated 'dest' buffer of size 'maxOutputSize'.
> > + * Compression is guaranteed to succeed if
> > + * 'maxOutputSize' >= LZ4_compressBound(inputSize).
> > + * It also runs faster, so it's a recommended setting.
> > + * If the function cannot compress 'source'
> > + * into a more limited 'dest' budget,
> > + * compression stops *immediately*,
> > + * and the function result is zero.
> > + * As a consequence, 'dest' content is not valid.
> > + *
> > + * source   : source address of the original data
> > + * dest : output buffer address
> > + * of the compressed data
> > + * inputSize: Max supported value is
> > + * LZ4_MAX_INPUT_SIZE
> > + * maxOutputSize: full or partial size of buffer 'dest'
> > + * (which must be already allocated)
> > + * workmem  : address of the working memory.
> > + * This requires 'workmem' of size LZ4_MEM_COMPRESS.
> > + * return   : the number of bytes written into buffer 'dest'
> > + *   (necessarily <= maxOutputSize) or 0 if compression fails
> > + */
> > +int LZ4_compress_default(const char *source, char *dest, int inputSize,
> > +   int maxOutputSize, void *wrkmem);
> 
> Is there any chance you could format these as kerneldoc comments?  You're
> not too far from it now, and that would allow the LZ4 interface to be
> pulled into the documentation.
> 
> Thanks,
> 
> jon

Hi Jon,

of course, that makes sense. I already checked the documentation and you're 
right, I'm not that far from it.
Will do the necessary changes.

Thanks,

Sven


Re: [PATCH -stable] crypto: ccm - deal with CTR ciphers that honour iv_out

2017-02-01 Thread Ard Biesheuvel
On 28 January 2017 at 20:40, Ard Biesheuvel  wrote:
> The skcipher API mandates that chaining modes involving IVs calculate
> an outgoing IV value that is suitable for encrypting additional blocks
> of data. This means the CCM driver cannot assume that req->iv points to
> the original IV value when it calls crypto_ccm_auth. So pass a copy to
> the skcipher instead.
>
> Signed-off-by: Ard Biesheuvel 
> ---
>  crypto/ccm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/crypto/ccm.c b/crypto/ccm.c
> index b388ac6edfb9..8976ef9bc2e7 100644
> --- a/crypto/ccm.c
> +++ b/crypto/ccm.c
> @@ -362,7 +362,7 @@ static int crypto_ccm_decrypt(struct aead_request *req)
> unsigned int cryptlen = req->cryptlen;
> u8 *authtag = pctx->auth_tag;
> u8 *odata = pctx->odata;
> -   u8 *iv = req->iv;
> +   u8 iv[16];
> int err;
>
> cryptlen -= authsize;
> @@ -378,6 +378,7 @@ static int crypto_ccm_decrypt(struct aead_request *req)
> if (req->src != req->dst)
> dst = pctx->dst;
>
> +   memcpy(iv, req->iv, sizeof(iv));
> skcipher_request_set_tfm(skreq, ctx->ctr);
> skcipher_request_set_callback(skreq, pctx->flags,
>   crypto_ccm_decrypt_done, req);
> --
> 2.7.4
>

Herbert,

Could you please forward this patch to Linus as well? I noticed that the patch

crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes

is now in mainline, which means CCM is now broken on arm64, given that
the iv_out requirement for CTR apparently isn't honored by *any*
implementation, and CCM wrongly assumes that req->iv retains its value
across the call into the CTR skcipher

Thanks,
Ard.


Re: crypto: NULL deref in sha512_mb_mgr_get_comp_job_avx2

2017-02-01 Thread Tim Chen
On Tue, Jan 31, 2017 at 02:16:31PM +0100, Dmitry Vyukov wrote:
> Hello,
> 
> I am getting the following reports with low frequency while running
> syzkaller fuzzer. Unfortunately they are not reproducible and happen
> in a background thread, so it is difficult to extract any context on
> my side. I see only few such crashes per week, so most likely it is
> some hard to trigger data race. The following reports are from mmotm
> tree, commits 00e20cfc2bf04a0cbe1f5405f61c8426f43eee84 and
> fff7e71eac7788904753136f09bcad7471f7799e. Any ideas as to how this can
> happen?
> 
> BUG: unable to handle kernel NULL pointer dereference at 0060
> IP: [] sha512_mb_mgr_get_comp_job_avx2+0x6e/0xee
> arch/x86/crypto/sha512-mb/sha512_mb_mgr_flush_avx2.S:251
> PGD 1d2395067 [  220.874864] PUD 1d2860067
> Oops: 0002 [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 516 Comm: kworker/0:1 Not tainted 4.9.0 #4
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> Workqueue: crypto mcryptd_queue_worker
> task: 8801d9f346c0 task.stack: 8801d9f08000
> RIP: 0010:[]  []
> sha512_mb_mgr_get_comp_job_avx2+0x6e/0xee
> arch/x86/crypto/sha512-mb/sha512_mb_mgr_flush_avx2.S:251
> RSP: 0018:8801d9f0eef8  EFLAGS: 00010202
> RAX:  RBX: 8801d7db1190 RCX: 0006
> RDX: 0001 RSI: 8801d9f34ee8 RDI: 8801d7db1040
> RBP: 8801d9f0f258 R08: 0001 R09: 0001
> R10: 0002 R11: 0003 R12: 8801d9f0f230
> R13: 8801c8bbc4e0 R14: 8801c8bbc530 R15: 8801d9f0ef70
> FS:  () GS:8801dc00() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0060 CR3: 0001cc15a000 CR4: 001406f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Stack:
>  8801d7db1040 813fa207 dc00 e8c0f238
>  0002 11003b3e1dea e8c0f218 8801d9f0f190
>  0282 e8c0f140 e8c0f220 41b58ab3
> Call Trace:
>  [] sha512_mb_update+0x2f7/0x4e0
> arch/x86/crypto/sha512-mb/sha512_mb.c:588
>  [] crypto_ahash_update include/crypto/hash.h:512 [inline]
>  [] ahash_mcryptd_update crypto/mcryptd.c:627 [inline]
>  [] mcryptd_hash_update+0xcd/0x1c0 crypto/mcryptd.c:373
>  [] mcryptd_queue_worker+0xff/0x6a0 crypto/mcryptd.c:181
>  [] process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2096
>  [] worker_thread+0x223/0x1990 kernel/workqueue.c:2230
>  [] kthread+0x323/0x3e0 kernel/kthread.c:209
>  [] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
> Code: 49 0f 42 d3 48 f7 c2 f0 ff ff ff 0f 85 9a 00 00 00 48 83 e2 0f
> 48 6b da 08 48 8d 9c 1f 48 01 00 00 48 8b 03 48 c7 03 00 00 00 00 
> 40 60 02 00 00 00 48 8b 9f 40 01 00 00 48 c1 e3 08 48 09 d3
> RIP  [] sha512_mb_mgr_get_comp_job_avx2+0x6e/0xee
> arch/x86/crypto/sha512-mb/sha512_mb_mgr_flush_avx2.S:251
>  RSP 
> CR2: 0060
> ---[ end trace 139fd4cda5dfe2c4 ]---
> 

Dmitry,

One theory that Mehga and I have is that perhaps the flusher
and regular computaion updates are stepping on each other. 
Can you try this patch and see if it helps?

Tim

--->8---

From: Tim Chen 
Subject: [PATCH] crypto/sha512-mb: Protect sha512 mb ctx mgr access
To: Herbert Xu , Dmitry Vyukov 
Cc: Tim Chen , David Miller , 
linux-crypto@vger.kernel.org, LKML , 
megha@linux.intel.com, fenghua...@intel.com

The flusher and regular multi-buffer computation via mcryptd may race with 
another.
Add here a lock and turn off interrupt to to access multi-buffer
computation state cstate->mgr before a round of computation. This should
prevent the flusher code jumping in.

Signed-off-by: Tim Chen 
---
 arch/x86/crypto/sha512-mb/sha512_mb.c | 64 +++
 1 file changed, 42 insertions(+), 22 deletions(-)

diff --git a/arch/x86/crypto/sha512-mb/sha512_mb.c 
b/arch/x86/crypto/sha512-mb/sha512_mb.c
index d210174..f3c1c21 100644
--- a/arch/x86/crypto/sha512-mb/sha512_mb.c
+++ b/arch/x86/crypto/sha512-mb/sha512_mb.c
@@ -221,7 +221,7 @@ static struct sha512_hash_ctx *sha512_ctx_mgr_resubmit
 }
 
 static struct sha512_hash_ctx
-   *sha512_ctx_mgr_get_comp_ctx(struct sha512_ctx_mgr *mgr)
+   *sha512_ctx_mgr_get_comp_ctx(struct mcryptd_alg_cstate *cstate)
 {
/*
 * If get_comp_job returns NULL, there are no jobs complete.
@@ -233,11 +233,17 @@ static struct sha512_hash_ctx
 * Otherwise, all jobs currently being managed by the hash_ctx_mgr
 * still need processing.
 */
+   struct sha512_ctx_mgr *mgr;
struct sha512_hash_ctx *ctx;
+   unsigned long flags;
 
+   mgr = cstate->mgr;
+   spin_lock_irqsave(&cstate->work_lock, flags);
ctx = (struct sha512_hash_ctx *)
sha512

Re: [PATCH 1/1] crypto:algif_aead - Fix kernel panic on list_del

2017-02-01 Thread Stephan Müller
Am Mittwoch, 1. Februar 2017, 21:10:28 CET schrieb Harsh Jain:

Hi Harsh,

> Kernel panics when userspace program try to access AEAD interface.
> Remove node from Linked List before freeing its memory.

Very good catch. Thank you.

Reviewed-by: Stephan Müller 

(PS: Herbert, in case you want to apply my patches regarding fixing the memory 
management for algif_aead and algif_skcipher, please note that this error is 
in the new function aead_free_rsgl/skcipher_free_sgl. Thus, if you think that 
my approach is good after all, I will need to re-send the patch.)

Ciao
Stephan


[PATCH 1/1] crypto:algif_aead - Fix kernel panic on list_del

2017-02-01 Thread Harsh Jain
Kernel panics when userspace program try to access AEAD interface.
Remove node from Linked List before freeing its memory.

Signed-off-by: Harsh Jain 
---
 crypto/algif_aead.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index f849311..533265f 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -661,9 +661,9 @@ static int aead_recvmsg_sync(struct socket *sock, struct 
msghdr *msg, int flags)
 unlock:
list_for_each_entry_safe(rsgl, tmp, &ctx->list, list) {
af_alg_free_sg(&rsgl->sgl);
+   list_del(&rsgl->list);
if (rsgl != &ctx->first_rsgl)
sock_kfree_s(sk, rsgl, sizeof(*rsgl));
-   list_del(&rsgl->list);
}
INIT_LIST_HEAD(&ctx->list);
aead_wmem_wakeup(sk);
-- 
1.8.2.3



[PATCH] crypto: arm64/crc32 - merge CRC32 and PMULL instruction based drivers

2017-02-01 Thread Ard Biesheuvel
The PMULL based CRC32 implementation already contains code based on the
separate, optional CRC32 instructions to fallback to when operating on
small quantities of data. We can expose these routines directly on systems
that lack the 64x64 PMULL instructions but do implement the CRC32 ones,
which makes the driver that is based solely on those CRC32 instructions
redundant. So remove it.

Note that this aligns arm64 with ARM, whose accelerated CRC32 driver
also combines the CRC32 extension based and the PMULL based versions.

Signed-off-by: Ard Biesheuvel 
---

This is a meaningful patch by itself imho, but also fixes the issue reported
by Matthias where their v4.8 based GCC does not understand the -mcpu=generic+crc
command line option, resulting in failed builds.

 arch/arm64/configs/defconfig  |   1 -
 arch/arm64/crypto/Kconfig |   9 +-
 arch/arm64/crypto/Makefile|   4 -
 arch/arm64/crypto/crc32-arm64.c   | 290 
 arch/arm64/crypto/crc32-ce-glue.c |  49 +++-
 5 files changed, 41 insertions(+), 312 deletions(-)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 33b744d54739..6fc6f5a2a6e5 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -516,4 +516,3 @@ CONFIG_CRYPTO_GHASH_ARM64_CE=y
 CONFIG_CRYPTO_AES_ARM64_CE_CCM=y
 CONFIG_CRYPTO_AES_ARM64_CE_BLK=y
 # CONFIG_CRYPTO_AES_ARM64_NEON_BLK is not set
-CONFIG_CRYPTO_CRC32_ARM64=y
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index bed7feddfeed..d92293747d63 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -37,8 +37,8 @@ config CRYPTO_CRCT10DIF_ARM64_CE
select CRYPTO_HASH
 
 config CRYPTO_CRC32_ARM64_CE
-   tristate "CRC32 and CRC32C digest algorithms using PMULL instructions"
-   depends on KERNEL_MODE_NEON && CRC32
+   tristate "CRC32 and CRC32C digest algorithms using ARMv8 extensions"
+   depends on CRC32
select CRYPTO_HASH
 
 config CRYPTO_AES_ARM64
@@ -71,11 +71,6 @@ config CRYPTO_AES_ARM64_NEON_BLK
select CRYPTO_AES
select CRYPTO_SIMD
 
-config CRYPTO_CRC32_ARM64
-   tristate "CRC32 and CRC32C using optional ARMv8 instructions"
-   depends on ARM64
-   select CRYPTO_HASH
-
 config CRYPTO_CHACHA20_NEON
tristate "NEON accelerated ChaCha20 symmetric cipher"
depends on KERNEL_MODE_NEON
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index d1ae1b9cbe70..b5edc5918c28 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -55,10 +55,6 @@ AFLAGS_aes-neon.o:= -DINTERLEAVE=4
 
 CFLAGS_aes-glue-ce.o   := -DUSE_V8_CRYPTO_EXTENSIONS
 
-obj-$(CONFIG_CRYPTO_CRC32_ARM64) += crc32-arm64.o
-
-CFLAGS_crc32-arm64.o   := -mcpu=generic+crc
-
 $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE
$(call if_changed_rule,cc_o_c)
 
diff --git a/arch/arm64/crypto/crc32-arm64.c b/arch/arm64/crypto/crc32-arm64.c
deleted file mode 100644
index 6a37c3c6b11d..
--- a/arch/arm64/crypto/crc32-arm64.c
+++ /dev/null
@@ -1,290 +0,0 @@
-/*
- * crc32-arm64.c - CRC32 and CRC32C using optional ARMv8 instructions
- *
- * Module based on crypto/crc32c_generic.c
- *
- * CRC32 loop taken from Ed Nevill's Hadoop CRC patch
- * 
http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3C1403687030.3355.19.camel%40localhost.localdomain%3E
- *
- * Using inline assembly instead of intrinsics in order to be backwards
- * compatible with older compilers.
- *
- * Copyright (C) 2014 Linaro Ltd 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-
-MODULE_AUTHOR("Yazen Ghannam ");
-MODULE_DESCRIPTION("CRC32 and CRC32C using optional ARMv8 instructions");
-MODULE_LICENSE("GPL v2");
-
-#define CRC32X(crc, value) __asm__("crc32x %w[c], %w[c], 
%x[v]":[c]"+r"(crc):[v]"r"(value))
-#define CRC32W(crc, value) __asm__("crc32w %w[c], %w[c], 
%w[v]":[c]"+r"(crc):[v]"r"(value))
-#define CRC32H(crc, value) __asm__("crc32h %w[c], %w[c], 
%w[v]":[c]"+r"(crc):[v]"r"(value))
-#define CRC32B(crc, value) __asm__("crc32b %w[c], %w[c], 
%w[v]":[c]"+r"(crc):[v]"r"(value))
-#define CRC32CX(crc, value) __asm__("crc32cx %w[c], %w[c], 
%x[v]":[c]"+r"(crc):[v]"r"(value))
-#define CRC32CW(crc, value) __asm__("crc32cw %w[c], %w[c], 
%w[v]":[c]"+r"(crc):[v]"r"(value))
-#define CRC32CH(crc, value) __asm__("crc32ch %w[c], %w[c], 
%w[v]":[c]"+r"(crc):[v]"r"(value))
-#define CRC32CB(crc, value) __asm__("crc32cb %w[c], %w[c], 
%w[v]":[c]"+r"(crc):[v]"r"(value))
-
-static u32 crc32_arm64_le_hw(u32 crc, const u8 *p, unsigned int len)
-{
-   s64 length = len;
-
-   while ((length -= sizeof(u64)) >= 0) {
-   CRC32X(crc, get_unaligned_le64(p));
-   p += sizeof(u64);
-   }
-
-   /* The following is more

Re: [PATCH] crypto: arm64/crc32 - detect crc32 support in assembler

2017-02-01 Thread Ard Biesheuvel
On 1 February 2017 at 13:58, Alexander Graf  wrote:
> On 02/01/2017 10:43 AM, Ard Biesheuvel wrote:
>>
>> On 1 February 2017 at 09:07, Ard Biesheuvel 
>> wrote:
>>>
>>> On 27 January 2017 at 10:52, Will Deacon  wrote:

 On Fri, Jan 27, 2017 at 10:43:16AM +, Ard Biesheuvel wrote:
>
> On 27 January 2017 at 10:40, Matthias Brugger 
> wrote:
>>
>> Older compilers may not be able to detect the crc32 extended cpu type.
>
> What do you mean 'detect'? Could you describe the failure in more
> detail
> please?
>
>> Anyway only inline assembler code is used, which gets passed to the
>> assembler. This patch moves the crc detection to the assembler.
>>
>> Suggested-by: Alexander Graf 
>> Signed-off-by: Matthias Brugger 
>> ---
>>   arch/arm64/crypto/Makefile  | 2 --
>>   arch/arm64/crypto/crc32-arm64.c | 3 +++
>>   2 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
>> index aad7b744..0d779dac75cd 100644
>> --- a/arch/arm64/crypto/Makefile
>> +++ b/arch/arm64/crypto/Makefile
>> @@ -48,8 +48,6 @@ CFLAGS_aes-glue-ce.o  := -DUSE_V8_CRYPTO_EXTENSIONS
>>
>>   obj-$(CONFIG_CRYPTO_CRC32_ARM64) += crc32-arm64.o
>>
>> -CFLAGS_crc32-arm64.o   := -mcpu=generic+crc
>> -
>>   $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE
>>  $(call if_changed_rule,cc_o_c)
>>
>> diff --git a/arch/arm64/crypto/crc32-arm64.c
>> b/arch/arm64/crypto/crc32-arm64.c
>> index 6a37c3c6b11d..10f5dd075323 100644
>> --- a/arch/arm64/crypto/crc32-arm64.c
>> +++ b/arch/arm64/crypto/crc32-arm64.c
>> @@ -29,6 +29,9 @@ MODULE_AUTHOR("Yazen Ghannam
>> ");
>>   MODULE_DESCRIPTION("CRC32 and CRC32C using optional ARMv8
>> instructions");
>>   MODULE_LICENSE("GPL v2");
>>
>> +/* Request crc extension capabilities from the assembler */
>> +asm(".arch_extension crc");
>> +
>
> Will should confirm, but I think this is a recent feature in GAS for
> AArch64, so this may break older toolchains as well.

 Yes, the .arch_extension directive isn't universally supported by
 AArch64
 gas so we can't rely on it unconditionally. The best bet is to check for
 the support and, if it's not present, then disable whatever feature
 relies
 on it. See the lseinstr variable in Makefile.

>>> Actually, this driver has become somewhat redundant now that we have
>>> an alternative that combines an implementation based on 64x64
>>> polynomial multiplication with an implementation based on the CRC32
>>> instructions.
>>>
>>> I will propose a patch that makes the latter usable when only the
>>> CRC32 instructions are supported.
>>
>> ... although you still haven't explained what the actual problem is
>> that you are trying to solve.
>
>
>
>
> The problem is that in Leap 42.2 (as well as SLES12 SP2) we have a 4.8 based
> system compiler, but recent binutils. That means that while our assembler is
> happy to work with crc instructions, passing the -mcpu parameter to gcc
> fails because gcc isn't aware of the flavor yet.
>
> That in turn means that we want to tell the assembler about feature
> requirements rather than the compiler. Fortunately the ".arch_extension"
> primitive allows you to do so.
>
> As far as checking for availability of it goes, I agree that it'd be nice to
> check if ".arch_extension" is supported. But so would be to check if
> -mcpu=generic+crc is supported. IMHO this patch doesn't make the current
> non-checking situation any worse. But of course checking is always nicer
> than not checking :)
>

Well, I am pretty sure binutils v2.25 and older are in wider use than
GCC v4.8 and older, which means the runtime test would disable the CRC
module altogether for a lot of users.

However, as I mentioned, we can also remove this driver once the PMULL
based one is updated.


Re: 4.10 aesni-intel no longer having lrw/ablk_helper dependencies?

2017-02-01 Thread Herbert Xu
On Mon, Jan 30, 2017 at 05:42:35PM +0100, Arkadiusz Miśkiewicz wrote:
> On Monday 30 of January 2017, Eric Biggers wrote:
> 
> > First, aesni-intel no longer includes an LRW implementation itself. 
> > Instead, the generic LRW module must be selected.  Internally it will use
> > the aesni-intel accelerated ECB algorithm if available.  So you need to
> > make sure that the "lrw" module is included in the initrd if it's not
> > already.
> > 
> > But I think the bigger problem is that aesni-intel couldn't be insmod'ed at
> > all, which shouldn't happen.  The problem might actually be related to the
> > "pcbc" algorithm.  Upon initialization, aesni-intel now tries to wrap
> > "pcbc(__aes-aesni)" with the "fpu" template.  This will fail if the "pcbc"
> > module hasn't been inserted.  I think this wasn't a problem before because
> > the old code using ablk_helper instead of crypto_simd didn't try to find
> > "pcbc" until someone asked for it, while now aesni-intel will try to find
> > it immediately.  And since aesni-intel has no direct dependency on pcbc,
> > I'm guessing what happened is that pcbc didn't end up in your initrd even
> > though it may have been built.  (You can verify this by adding pcbc to
> > your initrd and seeing if that works around the problem.)
> 
> (hardcoded) loading of pcbc fixed my problem, intel-aesni loaded fine and 
> luks 
> partition unlocked correctly.

Thanks for the report.  Does this patch fix the problem?

---8<---
Subject: crypto: aesni - Fix failure when pcbc module is absent

When aesni is built as a module together with pcbc, the pcbc module
must be present for aesni to load.  However, the pcbc module may not
be present for reasons such as its absence on initramfs.  This patch
allows the aesni to function even if the pcbc module is enabled but
not present.

Reported-by: Arkadiusz Miśkiewicz 
Signed-off-by: Herbert Xu 

diff --git a/arch/x86/crypto/aesni-intel_glue.c 
b/arch/x86/crypto/aesni-intel_glue.c
index 7ad0ed7..93de8ea 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -1089,9 +1089,9 @@ static void aesni_free_simds(void)
aesni_simd_skciphers[i]; i++)
simd_skcipher_free(aesni_simd_skciphers[i]);
 
-   for (i = 0; i < ARRAY_SIZE(aesni_simd_skciphers2) &&
-   aesni_simd_skciphers2[i].simd; i++)
-   simd_skcipher_free(aesni_simd_skciphers2[i].simd);
+   for (i = 0; i < ARRAY_SIZE(aesni_simd_skciphers2); i++)
+   if (aesni_simd_skciphers2[i].simd)
+   simd_skcipher_free(aesni_simd_skciphers2[i].simd);
 }
 
 static int __init aesni_init(void)
@@ -1172,7 +1172,7 @@ static int __init aesni_init(void)
simd = simd_skcipher_create_compat(algname, drvname, basename);
err = PTR_ERR(simd);
if (IS_ERR(simd))
-   goto unregister_simds;
+   continue;
 
aesni_simd_skciphers2[i].simd = simd;
}
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg

2017-02-01 Thread Herbert Xu
On Mon, Jan 23, 2017 at 05:06:34PM +, Benedetto, Salvatore wrote:
> 
> I forgot to add CC stable to it.
> 
> This error was introduced in 4.8 and so I think it should go into stable 4.8 
> and 4.9.
> 
> Should I resend or can you add that?

I had added it when applying the patch.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: tcrypt - Add mode to test specified algs

2017-02-01 Thread Herbert Xu
On Mon, Jan 23, 2017 at 04:13:04PM +0100, Rabin Vincent wrote:
> 
> That's what I thought so too, but that doesn't seem to be the case.  The
> mode=0 handling is this:
> 
>   switch (m) {
>   case 0:
>   if (alg) {
>   if (!crypto_has_alg(alg, type,
>   mask ?: CRYPTO_ALG_TYPE_MASK))
>   ret = -ENOENT;
>   break;
>   }
> 
>   for (i = 1; i < 200; i++)
>   ret += do_test(NULL, 0, 0, i);
>   break;
> 
> So, if alg= is specified, after first checking if the specified alg is
> present, it just goes ahead and runs all the tests.  I'm not sure what
> mode=0 alg=foo is meant to be used for.

You need to set the type and mask for it to work.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: arm64/crc32 - detect crc32 support in assembler

2017-02-01 Thread Alexander Graf

On 02/01/2017 10:43 AM, Ard Biesheuvel wrote:

On 1 February 2017 at 09:07, Ard Biesheuvel  wrote:

On 27 January 2017 at 10:52, Will Deacon  wrote:

On Fri, Jan 27, 2017 at 10:43:16AM +, Ard Biesheuvel wrote:

On 27 January 2017 at 10:40, Matthias Brugger  wrote:

Older compilers may not be able to detect the crc32 extended cpu type.

What do you mean 'detect'? Could you describe the failure in more detail
please?


Anyway only inline assembler code is used, which gets passed to the
assembler. This patch moves the crc detection to the assembler.

Suggested-by: Alexander Graf 
Signed-off-by: Matthias Brugger 
---
  arch/arm64/crypto/Makefile  | 2 --
  arch/arm64/crypto/crc32-arm64.c | 3 +++
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index aad7b744..0d779dac75cd 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -48,8 +48,6 @@ CFLAGS_aes-glue-ce.o  := -DUSE_V8_CRYPTO_EXTENSIONS

  obj-$(CONFIG_CRYPTO_CRC32_ARM64) += crc32-arm64.o

-CFLAGS_crc32-arm64.o   := -mcpu=generic+crc
-
  $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE
 $(call if_changed_rule,cc_o_c)

diff --git a/arch/arm64/crypto/crc32-arm64.c b/arch/arm64/crypto/crc32-arm64.c
index 6a37c3c6b11d..10f5dd075323 100644
--- a/arch/arm64/crypto/crc32-arm64.c
+++ b/arch/arm64/crypto/crc32-arm64.c
@@ -29,6 +29,9 @@ MODULE_AUTHOR("Yazen Ghannam ");
  MODULE_DESCRIPTION("CRC32 and CRC32C using optional ARMv8 instructions");
  MODULE_LICENSE("GPL v2");

+/* Request crc extension capabilities from the assembler */
+asm(".arch_extension crc");
+

Will should confirm, but I think this is a recent feature in GAS for
AArch64, so this may break older toolchains as well.

Yes, the .arch_extension directive isn't universally supported by AArch64
gas so we can't rely on it unconditionally. The best bet is to check for
the support and, if it's not present, then disable whatever feature relies
on it. See the lseinstr variable in Makefile.


Actually, this driver has become somewhat redundant now that we have
an alternative that combines an implementation based on 64x64
polynomial multiplication with an implementation based on the CRC32
instructions.

I will propose a patch that makes the latter usable when only the
CRC32 instructions are supported.

... although you still haven't explained what the actual problem is
that you are trying to solve.




The problem is that in Leap 42.2 (as well as SLES12 SP2) we have a 4.8 
based system compiler, but recent binutils. That means that while our 
assembler is happy to work with crc instructions, passing the -mcpu 
parameter to gcc fails because gcc isn't aware of the flavor yet.


That in turn means that we want to tell the assembler about feature 
requirements rather than the compiler. Fortunately the ".arch_extension" 
primitive allows you to do so.


As far as checking for availability of it goes, I agree that it'd be 
nice to check if ".arch_extension" is supported. But so would be to 
check if -mcpu=generic+crc is supported. IMHO this patch doesn't make 
the current non-checking situation any worse. But of course checking is 
always nicer than not checking :)



Alex



Re: [PATCH] crypto: arm64/crc32 - detect crc32 support in assembler

2017-02-01 Thread Ard Biesheuvel
On 1 February 2017 at 09:07, Ard Biesheuvel  wrote:
> On 27 January 2017 at 10:52, Will Deacon  wrote:
>> On Fri, Jan 27, 2017 at 10:43:16AM +, Ard Biesheuvel wrote:
>>> On 27 January 2017 at 10:40, Matthias Brugger  wrote:
>>> > Older compilers may not be able to detect the crc32 extended cpu type.
>>>
>>> What do you mean 'detect'? Could you describe the failure in more detail
>>> please?
>>>
>>> > Anyway only inline assembler code is used, which gets passed to the
>>> > assembler. This patch moves the crc detection to the assembler.
>>> >
>>> > Suggested-by: Alexander Graf 
>>> > Signed-off-by: Matthias Brugger 
>>> > ---
>>> >  arch/arm64/crypto/Makefile  | 2 --
>>> >  arch/arm64/crypto/crc32-arm64.c | 3 +++
>>> >  2 files changed, 3 insertions(+), 2 deletions(-)
>>> >
>>> > diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
>>> > index aad7b744..0d779dac75cd 100644
>>> > --- a/arch/arm64/crypto/Makefile
>>> > +++ b/arch/arm64/crypto/Makefile
>>> > @@ -48,8 +48,6 @@ CFLAGS_aes-glue-ce.o  := -DUSE_V8_CRYPTO_EXTENSIONS
>>> >
>>> >  obj-$(CONFIG_CRYPTO_CRC32_ARM64) += crc32-arm64.o
>>> >
>>> > -CFLAGS_crc32-arm64.o   := -mcpu=generic+crc
>>> > -
>>> >  $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE
>>> > $(call if_changed_rule,cc_o_c)
>>> >
>>> > diff --git a/arch/arm64/crypto/crc32-arm64.c 
>>> > b/arch/arm64/crypto/crc32-arm64.c
>>> > index 6a37c3c6b11d..10f5dd075323 100644
>>> > --- a/arch/arm64/crypto/crc32-arm64.c
>>> > +++ b/arch/arm64/crypto/crc32-arm64.c
>>> > @@ -29,6 +29,9 @@ MODULE_AUTHOR("Yazen Ghannam 
>>> > ");
>>> >  MODULE_DESCRIPTION("CRC32 and CRC32C using optional ARMv8 instructions");
>>> >  MODULE_LICENSE("GPL v2");
>>> >
>>> > +/* Request crc extension capabilities from the assembler */
>>> > +asm(".arch_extension crc");
>>> > +
>>>
>>> Will should confirm, but I think this is a recent feature in GAS for
>>> AArch64, so this may break older toolchains as well.
>>
>> Yes, the .arch_extension directive isn't universally supported by AArch64
>> gas so we can't rely on it unconditionally. The best bet is to check for
>> the support and, if it's not present, then disable whatever feature relies
>> on it. See the lseinstr variable in Makefile.
>>
>
> Actually, this driver has become somewhat redundant now that we have
> an alternative that combines an implementation based on 64x64
> polynomial multiplication with an implementation based on the CRC32
> instructions.
>
> I will propose a patch that makes the latter usable when only the
> CRC32 instructions are supported.

... although you still haven't explained what the actual problem is
that you are trying to solve.


Re: [PATCH] crypto: arm64/crc32 - detect crc32 support in assembler

2017-02-01 Thread Ard Biesheuvel
On 27 January 2017 at 10:52, Will Deacon  wrote:
> On Fri, Jan 27, 2017 at 10:43:16AM +, Ard Biesheuvel wrote:
>> On 27 January 2017 at 10:40, Matthias Brugger  wrote:
>> > Older compilers may not be able to detect the crc32 extended cpu type.
>>
>> What do you mean 'detect'? Could you describe the failure in more detail
>> please?
>>
>> > Anyway only inline assembler code is used, which gets passed to the
>> > assembler. This patch moves the crc detection to the assembler.
>> >
>> > Suggested-by: Alexander Graf 
>> > Signed-off-by: Matthias Brugger 
>> > ---
>> >  arch/arm64/crypto/Makefile  | 2 --
>> >  arch/arm64/crypto/crc32-arm64.c | 3 +++
>> >  2 files changed, 3 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
>> > index aad7b744..0d779dac75cd 100644
>> > --- a/arch/arm64/crypto/Makefile
>> > +++ b/arch/arm64/crypto/Makefile
>> > @@ -48,8 +48,6 @@ CFLAGS_aes-glue-ce.o  := -DUSE_V8_CRYPTO_EXTENSIONS
>> >
>> >  obj-$(CONFIG_CRYPTO_CRC32_ARM64) += crc32-arm64.o
>> >
>> > -CFLAGS_crc32-arm64.o   := -mcpu=generic+crc
>> > -
>> >  $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE
>> > $(call if_changed_rule,cc_o_c)
>> >
>> > diff --git a/arch/arm64/crypto/crc32-arm64.c 
>> > b/arch/arm64/crypto/crc32-arm64.c
>> > index 6a37c3c6b11d..10f5dd075323 100644
>> > --- a/arch/arm64/crypto/crc32-arm64.c
>> > +++ b/arch/arm64/crypto/crc32-arm64.c
>> > @@ -29,6 +29,9 @@ MODULE_AUTHOR("Yazen Ghannam 
>> > ");
>> >  MODULE_DESCRIPTION("CRC32 and CRC32C using optional ARMv8 instructions");
>> >  MODULE_LICENSE("GPL v2");
>> >
>> > +/* Request crc extension capabilities from the assembler */
>> > +asm(".arch_extension crc");
>> > +
>>
>> Will should confirm, but I think this is a recent feature in GAS for
>> AArch64, so this may break older toolchains as well.
>
> Yes, the .arch_extension directive isn't universally supported by AArch64
> gas so we can't rely on it unconditionally. The best bet is to check for
> the support and, if it's not present, then disable whatever feature relies
> on it. See the lseinstr variable in Makefile.
>

Actually, this driver has become somewhat redundant now that we have
an alternative that combines an implementation based on 64x64
polynomial multiplication with an implementation based on the CRC32
instructions.

I will propose a patch that makes the latter usable when only the
CRC32 instructions are supported.


Crypto Fixes for 4.10

2017-02-01 Thread Herbert Xu
Hi Linus:

This push fixes a bug in CBC/CTR on ARM64 that breaks chaining
as well as a bug in the core API that causes registration failures
when a driver unloads and then reloads an algorithm.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git linus


Ard Biesheuvel (1):
  crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes

Salvatore Benedetto (1):
  crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg

 arch/arm64/crypto/aes-modes.S |   88 -
 crypto/algapi.c   |1 +
 2 files changed, 43 insertions(+), 46 deletions(-)

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt