Re: [RFC PATCH v2 4/4] crypto: aes - add generic time invariant AES for CTR/CCM/GCM
On 2 February 2017 at 07:48, Ard Biesheuvel wrote: > On 2 February 2017 at 07:38, Eric Biggers wrote: >> Hi Ard, >> >> On Sat, Jan 28, 2017 at 11:33:33PM +, Ard Biesheuvel wrote: >>> >>> Note that this only implements AES encryption, which is all we need >>> for CTR and CBC-MAC. AES decryption can easily be implemented in a >>> similar way, but is significantly more costly. >> >> Is the expectation of decryption being more costly the only reason why >> "aes-ti" >> couldn't be a "cipher" algorithm, allowing it to automatically be used by the >> existing templates for CTR, CBC-MAC, CBC, ECB, XTS, CMAC, etc.? > > Yes. > >> It doesn't seem >> to do anything expensive on a per-block basis like loading SSE registers, so >> it >> seems it would fit better as a "cipher" algorithm if at all possible. Then >> there would be no need to implement all these modes yet again. >> > > True. > >> Also, what would be the feasibility of simply replacing aes-generic with the >> time-invariant implementation, rather than offering two implementations and >> requiring users to choose one, usually without the needed expertise? >> > > Well, it is a policy decision whether you want the best performance or > reduced correlation between timing and the input, so there is no way > to make everybody happy by replacing one with the other. But I can > certainly implement is as a cipher, and we can take the discussion > from there. > >>> + >>> +/* >>> + * Emit the sbox as __weak with external linkage to prevent the compiler >>> + * from doing constant folding on sbox references involving fixed indexes. >>> + */ >>> +__weak const u8 __cacheline_aligned __aesti_sbox[] = { >> >> Did you consider marking it 'volatile' instead? >> > > I did not, and I expect the result to be the same. I can replace it if > it matters. > >>> +static int aesti_set_key(struct aes_ti_ctx *ctx, const u8 *in_key, >>> + unsigned int key_len) >>> +{ >>> + struct crypto_aes_ctx rk; >>> + int err; >>> + >>> + err = crypto_aes_expand_key(&rk, in_key, key_len); >>> + if (err) >>> + return err; >> >> crypto_aes_expand_key() assumes that the key is u32-aligned; I don't think >> that's guaranteed here. >> > > No, I don't think so. AFAICT it expects the round keys to be u32 > aligned, which is guaranteed due to the fact that struct > crypto_aes_ctx encapsulates an array of u32 > I stand corrected: I misread le32_to_cpu() for get_unaligned_le32()
Re: [RFC PATCH] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic
On 2 February 2017 at 06:47, Eric Biggers wrote: > On Mon, Jan 30, 2017 at 02:11:29PM +, Ard Biesheuvel wrote: >> Instead of unconditionally forcing 4 byte alignment for all generic >> chaining modes that rely on crypto_xor() or crypto_inc() (which may >> result in unnecessary copying of data when the underlying hardware >> can perform unaligned accesses efficiently), make those functions >> deal with unaligned input explicitly, but only if the Kconfig symbol >> HAVE_EFFICIENT_UNALIGNED_ACCESS is set. This will allow us to drop >> the alignmasks from the CBC, CMAC, CTR, CTS, PCBC and SEQIV drivers. >> >> For crypto_inc(), this simply involves making the 4-byte stride >> conditional on HAVE_EFFICIENT_UNALIGNED_ACCESS being set, given that >> it typically operates on 16 byte buffers. >> >> For crypto_xor(), an algorithm is implemented that simply runs through >> the input using the largest strides possible if unaligned accesses are >> allowed. If they are not, an optimal sequence of memory accesses is >> emitted that takes the relative alignment of the input buffers into >> account, e.g., if the relative misalignment of dst and src is 4 bytes, >> the entire xor operation will be completed using 4 byte loads and stores >> (modulo unaligned bits at the start and end). Note that all expressions >> involving startalign and misalign are simply eliminated by the compiler >> if HAVE_EFFICIENT_UNALIGNED_ACCESS is defined. >> > > Hi Ard, > > This is a good idea, and I think it was error-prone to be requiring 4-byte > alignment always, and also inefficient on many architectures. > > The new crypto_inc() looks fine, but the new crypto_xor() is quite > complicated. > I'm wondering whether it has to be that way, especially since it seems to most > commonly be used on very small input buffers, e.g. 8 or 16-byte blocks. There > are a couple trivial ways it could be simplified, e.g. using 'dst' and 'src' > directly instead of 'a' and 'b' (which also seems to improve code generation > by > getting rid of the '+= len & ~mask' parts), or using sizeof(long) directly > instead of 'size' and 'mask'. > > But also when I tried testing the proposed crypto_xor() on MIPS, it didn't > work > correctly on a misaligned buffer. With startalign=1, it did one iteration of > the following loop and then exited with startalign=0 and entered the "unsigned > long at a time" loop, which is incorrect since at that point the buffers were > not yet fully aligned: > Right. I knew it was convoluted but I thought that was justified by its correctness :-) >> do { >> if (len < sizeof(u8)) >> break; >> >> if (len >= size && !(startalign & 1) && !(misalign & >> 1)) >> break; >> >> *dst++ ^= *src++; >> len -= sizeof(u8); >> startalign &= ~sizeof(u8); >> } while (misalign & 1); > > I think it would need to do instead: > > startalign += sizeof(u8); > startalign %= sizeof(unsigned long); > > But I am wondering whether you considered something simpler, using the > get_unaligned/put_unaligned helpers, maybe even using a switch statement for > the > last (sizeof(long) - 1) bytes so it can be compiled as a jump table. > Something > like this: > > #define xor_unaligned(dst, src) \ > put_unaligned(get_unaligned(dst) ^ get_unaligned(src), (dst)) > > void crypto_xor(u8 *dst, const u8 *src, unsigned int len) > { > while (len >= sizeof(unsigned long)) { > xor_unaligned((unsigned long *)dst, (unsigned long *)src); > dst += sizeof(unsigned long); > src += sizeof(unsigned long); > len -= sizeof(unsigned long); > } > > switch (len) { > #ifdef CONFIG_64BIT > case 7: > dst[6] ^= src[6]; > /* fall through */ > case 6: > xor_unaligned((u16 *)&dst[4], (u16 *)&src[4]); > goto len_4; > case 5: > dst[4] ^= src[4]; > /* fall through */ > case 4: > len_4: > xor_unaligned((u32 *)dst, (u32 *)src); > break; > #endif > case 3: > dst[2] ^= src[2]; > /* fall through */ > case 2: > xor_unaligned((u16 *)dst, (u16 *)src); > break; > case 1: > dst[0] ^= src[0]; > break; > } > } > > That would seem like a better choice for small buffers, which seems to be the > more common case. It should generate slightly faster code on architectures > with > fast unaligned access like x86_64, while still being sufficient on > architectures > without --- perhaps even faster, since it wouldn't have as many branches. > Well, what I tried to deal with explicitly is misaligned dst and
Re: [RFC PATCH v2 4/4] crypto: aes - add generic time invariant AES for CTR/CCM/GCM
On 2 February 2017 at 07:38, Eric Biggers wrote: > Hi Ard, > > On Sat, Jan 28, 2017 at 11:33:33PM +, Ard Biesheuvel wrote: >> >> Note that this only implements AES encryption, which is all we need >> for CTR and CBC-MAC. AES decryption can easily be implemented in a >> similar way, but is significantly more costly. > > Is the expectation of decryption being more costly the only reason why > "aes-ti" > couldn't be a "cipher" algorithm, allowing it to automatically be used by the > existing templates for CTR, CBC-MAC, CBC, ECB, XTS, CMAC, etc.? Yes. > It doesn't seem > to do anything expensive on a per-block basis like loading SSE registers, so > it > seems it would fit better as a "cipher" algorithm if at all possible. Then > there would be no need to implement all these modes yet again. > True. > Also, what would be the feasibility of simply replacing aes-generic with the > time-invariant implementation, rather than offering two implementations and > requiring users to choose one, usually without the needed expertise? > Well, it is a policy decision whether you want the best performance or reduced correlation between timing and the input, so there is no way to make everybody happy by replacing one with the other. But I can certainly implement is as a cipher, and we can take the discussion from there. >> + >> +/* >> + * Emit the sbox as __weak with external linkage to prevent the compiler >> + * from doing constant folding on sbox references involving fixed indexes. >> + */ >> +__weak const u8 __cacheline_aligned __aesti_sbox[] = { > > Did you consider marking it 'volatile' instead? > I did not, and I expect the result to be the same. I can replace it if it matters. >> +static int aesti_set_key(struct aes_ti_ctx *ctx, const u8 *in_key, >> + unsigned int key_len) >> +{ >> + struct crypto_aes_ctx rk; >> + int err; >> + >> + err = crypto_aes_expand_key(&rk, in_key, key_len); >> + if (err) >> + return err; > > crypto_aes_expand_key() assumes that the key is u32-aligned; I don't think > that's guaranteed here. > No, I don't think so. AFAICT it expects the round keys to be u32 aligned, which is guaranteed due to the fact that struct crypto_aes_ctx encapsulates an array of u32 Regards, Ard.
Re: [RFC PATCH v2 4/4] crypto: aes - add generic time invariant AES for CTR/CCM/GCM
Hi Ard, On Sat, Jan 28, 2017 at 11:33:33PM +, Ard Biesheuvel wrote: > > Note that this only implements AES encryption, which is all we need > for CTR and CBC-MAC. AES decryption can easily be implemented in a > similar way, but is significantly more costly. Is the expectation of decryption being more costly the only reason why "aes-ti" couldn't be a "cipher" algorithm, allowing it to automatically be used by the existing templates for CTR, CBC-MAC, CBC, ECB, XTS, CMAC, etc.? It doesn't seem to do anything expensive on a per-block basis like loading SSE registers, so it seems it would fit better as a "cipher" algorithm if at all possible. Then there would be no need to implement all these modes yet again. Also, what would be the feasibility of simply replacing aes-generic with the time-invariant implementation, rather than offering two implementations and requiring users to choose one, usually without the needed expertise? > + > +/* > + * Emit the sbox as __weak with external linkage to prevent the compiler > + * from doing constant folding on sbox references involving fixed indexes. > + */ > +__weak const u8 __cacheline_aligned __aesti_sbox[] = { Did you consider marking it 'volatile' instead? > +static int aesti_set_key(struct aes_ti_ctx *ctx, const u8 *in_key, > + unsigned int key_len) > +{ > + struct crypto_aes_ctx rk; > + int err; > + > + err = crypto_aes_expand_key(&rk, in_key, key_len); > + if (err) > + return err; crypto_aes_expand_key() assumes that the key is u32-aligned; I don't think that's guaranteed here. Eric
Re: [RFC PATCH] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic
On Mon, Jan 30, 2017 at 02:11:29PM +, Ard Biesheuvel wrote: > Instead of unconditionally forcing 4 byte alignment for all generic > chaining modes that rely on crypto_xor() or crypto_inc() (which may > result in unnecessary copying of data when the underlying hardware > can perform unaligned accesses efficiently), make those functions > deal with unaligned input explicitly, but only if the Kconfig symbol > HAVE_EFFICIENT_UNALIGNED_ACCESS is set. This will allow us to drop > the alignmasks from the CBC, CMAC, CTR, CTS, PCBC and SEQIV drivers. > > For crypto_inc(), this simply involves making the 4-byte stride > conditional on HAVE_EFFICIENT_UNALIGNED_ACCESS being set, given that > it typically operates on 16 byte buffers. > > For crypto_xor(), an algorithm is implemented that simply runs through > the input using the largest strides possible if unaligned accesses are > allowed. If they are not, an optimal sequence of memory accesses is > emitted that takes the relative alignment of the input buffers into > account, e.g., if the relative misalignment of dst and src is 4 bytes, > the entire xor operation will be completed using 4 byte loads and stores > (modulo unaligned bits at the start and end). Note that all expressions > involving startalign and misalign are simply eliminated by the compiler > if HAVE_EFFICIENT_UNALIGNED_ACCESS is defined. > Hi Ard, This is a good idea, and I think it was error-prone to be requiring 4-byte alignment always, and also inefficient on many architectures. The new crypto_inc() looks fine, but the new crypto_xor() is quite complicated. I'm wondering whether it has to be that way, especially since it seems to most commonly be used on very small input buffers, e.g. 8 or 16-byte blocks. There are a couple trivial ways it could be simplified, e.g. using 'dst' and 'src' directly instead of 'a' and 'b' (which also seems to improve code generation by getting rid of the '+= len & ~mask' parts), or using sizeof(long) directly instead of 'size' and 'mask'. But also when I tried testing the proposed crypto_xor() on MIPS, it didn't work correctly on a misaligned buffer. With startalign=1, it did one iteration of the following loop and then exited with startalign=0 and entered the "unsigned long at a time" loop, which is incorrect since at that point the buffers were not yet fully aligned: > do { > if (len < sizeof(u8)) > break; > > if (len >= size && !(startalign & 1) && !(misalign & 1)) > break; > > *dst++ ^= *src++; > len -= sizeof(u8); > startalign &= ~sizeof(u8); > } while (misalign & 1); I think it would need to do instead: startalign += sizeof(u8); startalign %= sizeof(unsigned long); But I am wondering whether you considered something simpler, using the get_unaligned/put_unaligned helpers, maybe even using a switch statement for the last (sizeof(long) - 1) bytes so it can be compiled as a jump table. Something like this: #define xor_unaligned(dst, src) \ put_unaligned(get_unaligned(dst) ^ get_unaligned(src), (dst)) void crypto_xor(u8 *dst, const u8 *src, unsigned int len) { while (len >= sizeof(unsigned long)) { xor_unaligned((unsigned long *)dst, (unsigned long *)src); dst += sizeof(unsigned long); src += sizeof(unsigned long); len -= sizeof(unsigned long); } switch (len) { #ifdef CONFIG_64BIT case 7: dst[6] ^= src[6]; /* fall through */ case 6: xor_unaligned((u16 *)&dst[4], (u16 *)&src[4]); goto len_4; case 5: dst[4] ^= src[4]; /* fall through */ case 4: len_4: xor_unaligned((u32 *)dst, (u32 *)src); break; #endif case 3: dst[2] ^= src[2]; /* fall through */ case 2: xor_unaligned((u16 *)dst, (u16 *)src); break; case 1: dst[0] ^= src[0]; break; } } That would seem like a better choice for small buffers, which seems to be the more common case. It should generate slightly faster code on architectures with fast unaligned access like x86_64, while still being sufficient on architectures without --- perhaps even faster, since it wouldn't have as many branches. Eric
Re: [PATCH 3/6] async_tx: Handle DMA devices having support for fewer PQ coefficients
On Wed, Feb 1, 2017 at 8:47 PM, Anup Patel wrote: > The DMAENGINE framework assumes that if PQ offload is supported by a > DMA device then all 256 PQ coefficients are supported. This assumption > does not hold anymore because we now have BCM-SBA-RAID offload engine > which supports PQ offload with limited number of PQ coefficients. > > This patch extends async_tx APIs to handle DMA devices with support > for fewer PQ coefficients. > > Signed-off-by: Anup Patel > Reviewed-by: Scott Branden > --- > crypto/async_tx/async_pq.c | 3 +++ > crypto/async_tx/async_raid6_recov.c | 12 ++-- > include/linux/dmaengine.h | 19 +++ > include/linux/raid/pq.h | 3 +++ > 4 files changed, 35 insertions(+), 2 deletions(-) So, I hate the way async_tx does these checks on each operation, and it's ok for me to say that because it's my fault. Really it's md that should be validating engine offload capabilities once at the beginning of time. I'd rather we move in that direction than continue to pile onto a bad design.
Re: [PATCH -stable] crypto: ccm - deal with CTR ciphers that honour iv_out
On Wed, Feb 01, 2017 at 08:08:09PM +, Ard Biesheuvel wrote: > > Could you please forward this patch to Linus as well? I noticed that the patch Sure, I will do that. > crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes > > is now in mainline, which means CCM is now broken on arm64, given that > the iv_out requirement for CTR apparently isn't honored by *any* > implementation, and CCM wrongly assumes that req->iv retains its value > across the call into the CTR skcipher Hmm, I wonder why we don't see this breakage with the generic CTR as it seems to do exactly the same thing. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: 4.10 aesni-intel no longer having lrw/ablk_helper dependencies?
On Wed, Feb 01, 2017 at 05:08:03PM +0100, Arkadiusz Miśkiewicz wrote: > > q: Will later loading of pcbc (so intel-aseni loaded from initrd, no pcbc > available, rootfs gets mounted; pcbc is loaded) enable its "functionality" > for > intel-aesni just like it would be available at intel-aesni load time ? No it won't but it's no big deal as pcbc(aes) isn't actually used anywhere so it's really just an example of using fpu. AFAIK pcbc is only ever used with fcrypt. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[PATCH 4/6] async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()
The DMA_PREP_FENCE is to be used when preparing Tx descriptor if output of Tx descriptor is to be used by next/dependent Tx descriptor. The DMA_PREP_FENSE will not be set correctly in do_async_gen_syndrome() when calling dma->device_prep_dma_pq() under following conditions: 1. ASYNC_TX_FENCE not set in submit->flags 2. DMA_PREP_FENCE not set in dma_flags 3. src_cnt (= (disks - 2)) is greater than dma_maxpq(dma, dma_flags) This patch fixes DMA_PREP_FENCE usage in do_async_gen_syndrome() taking inspiration from do_async_xor() implementation. Signed-off-by: Anup Patel Reviewed-by: Ray Jui Reviewed-by: Scott Branden --- crypto/async_tx/async_pq.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c index 16c6526..947cf35 100644 --- a/crypto/async_tx/async_pq.c +++ b/crypto/async_tx/async_pq.c @@ -62,9 +62,6 @@ do_async_gen_syndrome(struct dma_chan *chan, dma_addr_t dma_dest[2]; int src_off = 0; - if (submit->flags & ASYNC_TX_FENCE) - dma_flags |= DMA_PREP_FENCE; - while (src_cnt > 0) { submit->flags = flags_orig; pq_src_cnt = min(src_cnt, dma_maxpq(dma, dma_flags)); @@ -83,6 +80,8 @@ do_async_gen_syndrome(struct dma_chan *chan, if (cb_fn_orig) dma_flags |= DMA_PREP_INTERRUPT; } + if (submit->flags & ASYNC_TX_FENCE) + dma_flags |= DMA_PREP_FENCE; /* Drivers force forward progress in case they can not provide * a descriptor -- 2.7.4
[PATCH 0/6] Broadcom SBA RAID support
The Broadcom SBA RAID is a stream-based device which provides RAID5/6 offload. It requires a SoC specific ring manager (such as Broadcom FlexRM ring manager) to provide ring-based programming interface. Due to this, the Broadcom SBA RAID driver (mailbox client) implements DMA device having one DMA channel using a set of mailbox channels provided by Broadcom SoC specific ring manager driver (mailbox controller). Important limitations of Broadcom SBA RAID hardware are: 1. Requires disk position instead of disk coefficient 2. Supports only 30 PQ disk coefficients To address limitation #1, we have added raid_gflog table which will help driver convert disk coefficient to disk position. To address limitation #2, we have extended Linux Async Tx APIs to check for available PQ coefficients before doing PQ offload. This patchset is based on Linux-4.10-rc6 and depends on patchset "[PATCH v4 0/2] Broadcom FlexRM ring manager support" It is also available at sba-raid-v1 branch of https://github.com/Broadcom/arm64-linux.git Anup Patel (6): mailbox: Add new API mbox_channel_device() for clients lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position async_tx: Handle DMA devices having support for fewer PQ coefficients async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome() dmaengine: Add Broadcom SBA RAID driver dt-bindings: Add DT bindings document for Broadcom SBA RAID driver .../devicetree/bindings/dma/brcm,iproc-sba.txt | 29 + crypto/async_tx/async_pq.c |8 +- crypto/async_tx/async_raid6_recov.c| 12 +- drivers/dma/Kconfig| 13 + drivers/dma/Makefile |1 + drivers/dma/bcm-sba-raid.c | 1309 drivers/mailbox/mailbox.c | 21 + include/linux/dmaengine.h | 19 + include/linux/mailbox_client.h |1 + include/linux/raid/pq.h|4 + lib/raid6/mktables.c | 20 + 11 files changed, 1432 insertions(+), 5 deletions(-) create mode 100644 Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt create mode 100644 drivers/dma/bcm-sba-raid.c -- 2.7.4
[PATCH 5/6] dmaengine: Add Broadcom SBA RAID driver
The Broadcom stream buffer accelerator (SBA) provides offloading capabilities for RAID operations. This SBA offload engine is accessible via Broadcom SoC specific ring manager. This patch adds Broadcom SBA RAID driver which provides one DMA device with RAID capabilities using one or more Broadcom SoC specific ring manager channels. The SBA RAID driver in its current shape implements memcpy, xor, and pq operations. Signed-off-by: Anup Patel Reviewed-by: Ray Jui Reviewed-by: Scott Branden --- drivers/dma/Kconfig| 13 + drivers/dma/Makefile |1 + drivers/dma/bcm-sba-raid.c | 1309 3 files changed, 1323 insertions(+) create mode 100644 drivers/dma/bcm-sba-raid.c diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 263495d..58d0463 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -99,6 +99,19 @@ config AXI_DMAC controller is often used in Analog Device's reference designs for FPGA platforms. +config BCM_SBA_RAID +tristate "Broadcom SBA RAID engine support" +depends on (ARM64 && MAILBOX && RAID6_PQ) || COMPILE_TEST +select DMA_ENGINE +select DMA_ENGINE_RAID + select ASYNC_TX_ENABLE_CHANNEL_SWITCH + default ARCH_BCM_IPROC +help + Enable support for Broadcom SBA RAID Engine. The SBA RAID + engine is available on most of the Broadcom iProc SoCs. It + has the capability to offload memcpy, xor and pq computation + for raid5/6. + config COH901318 bool "ST-Ericsson COH901318 DMA support" select DMA_ENGINE diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile index a4fa336..ba96bdd 100644 --- a/drivers/dma/Makefile +++ b/drivers/dma/Makefile @@ -17,6 +17,7 @@ obj-$(CONFIG_AMCC_PPC440SPE_ADMA) += ppc4xx/ obj-$(CONFIG_AT_HDMAC) += at_hdmac.o obj-$(CONFIG_AT_XDMAC) += at_xdmac.o obj-$(CONFIG_AXI_DMAC) += dma-axi-dmac.o +obj-$(CONFIG_BCM_SBA_RAID) += bcm-sba-raid.o obj-$(CONFIG_COH901318) += coh901318.o coh901318_lli.o obj-$(CONFIG_DMA_BCM2835) += bcm2835-dma.o obj-$(CONFIG_DMA_JZ4740) += dma-jz4740.o diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c new file mode 100644 index 000..bf39f3f --- /dev/null +++ b/drivers/dma/bcm-sba-raid.c @@ -0,0 +1,1309 @@ +/* + * Copyright (C) 2017 Broadcom + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +/* + * Broadcom SBA RAID Driver + * + * The Broadcom stream buffer accelerator (SBA) provides offloading + * capabilities for RAID operations. The SBA offload engine is accessible + * via Broadcom SoC specific ring manager. Two or more offload engines + * can share same Broadcom SoC specific ring manager due to this Broadcom + * SoC specific ring manager driver is implemented as a mailbox controller + * driver and offload engine drivers are implemented as mallbox clients. + * + * Typically, Broadcom SoC specific ring manager will implement larger + * number of hardware rings over one or more SBA hardware devices. By + * design, the internal buffer size of SBA hardware device is limited + * but all offload operations supported by SBA can be broken down into + * multiple small size requests and executed parallely on multiple SBA + * hardware devices for achieving high through-put. + * + * The Broadcom SBA RAID driver does not require any register programming + * except submitting request to SBA hardware device via mailbox channels. + * This driver implements a DMA device with one DMA channel using a set + * of mailbox channels provided by Broadcom SoC specific ring manager + * driver. To exploit parallelism (as described above), all DMA request + * coming to SBA RAID DMA channel are broken down to smaller requests + * and submitted to multiple mailbox channels in round-robin fashion. + * For having more SBA DMA channels, we can create more SBA device nodes + * in Broadcom SoC specific DTS based on number of hardware rings supported + * by Broadcom SoC ring manager. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "dmaengine.h" + +/* SBA command helper macros */ +#define SBA_DEC(_d, _s, _m)(((_d) >> (_s)) & (_m)) +#define SBA_ENC(_d, _v, _s, _m)\ + do {\ + (_d) &= ~((u64)(_m) << (_s)); \ + (_d) |= (((u64)(_v) & (_m)) << (_s)); \ + } while (0) + +/* SBA command related defines */ +#define SBA_TYPE_SHIFT 48 +#define SBA_TYPE_MASK 0x3 +#define SBA_TYPE_A 0x0 +#define SBA_TYPE_B 0x2 +#define SBA_TYPE_C
[PATCH 3/6] async_tx: Handle DMA devices having support for fewer PQ coefficients
The DMAENGINE framework assumes that if PQ offload is supported by a DMA device then all 256 PQ coefficients are supported. This assumption does not hold anymore because we now have BCM-SBA-RAID offload engine which supports PQ offload with limited number of PQ coefficients. This patch extends async_tx APIs to handle DMA devices with support for fewer PQ coefficients. Signed-off-by: Anup Patel Reviewed-by: Scott Branden --- crypto/async_tx/async_pq.c | 3 +++ crypto/async_tx/async_raid6_recov.c | 12 ++-- include/linux/dmaengine.h | 19 +++ include/linux/raid/pq.h | 3 +++ 4 files changed, 35 insertions(+), 2 deletions(-) diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c index f83de99..16c6526 100644 --- a/crypto/async_tx/async_pq.c +++ b/crypto/async_tx/async_pq.c @@ -187,6 +187,9 @@ async_gen_syndrome(struct page **blocks, unsigned int offset, int disks, BUG_ON(disks > 255 || !(P(blocks, disks) || Q(blocks, disks))); + if (device && dma_maxpqcoef(device) < src_cnt) + device = NULL; + if (device) unmap = dmaengine_get_unmap_data(device->dev, disks, GFP_NOWAIT); diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c index 8fab627..2916f95 100644 --- a/crypto/async_tx/async_raid6_recov.c +++ b/crypto/async_tx/async_raid6_recov.c @@ -352,6 +352,7 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb, { void *scribble = submit->scribble; int non_zero_srcs, i; + struct dma_chan *chan = async_dma_find_channel(DMA_PQ); BUG_ON(faila == failb); if (failb < faila) @@ -359,12 +360,15 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb, pr_debug("%s: disks: %d len: %zu\n", __func__, disks, bytes); + if (chan && dma_maxpqcoef(chan->device) < RAID6_PQ_MAX_COEF) + chan = NULL; + /* if a dma resource is not available or a scribble buffer is not * available punt to the synchronous path. In the 'dma not * available' case be sure to use the scribble buffer to * preserve the content of 'blocks' as the caller intended. */ - if (!async_dma_find_channel(DMA_PQ) || !scribble) { + if (!chan || !scribble) { void **ptrs = scribble ? scribble : (void **) blocks; async_tx_quiesce(&submit->depend_tx); @@ -432,15 +436,19 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila, void *scribble = submit->scribble; int good_srcs, good, i; struct page *srcs[2]; + struct dma_chan *chan = async_dma_find_channel(DMA_PQ); pr_debug("%s: disks: %d len: %zu\n", __func__, disks, bytes); + if (chan && dma_maxpqcoef(chan->device) < RAID6_PQ_MAX_COEF) + chan = NULL; + /* if a dma resource is not available or a scribble buffer is not * available punt to the synchronous path. In the 'dma not * available' case be sure to use the scribble buffer to * preserve the content of 'blocks' as the caller intended. */ - if (!async_dma_find_channel(DMA_PQ) || !scribble) { + if (!chan || !scribble) { void **ptrs = scribble ? scribble : (void **) blocks; async_tx_quiesce(&submit->depend_tx); diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index feee6ec..d938a8b 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -24,6 +24,7 @@ #include #include #include +#include #include /** @@ -668,6 +669,7 @@ struct dma_filter { * @cap_mask: one or more dma_capability flags * @max_xor: maximum number of xor sources, 0 if no capability * @max_pq: maximum number of PQ sources and PQ-continue capability + * @max_pqcoef: maximum number of PQ coefficients, 0 if all supported * @copy_align: alignment shift for memcpy operations * @xor_align: alignment shift for xor operations * @pq_align: alignment shift for pq operations @@ -727,11 +729,13 @@ struct dma_device { dma_cap_mask_t cap_mask; unsigned short max_xor; unsigned short max_pq; + unsigned short max_pqcoef; enum dmaengine_alignment copy_align; enum dmaengine_alignment xor_align; enum dmaengine_alignment pq_align; enum dmaengine_alignment fill_align; #define DMA_HAS_PQ_CONTINUE (1 << 15) + #define DMA_HAS_FEWER_PQ_COEF (1 << 15) int dev_id; struct device *dev; @@ -1122,6 +1126,21 @@ static inline int dma_maxpq(struct dma_device *dma, enum dma_ctrl_flags flags) BUG(); } +static inline void dma_set_maxpqcoef(struct dma_device *dma, +unsigned short max_pqcoef) +{ + if (max_pqcoef < RAID6_PQ_MAX_COEF) { + dma->max_pqcoef = max_pqcoef; + dma->max_pqcoef |= DMA_HAS_FEWER
[PATCH 6/6] dt-bindings: Add DT bindings document for Broadcom SBA RAID driver
This patch adds the DT bindings document for newly added Broadcom SBA RAID driver. Signed-off-by: Anup Patel Reviewed-by: Ray Jui Reviewed-by: Scott Branden --- .../devicetree/bindings/dma/brcm,iproc-sba.txt | 29 ++ 1 file changed, 29 insertions(+) create mode 100644 Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt diff --git a/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt b/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt new file mode 100644 index 000..092913a --- /dev/null +++ b/Documentation/devicetree/bindings/dma/brcm,iproc-sba.txt @@ -0,0 +1,29 @@ +* Broadcom SBA RAID engine + +Required properties: +- compatible: Should be one of the following + "brcm,iproc-sba" + "brcm,iproc-sba-v2" + The "brcm,iproc-sba" has support for only 6 PQ coefficients + The "brcm,iproc-sba-v2" has support for only 30 PQ coefficients +- mboxes: List of phandle and mailbox channel specifiers + +Example: + +raid_mbox: mbox@6740 { + ... + #mbox-cells = <3>; + ... +}; + +raid0 { + compatible = "brcm,iproc-sba-v2"; + mboxes = <&raid_mbox 0 0x1 0x>, +<&raid_mbox 1 0x1 0x>, +<&raid_mbox 2 0x1 0x>, +<&raid_mbox 3 0x1 0x>, +<&raid_mbox 4 0x1 0x>, +<&raid_mbox 5 0x1 0x>, +<&raid_mbox 6 0x1 0x>, +<&raid_mbox 7 0x1 0x>; +}; -- 2.7.4
[PATCH 2/6] lib/raid6: Add log-of-2 table for RAID6 HW requiring disk position
The raid6_gfexp table represents {2}^n values for 0 <= n < 256. The Linux async_tx framework pass values from raid6_gfexp as coefficients for each source to prep_dma_pq() callback of DMA channel with PQ capability. This creates problem for RAID6 offload engines (such as Broadcom SBA) which take disk position (i.e. log of {2}) instead of multiplicative cofficients from raid6_gfexp table. This patch adds raid6_gflog table having log-of-2 value for any given x such that 0 <= x < 256. For any given disk coefficient x, the corresponding disk position is given by raid6_gflog[x]. The RAID6 offload engine driver can use this newly added raid6_gflog table to get disk position from multiplicative coefficient. Signed-off-by: Anup Patel Reviewed-by: Scott Branden Reviewed-by: Ray Jui --- include/linux/raid/pq.h | 1 + lib/raid6/mktables.c| 20 2 files changed, 21 insertions(+) diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h index 4d57bba..30f9453 100644 --- a/include/linux/raid/pq.h +++ b/include/linux/raid/pq.h @@ -142,6 +142,7 @@ int raid6_select_algo(void); extern const u8 raid6_gfmul[256][256] __attribute__((aligned(256))); extern const u8 raid6_vgfmul[256][32] __attribute__((aligned(256))); extern const u8 raid6_gfexp[256] __attribute__((aligned(256))); +extern const u8 raid6_gflog[256] __attribute__((aligned(256))); extern const u8 raid6_gfinv[256] __attribute__((aligned(256))); extern const u8 raid6_gfexi[256] __attribute__((aligned(256))); diff --git a/lib/raid6/mktables.c b/lib/raid6/mktables.c index 39787db..e824d08 100644 --- a/lib/raid6/mktables.c +++ b/lib/raid6/mktables.c @@ -125,6 +125,26 @@ int main(int argc, char *argv[]) printf("EXPORT_SYMBOL(raid6_gfexp);\n"); printf("#endif\n"); + /* Compute log-of-2 table */ + printf("\nconst u8 __attribute__((aligned(256)))\n" + "raid6_gflog[256] =\n" "{\n"); + for (i = 0; i < 256; i += 8) { + printf("\t"); + for (j = 0; j < 8; j++) { + v = 255; + for (k = 0; k < 256; k++) + if (exptbl[k] == (i + j)) { + v = k; + break; + } + printf("0x%02x,%c", v, (j == 7) ? '\n' : ' '); + } + } + printf("};\n"); + printf("#ifdef __KERNEL__\n"); + printf("EXPORT_SYMBOL(raid6_gflog);\n"); + printf("#endif\n"); + /* Compute inverse table x^-1 == x^254 */ printf("\nconst u8 __attribute__((aligned(256)))\n" "raid6_gfinv[256] =\n" "{\n"); -- 2.7.4
[PATCH 1/6] mailbox: Add new API mbox_channel_device() for clients
The remote processor can have DMAENGINE capabilities and client can pass data to be processed via main memory. In such cases, the client will require DMAble memory for remote processor. This patch adds new API mbox_channel_device() which can be used by clients to get struct device pointer of underlying mailbox controller. This struct device pointer of mailbox controller can be used by clients to allocate DMAble memory for remote processor. Signed-off-by: Anup Patel Reviewed-by: Scott Branden Reviewed-by: Ray Jui --- drivers/mailbox/mailbox.c | 21 + include/linux/mailbox_client.h | 1 + 2 files changed, 22 insertions(+) diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c index 4671f8a..d4380fc 100644 --- a/drivers/mailbox/mailbox.c +++ b/drivers/mailbox/mailbox.c @@ -281,6 +281,27 @@ int mbox_send_message(struct mbox_chan *chan, void *mssg) EXPORT_SYMBOL_GPL(mbox_send_message); /** + * mbox_channel_device - Get device pointer of a mailbox channel. + * @chan: Mailbox channel assigned to this client. + * + * The remote processor can have DMAENGINE capabilities and client + * can pass data to be processed via main memory. In such cases, + * the client will require struct device pointer of the mailbox + * channel to map/unmap/allocate/free DMAble memory. + * + * Return: Pointer to the struct device of mailbox channel. + *ERR_PTR on failure. + */ +struct device *mbox_channel_device(struct mbox_chan *chan) +{ + if (!chan || !chan->cl) + return ERR_PTR(-EINVAL); + + return chan->mbox->dev; +} +EXPORT_SYMBOL_GPL(mbox_channel_device); + +/** * mbox_request_channel - Request a mailbox channel. * @cl: Identity of the client requesting the channel. * @index: Index of mailbox specifier in 'mboxes' property. diff --git a/include/linux/mailbox_client.h b/include/linux/mailbox_client.h index 4434871..3daffad 100644 --- a/include/linux/mailbox_client.h +++ b/include/linux/mailbox_client.h @@ -40,6 +40,7 @@ struct mbox_client { void (*tx_done)(struct mbox_client *cl, void *mssg, int r); }; +struct device *mbox_channel_device(struct mbox_chan *chan); struct mbox_chan *mbox_request_channel_byname(struct mbox_client *cl, const char *name); struct mbox_chan *mbox_request_channel(struct mbox_client *cl, int index); -- 2.7.4
Re: [PATCH v6 1/5] lib: Update LZ4 compressor module
On Tue, Jan 31, 2017 at 03:27:44PM -0700, Jonathan Corbet wrote: > On Fri, 27 Jan 2017 23:02:00 +0100 > Sven Schmidt <4ssch...@informatik.uni-hamburg.de> wrote: > > I have one quick question... > > > /* > > + * LZ4_compress_default() > > + * Compresses 'sourceSize' bytes from buffer 'source' > > + * into already allocated 'dest' buffer of size 'maxOutputSize'. > > + * Compression is guaranteed to succeed if > > + * 'maxOutputSize' >= LZ4_compressBound(inputSize). > > + * It also runs faster, so it's a recommended setting. > > + * If the function cannot compress 'source' > > + * into a more limited 'dest' budget, > > + * compression stops *immediately*, > > + * and the function result is zero. > > + * As a consequence, 'dest' content is not valid. > > + * > > + * source : source address of the original data > > + * dest : output buffer address > > + * of the compressed data > > + * inputSize: Max supported value is > > + * LZ4_MAX_INPUT_SIZE > > + * maxOutputSize: full or partial size of buffer 'dest' > > + * (which must be already allocated) > > + * workmem : address of the working memory. > > + * This requires 'workmem' of size LZ4_MEM_COMPRESS. > > + * return : the number of bytes written into buffer 'dest' > > + * (necessarily <= maxOutputSize) or 0 if compression fails > > + */ > > +int LZ4_compress_default(const char *source, char *dest, int inputSize, > > + int maxOutputSize, void *wrkmem); > > Is there any chance you could format these as kerneldoc comments? You're > not too far from it now, and that would allow the LZ4 interface to be > pulled into the documentation. > > Thanks, > > jon Hi Jon, of course, that makes sense. I already checked the documentation and you're right, I'm not that far from it. Will do the necessary changes. Thanks, Sven
Re: [PATCH -stable] crypto: ccm - deal with CTR ciphers that honour iv_out
On 28 January 2017 at 20:40, Ard Biesheuvel wrote: > The skcipher API mandates that chaining modes involving IVs calculate > an outgoing IV value that is suitable for encrypting additional blocks > of data. This means the CCM driver cannot assume that req->iv points to > the original IV value when it calls crypto_ccm_auth. So pass a copy to > the skcipher instead. > > Signed-off-by: Ard Biesheuvel > --- > crypto/ccm.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/crypto/ccm.c b/crypto/ccm.c > index b388ac6edfb9..8976ef9bc2e7 100644 > --- a/crypto/ccm.c > +++ b/crypto/ccm.c > @@ -362,7 +362,7 @@ static int crypto_ccm_decrypt(struct aead_request *req) > unsigned int cryptlen = req->cryptlen; > u8 *authtag = pctx->auth_tag; > u8 *odata = pctx->odata; > - u8 *iv = req->iv; > + u8 iv[16]; > int err; > > cryptlen -= authsize; > @@ -378,6 +378,7 @@ static int crypto_ccm_decrypt(struct aead_request *req) > if (req->src != req->dst) > dst = pctx->dst; > > + memcpy(iv, req->iv, sizeof(iv)); > skcipher_request_set_tfm(skreq, ctx->ctr); > skcipher_request_set_callback(skreq, pctx->flags, > crypto_ccm_decrypt_done, req); > -- > 2.7.4 > Herbert, Could you please forward this patch to Linus as well? I noticed that the patch crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes is now in mainline, which means CCM is now broken on arm64, given that the iv_out requirement for CTR apparently isn't honored by *any* implementation, and CCM wrongly assumes that req->iv retains its value across the call into the CTR skcipher Thanks, Ard.
Re: crypto: NULL deref in sha512_mb_mgr_get_comp_job_avx2
On Tue, Jan 31, 2017 at 02:16:31PM +0100, Dmitry Vyukov wrote: > Hello, > > I am getting the following reports with low frequency while running > syzkaller fuzzer. Unfortunately they are not reproducible and happen > in a background thread, so it is difficult to extract any context on > my side. I see only few such crashes per week, so most likely it is > some hard to trigger data race. The following reports are from mmotm > tree, commits 00e20cfc2bf04a0cbe1f5405f61c8426f43eee84 and > fff7e71eac7788904753136f09bcad7471f7799e. Any ideas as to how this can > happen? > > BUG: unable to handle kernel NULL pointer dereference at 0060 > IP: [] sha512_mb_mgr_get_comp_job_avx2+0x6e/0xee > arch/x86/crypto/sha512-mb/sha512_mb_mgr_flush_avx2.S:251 > PGD 1d2395067 [ 220.874864] PUD 1d2860067 > Oops: 0002 [#1] SMP KASAN > Dumping ftrace buffer: >(ftrace buffer empty) > Modules linked in: > CPU: 0 PID: 516 Comm: kworker/0:1 Not tainted 4.9.0 #4 > Hardware name: Google Google Compute Engine/Google Compute Engine, > BIOS Google 01/01/2011 > Workqueue: crypto mcryptd_queue_worker > task: 8801d9f346c0 task.stack: 8801d9f08000 > RIP: 0010:[] [] > sha512_mb_mgr_get_comp_job_avx2+0x6e/0xee > arch/x86/crypto/sha512-mb/sha512_mb_mgr_flush_avx2.S:251 > RSP: 0018:8801d9f0eef8 EFLAGS: 00010202 > RAX: RBX: 8801d7db1190 RCX: 0006 > RDX: 0001 RSI: 8801d9f34ee8 RDI: 8801d7db1040 > RBP: 8801d9f0f258 R08: 0001 R09: 0001 > R10: 0002 R11: 0003 R12: 8801d9f0f230 > R13: 8801c8bbc4e0 R14: 8801c8bbc530 R15: 8801d9f0ef70 > FS: () GS:8801dc00() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 0060 CR3: 0001cc15a000 CR4: 001406f0 > DR0: DR1: DR2: > DR3: DR6: fffe0ff0 DR7: 0400 > Stack: > 8801d7db1040 813fa207 dc00 e8c0f238 > 0002 11003b3e1dea e8c0f218 8801d9f0f190 > 0282 e8c0f140 e8c0f220 41b58ab3 > Call Trace: > [] sha512_mb_update+0x2f7/0x4e0 > arch/x86/crypto/sha512-mb/sha512_mb.c:588 > [] crypto_ahash_update include/crypto/hash.h:512 [inline] > [] ahash_mcryptd_update crypto/mcryptd.c:627 [inline] > [] mcryptd_hash_update+0xcd/0x1c0 crypto/mcryptd.c:373 > [] mcryptd_queue_worker+0xff/0x6a0 crypto/mcryptd.c:181 > [] process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2096 > [] worker_thread+0x223/0x1990 kernel/workqueue.c:2230 > [] kthread+0x323/0x3e0 kernel/kthread.c:209 > [] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 > Code: 49 0f 42 d3 48 f7 c2 f0 ff ff ff 0f 85 9a 00 00 00 48 83 e2 0f > 48 6b da 08 48 8d 9c 1f 48 01 00 00 48 8b 03 48 c7 03 00 00 00 00 > 40 60 02 00 00 00 48 8b 9f 40 01 00 00 48 c1 e3 08 48 09 d3 > RIP [] sha512_mb_mgr_get_comp_job_avx2+0x6e/0xee > arch/x86/crypto/sha512-mb/sha512_mb_mgr_flush_avx2.S:251 > RSP > CR2: 0060 > ---[ end trace 139fd4cda5dfe2c4 ]--- > Dmitry, One theory that Mehga and I have is that perhaps the flusher and regular computaion updates are stepping on each other. Can you try this patch and see if it helps? Tim --->8--- From: Tim Chen Subject: [PATCH] crypto/sha512-mb: Protect sha512 mb ctx mgr access To: Herbert Xu , Dmitry Vyukov Cc: Tim Chen , David Miller , linux-crypto@vger.kernel.org, LKML , megha@linux.intel.com, fenghua...@intel.com The flusher and regular multi-buffer computation via mcryptd may race with another. Add here a lock and turn off interrupt to to access multi-buffer computation state cstate->mgr before a round of computation. This should prevent the flusher code jumping in. Signed-off-by: Tim Chen --- arch/x86/crypto/sha512-mb/sha512_mb.c | 64 +++ 1 file changed, 42 insertions(+), 22 deletions(-) diff --git a/arch/x86/crypto/sha512-mb/sha512_mb.c b/arch/x86/crypto/sha512-mb/sha512_mb.c index d210174..f3c1c21 100644 --- a/arch/x86/crypto/sha512-mb/sha512_mb.c +++ b/arch/x86/crypto/sha512-mb/sha512_mb.c @@ -221,7 +221,7 @@ static struct sha512_hash_ctx *sha512_ctx_mgr_resubmit } static struct sha512_hash_ctx - *sha512_ctx_mgr_get_comp_ctx(struct sha512_ctx_mgr *mgr) + *sha512_ctx_mgr_get_comp_ctx(struct mcryptd_alg_cstate *cstate) { /* * If get_comp_job returns NULL, there are no jobs complete. @@ -233,11 +233,17 @@ static struct sha512_hash_ctx * Otherwise, all jobs currently being managed by the hash_ctx_mgr * still need processing. */ + struct sha512_ctx_mgr *mgr; struct sha512_hash_ctx *ctx; + unsigned long flags; + mgr = cstate->mgr; + spin_lock_irqsave(&cstate->work_lock, flags); ctx = (struct sha512_hash_ctx *) sha512
Re: [PATCH 1/1] crypto:algif_aead - Fix kernel panic on list_del
Am Mittwoch, 1. Februar 2017, 21:10:28 CET schrieb Harsh Jain: Hi Harsh, > Kernel panics when userspace program try to access AEAD interface. > Remove node from Linked List before freeing its memory. Very good catch. Thank you. Reviewed-by: Stephan Müller (PS: Herbert, in case you want to apply my patches regarding fixing the memory management for algif_aead and algif_skcipher, please note that this error is in the new function aead_free_rsgl/skcipher_free_sgl. Thus, if you think that my approach is good after all, I will need to re-send the patch.) Ciao Stephan
[PATCH 1/1] crypto:algif_aead - Fix kernel panic on list_del
Kernel panics when userspace program try to access AEAD interface. Remove node from Linked List before freeing its memory. Signed-off-by: Harsh Jain --- crypto/algif_aead.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c index f849311..533265f 100644 --- a/crypto/algif_aead.c +++ b/crypto/algif_aead.c @@ -661,9 +661,9 @@ static int aead_recvmsg_sync(struct socket *sock, struct msghdr *msg, int flags) unlock: list_for_each_entry_safe(rsgl, tmp, &ctx->list, list) { af_alg_free_sg(&rsgl->sgl); + list_del(&rsgl->list); if (rsgl != &ctx->first_rsgl) sock_kfree_s(sk, rsgl, sizeof(*rsgl)); - list_del(&rsgl->list); } INIT_LIST_HEAD(&ctx->list); aead_wmem_wakeup(sk); -- 1.8.2.3
[PATCH] crypto: arm64/crc32 - merge CRC32 and PMULL instruction based drivers
The PMULL based CRC32 implementation already contains code based on the separate, optional CRC32 instructions to fallback to when operating on small quantities of data. We can expose these routines directly on systems that lack the 64x64 PMULL instructions but do implement the CRC32 ones, which makes the driver that is based solely on those CRC32 instructions redundant. So remove it. Note that this aligns arm64 with ARM, whose accelerated CRC32 driver also combines the CRC32 extension based and the PMULL based versions. Signed-off-by: Ard Biesheuvel --- This is a meaningful patch by itself imho, but also fixes the issue reported by Matthias where their v4.8 based GCC does not understand the -mcpu=generic+crc command line option, resulting in failed builds. arch/arm64/configs/defconfig | 1 - arch/arm64/crypto/Kconfig | 9 +- arch/arm64/crypto/Makefile| 4 - arch/arm64/crypto/crc32-arm64.c | 290 arch/arm64/crypto/crc32-ce-glue.c | 49 +++- 5 files changed, 41 insertions(+), 312 deletions(-) diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index 33b744d54739..6fc6f5a2a6e5 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -516,4 +516,3 @@ CONFIG_CRYPTO_GHASH_ARM64_CE=y CONFIG_CRYPTO_AES_ARM64_CE_CCM=y CONFIG_CRYPTO_AES_ARM64_CE_BLK=y # CONFIG_CRYPTO_AES_ARM64_NEON_BLK is not set -CONFIG_CRYPTO_CRC32_ARM64=y diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig index bed7feddfeed..d92293747d63 100644 --- a/arch/arm64/crypto/Kconfig +++ b/arch/arm64/crypto/Kconfig @@ -37,8 +37,8 @@ config CRYPTO_CRCT10DIF_ARM64_CE select CRYPTO_HASH config CRYPTO_CRC32_ARM64_CE - tristate "CRC32 and CRC32C digest algorithms using PMULL instructions" - depends on KERNEL_MODE_NEON && CRC32 + tristate "CRC32 and CRC32C digest algorithms using ARMv8 extensions" + depends on CRC32 select CRYPTO_HASH config CRYPTO_AES_ARM64 @@ -71,11 +71,6 @@ config CRYPTO_AES_ARM64_NEON_BLK select CRYPTO_AES select CRYPTO_SIMD -config CRYPTO_CRC32_ARM64 - tristate "CRC32 and CRC32C using optional ARMv8 instructions" - depends on ARM64 - select CRYPTO_HASH - config CRYPTO_CHACHA20_NEON tristate "NEON accelerated ChaCha20 symmetric cipher" depends on KERNEL_MODE_NEON diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile index d1ae1b9cbe70..b5edc5918c28 100644 --- a/arch/arm64/crypto/Makefile +++ b/arch/arm64/crypto/Makefile @@ -55,10 +55,6 @@ AFLAGS_aes-neon.o:= -DINTERLEAVE=4 CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS -obj-$(CONFIG_CRYPTO_CRC32_ARM64) += crc32-arm64.o - -CFLAGS_crc32-arm64.o := -mcpu=generic+crc - $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE $(call if_changed_rule,cc_o_c) diff --git a/arch/arm64/crypto/crc32-arm64.c b/arch/arm64/crypto/crc32-arm64.c deleted file mode 100644 index 6a37c3c6b11d.. --- a/arch/arm64/crypto/crc32-arm64.c +++ /dev/null @@ -1,290 +0,0 @@ -/* - * crc32-arm64.c - CRC32 and CRC32C using optional ARMv8 instructions - * - * Module based on crypto/crc32c_generic.c - * - * CRC32 loop taken from Ed Nevill's Hadoop CRC patch - * http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3C1403687030.3355.19.camel%40localhost.localdomain%3E - * - * Using inline assembly instead of intrinsics in order to be backwards - * compatible with older compilers. - * - * Copyright (C) 2014 Linaro Ltd - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ - -#include -#include -#include -#include -#include -#include - -#include - -MODULE_AUTHOR("Yazen Ghannam "); -MODULE_DESCRIPTION("CRC32 and CRC32C using optional ARMv8 instructions"); -MODULE_LICENSE("GPL v2"); - -#define CRC32X(crc, value) __asm__("crc32x %w[c], %w[c], %x[v]":[c]"+r"(crc):[v]"r"(value)) -#define CRC32W(crc, value) __asm__("crc32w %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value)) -#define CRC32H(crc, value) __asm__("crc32h %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value)) -#define CRC32B(crc, value) __asm__("crc32b %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value)) -#define CRC32CX(crc, value) __asm__("crc32cx %w[c], %w[c], %x[v]":[c]"+r"(crc):[v]"r"(value)) -#define CRC32CW(crc, value) __asm__("crc32cw %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value)) -#define CRC32CH(crc, value) __asm__("crc32ch %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value)) -#define CRC32CB(crc, value) __asm__("crc32cb %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value)) - -static u32 crc32_arm64_le_hw(u32 crc, const u8 *p, unsigned int len) -{ - s64 length = len; - - while ((length -= sizeof(u64)) >= 0) { - CRC32X(crc, get_unaligned_le64(p)); - p += sizeof(u64); - } - - /* The following is more
Re: [PATCH] crypto: arm64/crc32 - detect crc32 support in assembler
On 1 February 2017 at 13:58, Alexander Graf wrote: > On 02/01/2017 10:43 AM, Ard Biesheuvel wrote: >> >> On 1 February 2017 at 09:07, Ard Biesheuvel >> wrote: >>> >>> On 27 January 2017 at 10:52, Will Deacon wrote: On Fri, Jan 27, 2017 at 10:43:16AM +, Ard Biesheuvel wrote: > > On 27 January 2017 at 10:40, Matthias Brugger > wrote: >> >> Older compilers may not be able to detect the crc32 extended cpu type. > > What do you mean 'detect'? Could you describe the failure in more > detail > please? > >> Anyway only inline assembler code is used, which gets passed to the >> assembler. This patch moves the crc detection to the assembler. >> >> Suggested-by: Alexander Graf >> Signed-off-by: Matthias Brugger >> --- >> arch/arm64/crypto/Makefile | 2 -- >> arch/arm64/crypto/crc32-arm64.c | 3 +++ >> 2 files changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile >> index aad7b744..0d779dac75cd 100644 >> --- a/arch/arm64/crypto/Makefile >> +++ b/arch/arm64/crypto/Makefile >> @@ -48,8 +48,6 @@ CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS >> >> obj-$(CONFIG_CRYPTO_CRC32_ARM64) += crc32-arm64.o >> >> -CFLAGS_crc32-arm64.o := -mcpu=generic+crc >> - >> $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE >> $(call if_changed_rule,cc_o_c) >> >> diff --git a/arch/arm64/crypto/crc32-arm64.c >> b/arch/arm64/crypto/crc32-arm64.c >> index 6a37c3c6b11d..10f5dd075323 100644 >> --- a/arch/arm64/crypto/crc32-arm64.c >> +++ b/arch/arm64/crypto/crc32-arm64.c >> @@ -29,6 +29,9 @@ MODULE_AUTHOR("Yazen Ghannam >> "); >> MODULE_DESCRIPTION("CRC32 and CRC32C using optional ARMv8 >> instructions"); >> MODULE_LICENSE("GPL v2"); >> >> +/* Request crc extension capabilities from the assembler */ >> +asm(".arch_extension crc"); >> + > > Will should confirm, but I think this is a recent feature in GAS for > AArch64, so this may break older toolchains as well. Yes, the .arch_extension directive isn't universally supported by AArch64 gas so we can't rely on it unconditionally. The best bet is to check for the support and, if it's not present, then disable whatever feature relies on it. See the lseinstr variable in Makefile. >>> Actually, this driver has become somewhat redundant now that we have >>> an alternative that combines an implementation based on 64x64 >>> polynomial multiplication with an implementation based on the CRC32 >>> instructions. >>> >>> I will propose a patch that makes the latter usable when only the >>> CRC32 instructions are supported. >> >> ... although you still haven't explained what the actual problem is >> that you are trying to solve. > > > > > The problem is that in Leap 42.2 (as well as SLES12 SP2) we have a 4.8 based > system compiler, but recent binutils. That means that while our assembler is > happy to work with crc instructions, passing the -mcpu parameter to gcc > fails because gcc isn't aware of the flavor yet. > > That in turn means that we want to tell the assembler about feature > requirements rather than the compiler. Fortunately the ".arch_extension" > primitive allows you to do so. > > As far as checking for availability of it goes, I agree that it'd be nice to > check if ".arch_extension" is supported. But so would be to check if > -mcpu=generic+crc is supported. IMHO this patch doesn't make the current > non-checking situation any worse. But of course checking is always nicer > than not checking :) > Well, I am pretty sure binutils v2.25 and older are in wider use than GCC v4.8 and older, which means the runtime test would disable the CRC module altogether for a lot of users. However, as I mentioned, we can also remove this driver once the PMULL based one is updated.
Re: 4.10 aesni-intel no longer having lrw/ablk_helper dependencies?
On Mon, Jan 30, 2017 at 05:42:35PM +0100, Arkadiusz Miśkiewicz wrote: > On Monday 30 of January 2017, Eric Biggers wrote: > > > First, aesni-intel no longer includes an LRW implementation itself. > > Instead, the generic LRW module must be selected. Internally it will use > > the aesni-intel accelerated ECB algorithm if available. So you need to > > make sure that the "lrw" module is included in the initrd if it's not > > already. > > > > But I think the bigger problem is that aesni-intel couldn't be insmod'ed at > > all, which shouldn't happen. The problem might actually be related to the > > "pcbc" algorithm. Upon initialization, aesni-intel now tries to wrap > > "pcbc(__aes-aesni)" with the "fpu" template. This will fail if the "pcbc" > > module hasn't been inserted. I think this wasn't a problem before because > > the old code using ablk_helper instead of crypto_simd didn't try to find > > "pcbc" until someone asked for it, while now aesni-intel will try to find > > it immediately. And since aesni-intel has no direct dependency on pcbc, > > I'm guessing what happened is that pcbc didn't end up in your initrd even > > though it may have been built. (You can verify this by adding pcbc to > > your initrd and seeing if that works around the problem.) > > (hardcoded) loading of pcbc fixed my problem, intel-aesni loaded fine and > luks > partition unlocked correctly. Thanks for the report. Does this patch fix the problem? ---8<--- Subject: crypto: aesni - Fix failure when pcbc module is absent When aesni is built as a module together with pcbc, the pcbc module must be present for aesni to load. However, the pcbc module may not be present for reasons such as its absence on initramfs. This patch allows the aesni to function even if the pcbc module is enabled but not present. Reported-by: Arkadiusz Miśkiewicz Signed-off-by: Herbert Xu diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c index 7ad0ed7..93de8ea 100644 --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -1089,9 +1089,9 @@ static void aesni_free_simds(void) aesni_simd_skciphers[i]; i++) simd_skcipher_free(aesni_simd_skciphers[i]); - for (i = 0; i < ARRAY_SIZE(aesni_simd_skciphers2) && - aesni_simd_skciphers2[i].simd; i++) - simd_skcipher_free(aesni_simd_skciphers2[i].simd); + for (i = 0; i < ARRAY_SIZE(aesni_simd_skciphers2); i++) + if (aesni_simd_skciphers2[i].simd) + simd_skcipher_free(aesni_simd_skciphers2[i].simd); } static int __init aesni_init(void) @@ -1172,7 +1172,7 @@ static int __init aesni_init(void) simd = simd_skcipher_create_compat(algname, drvname, basename); err = PTR_ERR(simd); if (IS_ERR(simd)) - goto unregister_simds; + continue; aesni_simd_skciphers2[i].simd = simd; } -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg
On Mon, Jan 23, 2017 at 05:06:34PM +, Benedetto, Salvatore wrote: > > I forgot to add CC stable to it. > > This error was introduced in 4.8 and so I think it should go into stable 4.8 > and 4.9. > > Should I resend or can you add that? I had added it when applying the patch. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: tcrypt - Add mode to test specified algs
On Mon, Jan 23, 2017 at 04:13:04PM +0100, Rabin Vincent wrote: > > That's what I thought so too, but that doesn't seem to be the case. The > mode=0 handling is this: > > switch (m) { > case 0: > if (alg) { > if (!crypto_has_alg(alg, type, > mask ?: CRYPTO_ALG_TYPE_MASK)) > ret = -ENOENT; > break; > } > > for (i = 1; i < 200; i++) > ret += do_test(NULL, 0, 0, i); > break; > > So, if alg= is specified, after first checking if the specified alg is > present, it just goes ahead and runs all the tests. I'm not sure what > mode=0 alg=foo is meant to be used for. You need to set the type and mask for it to work. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: arm64/crc32 - detect crc32 support in assembler
On 02/01/2017 10:43 AM, Ard Biesheuvel wrote: On 1 February 2017 at 09:07, Ard Biesheuvel wrote: On 27 January 2017 at 10:52, Will Deacon wrote: On Fri, Jan 27, 2017 at 10:43:16AM +, Ard Biesheuvel wrote: On 27 January 2017 at 10:40, Matthias Brugger wrote: Older compilers may not be able to detect the crc32 extended cpu type. What do you mean 'detect'? Could you describe the failure in more detail please? Anyway only inline assembler code is used, which gets passed to the assembler. This patch moves the crc detection to the assembler. Suggested-by: Alexander Graf Signed-off-by: Matthias Brugger --- arch/arm64/crypto/Makefile | 2 -- arch/arm64/crypto/crc32-arm64.c | 3 +++ 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile index aad7b744..0d779dac75cd 100644 --- a/arch/arm64/crypto/Makefile +++ b/arch/arm64/crypto/Makefile @@ -48,8 +48,6 @@ CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS obj-$(CONFIG_CRYPTO_CRC32_ARM64) += crc32-arm64.o -CFLAGS_crc32-arm64.o := -mcpu=generic+crc - $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE $(call if_changed_rule,cc_o_c) diff --git a/arch/arm64/crypto/crc32-arm64.c b/arch/arm64/crypto/crc32-arm64.c index 6a37c3c6b11d..10f5dd075323 100644 --- a/arch/arm64/crypto/crc32-arm64.c +++ b/arch/arm64/crypto/crc32-arm64.c @@ -29,6 +29,9 @@ MODULE_AUTHOR("Yazen Ghannam "); MODULE_DESCRIPTION("CRC32 and CRC32C using optional ARMv8 instructions"); MODULE_LICENSE("GPL v2"); +/* Request crc extension capabilities from the assembler */ +asm(".arch_extension crc"); + Will should confirm, but I think this is a recent feature in GAS for AArch64, so this may break older toolchains as well. Yes, the .arch_extension directive isn't universally supported by AArch64 gas so we can't rely on it unconditionally. The best bet is to check for the support and, if it's not present, then disable whatever feature relies on it. See the lseinstr variable in Makefile. Actually, this driver has become somewhat redundant now that we have an alternative that combines an implementation based on 64x64 polynomial multiplication with an implementation based on the CRC32 instructions. I will propose a patch that makes the latter usable when only the CRC32 instructions are supported. ... although you still haven't explained what the actual problem is that you are trying to solve. The problem is that in Leap 42.2 (as well as SLES12 SP2) we have a 4.8 based system compiler, but recent binutils. That means that while our assembler is happy to work with crc instructions, passing the -mcpu parameter to gcc fails because gcc isn't aware of the flavor yet. That in turn means that we want to tell the assembler about feature requirements rather than the compiler. Fortunately the ".arch_extension" primitive allows you to do so. As far as checking for availability of it goes, I agree that it'd be nice to check if ".arch_extension" is supported. But so would be to check if -mcpu=generic+crc is supported. IMHO this patch doesn't make the current non-checking situation any worse. But of course checking is always nicer than not checking :) Alex
Re: [PATCH] crypto: arm64/crc32 - detect crc32 support in assembler
On 1 February 2017 at 09:07, Ard Biesheuvel wrote: > On 27 January 2017 at 10:52, Will Deacon wrote: >> On Fri, Jan 27, 2017 at 10:43:16AM +, Ard Biesheuvel wrote: >>> On 27 January 2017 at 10:40, Matthias Brugger wrote: >>> > Older compilers may not be able to detect the crc32 extended cpu type. >>> >>> What do you mean 'detect'? Could you describe the failure in more detail >>> please? >>> >>> > Anyway only inline assembler code is used, which gets passed to the >>> > assembler. This patch moves the crc detection to the assembler. >>> > >>> > Suggested-by: Alexander Graf >>> > Signed-off-by: Matthias Brugger >>> > --- >>> > arch/arm64/crypto/Makefile | 2 -- >>> > arch/arm64/crypto/crc32-arm64.c | 3 +++ >>> > 2 files changed, 3 insertions(+), 2 deletions(-) >>> > >>> > diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile >>> > index aad7b744..0d779dac75cd 100644 >>> > --- a/arch/arm64/crypto/Makefile >>> > +++ b/arch/arm64/crypto/Makefile >>> > @@ -48,8 +48,6 @@ CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS >>> > >>> > obj-$(CONFIG_CRYPTO_CRC32_ARM64) += crc32-arm64.o >>> > >>> > -CFLAGS_crc32-arm64.o := -mcpu=generic+crc >>> > - >>> > $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE >>> > $(call if_changed_rule,cc_o_c) >>> > >>> > diff --git a/arch/arm64/crypto/crc32-arm64.c >>> > b/arch/arm64/crypto/crc32-arm64.c >>> > index 6a37c3c6b11d..10f5dd075323 100644 >>> > --- a/arch/arm64/crypto/crc32-arm64.c >>> > +++ b/arch/arm64/crypto/crc32-arm64.c >>> > @@ -29,6 +29,9 @@ MODULE_AUTHOR("Yazen Ghannam >>> > "); >>> > MODULE_DESCRIPTION("CRC32 and CRC32C using optional ARMv8 instructions"); >>> > MODULE_LICENSE("GPL v2"); >>> > >>> > +/* Request crc extension capabilities from the assembler */ >>> > +asm(".arch_extension crc"); >>> > + >>> >>> Will should confirm, but I think this is a recent feature in GAS for >>> AArch64, so this may break older toolchains as well. >> >> Yes, the .arch_extension directive isn't universally supported by AArch64 >> gas so we can't rely on it unconditionally. The best bet is to check for >> the support and, if it's not present, then disable whatever feature relies >> on it. See the lseinstr variable in Makefile. >> > > Actually, this driver has become somewhat redundant now that we have > an alternative that combines an implementation based on 64x64 > polynomial multiplication with an implementation based on the CRC32 > instructions. > > I will propose a patch that makes the latter usable when only the > CRC32 instructions are supported. ... although you still haven't explained what the actual problem is that you are trying to solve.
Re: [PATCH] crypto: arm64/crc32 - detect crc32 support in assembler
On 27 January 2017 at 10:52, Will Deacon wrote: > On Fri, Jan 27, 2017 at 10:43:16AM +, Ard Biesheuvel wrote: >> On 27 January 2017 at 10:40, Matthias Brugger wrote: >> > Older compilers may not be able to detect the crc32 extended cpu type. >> >> What do you mean 'detect'? Could you describe the failure in more detail >> please? >> >> > Anyway only inline assembler code is used, which gets passed to the >> > assembler. This patch moves the crc detection to the assembler. >> > >> > Suggested-by: Alexander Graf >> > Signed-off-by: Matthias Brugger >> > --- >> > arch/arm64/crypto/Makefile | 2 -- >> > arch/arm64/crypto/crc32-arm64.c | 3 +++ >> > 2 files changed, 3 insertions(+), 2 deletions(-) >> > >> > diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile >> > index aad7b744..0d779dac75cd 100644 >> > --- a/arch/arm64/crypto/Makefile >> > +++ b/arch/arm64/crypto/Makefile >> > @@ -48,8 +48,6 @@ CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS >> > >> > obj-$(CONFIG_CRYPTO_CRC32_ARM64) += crc32-arm64.o >> > >> > -CFLAGS_crc32-arm64.o := -mcpu=generic+crc >> > - >> > $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE >> > $(call if_changed_rule,cc_o_c) >> > >> > diff --git a/arch/arm64/crypto/crc32-arm64.c >> > b/arch/arm64/crypto/crc32-arm64.c >> > index 6a37c3c6b11d..10f5dd075323 100644 >> > --- a/arch/arm64/crypto/crc32-arm64.c >> > +++ b/arch/arm64/crypto/crc32-arm64.c >> > @@ -29,6 +29,9 @@ MODULE_AUTHOR("Yazen Ghannam >> > "); >> > MODULE_DESCRIPTION("CRC32 and CRC32C using optional ARMv8 instructions"); >> > MODULE_LICENSE("GPL v2"); >> > >> > +/* Request crc extension capabilities from the assembler */ >> > +asm(".arch_extension crc"); >> > + >> >> Will should confirm, but I think this is a recent feature in GAS for >> AArch64, so this may break older toolchains as well. > > Yes, the .arch_extension directive isn't universally supported by AArch64 > gas so we can't rely on it unconditionally. The best bet is to check for > the support and, if it's not present, then disable whatever feature relies > on it. See the lseinstr variable in Makefile. > Actually, this driver has become somewhat redundant now that we have an alternative that combines an implementation based on 64x64 polynomial multiplication with an implementation based on the CRC32 instructions. I will propose a patch that makes the latter usable when only the CRC32 instructions are supported.
Crypto Fixes for 4.10
Hi Linus: This push fixes a bug in CBC/CTR on ARM64 that breaks chaining as well as a bug in the core API that causes registration failures when a driver unloads and then reloads an algorithm. Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git linus Ard Biesheuvel (1): crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes Salvatore Benedetto (1): crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg arch/arm64/crypto/aes-modes.S | 88 - crypto/algapi.c |1 + 2 files changed, 43 insertions(+), 46 deletions(-) Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt