The existing test cases only exercise a small slice of the various
possible code paths through the x86 SSE/PCLMULQDQ implementation,
and the upcoming ports of it for arm64. So add one that exceeds 256
bytes in size, and convert another to a chunked test.
Signed-off-by: Ard Biesheuvel
---
crypto
This is a straight transliteration of the Intel algorithm implemented
using SSE and PCLMULQDQ instructions that resides under in the file
arch/x86/crypto/crct10dif-pcl-asm_64.S.
Suggested-by: YueHaibing
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/Kconfig | 5 +
arch/arm64
This is a straight transliteration of the Intel algorithm implemented
using SSE and PCLMULQDQ instructions that resides under in the file
arch/x86/crypto/crct10dif-pcl-asm_64.S.
Signed-off-by: Ard Biesheuvel
---
arch/arm/crypto/Kconfig| 5 +
arch/arm/crypto/Makefile
On 24 November 2016 at 15:43, Ard Biesheuvel wrote:
> This is a straight transliteration of the Intel algorithm implemented
> using SSE and PCLMULQDQ instructions that resides under in the file
> arch/x86/crypto/crct10dif-pcl-asm_64.S.
>
> Signed-off-by: Ard Biesheuvel
> ---
last Thursday.
https://git.kernel.org/cgit/linux/kernel/git/ardb/linux.git/log/?h=crc32
Ard Biesheuvel (2):
crypto: arm64/crc32 - accelerated support based on x86 SSE
implementation
crypto: arm/crc32 - accelerated support based on x86 SSE
implementation
arch/arm/crypto/Kconfig
blocks of at least
64 bytes, and on multiples of 16 bytes only. For the remaining input,
or for all input on systems that lack the PMULL 64x64->128 instructions,
the CRC32 instructions will be used.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/Kconfig | 6 +
arch/arm64/cry
blocks of at least
64 bytes, and on multiples of 16 bytes only. For the remaining input,
or for all input on systems that lack the PMULL 64x64->128 instructions,
the CRC32 instructions will be used.
Signed-off-by: Ard Biesheuvel
---
arch/arm/crypto/Kconfig | 5 +
arch/arm/crypto/Makef
On 20 November 2016 at 11:43, Ard Biesheuvel wrote:
> On 20 November 2016 at 11:42, Ard Biesheuvel
> wrote:
>> This integrates both the accelerated scalar and the NEON implementations
>> of SHA-224/256 as well as SHA-384/512 from the OpenSSL project.
>>
>> Relati
On 28 November 2016 at 13:05, Will Deacon wrote:
> On Sun, Nov 20, 2016 at 11:42:01AM +0000, Ard Biesheuvel wrote:
>> This integrates both the accelerated scalar and the NEON implementations
>> of SHA-224/256 as well as SHA-384/512 from the OpenSSL project.
>>
>> Relat
Add the files that are generated by the recently merged OpenSSL
SHA-256/512 implementation to .gitignore so Git disregards them
when showing untracked files.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/.gitignore | 2 ++
1 file changed, 2 insertions(+)
create mode 100644 arch/arm64
On 28 November 2016 at 14:17, Herbert Xu wrote:
> On Thu, Nov 24, 2016 at 05:32:42PM +0000, Ard Biesheuvel wrote:
>> On 24 November 2016 at 15:43, Ard Biesheuvel
>> wrote:
>> > This is a straight transliteration of the Intel algorithm implemented
>> > using SS
Fix a missing statement that got lost in the skcipher conversion of
the CTR transform.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/aes-glue.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c
index 5c43b92b3714..4e3f8adb1793
s+0x0): first defined here
Fix this by making aes_simd_algs 'static'.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/aes-glue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c
index 24f6137c1a6e..5c43
The new skcipher walk interface does not take into account whether we
are encrypting or decrypting. In the latter case, the walk should
disregard the MAC. Fix this in the arm64 CE driver.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/aes-ce-ccm-glue.c | 7 +++
1 file changed, 3
.
Signed-off-by: Ard Biesheuvel
---
crypto/skcipher.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 0f3071991b13..5367f817b40e 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -506,6 +506,8 @@ int skcipher_walk_aead(struct skcipher_walk *walk
The CBC encryption routine should use the encryption round keys, not
the decryption round keys.
Signed-off-by: Ard Biesheuvel
---
Another fix for the queued changes, this time for 32-bit ARM.
I must say, I'm not impressed with the level of testing that has been
carried out after applying
On 30 November 2016 at 13:14, Herbert Xu wrote:
> On Tue, Nov 29, 2016 at 01:05:32PM +0000, Ard Biesheuvel wrote:
>> The new skcipher walk interface does not take into account whether we
>> are encrypting or decrypting. In the latter case, the walk should
>> disregard the
> On 30 Nov 2016, at 13:19, Herbert Xu wrote:
>
>> On Tue, Nov 29, 2016 at 05:23:36PM +, Ard Biesheuvel wrote:
>> The CBC encryption routine should use the encryption round keys, not
>> the decryption round keys.
>>
>> Signed-off-by: Ard Bie
The IDXn offsets are chosen such that tap values (which may go up to
255) end up overlapping in the xbuf allocation. In particular, IDX1
and IDX3 are too close together, so update IDX3 to avoid this issue.
Signed-off-by: Ard Biesheuvel
---
crypto/testmgr.c | 2 +-
1 file changed, 1 insertion
This v2 combines the CRC-T10DIF and CRC32 implementations for both ARM and
arm64 that I sent out a couple of weeks ago, and adds support to the latter
for CRC32C.
Ard Biesheuvel (6):
crypto: testmgr - avoid overlap in chunked tests
crypto: testmgr - add/enhance test cases for CRC-T10DIF
The existing test cases only exercise a small slice of the various
possible code paths through the x86 SSE/PCLMULQDQ implementation,
and the upcoming ports of it for arm64. So add one that exceeds 256
bytes in size, and convert another to a chunked test.
Signed-off-by: Ard Biesheuvel
---
crypto
: Ard Biesheuvel
---
arch/arm64/crypto/Kconfig | 5 +
arch/arm64/crypto/Makefile| 3 +
arch/arm64/crypto/crct10dif-ce-core.S | 317
arch/arm64/crypto/crct10dif-ce-glue.c | 91 ++
4 files changed, 416 insertions(+)
diff --git a/arch/arm64/crypto
: Ard Biesheuvel
---
arch/arm/crypto/Kconfig | 5 +
arch/arm/crypto/Makefile| 2 +
arch/arm/crypto/crct10dif-ce-core.S | 349
arch/arm/crypto/crct10dif-ce-glue.c | 95 ++
4 files changed, 451 insertions(+)
diff --git a/arch/arm/crypto/Kconfig b
)
+{
+ crypto_unregister_shashes(crc32_pmull_algs,
+ ARRAY_SIZE(crc32_pmull_algs));
+}
+
+module_cpu_feature_match(PMULL, crc32_pmull_mod_init);
+module_exit(crc32_pmull_mod_exit);
+
+MODULE_AUTHOR("Ard Biesheuvel ");
+MODULE_LICENSE("GPL v2");
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
ARRAY_SIZE(crc32_pmull_algs));
+}
+
+static void __exit crc32_pmull_mod_exit(void)
+{
+ crypto_unregister_shashes(crc32_pmull_algs,
+ ARRAY_SIZE(crc32_pmull_algs));
+}
+
+module_init(crc32_pmull_mod_init);
+module_exit(crc32_pmull_mod_exit);
+
On 4 December 2016 at 11:54, Ard Biesheuvel wrote:
> This v2 combines the CRC-T10DIF and CRC32 implementations for both ARM and
> arm64 that I sent out a couple of weeks ago, and adds support to the latter
> for CRC32C.
>
Please don't apply yet. There is an issue in the 32-bi
The IDXn offsets are chosen such that tap values (which may go up to
255) end up overlapping in the xbuf allocation. In particular, IDX1
and IDX3 are too close together, so update IDX3 to avoid this issue.
Signed-off-by: Ard Biesheuvel
---
crypto/testmgr.c | 2 +-
1 file changed, 1 insertion
The existing test cases only exercise a small slice of the various
possible code paths through the x86 SSE/PCLMULQDQ implementation,
and the upcoming ports of it for arm64. So add one that exceeds 256
bytes in size, and convert another to a chunked test.
Signed-off-by: Ard Biesheuvel
---
crypto
not a
multiple of 16 bytes (but they still must be 16 byte aligned)
Ard Biesheuvel (6):
crypto: testmgr - avoid overlap in chunked tests
crypto: testmgr - add/enhance test cases for CRC-T10DIF
crypto: arm64/crct10dif - port x86 SSE implementation to arm64
crypto: arm/crct10dif - port x86
CRC32 and one for CRC32C.
The PMULL/NEON algorithm is faster, but operates on blocks of at least
64 bytes, and on multiples of 16 bytes only. For the remaining input,
or for all input on systems that lack the PMULL 64x64->128 instructions,
the CRC32 instructions will be used.
Signed-off-by:
ull_mod_exit(void)
+{
+ crypto_unregister_shashes(crc32_pmull_algs,
+ ARRAY_SIZE(crc32_pmull_algs));
+}
+
+module_cpu_feature_match(PMULL, crc32_pmull_mod_init);
+module_exit(crc32_pmull_mod_exit);
+
+MODULE_AUTHOR("Ard Biesheuvel ");
+MODULE_LICENSE("GPL v2");
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
This is a transliteration of the Intel algorithm implemented
using SSE and PCLMULQDQ instructions that resides in the file
arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only
operate on buffers that are 16 byte aligned (but of any size)
Signed-off-by: Ard Biesheuvel
---
arch/arm
This is a transliteration of the Intel algorithm implemented
using SSE and PCLMULQDQ instructions that resides in the file
arch/x86/crypto/crct10dif-pcl-asm_64.S, but simplified to only
operate on buffers that are 16 byte aligned (but of any size)
Signed-off-by: Ard Biesheuvel
---
arch/arm64
On 7 December 2016 at 19:19, Eric Biggers wrote:
> On Mon, Dec 05, 2016 at 06:42:23PM +0000, Ard Biesheuvel wrote:
>> The IDXn offsets are chosen such that tap values (which may go up to
>> 255) end up overlapping in the xbuf allocation. In particular, IDX1
>> and IDX3 are t
by putting IDX3 within 492 bytes of IDX1, which causes overlap if the
first chunk exceeds 492 bytes, which is the case for at least one of
the xts(aes) test cases.
So increase IDX3 by another 1000 bytes.
Signed-off-by: Ard Biesheuvel
---
crypto/testmgr.c | 2 +-
1 file changed, 1 insertion(+),
generic C code
(measured on Cortex-A57 using the arm64 version)
I'm aware that blkciphers are deprecated in favor of skciphers, but this
code (like the x86 version) uses the init and setkey routines of the generic
version, so it is probably better to port all implementations at once.
Ar
This is a straight port to arm64/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/Kconfig | 6 +
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/chacha20-neon-core.S | 480
This is a straight port to ARM/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher.
Signed-off-by: Ard Biesheuvel
---
arch/arm/crypto/Kconfig | 6 +
arch/arm/crypto/Makefile | 2 +
arch/arm/crypto/chacha20-neon-core.S | 524
arch
not* guarantee that those steps
produce an exact multiple of the chunk size.
Signed-off-by: Ard Biesheuvel
---
arch/arm/crypto/aesbs-glue.c | 68 +---
1 file changed, 38 insertions(+), 30 deletions(-)
diff --git a/arch/arm/crypto/aesbs-glue.c b/arch/arm/
generic C code
(measured on Cortex-A57 using the arm64 version)
Changes in v2:
- add patch to convert the generic and x86 to skciphers first
- tweaked the arm64 version for some additional performance
- use chunksize == 4x blocksize for optimal speed
Ard Biesheuvel (3):
crypto: chacha20 - conve
This is a straight port to arm64/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/Kconfig | 6 +
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/chacha20-neon-core.S | 450
This is a straight port to ARM/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher.
Signed-off-by: Ard Biesheuvel
---
arch/arm/crypto/Kconfig | 6 +
arch/arm/crypto/Makefile | 2 +
arch/arm/crypto/chacha20-neon-core.S | 524
arch
that all presented blocks
except the final one are a multiple of the chunk size, so we can simplify
the encrypt() routine somewhat.
Signed-off-by: Ard Biesheuvel
---
arch/x86/crypto/chacha20_glue.c | 69 +-
crypto/chacha20_generic.c | 73
include/crypto
tions
introduced in ARMv8, but those are part of an optional extension, and so
it is good to have a fallback.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/Kconfig | 6 +
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/aes-neonbs-core.S
to be the intention
that walk->buffer point to walk->page after skcipher_next_slow(), so
ensure that is the case.
Signed-off-by: Ard Biesheuvel
---
crypto/skcipher.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
ind
we should
not use sg_init_one() with the address of a kernel symbol).
But I will leave it up to Herbert to decide whether he prefers that or not.
In any case,
Acked-by: Ard Biesheuvel
> ---
> crypto/testmgr.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>
entation anyway, and the base layer was already
a huge improvement compared to the open coded implementations of the
SHA boilerplate.
> Cc: Ard Biesheuvel
> Cc: Herbert Xu
> Signed-off-by: Andy Lutomirski
> ---
> arch/arm/crypto/sha2-ce-glue.c | 10 ---
> arch/ar
On 26 December 2016 at 07:57, Herbert Xu wrote:
> On Sat, Dec 24, 2016 at 09:57:53AM -0800, Andy Lutomirski wrote:
>>
>> I actually do use incremental hashing later on. BPF currently
>> vmallocs() a big temporary buffer just so it can fill it and hash it.
>> I change it to hash as it goes.
>
> H
> On 27 Dec 2016, at 10:04, Herbert Xu wrote:
>
>> On Thu, Dec 08, 2016 at 02:28:57PM +, Ard Biesheuvel wrote:
>> Another port of existing x86 SSE code to NEON, again both for arm64 and ARM.
>>
>> ChaCha20 is a stream cipher described in RFC 7539, and is
On 27 December 2016 at 08:57, Herbert Xu wrote:
> On Fri, Dec 09, 2016 at 01:47:26PM +0000, Ard Biesheuvel wrote:
>> The bit-sliced NEON implementation of AES only performs optimally if
>> it can process 8 blocks of input in parallel. This is due to the nature
>> of bit sl
On 27 December 2016 at 15:36, Jeffrey Walton wrote:
>> ChaCha20 is a stream cipher described in RFC 7539, and is intended to be
>> an efficient software implementable 'standby cipher', in case AES cannot
>> be used.
>
> That's not quite correct.
>
> The IETF changed the algorithm a bit, and its no
> On 28 Dec 2016, at 09:03, Herbert Xu wrote:
>
>> On Tue, Dec 27, 2016 at 02:26:35PM +, Ard Biesheuvel wrote:
>>
>> You just nacked the v2 of this series (due to the chunksize/walksize) and i
>> rewrote them as skciphers as well
>
> Sorry. Would you
> On 28 Dec 2016, at 09:10, Herbert Xu wrote:
>
>> On Tue, Dec 27, 2016 at 06:35:46PM +, Ard Biesheuvel wrote:
>>
>> OK, I will try to hack something up.
>>
>> One thing to keep in mind though is that stacked chaining modes should
>> present the
> On 28 Dec 2016, at 09:18, Herbert Xu wrote:
>
>> On Tue, Dec 27, 2016 at 06:04:52PM +0800, Herbert Xu wrote:
>>> On Fri, Dec 09, 2016 at 02:33:51PM +, Ard Biesheuvel wrote:
>>> This converts the ChaCha20 code from a blkcipher to a skcipher, which
>>>
On 28 December 2016 at 09:23, Herbert Xu wrote:
> On Wed, Dec 28, 2016 at 09:19:32AM +0000, Ard Biesheuvel wrote:
>>
>> Ok, so that implies a field in the skcipher algo struct then, rather than
>> some definition internal to the driver?
>
> Oh yes it should definitely
On 29 December 2016 at 02:23, Herbert Xu wrote:
> On Wed, Dec 28, 2016 at 07:50:44PM +0000, Ard Biesheuvel wrote:
>>
>> So about this chunksize, is it ever expected to assume other values
>> than 1 (for stream ciphers) or the block size (for block ciphers)?
>> Havi
e walksize (in the skcipher
case) or from the chunksize (in the AEAD case).
Signed-off-by: Ard Biesheuvel
---
crypto/skcipher.c | 20 +++-
include/crypto/internal/skcipher.h | 2 +-
include/crypto/skcipher.h | 34
3 files changed, 47 insert
implementation of AES
in XTS mode for arm64, where using the 8-way cipher (and its ~2 KB
expanded key schedule) to generate the initial tweak is suboptimal.
Signed-off-by: Ard Biesheuvel
---
crypto/aes_generic.c | 10 ++
include/crypto/aes.h | 3 +++
2 files changed, 9 insertions(+), 4
This is a straight port to ARM/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher. It uses the new skcipher walksize
attribute to process the input in strides of 4x the block size.
Signed-off-by: Ard Biesheuvel
---
arch/arm/crypto/Kconfig | 6 +
arch/arm/crypto
modes.
Ard Biesheuvel (6):
crypto: generic/aes - export encrypt and decrypt entry points
crypto: arm/aes-neonbs - process 8 blocks in parallel if we can
crypto: arm/chacha20 - implement NEON version based on SSE3 code
crypto: arm64/chacha20 - implement NEON version based on SSE3 code
ll available. However, it does *not* guarantee that those steps
produce an exact multiple of the walk size.
Signed-off-by: Ard Biesheuvel
---
arch/arm/crypto/aesbs-glue.c | 67 +++-
1 file changed, 38 insertions(+), 29 deletions(-)
diff --git a/arch/arm/crypto/aesbs-glue.c b/arc
This is a straight port to arm64/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher. It uses the new skcipher walksize
attribute to process the input in strides of 4x the block size.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/Kconfig | 6 +
arch/arm64
in places where synchronous
transforms are required, such as the MAC802.11 encryption code, which
executes in sotfirq context, where SIMD processing is allowed on arm64.
Users of the async transform will keep the existing behavior.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/aes-glue.c
tions
introduced in ARMv8, but those are part of an optional extension, and so
it is good to have a fallback.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/Kconfig | 7 +
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/aes-neonbs-core.S | 879
On 31 October 2016 at 16:13, Russell King - ARM Linux
wrote:
> On Sat, Oct 29, 2016 at 11:08:36AM +0100, Ard Biesheuvel wrote:
>> On 18 October 2016 at 11:52, Ard Biesheuvel
>> wrote:
>> > Wire up the generic support for exposing CPU feature bits via the
>> > m
On 2 January 2017 at 18:21, Ard Biesheuvel wrote:
> This series adds SIMD implementations for arm64 and ARM of ChaCha20 (*),
> and a port of the ARM bit-sliced AES algorithm to arm64, and
>
> Patch #1 is a prerequisite for the AES-XTS implementation in #6, which needs
> a secondar
-A57, this code manages 13.0 cycles per byte, which is ~34% faster
than the generic C code. (Note that this is still >13x slower than the code
that uses the optional ARMv8 Crypto Extensions, which manages <1 cycles per
byte.)
Signed-off-by: Ard Biesheuvel
---
Raw performance data after the
On 2 January 2017 at 23:40, Russell King - ARM Linux
wrote:
> On Mon, Jan 02, 2017 at 09:06:04PM +0000, Ard Biesheuvel wrote:
>> On 31 October 2016 at 16:13, Russell King - ARM Linux
>> wrote:
>> > On Sat, Oct 29, 2016 at 11:08:36AM +0100, Ard Biesheuvel wrote:
>>
: Ard Biesheuvel
---
It makes sense to test this on a variety of cores before deciding whether
to merge it or not. Test results welcome. (insmod tcrypt.ko mode=200 sec=1)
arch/arm/crypto/Kconfig | 20 +--
arch/arm/crypto/Makefile | 4 +-
arch/arm/crypto/aes-cipher-core.S | 169
On 3 January 2017 at 20:01, Ard Biesheuvel wrote:
> On 2 January 2017 at 18:21, Ard Biesheuvel wrote:
>> This series adds SIMD implementations for arm64 and ARM of ChaCha20 (*),
>> and a port of the ARM bit-sliced AES algorithm to arm64, and
>>
>> Patch #1 is a p
On 10 January 2017 at 14:33, Herbert Xu wrote:
> I recently applied the patch
>
> https://patchwork.kernel.org/patch/9468391/
>
> and ended up with a boot crash when it tried to run the x86 chacha20
> code. It turned out that the patch changed a manually aligned
> stack buffer to one that
On 10 January 2017 at 19:00, Andy Lutomirski wrote:
> On Tue, Jan 10, 2017 at 9:30 AM, Ard Biesheuvel
> wrote:
>> On 10 January 2017 at 14:33, Herbert Xu wrote:
>>> I recently applied the patch
>>>
>>> https://patchwork.kernel.org/patch/9468391/
On 10 January 2017 at 19:22, Andy Lutomirski wrote:
> On Tue, Jan 10, 2017 at 11:16 AM, Ard Biesheuvel
> wrote:
>> On 10 January 2017 at 19:00, Andy Lutomirski wrote:
>>> On Tue, Jan 10, 2017 at 9:30 AM, Ard Biesheuvel
>>> wrote:
>>>> On 10 Janu
On 11 January 2017 at 06:53, Linus Torvalds
wrote:
>
>
> On Jan 10, 2017 8:36 PM, "Herbert Xu" wrote:
>
>
> Sure we can ban the use of attribute aligned on stacks. But
> what about indirect uses through structures?
>
>
> It should be pretty trivial to add a sparse warning for that, though.
>
Co
On 11 January 2017 at 12:08, Herbert Xu wrote:
> The kernel on x86-64 cannot use gcc attribute align to align to
> a 16-byte boundary. This patch reverts to the old way of aligning
> it by hand.
>
> Incidentally the old way was actually broken in not allocating
> enough space and would silently c
On 11 January 2017 at 12:28, Herbert Xu wrote:
> On Wed, Jan 11, 2017 at 12:14:24PM +0000, Ard Biesheuvel wrote:
>>
>> I think the old code was fine, actually:
>>
>> u32 *state, state_buf[16 + (CHACHA20_STATE_ALIGN / sizeof(u32)) - 1];
>>
>> ends up all
in places where synchronous
transforms are required, such as the MAC802.11 encryption code, which
executes in sotfirq context, where SIMD processing is allowed on arm64.
Users of the async transform will keep the existing behavior.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/aes-glue.c
://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git crypto-arm-v4.11
https://git.kernel.org/cgit/linux/kernel/git/ardb/linux.git/log/?h=crypto-arm-v4.11
Ard Biesheuvel (7):
crypto: arm64/chacha20 - implement NEON version based on SSE3 code
crypto: arm/chacha20 - implement NEON version
: Ard Biesheuvel
---
arch/arm/crypto/Kconfig | 20 +--
arch/arm/crypto/Makefile | 4 +-
arch/arm/crypto/aes-cipher-core.S | 179
arch/arm/crypto/aes-cipher-glue.c | 74
arch/arm/crypto/aes_glue.c| 98 ---
5 files changed, 256
This is a straight port to ARM/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher. It uses the new skcipher walksize
attribute to process the input in strides of 4x the block size.
Signed-off-by: Ard Biesheuvel
---
arch/arm/crypto/Kconfig | 6 +
arch/arm/crypto
This is a straight port to arm64/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher. It uses the new skcipher walksize
attribute to process the input in strides of 4x the block size.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/Kconfig | 6 +
arch/arm64
tions
introduced in ARMv8, but those are part of an optional extension, and so
it is good to have a fallback.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/Kconfig | 7 +
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/aes-neonbs-core.S | 963
-A57, this code manages 13.0 cycles per byte, which is ~34% faster
than the generic C code. (Note that this is still >13x slower than the code
that uses the optional ARMv8 Crypto Extensions, which manages <1 cycles per
byte.)
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/K
On 12 January 2017 at 06:12, Herbert Xu wrote:
> On Tue, Jan 10, 2017 at 05:30:48PM +0000, Ard Biesheuvel wrote:
>>
>> Apologies for introducing this breakage. It seemed like an obvious and
>> simple cleanup, so I didn't even bother to mention it in the commit
>>
ORS 2 -> 3
AES_CCM_ENC_TEST_VECTORS 8 -> 14
AES_CCM_DEC_TEST_VECTORS 7 -> 17
AES_CCM_4309_ENC_TEST_VECTORS7 -> 23
AES_CCM_4309_DEC_TEST_VECTORS 10 -> 23
CAMELLIA_CTR_ENC_TEST_VECTORS2 -> 3
CAMELLIA_CTR_DEC_TEST_VECTORS2 -> 3
Signed
On 12 January 2017 at 16:45, Herbert Xu wrote:
> On Wed, Jan 11, 2017 at 04:41:48PM +0000, Ard Biesheuvel wrote:
>> This adds ARM and arm64 implementations of ChaCha20, scalar AES and SIMD
>> AES (using bit slicing). The SIMD algorithms in this series take advantage
>>
Hi Arnd,
On 12 January 2017 at 19:04, kbuild test robot wrote:
> tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> master
> head: 1abee99eafab67fb1c98f9ecfc43cd5735384a86
> commit: 81edb42629758bacdf813dd5e4542ae26e3ad73a [43/44] crypto: arm/aes -
> replac
The ARMv8-M architecture introduces 'tt' and 'ttt' instructions,
which means we can no longer use 'tt' as a register alias on recent
versions of binutils for ARM. So replace the alias with 'ttab'.
Fixes: 81edb4262975 ("crypto: arm/aes - replace scal
On 14 January 2017 at 14:24, Krzysztof Kozlowski wrote:
> Hi,
>
> allyesconfig and multi_v7_defconfig fail to build on recent linux-next
> on GCC 6.2.0.
>
> Errors:
> ../arch/arm/crypto/aes-cipher-core.S: Assembler messages:
> ../arch/arm/crypto/aes-cipher-core.S:21: Error: selected processor does
While this is usually the case, it is not mandated by the API, and
given that the CTS code already accesses the ciphertext scatterlist
to retrieve those bytes, we can simply copy them into req->iv before
proceeding.
Fixes: 0605c41cc53c ("crypto: cts - Convert to skcipher")
Sig
On 17 January 2017 at 09:11, Herbert Xu wrote:
> On Mon, Jan 16, 2017 at 09:16:35AM +0000, Ard Biesheuvel wrote:
>> Since the skcipher conversion in commit 0605c41cc53c ("crypto:
>> cts - Convert to skcipher"), the cts code tacitly assumes that
>> the underlying CBC
On 17 January 2017 at 09:25, Herbert Xu wrote:
> On Tue, Jan 17, 2017 at 09:20:11AM +0000, Ard Biesheuvel wrote:
>>
>> So to be clear, it is part of the API that after calling
>> crypto_skcipher_encrypt(req), and completing the request, req->iv
>> should contain a v
otherwise, chaining is impossible anyway.
Cc: # v3.16+
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/aes-modes.S | 88 ++--
1 file changed, 42 insertions(+), 46 deletions(-)
diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S
index c53dbeae79f2
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/aes-ce-ccm-glue.c | 1 -
1 file changed, 1 deletion
sensitivity to cache timing attacks. So switch the fallback
handling to the plain NEON driver.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/Kconfig | 2 +-
arch/arm64/crypto/aes-neonbs-glue.c | 38 ++--
2 files changed, 29 insertions(+), 11 deletions(-)
diff
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel
---
NOTE: this won't apply unless 'crypto: arm64/aes-blk - hon
Shuffle some instructions around in the __hround macro to shave off
0.1 cycles per byte on Cortex-A57.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/aes-cipher-core.S | 52 +++-
1 file changed, 19 insertions(+), 33 deletions(-)
diff --git a/arch/arm64/crypto/aes-cipher
constants from memory in every round.
To allow the ECB and CBC encrypt routines to be reused by the bitsliced
NEON code in a subsequent patch, export them from the module.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/aes-glue.c | 2 +
arch/arm64/crypto/aes-neon.S | 199
Remove the unnecessary alignmask: it is much more efficient to deal with
the misalignment in the core algorithm than relying on the crypto API to
copy the data to a suitably aligned buffer.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/chacha20-neon-glue.c | 1 -
1 file changed, 1
KASLR"),
which is why the AES code used literals instead.
So now we can get rid of the literals, and switch to the adr_l macro.
Signed-off-by: Ard Biesheuvel
---
arch/arm64/crypto/aes-cipher-core.S | 7 ++-
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/crypto/aes-cip
201 - 300 of 2556 matches
Mail list logo