On Thu, Dec 15, 2016 at 10:09 PM, Peter Zijlstra <pet...@infradead.org> wrote:
> On Thu, Dec 15, 2016 at 07:50:36PM +0100, Jason A. Donenfeld wrote:
>> There's no 32-bit platform
>> that will trap on a 64-bit unaligned access because there's no such
>> thing as a 64
On Thu, Dec 15, 2016 at 9:31 PM, Hannes Frederic Sowa
wrote:
> ARM64 and x86-64 have memory operations that are not vector operations
> that operate on 128 bit memory.
Fair enough. imull I guess.
> How do you know that the compiler for some architecture will not
These restore parity with the jhash interface by providing high
performance helpers for common input sizes.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Tom Herbert <t...@herbertland.com>
---
include/linux/siphash.h | 33 ++
lib/siphash.c
high-speed solution to a widely known set of
problems, and it's time we catch-up.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumas...@gmail.com>
Cc: Daniel J. Bernstein <d...@cr.yp.to>
Cc: Linus Torvalds <torva...@linux-f
your Reviewed-by,
so we can begin to get this completed. If there are still lingering issues,
let me know and I'll incorporated them into a v6 if necessary.
Thanks,
Jason
Jason A. Donenfeld (4):
siphash: add cryptographically secure PRF
siphash: add Nu{32,64} helpers
secure_seq: use SipHash
the values directly to the
short input convenience functions.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Andi Kleen <a...@linux.intel.com>
Cc: David Miller <da...@davemloft.net>
Cc: David Laight <david.lai...@aculab.com>
Cc: Tom Herbert <t...@herbertland.com>
|
--
get_random_long | 137130 | 415983 | 3.03x |
get_random_int | 86384 | 343323 | 3.97x |
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumas...@gmail.com>
Cc: Ted Tso <ty...@mit.edu>
---
drivers/
Hi David & Hannes,
This conversation is veering off course. I think this doesn't really
matter at all. Gcc converts u64 into essentially a pair of u32 on
32-bit platforms, so the alignment requirements for 32-bit is at a
maximum 32 bits. On 64-bit platforms the alignment requirements are
related
Hi David,
On Thu, Dec 15, 2016 at 11:14 AM, David Laight <david.lai...@aculab.com> wrote:
> From: Behalf Of Jason A. Donenfeld
>> Sent: 14 December 2016 18:46
> ...
>> + ret = *chaining = siphash24((u8 *),
>> offsetof(typeof(combined), end),
>
> If you
These restore parity with the jhash interface by providing high
performance helpers for common input sizes.
Linus doesn't like the use of "qword" and "dword", but I haven't been
able to come up with another name for these that fits as well.
Signed-off-by: Jason A. Donenfeld
the values directly to the
short input convenience functions.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Andi Kleen <a...@linux.intel.com>
Cc: David Miller <da...@davemloft.net>
Cc: David Laight <david.lai...@aculab.com>
Cc: Tom Herbert <t...@herbertland.com>
known high-speed solution to a widely known problem, and it's
time we catch-up.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumas...@gmail.com>
Cc: Daniel J. Bernstein <d...@cr.yp.to>
Cc: Linus Torvalds <torva...@linux-foundatio
|
--
get_random_long | 137130 | 415983 | 3.03x |
get_random_int | 86384 | 343323 | 3.97x |
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumas...@gmail.com>
Cc: Ted Tso <ty...@mit.edu>
---
Changes from v3->v4:
Hey Ted,
On Wed, Dec 14, 2016 at 8:12 PM, Jason A. Donenfeld <ja...@zx2c4.com> wrote:
> I think this opens up a big window for optimizing it even
> further.
I optimized it a bit further and siphash is now the clear winner over chacha:
[1.784801] random benchmark!!
[
Hi Hannes,
On Wed, Dec 14, 2016 at 11:03 PM, Hannes Frederic Sowa
wrote:
> I fear that the alignment requirement will be a source of bugs on 32 bit
> machines, where you cannot even simply take a well aligned struct on a
> stack and put it into the normal
Hey Tom,
On Thu, Dec 15, 2016 at 12:14 AM, Tom Herbert wrote:
> I'm confused, doesn't 2dword == 1qword? Anyway, I think the qword
> functions are good enough. If someone needs to hash over some odd
> length they can either put them in a structure padded to 64 bits or
> call
Hey Tom,
On Wed, Dec 14, 2016 at 10:35 PM, Tom Herbert wrote:
> Those look good, although I would probably just do 1,2,3 words and
> then have a function that takes n words like jhash. Might want to call
> these dword to distinguish from 32 bit words in jhash.
So actually
Interesting. Evidently gcc 4.8 doesn't like my use of:
enum siphash_lengths {
SIPHASH24_KEY_LEN = 16,
SIPHASH24_ALIGNMENT = 8
};
I'll convert this to the more boring:
#define SIPHASH24_KEY_LEN 16
#define SIPHASH24_ALIGNMENT 8
--
To unsubscribe from this list: send the line
On Wed, Dec 14, 2016 at 9:12 PM, Tom Herbert wrote:
> If you pad the data structure to 64 bits then we can call the version
> of siphash that only deals in 64 bit words. Writing a zero in the
> padding will be cheaper than dealing with odd lengths in siphash24.
On Wed, Dec
Hey Tom,
Just following up on what I mentioned in my last email...
On Wed, Dec 14, 2016 at 8:35 PM, Jason A. Donenfeld <ja...@zx2c4.com> wrote:
> I think your suggestion for (2) will contribute to further
> optimizations for (1). In v2, I had another patch in there adding
>
Hi Hannes,
On Wed, Dec 14, 2016 at 4:09 PM, Hannes Frederic Sowa
wrote:
> Yes, numbers would be very usable here. I am mostly concerned about
> small plastic router cases. E.g. assume you double packet processing
> time with a change of the hashing function at what
Hi Hannes,
On Wed, Dec 14, 2016 at 8:22 PM, Hannes Frederic Sowa
wrote:
> I don't think this helps. Did you test it? I don't see reason why
> padding could be left out between `d' and `end' because of the flexible
> array member?
Because the type u8 doesn't require
Hi Tom,
On Wed, Dec 14, 2016 at 8:18 PM, Tom Herbert wrote:
> "super fast" is relative. My quick test shows that this faster than
> Toeplitz (good, but not exactly hard to achieve), but is about 4x
> slower than jhash.
Fast relative to other cryptographically secure PRFs.
Hi again,
On Wed, Dec 14, 2016 at 5:37 PM, Theodore Ts'o wrote:
> [3.606139] random benchmark!!
> [3.606276] get_random_int # cycles: 326578
> [3.606317] get_random_int_new # cycles: 95438
> [3.607423] get_random_bytes # cycles: 2653388
Looks to me like my siphash
does add a degree of natural entropy. So, in keeping with
this design, instead of the massive per-cpu 64-byte md5 state, there is
instead a per-cpu previously returned value for chaining.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumas...@
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumas...@gmail.com>
Cc: Daniel J. Bernstein <d...@cr.yp.to>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Eric Biggers <ebigge...@gmail.com>
Cc: David Laight <david
e/secure_seq.c
@@ -1,3 +1,5 @@
+/* Copyright (C) 2016 Jason A. Donenfeld <ja...@zx2c4.com>. All Rights
Reserved. */
+
#include
#include
#include
@@ -8,14 +10,14 @@
#include
#include
#include
-
+#include
#include
#if IS_ENABLED(CONFIG_IPV6) || IS_ENABLED(CONFIG_INE
Hi David,
On Wed, Dec 14, 2016 at 6:56 PM, David Miller wrote:
> Just marking the structure __packed, whether necessary or not, makes
> the compiler assume that the members are not aligned and causes
> byte-by-byte accesses to be performed for words.
> Never, _ever_, use
Hey Ted,
On Wed, Dec 14, 2016 at 5:37 PM, Theodore Ts'o wrote:
> One somewhat undesirable aspect of the current algorithm is that we
> never change random_int_secret.
Why exactly would this be a problem? So long as the secret is kept
secret, the PRF is secure. If an attacker can
On Wed, Dec 14, 2016 at 3:47 PM, David Laight wrote:
> Just remove the __packed and ensure that the structure is 'nice'.
> This includes ensuring there is no 'tail padding'.
> In some cases you'll need to put the port number into a 32bit field.
I'd rather not. There's no
Hi David,
On Wed, Dec 14, 2016 at 10:51 AM, David Laight <david.lai...@aculab.com> wrote:
> From: Jason A. Donenfeld
>> Sent: 14 December 2016 00:17
>> This gives a clear speed and security improvement. Rather than manually
>> filling MD5 buffers, we simply create a l
Hi David,
On Wed, Dec 14, 2016 at 10:56 AM, David Laight wrote:
> ...
>> +u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
> ...
>> + u64 k0 = get_unaligned_le64(key);
>> + u64 k1 = get_unaligned_le64(key + sizeof(u64));
> ...
>> +
, 79 insertions(+), 81 deletions(-)
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 88a8e429fc3e..abadc79cd5d3 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -1,3 +1,5 @@
+/* Copyright (C) 2016 Jason A. Donenfeld <ja...@zx2c4.com>. All Rights
Reserved. */
+
Many jhash users currently rely on the Nwords functions. In order to
make transitions to siphash fit something people already know about, we
provide analog functions here. This also winds up being nice for the
networking stack, where hashing 32-bit fields is common.
Signed-off-by: Jason
does add a degree of natural entropy. So, in keeping with
this design, instead of the massive per-cpu 64-byte md5 state, there is
instead a per-cpu previously returned value for chaining.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumas...@
s
a widely known high-speed solution to a widely known problem, and it's
time we catch-up.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumas...@gmail.com>
Cc: Daniel J. Bernstein <d...@cr.yp.to>
Cc: Linus Torvalds <torva...@linux-f
Hi Linus,
On Tue, Dec 13, 2016 at 8:25 PM, Linus Torvalds
wrote:
> Yeah,. the TCP sequence number md5_transform() cases are likely the
> best example of something where siphash might be good. That tends to
> be really just a couple words of data (the address and
Hi Eric,
On Tue, Dec 13, 2016 at 9:39 AM, Eric Biggers wrote:
> Hmm, I don't think you can really do load_unaligned_zeropad() without first
> checking for 'left != 0'. The fixup section for load_unaligned_zeropad()
> assumes that rounding the pointer down to a word boundary
s
a widely known high-speed solution to a widely known problem, and it's
time we catch-up.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumas...@gmail.com>
Cc: Daniel J. Bernstein <d...@cr.yp.to>
---
Changes from v3->v4:
- load_u
On Tue, Dec 13, 2016 at 12:01 AM, Andi Kleen wrote:
> It would be nice if the network code could be converted to use siphash
> for the secure sequence numbers. Right now it pulls in a lot of code
> for bigger secure hashes just for that, which is a problem for tiny
>
s
a widely known high-speed solution to a widely known problem, and it's
time we catch-up.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumas...@gmail.com>
Cc: Daniel J. Bernstein <d...@cr.yp.to>
---
Changes from v2->v3:
- The
On Mon, Dec 12, 2016 at 10:44 PM, Jason A. Donenfeld <ja...@zx2c4.com> wrote:
> #if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
>switch (left) {
>case 0: break;
>case 1: b |= data[0]; break;
>case 2: b |= get_unaligned_le1
Hi Linus,
> I guess you could try to just remove the "if (left)" test entirely, if
> it is at least partly the mispredict. It should do the right thing
> even with a zero count, and it might schedule the code better. Code
> size _should_ be better with the byte mask model (which won't matter
> in
Hey Eric,
Lots of good points; thanks for the review. Responses are inline below.
On Mon, Dec 12, 2016 at 6:42 AM, Eric Biggers wrote:
> Maybe add to the help text for CONFIG_TEST_HASH that it now tests siphash too?
Good call. Will do.
> This assumes the key and message
Hey Linus,
On Mon, Dec 12, 2016 at 5:01 AM, Linus Torvalds
wrote:
> The above is extremely inefficient. Considering that most kernel data
> would be expected to be smallish, that matters (ie the usual benchmark
> would not be about hashing megabytes of data, but
s
a widely known high-speed solution to a widely known problem, and it's
time we catch-up.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumas...@gmail.com>
Cc: Daniel J. Bernstein <d...@cr.yp.to>
---
include/linux/siphash.h |
Hi Greg,
Thanks for the review. Responses to your suggestions are inline below:
On Sat, Dec 10, 2016 at 1:37 PM, Greg KH wrote:
> Please use u64 and u8 instead of the userspace uint64_t and uint8_t
> types for kernel code. Yes, the ship has probably sailed for
Hi Herbert,
On Sat, Dec 10, 2016 at 6:37 AM, Herbert Xu wrote:
> As for AEAD we never had a sync interface to begin with and I
> don't think I'm going to add one.
That's too bad to hear. I hope you'll reconsider. Modern cryptographic
design is heading more and more
s
a widely known high-speed solution to a widely known problem, and it's
time we catch-up.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumas...@gmail.com>
Cc: Daniel J. Bernstein <d...@cr.yp.to>
---
include/linux/siphash.
Hi Herbert,
The scatterwalk_map_and_copy function copies ordinary buffers to and
from scatterlists. These buffers can, of course, be on the stack, and
this remains the most popular use of this function -- getting info
between stack buffers and DMA regions. It's mostly used for adding or
checking
Hello friendly test robot,
On Sun, Nov 13, 2016 at 12:27 AM, kbuild test robot wrote:
> Hi Jason,
>
> [auto build test ERROR on cryptodev/master]
That error was fixed by v4 in this series. The version that should be
tested and ultimately applied is v4 and can be found here:
By using the unaligned access helpers, we drastically improve
performance on small MIPS routers that have to go through the exception
fix-up handler for these unaligned accesses.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
crypto/poly1305_generic.
By using the unaligned access helpers, we drastically improve
performance on small MIPS routers that have to go through the exception
fix-up handler for these unaligned accesses.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
crypto/poly1305_generic.
On Mon, Nov 7, 2016 at 8:25 PM, Eric Biggers wrote:
> No it does *not* buffer all incoming blocks, which is why the source pointer
> can
> fall out of alignment. Yes, I actually tested this. In fact this situation
> is
> even hit, in both possible places, in the
By using the unaligned access helpers, we drastically improve
performance on small MIPS routers that have to go through the exception
fix-up handler for these unaligned accesses.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
crypto/poly1305_generic.c | 12 ++--
1 file c
On Mon, Nov 7, 2016 at 7:26 PM, Eric Biggers wrote:
>
> I was not referring to any users in particular, only what users could do. As
> an
> example, if you did crypto_shash_update() with 32, 15, then 17 bytes, and the
> underlying algorithm is poly1305-generic, the last
On Mon, Nov 7, 2016 at 7:08 PM, Jason A. Donenfeld <ja...@zx2c4.com> wrote:
> Hmm... The general data flow that strikes me as most pertinent is
> something like:
>
> struct sk_buff *skb = get_it_from_somewhere();
> skb = skb_share_check(skb, GFP_ATOMIC);
> num_
Hi Eric,
On Fri, Nov 4, 2016 at 6:37 PM, Eric Biggers wrote:
> I agree, and the current code is wrong; but do note that this proposal is
> correct for poly1305_setrkey() but not for poly1305_setskey() and
> poly1305_blocks(). In the latter two cases, 4-byte alignment of the
Hi David,
On Thu, Nov 3, 2016 at 6:08 PM, David Miller wrote:
> In any event no piece of code should be doing 32-bit word reads from
> addresses like "x + 3" without, at a very minimum, going through the
> kernel unaligned access handlers.
Excellent point. In otherwords,
Hi Herbert,
On Thu, Nov 3, 2016 at 1:49 AM, Herbert Xu wrote:
> FWIW I'd rather live with a 6% slowdown than having two different
> code paths in the generic code. Anyone who cares about 6% would
> be much better off writing an assembly version of the code.
Please
On Wed, Nov 2, 2016 at 10:26 PM, Herbert Xu wrote:
> What I'm interested in is whether the new code is sufficiently
> close in performance to the old code, particularonly on x86.
>
> I'd much rather only have a single set of code for all architectures.
> After all,
These architectures select HAVE_EFFICIENT_UNALIGNED_ACCESS:
s390 arm arm64 powerpc x86 x86_64
So, these will use the original old code.
The architectures that will thus use the new code are:
alpha arc avr32 blackfin c6x cris frv h7300 hexagon ia64 m32r m68k
metag microblaze mips mn10300 nios2
On Wed, Nov 2, 2016 at 9:09 PM, Herbert Xu wrote:
> Can you give some numbers please? What about other architectures
> that your patch impacts?
Per [1], the patch gives a 181% speed up on MIPS32r2.
[1]
On MIPS chips commonly found in inexpensive routers, this makes a big
difference in performance.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
crypto/poly1305_generic.c | 29 -
1 file changed, 28 insertions(+), 1 deletion(-)
diff --git a/
Hi Steffen,
On Fri, Oct 7, 2016 at 5:15 AM, Steffen Klassert
wrote:
> Why you want to have this?
I'm working on some bufferbloat/queue code that could benefit from
knowing how many items are currently in flight. The goal is to always
keep padata busy, but never
Since padata has a maximum number of inflight jobs, currently 1000, it's
very useful to know how many jobs are currently queued up. This adds a
simple helper function to expose this information.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
---
include/linux/padata.h | 2 ++
Hi Richard,
On Fri, Jul 1, 2016 at 1:42 PM, Richard Weinberger
wrote:
> So every logical tunnel will allocate a new net device?
> Doesn't this scale badly? I have ipsec alike setups
> with many, many road warriors in mind.
No, this isn't the case. Each net device
kernel tree and ditch the backwards-compatibility
#ifdefs. Importantly, though, WireGuard doesn't require any
modifications in other parts of the kernel, making it nicely
standalone. And most of all, the codebase is pretty short; I hope you
find it enjoyable to read.
I look forward to your feedback
Hi Steffen & Folks,
I submit a job to padata_do_parallel(). When the parallel() function
triggers, I do some things, and then call padata_do_serial(). Finally
the serial() function triggers, where I complete the job (check a
nonce, etc).
The padata API is very appealing because not only does it
Can we queue this up for stable too, please?
On Mon, Jan 25, 2016 at 2:59 PM, Herbert Xu wrote:
> Patch applied. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo
Hi Martin,
Your ChaCha20Poly1305 implementation when decrypting calls chacha20
decryption before it verifies the auth tag. Not only does this waste
CPU cycles, but it makes it impossible to attempt decryption of cipher
texts using different keys (until one is right) without creating a
copy, which
Hi Martin,
I'm trying to use your ChaPoly implementation and I'm running into
quite a bit of trouble. Firstly, it appears that the routine
intermittently overwrites boundaries that it shouldn't, resulting in
panics, which I'm still looking into, and could be caused by error on
my part. But, more
https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg17498.html
Eli's patch fixes the issue!
I'd recommend you send this off for stable inclusion, as it's
potentially remotely exploitable.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message
If the length of the plaintext is zero, there's no need to waste cycles
on encryption and decryption. Using the chacha20poly1305 construction
for zero-length plaintexts is a common way of using a shared encryption
key for AAD authentication.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com&
ing chacha20poly1305 with a zero length
input, which then calls chacha20, which calls the key setup routine,
which eventually OOPSes due to the uninitialized ->iv member.
Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com>
Cc: <sta...@vger.kernel.org>
---
crypto/ablkcipher.c | 2 +-
crypto/blkc
201 - 275 of 275 matches
Mail list logo