Re: [PATCH v3 1/3] siphash: add cryptographically secure hashtable function

2016-12-17 Thread Christian Kujau
On Thu, 15 Dec 2016, Jason A. Donenfeld wrote:
> > I'd still drop the "24" unless you really think we're going to have
> > multiple variants coming into the kernel.
> 
> Okay. I don't have a problem with this, unless anybody has some reason
> to the contrary.

What if the 2/4-round version falls and we need more rounds to withstand 
future cryptoanalysis? We'd then have siphash_ and siphash48_ functions, 
no? My amateurish bike-shedding argument would be "let's keep the 24 then" :-)

C.
-- 
BOFH excuse #354:

Chewing gum on /dev/sd3c
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread Jeffrey Walton
> As far as half-siphash is concerned, it occurs to me that the main
> problem will be those users who need to guarantee that output can't be
> guessed over a long period of time.  For example, if you have a
> long-running process, then the output needs to remain unguessable over
> potentially months or years, or else you might be weakening the ASLR
> protections.  If on the other hand, the hash table or the process will
> be going away in a matter of seconds or minutes, the requirements with
> respect to cryptographic strength go down significantly.

Perhaps SipHash-4-8 should be used instead of SipHash-2-4. I believe
SipHash-4-8 is recommended for the security conscious who want to be
more conservative in their security estimates.

SipHash-4-8 does not add much more processing. If you are clocking
SipHash-2-4 at 2.0 or 2.5 cpb, then SipHash-4-8 will run at 3.0 to
4.0. Both are well below MD5 times. (At least with the data sets I've
tested).

> Now, maybe this doesn't matter that much if we can guarantee (or make
> assumptions) that the attacker doesn't have unlimited access the
> output stream of get_random_{long,int}(), or if it's being used in an
> anti-DOS use case where it ultimately only needs to be harder than
> alternate ways of attacking the system.
>
> Rekeying every five minutes doesn't necessarily help the with respect
> to ASLR, but it might reduce the amount of the output stream that
> would be available to the attacker in order to be able to attack the
> get_random_{long,int}() generator, and it also reduces the value of
> doing that attack to only compromising the ASLR for those processes
> started within that five minute window.

Forgive my ignorance... I did not find reading on using the primitive
in a PRNG. Does anyone know what Aumasson or Bernstein have to say?
Aumasson's site does not seem to discuss the use case:
https://www.google.com/search?q=siphash+rng+site%3A131002.net. (And
their paper only mentions random-number once in a different context).

Making the leap from internal hash tables and short-lived network
packets to the rng case may leave something to be desired, especially
if the bits get used in unanticipated ways, like creating long term
private keys.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread Theodore Ts'o
On Fri, Dec 16, 2016 at 09:15:03PM -0500, George Spelvin wrote:
> >> - Ted, Andy Lutorminski and I will try to figure out a construction of
> >>   get_random_long() that we all like.

We don't have to find the most optimal solution right away; we can
approach this incrementally, after all.

So long as we replace get_random_{long,int}() with something which is
(a) strictly better in terms of security given today's use of MD5, and
(b) which is strictly *faster* than the current construction on 32-bit
and 64-bit systems, we can do that, and can try to make it be faster
while maintaining some minimum level of security which is sufficient
for all current users of get_random_{long,int}() and which can be
clearly artificulated for future users of get_random_{long,int}().

The main worry at this point I have is benchmarking siphash on a
32-bit system.  It may be that simply batching the chacha20 output so
that we're using the urandom construction more efficiently is the
better way to go, since that *does* meet the criteron of strictly more
secure and strictly faster than the current MD5 solution.  I'm open to
using siphash, but I want to see the the 32-bit numbers first.

As far as half-siphash is concerned, it occurs to me that the main
problem will be those users who need to guarantee that output can't be
guessed over a long period of time.  For example, if you have a
long-running process, then the output needs to remain unguessable over
potentially months or years, or else you might be weakening the ASLR
protections.  If on the other hand, the hash table or the process will
be going away in a matter of seconds or minutes, the requirements with
respect to cryptographic strength go down significantly.

Now, maybe this doesn't matter that much if we can guarantee (or make
assumptions) that the attacker doesn't have unlimited access the
output stream of get_random_{long,int}(), or if it's being used in an
anti-DOS use case where it ultimately only needs to be harder than
alternate ways of attacking the system.

Rekeying every five minutes doesn't necessarily help the with respect
to ASLR, but it might reduce the amount of the output stream that
would be available to the attacker in order to be able to attack the
get_random_{long,int}() generator, and it also reduces the value of
doing that attack to only compromising the ASLR for those processes
started within that five minute window.

Cheers,

- Ted

P.S.  I'm using ASLR as an example use case, above; of course we will
need to make similar eximainations of the other uses of
get_random_{long,int}().

P.P.S.  We might also want to think about potentially defining
get_random_{long,int}() to be unambiguously strong, and then creating
a get_weak_random_{long,int}() which on platforms where performance
might be a consideration, it uses a weaker algorithm perhaps with some
kind of rekeying interval.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin
To follow up on my comments that your benchmark results were peculiar,
here's my benchmark code.

It just computes the hash of all n*(n+1)/2 possible non-empty substrings
of a buffer of n (called "max" below) bytes.  "cpb" is "cycles per byte".

(The average length is (n+2)/3, c.f. https://oeis.org/A000292)

On x86-32, HSipHash is asymptotically twice the speed of SipHash,
rising to 2.5x for short strings:

SipHash/HSipHash benchmark, sizeof(long) = 4
 SipHash: max=   4 cycles= 10495 cpb=524.7500 (sum=47a4f5554869fa97)
HSipHash: max=   4 cycles=  3400 cpb=170. (sum=146a863e)
 SipHash: max=   8 cycles= 24468 cpb=203.9000 (sum=21c41a86355affcc)
HSipHash: max=   8 cycles=  9237 cpb= 76.9750 (sum=d3b5e0cd)
 SipHash: max=  16 cycles= 94622 cpb=115.9583 (sum=26d816b72721e48f)
HSipHash: max=  16 cycles= 34499 cpb= 42.2782 (sum=16bb7475)
 SipHash: max=  32 cycles=418767 cpb= 69.9811 (sum=dd5a97694b8a832d)
HSipHash: max=  32 cycles=156695 cpb= 26.1857 (sum=eed00fcb)
 SipHash: max=  64 cycles=   2119152 cpb= 46.3101 (sum=a2a725aecc09ed00)
HSipHash: max=  64 cycles=   1008678 cpb= 22.0428 (sum=99b9f4f)
 SipHash: max= 128 cycles=  12728659 cpb= 35.5788 (sum=420878cd20272817)
HSipHash: max= 128 cycles=   5452931 cpb= 15.2419 (sum=f1f4ad18)
 SipHash: max= 256 cycles=  38931946 cpb= 13.7615 (sum=e05dfb28b90dfd98)
HSipHash: max= 256 cycles=  13807312 cpb=  4.8805 (sum=ceeafcc1)
 SipHash: max= 512 cycles= 205537380 cpb=  9.1346 (sum=7d129d4de145fbea)
HSipHash: max= 512 cycles= 103420960 cpb=  4.5963 (sum=7f15a313)
 SipHash: max=1024 cycles=1540259472 cpb=  8.5817 (sum=cca7cbdc778ca8af)
HSipHash: max=1024 cycles= 796090824 cpb=  4.4355 (sum=d8f3374f)

On x86-64, SipHash is consistently faster, asymptotically approaching 2x
for long strings:

SipHash/HSipHash benchmark, sizeof(long) = 8
 SipHash: max=   4 cycles=  2642 cpb=132.1000 (sum=47a4f5554869fa97)
HSipHash: max=   4 cycles=  2498 cpb=124.9000 (sum=146a863e)
 SipHash: max=   8 cycles=  5270 cpb= 43.9167 (sum=21c41a86355affcc)
HSipHash: max=   8 cycles=  7140 cpb= 59.5000 (sum=d3b5e0cd)
 SipHash: max=  16 cycles= 19950 cpb= 24.4485 (sum=26d816b72721e48f)
HSipHash: max=  16 cycles= 23546 cpb= 28.8554 (sum=16bb7475)
 SipHash: max=  32 cycles= 80188 cpb= 13.4004 (sum=dd5a97694b8a832d)
HSipHash: max=  32 cycles=101218 cpb= 16.9148 (sum=eed00fcb)
 SipHash: max=  64 cycles=373286 cpb=  8.1575 (sum=a2a725aecc09ed00)
HSipHash: max=  64 cycles=535568 cpb= 11.7038 (sum=99b9f4f)
 SipHash: max= 128 cycles=   2075224 cpb=  5.8006 (sum=420878cd20272817)
HSipHash: max= 128 cycles=   3336820 cpb=  9.3270 (sum=f1f4ad18)
 SipHash: max= 256 cycles=  14276278 cpb=  5.0463 (sum=e05dfb28b90dfd98)
HSipHash: max= 256 cycles=  28847880 cpb= 10.1970 (sum=ceeafcc1)
 SipHash: max= 512 cycles=  50135180 cpb=  2.2281 (sum=7d129d4de145fbea)
HSipHash: max= 512 cycles=  86145916 cpb=  3.8286 (sum=7f15a313)
 SipHash: max=1024 cycles= 334111900 cpb=  1.8615 (sum=cca7cbdc778ca8af)
HSipHash: max=1024 cycles= 640432452 cpb=  3.5682 (sum=d8f3374f)


Here's the code; compile with -DSELFTEST.  (The main purpose of
printing the sum is to prevent dead code elimination.)


#if SELFTEST
#include 
#include 

static inline uint64_t rol64(uint64_t word, unsigned int shift)
{
return word << shift | word >> (64 - shift);
}

static inline uint32_t rol32(uint32_t word, unsigned int shift)
{
return word << shift | word >> (32 - shift);
}

static inline uint64_t get_unaligned_le64(void const *p)
{
return *(uint64_t const *)p;
}

static inline uint32_t get_unaligned_le32(void const *p)
{
return *(uint32_t const *)p;
}

static inline uint64_t le64_to_cpup(uint64_t const *p)
{
return *p;
}

static inline uint32_t le32_to_cpup(uint32_t const *p)
{
return *p;
}


#else
#include/* For rol64 */
#include 
#include 
#include 
#endif

/* The basic ARX mixing function, taken from Skein */
#define SIP_MIX(a, b, s) ((a) += (b), (b) = rol64(b, s), (b) ^= (a))

/*
 * The complete SipRound.  Note that, when unrolled twice like below,
 * the 32-bit rotates drop out on 32-bit machines.
 */
#define SIP_ROUND(a, b, c, d) \
(SIP_MIX(a, b, 13), SIP_MIX(c, d, 16), (a) = rol64(a, 32), \
 SIP_MIX(c, b, 17), SIP_MIX(a, d, 21), (c) = rol64(c, 32))

/*
 * This is rolled up more than most implementations, resulting in about
 * 55% the code size.  Speed is a few precent slower.  A crude benchmark
 * (for (i=1; i <= max; i++) for (j = 0; j < 4096-i; j++) hash(buf+j, i);)
 * produces the following timings (in usec):
 *
 *  i386i386i386x86_64  x86_64  x86_64  x86_64
 * Length   small   unroll  halfmd4 small   unroll  halfmd4 teahash
 * 1..4106910291608 195 160 399 690
 * 1..8248323813851 410 360 9881659
 * 1..12   430341526207 690 61816422690
 * 1..16   61225931866

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread Jeffrey Walton
> diff --git a/lib/test_siphash.c b/lib/test_siphash.c
> new file mode 100644
> index ..93549e4e22c5
> --- /dev/null
> +++ b/lib/test_siphash.c
> @@ -0,0 +1,83 @@
> +/* Test cases for siphash.c
> + *
> + * Copyright (C) 2016 Jason A. Donenfeld . All Rights 
> Reserved.
> + *
> + * This file is provided under a dual BSD/GPLv2 license.
> + *
> + * SipHash: a fast short-input PRF
> + * https://131002.net/siphash/
> + *
> + * This implementation is specifically for SipHash2-4.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* Test vectors taken from official reference source available at:
> + * https://131002.net/siphash/siphash24.c
> + */
> +static const u64 test_vectors[64] = {
> +   0x726fdb47dd0e0e31ULL, 0x74f839c593dc67fdULL, 0x0d6c8009d9a94f5aULL,
> +   0x85676696d7fb7e2dULL, 0xcf2794e0277187b7ULL, 0x18765564cd99a68dULL,
> +   0xcbc9466e58fee3ceULL, 0xab0200f58b01d137ULL, 0x93f5f5799a932462ULL,
> +   0x9e0082df0ba9e4b0ULL, 0x7a5dbbc594ddb9f3ULL, 0xf4b32f46226bada7ULL,
> +   0x751e8fbc860ee5fbULL, 0x14ea5627c0843d90ULL, 0xf723ca908e7af2eeULL,
> +   0xa129ca6149be45e5ULL, 0x3f2acc7f57c29bdbULL, 0x699ae9f52cbe4794ULL,
> +   0x4bc1b3f0968dd39cULL, 0xbb6dc91da77961bdULL, 0xbed65cf21aa2ee98ULL,
> +   0xd0f2cbb02e3b67c7ULL, 0x93536795e3a33e88ULL, 0xa80c038ccd5ccec8ULL,
> +   0xb8ad50c6f649af94ULL, 0xbce192de8a85b8eaULL, 0x17d835b85bbb15f3ULL,
> +   0x2f2e6163076bcfadULL, 0xde4daaaca71dc9a5ULL, 0xa6a2506687956571ULL,
> +   0xad87a3535c49ef28ULL, 0x32d892fad841c342ULL, 0x7127512f72f27cceULL,
> +   0xa7f32346f95978e3ULL, 0x12e0b01abb051238ULL, 0x15e034d40fa197aeULL,
> +   0x314dffbe0815a3b4ULL, 0x027990f029623981ULL, 0xcadcd4e59ef40c4dULL,
> +   0x9abfd8766a33735cULL, 0x0e3ea96b5304a7d0ULL, 0xad0c42d6fc585992ULL,
> +   0x187306c89bc215a9ULL, 0xd4a60abcf3792b95ULL, 0xf935451de4f21df2ULL,
> +   0xa9538f0419755787ULL, 0xdb9acddff56ca510ULL, 0xd06c98cd5c0975ebULL,
> +   0xe612a3cb9ecba951ULL, 0xc766e62cfcadaf96ULL, 0xee64435a9752fe72ULL,
> +   0xa192d576b245165aULL, 0x0a8787bf8ecb74b2ULL, 0x81b3e73d20b49b6fULL,
> +   0x7fa8220ba3b2eceaULL, 0x245731c13ca42499ULL, 0xb78dbfaf3a8d83bdULL,
> +   0xea1ad565322a1a0bULL, 0x60e61c23a3795013ULL, 0x6606d7e446282b93ULL,
> +   0x6ca4ecb15c5f91e1ULL, 0x9f626da15c9625f3ULL, 0xe51b38608ef25f57ULL,
> +   0x958a324ceb064572ULL
> +};
> +static const siphash_key_t test_key =
> +   { 0x0706050403020100ULL , 0x0f0e0d0c0b0a0908ULL };
> +
> +static int __init siphash_test_init(void)
> +{
> +   u8 in[64] __aligned(SIPHASH_ALIGNMENT);
> +   u8 in_unaligned[65];
> +   u8 i;
> +   int ret = 0;
> +
> +   for (i = 0; i < 64; ++i) {
> +   in[i] = i;
> +   in_unaligned[i + 1] = i;
> +   if (siphash(in, i, test_key) != test_vectors[i]) {
> +   pr_info("self-test aligned %u: FAIL\n", i + 1);
> +   ret = -EINVAL;
> +   }
> +   if (siphash_unaligned(in_unaligned + 1, i, test_key) != 
> test_vectors[i]) {
> +   pr_info("self-test unaligned %u: FAIL\n", i + 1);
> +   ret = -EINVAL;
> +   }
> +   }
> +   if (!ret)
> +   pr_info("self-tests: pass\n");
> +   return ret;
> +}
> +
> +static void __exit siphash_test_exit(void)
> +{
> +}
> +
> +module_init(siphash_test_init);
> +module_exit(siphash_test_exit);
> +
> +MODULE_AUTHOR("Jason A. Donenfeld ");
> +MODULE_LICENSE("Dual BSD/GPL");
> --
> 2.11.0
>

I believe the output of SipHash depends upon endianness. Folks who
request a digest through the af_alg interface will likely expect a
byte array.

I think that means on little endian machines, values like element 0
must be reversed byte reversed:

0x726fdb47dd0e0e31ULL => 31,0e,0e,dd,47,db,6f,72

If I am not mistaken, that value (and other tv's) are returned here:

return (v0 ^ v1) ^ (v2 ^ v3);

It may be prudent to include the endian reversal in the test to ensure
big endian machines produce expected results. Some closely related
testing on an old Apple PowerMac G5 revealed that result needed to be
reversed before returning it to a caller.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin
BTW, here's some SipHash code I wrote for Linux a while ago.

My target application was ext4 directory hashing, resulting in different
implementation choices, although I still think that a rolled-up
implementation like this is reasonable.  Reducing I-cache impact speeds
up the calling code.

One thing I'd like to suggest you steal is the way it handles the
fetch of the final partial word.  It's a lot smaller and faster than
an 8-way case statement.


#include/* For rol64 */
#include 
#include 
#include 

/* The basic ARX mixing function, taken from Skein */
#define SIP_MIX(a, b, s) ((a) += (b), (b) = rol64(b, s), (b) ^= (a))

/*
 * The complete SipRound.  Note that, when unrolled twice like below,
 * the 32-bit rotates drop out on 32-bit machines.
 */
#define SIP_ROUND(a, b, c, d) \
(SIP_MIX(a, b, 13), SIP_MIX(c, d, 16), (a) = rol64(a, 32), \
 SIP_MIX(c, b, 17), SIP_MIX(a, d, 21), (c) = rol64(c, 32))

/*
 * This is rolled up more than most implementations, resulting in about
 * 55% the code size.  Speed is a few precent slower.  A crude benchmark
 * (for (i=1; i <= max; i++) for (j = 0; j < 4096-i; j++) hash(buf+j, i);)
 * produces the following timings (in usec):
 *
 *  i386i386i386x86_64  x86_64  x86_64  x86_64
 * Length   small   unroll  halfmd4 small   unroll  halfmd4 teahash
 * 1..4106910291608 195 160 399 690
 * 1..8248323813851 410 360 9881659
 * 1..12   430341526207 690 61816422690
 * 1..16   612259318668 968 87623633786
 * 1..20   83488137   112451323118531625567
 * 1..24  10580   10327   139351657150440667635
 * 1..28  13211   12956   168032069187150289759
 * 1..32  15843   15572   19725247022606084   11932
 * 1..36  18864   18609   24259293426787566   14794
 * 1..1024  5890194 6130242 10264816 881933  881244 3617392 7589036
 *
 * The performance penalty is quite minor, decreasing for long strings,
 * and it's significantly faster than half_md4, so I'm going for the
 * I-cache win.
 */
uint64_t
siphash24(char const *in, size_t len, uint32_t const seed[4])
{
uint64_t a = 0x736f6d6570736575;/* somepseu */
uint64_t b = 0x646f72616e646f6d;/* dorandom */
uint64_t c = 0x6c7967656e657261;/* lygenera */
uint64_t d = 0x7465646279746573;/* tedbytes */
uint64_t m = 0;
uint8_t padbyte = len;

/*
 * Mix in the 128-bit hash seed.  This is in a format convenient
 * to the ext3/ext4 code.  Please feel free to adapt the
 * */
if (seed) {
m = seed[2] | (uint64_t)seed[3] << 32;
b ^= m;
d ^= m;
m = seed[0] | (uint64_t)seed[1] << 32;
/* a ^= m; is done in loop below */
c ^= m;
}

/*
 * By using the same SipRound code for all iterations, we
 * save space, at the expense of some branch prediction.  But
 * branch prediction is hard because of variable length anyway.
 */
len = len/8 + 3;/* Now number of rounds to perform */
do {
a ^= m;

switch (--len) {
unsigned bytes;

default:/* Full words */
d ^= m = get_unaligned_le64(in);
in += 8;
break;
case 2: /* Final partial word */
/*
 * We'd like to do one 64-bit fetch rather than
 * mess around with bytes, but reading past the end
 * might hit a protection boundary.  Fortunately,
 * we know that protection boundaries are aligned,
 * so we can consider only three cases:
 * - The remainder occupies zero words
 * - The remainder fits into one word
 * - The remainder straddles two words
 */
bytes = padbyte & 7;

if (bytes == 0) {
m = 0;
} else {
unsigned offset = (unsigned)(uintptr_t)in & 7;

if (offset + bytes <= 8) {
m = le64_to_cpup((uint64_t const *)
(in - offset));
m >>= 8*offset;
} else {
m = get_unaligned_le64(in);
}
m &= ((uint64_t)1 << 8*