Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-23 Thread Jason A. Donenfeld
On Fri, Dec 23, 2016 at 7:19 PM, Hannes Frederic Sowa
 wrote:
> Factoring out sha3

Per the other thread, you probably don't actually want SHA3, because
it's slow in software. You want SHA2. If you want something faster and
better, then Blake2 is most certainly the way to go. I'll be
submitting some patches in a month or so adding Blake2 for future use
in the tree.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-23 Thread Hannes Frederic Sowa
On 23.12.2016 17:42, Andy Lutomirski wrote:
> On Fri, Dec 23, 2016 at 8:23 AM, Andy Lutomirski  wrote:
>> On Fri, Dec 23, 2016 at 3:59 AM, Daniel Borkmann  
>> wrote:
>>> On 12/23/2016 11:59 AM, Hannes Frederic Sowa wrote:

 On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote:
>
> On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote:
>>
>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>>>
>>> [...]
>>>
>> The hashing is not a proper sha1 neither, unfortunately. I think that
>> is why it will have a custom implementation in iproute2?
>
>
> Still trying to catch up on this admittedly bit confusing thread. I
> did run automated tests over couple of days comparing the data I got
> from fdinfo with the one from af_alg and found no mismatch on the test
> cases varying from min to max possible program sizes. In the process
> of testing, as you might have seen on netdev, I found couple of other
> bugs in bpf code along the way and fixed them up as well. So my question,
> do you or Andy or anyone participating in claiming this have any
> concrete data or test cases that suggests something different? If yes,
> I'm very curious to hear about it and willing fix it up, of course.
> When I'm back from pto I'll prep and cook up my test suite to be
> included into the selftests/bpf/, should have done this initially,
> sorry about that. I'll also post something to expose the alg, that
> sounds fine to me.


 Looking into your code closer, I noticed that you indeed seem to do the
 finalization of sha-1 by hand by aligning and padding the buffer
 accordingly and also patching in the necessary payload length.

 Apologies for my side for claiming that this is not correct sha1
 output, I was only looking at sha_transform and its implementation and
 couldn't see the padding and finalization round with embedding the data
 length in there and hadn't thought of it being done manually.

 Anyway, is it difficult to get the sha finalization into some common
 code library? It is not very bpf specific and crypto code reviewers
 won't find it there at all.
>>>
>>>
>>> Yes, sure, I'll rework it that way (early next year when I'm back if
>>> that's fine with you).
>>
>> Can we make it SHA-256 before 4.10 comes out, though?  This really
>> looks like it will be used in situations where collisions matter and
>> it will be exposed to malicious programs, and SHA-1 should not be used
>> for new designs for this purpose because it simply isn't long enough.
>>
>> Also, a SHA-1 digest isn't a pile of u32s, so u32 digest[...] is very
>> misleading.  That should be u8 or, at the very least, __be32.
>>
>> I realize that there isn't a sha-256 implementation in lib, but would
>> it really be so bad to make the bpf digest only work (for now) when
>> crypto is enabled?  I would *love* to see the crypto core learn how to
>> export simple primitives for direct use without needing the whole
>> crypto core, and this doesn't seem particularly hard to do, but I
>> don't think that's 4.10 material.
> 
> I'm going to try to send out RFC patches for all of this today or
> tomorrow.  It doesn't look bad at all.

Factoring out sha3 to lib/ and use it as standalone and in crypto api
doesn't seem hard, yep. I also proposed this to Daniel offlist.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-23 Thread Andy Lutomirski
On Fri, Dec 23, 2016 at 8:23 AM, Andy Lutomirski  wrote:
> On Fri, Dec 23, 2016 at 3:59 AM, Daniel Borkmann  wrote:
>> On 12/23/2016 11:59 AM, Hannes Frederic Sowa wrote:
>>>
>>> On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote:

 On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote:
>
> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>>
>> [...]
>>
> The hashing is not a proper sha1 neither, unfortunately. I think that
> is why it will have a custom implementation in iproute2?


 Still trying to catch up on this admittedly bit confusing thread. I
 did run automated tests over couple of days comparing the data I got
 from fdinfo with the one from af_alg and found no mismatch on the test
 cases varying from min to max possible program sizes. In the process
 of testing, as you might have seen on netdev, I found couple of other
 bugs in bpf code along the way and fixed them up as well. So my question,
 do you or Andy or anyone participating in claiming this have any
 concrete data or test cases that suggests something different? If yes,
 I'm very curious to hear about it and willing fix it up, of course.
 When I'm back from pto I'll prep and cook up my test suite to be
 included into the selftests/bpf/, should have done this initially,
 sorry about that. I'll also post something to expose the alg, that
 sounds fine to me.
>>>
>>>
>>> Looking into your code closer, I noticed that you indeed seem to do the
>>> finalization of sha-1 by hand by aligning and padding the buffer
>>> accordingly and also patching in the necessary payload length.
>>>
>>> Apologies for my side for claiming that this is not correct sha1
>>> output, I was only looking at sha_transform and its implementation and
>>> couldn't see the padding and finalization round with embedding the data
>>> length in there and hadn't thought of it being done manually.
>>>
>>> Anyway, is it difficult to get the sha finalization into some common
>>> code library? It is not very bpf specific and crypto code reviewers
>>> won't find it there at all.
>>
>>
>> Yes, sure, I'll rework it that way (early next year when I'm back if
>> that's fine with you).
>
> Can we make it SHA-256 before 4.10 comes out, though?  This really
> looks like it will be used in situations where collisions matter and
> it will be exposed to malicious programs, and SHA-1 should not be used
> for new designs for this purpose because it simply isn't long enough.
>
> Also, a SHA-1 digest isn't a pile of u32s, so u32 digest[...] is very
> misleading.  That should be u8 or, at the very least, __be32.
>
> I realize that there isn't a sha-256 implementation in lib, but would
> it really be so bad to make the bpf digest only work (for now) when
> crypto is enabled?  I would *love* to see the crypto core learn how to
> export simple primitives for direct use without needing the whole
> crypto core, and this doesn't seem particularly hard to do, but I
> don't think that's 4.10 material.

I'm going to try to send out RFC patches for all of this today or
tomorrow.  It doesn't look bad at all.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-23 Thread Andy Lutomirski
On Fri, Dec 23, 2016 at 3:59 AM, Daniel Borkmann  wrote:
> On 12/23/2016 11:59 AM, Hannes Frederic Sowa wrote:
>>
>> On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote:
>>>
>>> On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote:

 On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>
> [...]
>
 The hashing is not a proper sha1 neither, unfortunately. I think that
 is why it will have a custom implementation in iproute2?
>>>
>>>
>>> Still trying to catch up on this admittedly bit confusing thread. I
>>> did run automated tests over couple of days comparing the data I got
>>> from fdinfo with the one from af_alg and found no mismatch on the test
>>> cases varying from min to max possible program sizes. In the process
>>> of testing, as you might have seen on netdev, I found couple of other
>>> bugs in bpf code along the way and fixed them up as well. So my question,
>>> do you or Andy or anyone participating in claiming this have any
>>> concrete data or test cases that suggests something different? If yes,
>>> I'm very curious to hear about it and willing fix it up, of course.
>>> When I'm back from pto I'll prep and cook up my test suite to be
>>> included into the selftests/bpf/, should have done this initially,
>>> sorry about that. I'll also post something to expose the alg, that
>>> sounds fine to me.
>>
>>
>> Looking into your code closer, I noticed that you indeed seem to do the
>> finalization of sha-1 by hand by aligning and padding the buffer
>> accordingly and also patching in the necessary payload length.
>>
>> Apologies for my side for claiming that this is not correct sha1
>> output, I was only looking at sha_transform and its implementation and
>> couldn't see the padding and finalization round with embedding the data
>> length in there and hadn't thought of it being done manually.
>>
>> Anyway, is it difficult to get the sha finalization into some common
>> code library? It is not very bpf specific and crypto code reviewers
>> won't find it there at all.
>
>
> Yes, sure, I'll rework it that way (early next year when I'm back if
> that's fine with you).

Can we make it SHA-256 before 4.10 comes out, though?  This really
looks like it will be used in situations where collisions matter and
it will be exposed to malicious programs, and SHA-1 should not be used
for new designs for this purpose because it simply isn't long enough.

Also, a SHA-1 digest isn't a pile of u32s, so u32 digest[...] is very
misleading.  That should be u8 or, at the very least, __be32.

I realize that there isn't a sha-256 implementation in lib, but would
it really be so bad to make the bpf digest only work (for now) when
crypto is enabled?  I would *love* to see the crypto core learn how to
export simple primitives for direct use without needing the whole
crypto core, and this doesn't seem particularly hard to do, but I
don't think that's 4.10 material.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-23 Thread Daniel Borkmann

On 12/23/2016 11:59 AM, Hannes Frederic Sowa wrote:

On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote:

On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote:

On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:

[...]

The hashing is not a proper sha1 neither, unfortunately. I think that
is why it will have a custom implementation in iproute2?


Still trying to catch up on this admittedly bit confusing thread. I
did run automated tests over couple of days comparing the data I got
from fdinfo with the one from af_alg and found no mismatch on the test
cases varying from min to max possible program sizes. In the process
of testing, as you might have seen on netdev, I found couple of other
bugs in bpf code along the way and fixed them up as well. So my question,
do you or Andy or anyone participating in claiming this have any
concrete data or test cases that suggests something different? If yes,
I'm very curious to hear about it and willing fix it up, of course.
When I'm back from pto I'll prep and cook up my test suite to be
included into the selftests/bpf/, should have done this initially,
sorry about that. I'll also post something to expose the alg, that
sounds fine to me.


Looking into your code closer, I noticed that you indeed seem to do the
finalization of sha-1 by hand by aligning and padding the buffer
accordingly and also patching in the necessary payload length.

Apologies for my side for claiming that this is not correct sha1
output, I was only looking at sha_transform and its implementation and
couldn't see the padding and finalization round with embedding the data
length in there and hadn't thought of it being done manually.

Anyway, is it difficult to get the sha finalization into some common
code library? It is not very bpf specific and crypto code reviewers
won't find it there at all.


Yes, sure, I'll rework it that way (early next year when I'm back if
that's fine with you).

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-23 Thread Hannes Frederic Sowa
On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote:
> On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote:
> > On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
> > > On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
> > >  wrote:
> > > > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
> > > > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
> > > > >  wrote:
> > > > > > IPv6 you cannot touch anymore. The hashing algorithm is part of 
> > > > > > uAPI.
> > > > > > You don't want to give people new IPv6 addresses with the same 
> > > > > > stable
> > > > > > secret (across reboots) after a kernel upgrade. Maybe they lose
> > > > > > connectivity then and it is extra work?
> > > > > 
> > > > > Ahh, too bad. So it goes.
> > > > 
> > > > If no other users survive we can put it into the ipv6 module.
> > > > 
> > > > > > The bpf hash stuff can be changed during this merge window, as it is
> > > > > > not yet in a released kernel. Albeit I would probably have preferred
> > > > > > something like sha256 here, which can be easily replicated by user
> > > > > > space tools (minus the problem of patching out references to not
> > > > > > hashable data, which must be zeroed).
> > > > > 
> > > > > Oh, interesting, so time is of the essence then. Do you want to handle
> > > > > changing the new eBPF code to something not-SHA1 before it's too late,
> > > > > as part of a ne
> > > 
> > > w patchset that can fast track itself to David? And
> > > > > then I can preserve my large series for the next merge window.
> > > > 
> > > > This algorithm should be a non-seeded algorithm, because the hashes
> > > > should be stable and verifiable by user space tooling. Thus this would
> > > > need a hashing algorithm that is hardened against pre-image
> > > > attacks/collision resistance, which siphash is not. I would prefer some
> > > > higher order SHA algorithm for that actually.
> > > > 
> > > 
> > > You mean:
> > > 
> > > commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
> > > Author: Daniel Borkmann 
> > > Date:   Sun Dec 4 23:19:41 2016 +0100
> > > 
> > >  bpf: add prog_digest and expose it via fdinfo/netlink
> > > 
> > > 
> > > Yes, please!  This actually matters for security -- imagine a
> > > malicious program brute-forcing a collision so that it gets loaded
> > > wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
> > > (preferably the latter).  Speed basically doesn't matter here and
> > > Blake2 is both less stable (didn't they slightly change it recently?)
> > > and much less well studied.
> > 
> > We don't prevent ebpf programs being loaded based on the digest but
> > just to uniquely identify loaded programs from user space and match up
> > with their source.
> > 
> > There have been talks about signing bpf programs, thus this would
> > probably need another digest algorithm additionally to that one,
> > wasting probably instructions. Probably going somewhere in direction of
> > PKCS#7 might be the thing to do (which leads to the problem to make
> > PKCS#7 attackable by ordinary unpriv users, hmpf).
> > 
> > > My inclination would have been to seed them with something that isn't
> > > exposed to userspace for the precise reason that it would prevent user
> > > code from making assumptions about what's in the hash.  But if there's
> > > a use case for why user code needs to be able to calculate the hash on
> > > its own, then that's fine.  But perhaps the actual fdinfo string
> > > should be "sha256:abcd1234..." to give some flexibility down the road.
> > > 
> > > Also:
> > > 
> > > +   result = (__force __be32 *)fp->digest;
> > > +   for (i = 0; i < SHA_DIGEST_WORDS; i++)
> > > +   result[i] = cpu_to_be32(fp->digest[i]);
> > > 
> > > Everyone, please, please, please don't open-code crypto primitives.
> > > Is this and the code above it even correct?  It might be but on a very
> > > brief glance it looks wrong to me.  If you're doing this to avoid
> > > depending on crypto, then fix crypto so you can pull in the algorithm
> > > without pulling in the whole crypto core.
> > 
> > The hashing is not a proper sha1 neither, unfortunately. I think that
> > is why it will have a custom implementation in iproute2?
> 
> Still trying to catch up on this admittedly bit confusing thread. I
> did run automated tests over couple of days comparing the data I got
> from fdinfo with the one from af_alg and found no mismatch on the test
> cases varying from min to max possible program sizes. In the process
> of testing, as you might have seen on netdev, I found couple of other
> bugs in bpf code along the way and fixed them up as well. So my question,
> do you or Andy or anyone participating in claiming this have any
> concrete data or test cases that suggests something different? If yes,
> I'm very curious to hear about it and willing fix it up, of course.
> When I'm back from pto I'll prep 

Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-23 Thread Daniel Borkmann

On 12/22/2016 06:25 PM, Andy Lutomirski wrote:

On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa

[...]

I wondered if bpf program loading should have used the module loading
infrastructure from the beginning...


That would be way too complicated and would be nasty for the unprivileged cases.


Also, there are users be it privileged or not that don't require to have a full
obj loader from user space, but are fine with just hard-coding parts or all of
the insns in their application. Back then we settled with using fds based on
Andy's suggestion, it has both ups and downs as we saw along the way but worked
okay thus far.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-23 Thread Daniel Borkmann

On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote:

On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:

On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
 wrote:

On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:

On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
 wrote:

IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
You don't want to give people new IPv6 addresses with the same stable
secret (across reboots) after a kernel upgrade. Maybe they lose
connectivity then and it is extra work?


Ahh, too bad. So it goes.


If no other users survive we can put it into the ipv6 module.


The bpf hash stuff can be changed during this merge window, as it is
not yet in a released kernel. Albeit I would probably have preferred
something like sha256 here, which can be easily replicated by user
space tools (minus the problem of patching out references to not
hashable data, which must be zeroed).


Oh, interesting, so time is of the essence then. Do you want to handle
changing the new eBPF code to something not-SHA1 before it's too late,
as part of a ne


w patchset that can fast track itself to David? And

then I can preserve my large series for the next merge window.


This algorithm should be a non-seeded algorithm, because the hashes
should be stable and verifiable by user space tooling. Thus this would
need a hashing algorithm that is hardened against pre-image
attacks/collision resistance, which siphash is not. I would prefer some
higher order SHA algorithm for that actually.



You mean:

commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
Author: Daniel Borkmann 
Date:   Sun Dec 4 23:19:41 2016 +0100

 bpf: add prog_digest and expose it via fdinfo/netlink


Yes, please!  This actually matters for security -- imagine a
malicious program brute-forcing a collision so that it gets loaded
wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
(preferably the latter).  Speed basically doesn't matter here and
Blake2 is both less stable (didn't they slightly change it recently?)
and much less well studied.


We don't prevent ebpf programs being loaded based on the digest but
just to uniquely identify loaded programs from user space and match up
with their source.

There have been talks about signing bpf programs, thus this would
probably need another digest algorithm additionally to that one,
wasting probably instructions. Probably going somewhere in direction of
PKCS#7 might be the thing to do (which leads to the problem to make
PKCS#7 attackable by ordinary unpriv users, hmpf).


My inclination would have been to seed them with something that isn't
exposed to userspace for the precise reason that it would prevent user
code from making assumptions about what's in the hash.  But if there's
a use case for why user code needs to be able to calculate the hash on
its own, then that's fine.  But perhaps the actual fdinfo string
should be "sha256:abcd1234..." to give some flexibility down the road.

Also:

+   result = (__force __be32 *)fp->digest;
+   for (i = 0; i < SHA_DIGEST_WORDS; i++)
+   result[i] = cpu_to_be32(fp->digest[i]);

Everyone, please, please, please don't open-code crypto primitives.
Is this and the code above it even correct?  It might be but on a very
brief glance it looks wrong to me.  If you're doing this to avoid
depending on crypto, then fix crypto so you can pull in the algorithm
without pulling in the whole crypto core.


The hashing is not a proper sha1 neither, unfortunately. I think that
is why it will have a custom implementation in iproute2?


Still trying to catch up on this admittedly bit confusing thread. I
did run automated tests over couple of days comparing the data I got
from fdinfo with the one from af_alg and found no mismatch on the test
cases varying from min to max possible program sizes. In the process
of testing, as you might have seen on netdev, I found couple of other
bugs in bpf code along the way and fixed them up as well. So my question,
do you or Andy or anyone participating in claiming this have any
concrete data or test cases that suggests something different? If yes,
I'm very curious to hear about it and willing fix it up, of course.
When I'm back from pto I'll prep and cook up my test suite to be
included into the selftests/bpf/, should have done this initially,
sorry about that. I'll also post something to expose the alg, that
sounds fine to me.


I wondered if bpf program loading should have used the module loading
infrastructure from the beginning...


At the very least, there should be a separate function that calculates
the hash of a buffer and that function should explicitly run itself
against test vectors of various lengths to make sure that it's
calculating what it claims to be calculating.  And it doesn't look
like the userspace code has landed, so, if this thing isn't
calculating SHA1 correctly, it's 

Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Hannes Frederic Sowa
On 22.12.2016 20:56, Andy Lutomirski wrote:
> It's also not quite clear to me why userspace needs to be able to
> calculate the digest on its own.  A bpf(BPF_CALC_PROGRAM_DIGEST)
> command that takes a BPF program as input and hashes it would seem to
> serve the same purpose, and that would allow the kernel to key the
> digest and change the algorithm down the road without breaking things.

I think that people expect digests of BPF programs to be stable over
time and reboots.

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Andy Lutomirski
On Thu, Dec 22, 2016 at 11:34 AM, Alexei Starovoitov
 wrote:
> On Thu, Dec 22, 2016 at 9:25 AM, Andy Lutomirski  wrote:
>> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
>>  wrote:
>>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>>>
>>> We don't prevent ebpf programs being loaded based on the digest but
>>> just to uniquely identify loaded programs from user space and match up
>>> with their source.
>>
>> The commit log talks about using the hash to see if the program has
>> already been compiled and JITted.  If that's done, then a collision
>> will directly cause the kernel to malfunction.
>
> Andy, please read the code.
> we could have used jhash there just as well.
> Collisions are fine.

There's relevant in the code to read yet AFAICS.  The code exports it
via fdinfo, and userspace is expected to do something with it.  The
commit message says:

When programs are pinned and retrieved by an ELF loader, the loader
can check the program's digest through fdinfo and compare it against
one that was generated over the ELF file's program section to see
if the program needs to be reloaded.

I assume this means that a userspace component is expected to compare
the digest of a loaded program to a digest of a program it wants to
load and to use the result of the comparison to decide whether the
programs are the same.  If that's indeed the case (and it sure sounds
like it, and I fully expect CRIU to do very similar things when
support is added), then malicious collisions do matter.

It's also not quite clear to me why userspace needs to be able to
calculate the digest on its own.  A bpf(BPF_CALC_PROGRAM_DIGEST)
command that takes a BPF program as input and hashes it would seem to
serve the same purpose, and that would allow the kernel to key the
digest and change the algorithm down the road without breaking things.

Regardless, adding a new hash algorithm that is almost-but-not-quite
SHA-1 and making it a stable interface to userspace is not a good
thing.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Theodore Ts'o
On Thu, Dec 22, 2016 at 07:08:37PM +0100, Hannes Frederic Sowa wrote:
> I wasn't concerned about performance but more about DoS resilience. I
> wonder how safe half md4 actually is in terms of allowing users to
> generate long hash chains in the filesystem (in terms of length
> extension attacks against half_md4).
> 
> In ext4, is it actually possible that a "disrupter" learns about the
> hashing secret in the way how the inodes are returned during getdents?

They'd have to be a local user, who can execute telldir(3) --- in
which case there are plenty of other denial of service attacks one
could carry out that would be far more devastating.

It might also be an issue if the file system is exposed via NFS, but
again, there are so many other ways an attacker could DoS a NFS server
that I don't think of it as a much of a concern.

Keep in mind that worst someone can do is cause directory inserts to
fail with an ENOSPC, and there are plenty of other ways of doing that
--- such as consuming all of the blocks and inodes in the file system,
for example.

So it's a threat, but not a high priority one as far as I'm concerned.
And if this was a problem in actual practice, users could switch to
the TEA based hash, which should be far harder to attack, and
available today.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Alexei Starovoitov
On Thu, Dec 22, 2016 at 9:25 AM, Andy Lutomirski  wrote:
> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
>  wrote:
>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>>
>> We don't prevent ebpf programs being loaded based on the digest but
>> just to uniquely identify loaded programs from user space and match up
>> with their source.
>
> The commit log talks about using the hash to see if the program has
> already been compiled and JITted.  If that's done, then a collision
> will directly cause the kernel to malfunction.

Andy, please read the code.
we could have used jhash there just as well.
Collisions are fine.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Jason A. Donenfeld
On Thu, Dec 22, 2016 at 5:59 PM, Hannes Frederic Sowa
 wrote:
> We don't prevent ebpf programs being loaded based on the digest but
> just to uniquely identify loaded programs from user space and match up
> with their source.

Okay, so in that case, a weak hashing function like SHA1 could result
in a real vulnerability. Therefore, this SHA1 stuff needs to be
reverted immediately, pending a different implementation. If this has
ever shipped in a kernel version, it could even deserve a CVE. No
SHA1!

> The hashing is not a proper sha1 neither, unfortunately. I think that
> is why it will have a custom implementation in iproute2?

Jeepers creepers. So for some ungodly reason, LKML has invented yet
another homebrewed crypto primitive. This story really gets more
horrifying every day. No bueno.

So yea, let's revert and re-commit (repeal and replace? just
kidding...). Out with SHA-1, in with Blake2 or SHA2.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld
On Thu, Dec 22, 2016 at 7:08 PM, Hannes Frederic Sowa
 wrote:
> I wasn't concerned about performance but more about DoS resilience. I
> wonder how safe half md4 actually is in terms of allowing users to
> generate long hash chains in the filesystem (in terms of length
> extension attacks against half_md4).

AFAIK, this is a real vulnerability that needs to be addressed.

Judging by Ted's inquiry about my siphash testing suite, I assume he's
probably tinkering around with it as we speak. :)


Meanwhile I've separated things into several trees:

1. chacha20 rng, already submitted:
https://git.zx2c4.com/linux-dev/log/?h=random-next

2. md5 cleanup, not yet submitted:
https://git.zx2c4.com/linux-dev/log/?h=md5-cleanup

3. md4 cleanup, already submitted:
https://git.zx2c4.com/linux-dev/log/?h=ext4-next-md4-cleanup

4. siphash and networking, not yet submitted as a x/4 series:
https://git.zx2c4.com/linux-dev/log/?h=net-next-siphash

I'll submit (4) in a couple of days, waiting for any comments on the
existing patch-set.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Hannes Frederic Sowa
On 22.12.2016 16:54, Theodore Ts'o wrote:
> On Thu, Dec 22, 2016 at 02:10:33PM +0100, Jason A. Donenfeld wrote:
>> On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa
>>  wrote:
>>> following up on what appears to be a random subject: ;)
>>>
>>> IIRC, ext4 code by default still uses half_md4 for hashing of filenames
>>> in the htree. siphash seems to fit this use case pretty good.
>>
>> I saw this too. I'll try to address it in v8 of this series.
> 
> This is a separate issue, and this series is getting a bit too
> complex.  So I'd suggest pushing this off to a separate change.
> 
> Changing the htree hash algorithm is an on-disk format change, and so
> we couldn't roll it out until e2fsprogs gets updated and rolled out
> pretty broadley.  In fact George sent me patches to add siphash as a
> hash algorithm for htree a while back (for both the kernel and
> e2fsprogs), but I never got around to testing and applying them,
> mainly because while it's technically faster, I had other higher
> priority issues to work on --- and see previous comments regarding
> pixel peeping.  Improving the hash algorithm by tens or even hundreds
> of nanoseconds isn't really going to matter since we only do a htree
> lookup on a file creation or cold cache lookup, and the SSD or HDD I/O
> times will dominate.  And from the power perspective, saving
> microwatts of CPU power isn't going to matter if you're going to be
> spinning up the storage device

I wasn't concerned about performance but more about DoS resilience. I
wonder how safe half md4 actually is in terms of allowing users to
generate long hash chains in the filesystem (in terms of length
extension attacks against half_md4).

In ext4, is it actually possible that a "disrupter" learns about the
hashing secret in the way how the inodes are returned during getdents?

Thanks,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Hannes Frederic Sowa
On Thu, 2016-12-22 at 09:25 -0800, Andy Lutomirski wrote:
> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
>  wrote:
> > On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
> > > 
> > > You mean:
> > > 
> > > commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
> > > Author: Daniel Borkmann 
> > > Date:   Sun Dec 4 23:19:41 2016 +0100
> > > 
> > > bpf: add prog_digest and expose it via fdinfo/netlink
> > > 
> > > 
> > > Yes, please!  This actually matters for security -- imagine a
> > > malicious program brute-forcing a collision so that it gets loaded
> > > wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
> > > (preferably the latter).  Speed basically doesn't matter here and
> > > Blake2 is both less stable (didn't they slightly change it recently?)
> > > and much less well studied.
> > 
> > We don't prevent ebpf programs being loaded based on the digest but
> > just to uniquely identify loaded programs from user space and match up
> > with their source.
> 
> The commit log talks about using the hash to see if the program has
> already been compiled and JITted.  If that's done, then a collision
> will directly cause the kernel to malfunction.

Yeah, it still shouldn't crash the kernel but it could cause
malfunctions because assumptions are not met from user space thus it
could act in a strange way:

My personal biggest concern is that users of this API will at some
point in time assume this digist is unique (as a key itself for a
hashtables f.e.), while it is actually not (and not enforced so by the
kernel). If you can get an unpriv ebpf program inserted to the kernel
with the same weak hash, a controller daemon could pick it up and bind
it to another ebpf hook, probably outside of the unpriv realm the user
was in before. Only the sorting matters, which might be unstable and is
not guaranteed by anything in most hash table based data structures.

The API seems flawed to me.

> > > My inclination would have been to seed them with something that isn't
> > > exposed to userspace for the precise reason that it would prevent user
> > > code from making assumptions about what's in the hash.  But if there's
> > > a use case for why user code needs to be able to calculate the hash on
> > > its own, then that's fine.  But perhaps the actual fdinfo string
> > > should be "sha256:abcd1234..." to give some flexibility down the road.

To add to this, I am very much in favor of that. Right now it doesn't
have a name because it is a custom algorithm. ;)

> > > 
> > > Also:
> > > 
> > > +   result = (__force __be32 *)fp->digest;
> > > +   for (i = 0; i < SHA_DIGEST_WORDS; i++)
> > > +   result[i] = cpu_to_be32(fp->digest[i]);
> > > 
> > > Everyone, please, please, please don't open-code crypto primitives.
> > > Is this and the code above it even correct?  It might be but on a very
> > > brief glance it looks wrong to me.  If you're doing this to avoid
> > > depending on crypto, then fix crypto so you can pull in the algorithm
> > > without pulling in the whole crypto core.
> > 
> > The hashing is not a proper sha1 neither, unfortunately. I think that
> > is why it will have a custom implementation in iproute2?
> 
> Putting on crypto hat:
> 
> NAK NAK NAK NAK NAK.  "The Linux kernel invented a new primitive in
> 2016 when people know better and is going to handle it by porting that
> new primitive to userspace" is not a particularly good argument.
> 
> Okay, crypto hack back off.
> 
> > 
> > I wondered if bpf program loading should have used the module loading
> > infrastructure from the beginning...
> 
> That would be way too complicated and would be nasty for the unprivileged 
> cases.

I was more or less just thinking about using the syscalls and user
space representation not the generic infrastructure, as it is anyway
too much concerned with real kernel modules (would probably also solve
your cgroup-ebpf thread, as it can be passed by unique name or name and
hash ;) ). Anyway...

> > > At the very least, there should be a separate function that calculates
> > > the hash of a buffer and that function should explicitly run itself
> > > against test vectors of various lengths to make sure that it's
> > > calculating what it claims to be calculating.  And it doesn't look
> > > like the userspace code has landed, so, if this thing isn't
> > > calculating SHA1 correctly, it's plausible that no one has noticed.
> > 
> > I hope this was known from the beginning, this is not sha1 unfortunately.
> > 
> > But ebpf elf programs also need preprocessing to get rid of some
> > embedded load-depending data, so maybe it was considered to be just
> > enough?
> 
> I suspect it was actually an accident.

Maybe, I don't know.

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Andy Lutomirski
On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa
 wrote:
> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>> On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
>>  wrote:
>> > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
>> > > Hi Hannes,
>> > >
>> > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
>> > >  wrote:
>> > > > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
>> > > > You don't want to give people new IPv6 addresses with the same stable
>> > > > secret (across reboots) after a kernel upgrade. Maybe they lose
>> > > > connectivity then and it is extra work?
>> > >
>> > > Ahh, too bad. So it goes.
>> >
>> > If no other users survive we can put it into the ipv6 module.
>> >
>> > > > The bpf hash stuff can be changed during this merge window, as it is
>> > > > not yet in a released kernel. Albeit I would probably have preferred
>> > > > something like sha256 here, which can be easily replicated by user
>> > > > space tools (minus the problem of patching out references to not
>> > > > hashable data, which must be zeroed).
>> > >
>> > > Oh, interesting, so time is of the essence then. Do you want to handle
>> > > changing the new eBPF code to something not-SHA1 before it's too late,
>> > > as part of a ne
>>
>> w patchset that can fast track itself to David? And
>> > > then I can preserve my large series for the next merge window.
>> >
>> > This algorithm should be a non-seeded algorithm, because the hashes
>> > should be stable and verifiable by user space tooling. Thus this would
>> > need a hashing algorithm that is hardened against pre-image
>> > attacks/collision resistance, which siphash is not. I would prefer some
>> > higher order SHA algorithm for that actually.
>> >
>>
>> You mean:
>>
>> commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
>> Author: Daniel Borkmann 
>> Date:   Sun Dec 4 23:19:41 2016 +0100
>>
>> bpf: add prog_digest and expose it via fdinfo/netlink
>>
>>
>> Yes, please!  This actually matters for security -- imagine a
>> malicious program brute-forcing a collision so that it gets loaded
>> wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
>> (preferably the latter).  Speed basically doesn't matter here and
>> Blake2 is both less stable (didn't they slightly change it recently?)
>> and much less well studied.
>
> We don't prevent ebpf programs being loaded based on the digest but
> just to uniquely identify loaded programs from user space and match up
> with their source.

The commit log talks about using the hash to see if the program has
already been compiled and JITted.  If that's done, then a collision
will directly cause the kernel to malfunction.

>> My inclination would have been to seed them with something that isn't
>> exposed to userspace for the precise reason that it would prevent user
>> code from making assumptions about what's in the hash.  But if there's
>> a use case for why user code needs to be able to calculate the hash on
>> its own, then that's fine.  But perhaps the actual fdinfo string
>> should be "sha256:abcd1234..." to give some flexibility down the road.
>>
>> Also:
>>
>> +   result = (__force __be32 *)fp->digest;
>> +   for (i = 0; i < SHA_DIGEST_WORDS; i++)
>> +   result[i] = cpu_to_be32(fp->digest[i]);
>>
>> Everyone, please, please, please don't open-code crypto primitives.
>> Is this and the code above it even correct?  It might be but on a very
>> brief glance it looks wrong to me.  If you're doing this to avoid
>> depending on crypto, then fix crypto so you can pull in the algorithm
>> without pulling in the whole crypto core.
>
> The hashing is not a proper sha1 neither, unfortunately. I think that
> is why it will have a custom implementation in iproute2?

Putting on crypto hat:

NAK NAK NAK NAK NAK.  "The Linux kernel invented a new primitive in
2016 when people know better and is going to handle it by porting that
new primitive to userspace" is not a particularly good argument.

Okay, crypto hack back off.

>
> I wondered if bpf program loading should have used the module loading
> infrastructure from the beginning...

That would be way too complicated and would be nasty for the unprivileged cases.

>
>> At the very least, there should be a separate function that calculates
>> the hash of a buffer and that function should explicitly run itself
>> against test vectors of various lengths to make sure that it's
>> calculating what it claims to be calculating.  And it doesn't look
>> like the userspace code has landed, so, if this thing isn't
>> calculating SHA1 correctly, it's plausible that no one has noticed.
>
> I hope this was known from the beginning, this is not sha1 unfortunately.
>
> But ebpf elf programs also need preprocessing to get rid of some
> embedded load-depending data, so maybe it was considered to be just
> enough?

I 

Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Hannes Frederic Sowa
On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
> On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
>  wrote:
> > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
> > > Hi Hannes,
> > > 
> > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
> > >  wrote:
> > > > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
> > > > You don't want to give people new IPv6 addresses with the same stable
> > > > secret (across reboots) after a kernel upgrade. Maybe they lose
> > > > connectivity then and it is extra work?
> > > 
> > > Ahh, too bad. So it goes.
> > 
> > If no other users survive we can put it into the ipv6 module.
> > 
> > > > The bpf hash stuff can be changed during this merge window, as it is
> > > > not yet in a released kernel. Albeit I would probably have preferred
> > > > something like sha256 here, which can be easily replicated by user
> > > > space tools (minus the problem of patching out references to not
> > > > hashable data, which must be zeroed).
> > > 
> > > Oh, interesting, so time is of the essence then. Do you want to handle
> > > changing the new eBPF code to something not-SHA1 before it's too late,
> > > as part of a ne
> 
> w patchset that can fast track itself to David? And
> > > then I can preserve my large series for the next merge window.
> > 
> > This algorithm should be a non-seeded algorithm, because the hashes
> > should be stable and verifiable by user space tooling. Thus this would
> > need a hashing algorithm that is hardened against pre-image
> > attacks/collision resistance, which siphash is not. I would prefer some
> > higher order SHA algorithm for that actually.
> > 
> 
> You mean:
> 
> commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
> Author: Daniel Borkmann 
> Date:   Sun Dec 4 23:19:41 2016 +0100
> 
> bpf: add prog_digest and expose it via fdinfo/netlink
> 
> 
> Yes, please!  This actually matters for security -- imagine a
> malicious program brute-forcing a collision so that it gets loaded
> wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
> (preferably the latter).  Speed basically doesn't matter here and
> Blake2 is both less stable (didn't they slightly change it recently?)
> and much less well studied.

We don't prevent ebpf programs being loaded based on the digest but
just to uniquely identify loaded programs from user space and match up
with their source.

There have been talks about signing bpf programs, thus this would
probably need another digest algorithm additionally to that one,
wasting probably instructions. Probably going somewhere in direction of
PKCS#7 might be the thing to do (which leads to the problem to make
PKCS#7 attackable by ordinary unpriv users, hmpf).

> My inclination would have been to seed them with something that isn't
> exposed to userspace for the precise reason that it would prevent user
> code from making assumptions about what's in the hash.  But if there's
> a use case for why user code needs to be able to calculate the hash on
> its own, then that's fine.  But perhaps the actual fdinfo string
> should be "sha256:abcd1234..." to give some flexibility down the road.
> 
> Also:
> 
> +   result = (__force __be32 *)fp->digest;
> +   for (i = 0; i < SHA_DIGEST_WORDS; i++)
> +   result[i] = cpu_to_be32(fp->digest[i]);
> 
> Everyone, please, please, please don't open-code crypto primitives.
> Is this and the code above it even correct?  It might be but on a very
> brief glance it looks wrong to me.  If you're doing this to avoid
> depending on crypto, then fix crypto so you can pull in the algorithm
> without pulling in the whole crypto core.

The hashing is not a proper sha1 neither, unfortunately. I think that
is why it will have a custom implementation in iproute2?

I wondered if bpf program loading should have used the module loading
infrastructure from the beginning...

> At the very least, there should be a separate function that calculates
> the hash of a buffer and that function should explicitly run itself
> against test vectors of various lengths to make sure that it's
> calculating what it claims to be calculating.  And it doesn't look
> like the userspace code has landed, so, if this thing isn't
> calculating SHA1 correctly, it's plausible that no one has noticed.

I hope this was known from the beginning, this is not sha1 unfortunately.

But ebpf elf programs also need preprocessing to get rid of some
embedded load-depending data, so maybe it was considered to be just
enough?

Bye,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Andy Lutomirski
On Thu, Dec 22, 2016 at 8:28 AM, Jason A. Donenfeld  wrote:
> Hi all,
>
> I don't know what your design requirements are for this. It looks like
> you're generating some kind of crypto digest of a program, and you
> need to avoid collisions. If you'd like to go with a PRF (keyed hash
> function) that uses some kernel secret key, then I'd strongly suggest
> using Keyed-Blake2. Alternatively, if you need for userspace to be
> able to calculate the same hash, and don't want to use some kernel
> secret, then I'd still suggest using Blake2, which will be fast and
> secure.
>
> If you can wait until January, I'll work on a commit adding the
> primitive to the tree. I've already written it and I just need to get
> things cleaned up.
>
>> Blake2 is both less stable (didn't they slightly change it recently?)
>
> No, Blake2 is very stable. It's also extremely secure and has been
> extensively studied. Not to mention it's faster than SHA2. And if you
> want to use it as a PRF, it's obvious better suited and faster to use
> Blake2's keyed PRF mode than HMAC-SHA2.
>
> If you don't care about performance, and you don't want to use a PRF,
> then just use SHA2-256. If you're particularly concerned about certain
> types of attacks, you could go with SHA2-512 truncated to 256 bytes,
> but somehow I doubt you need this.

I don't think this cares about performance.  (Well, it cares about
performance, but the verifier will likely dominiate the cost by such a
large margin that the hash algo doesn't matter.)  And staying
FIPS-compliant-ish is worth a little bit, so I'd advocate for
something in the SHA2 family.

> If userspace hasn't landed, can we get away with changing this code
> after 4.10? Or should we just fix it before 4.10? Or should we revert
> it before 4.10? Development-policy-things like this I have zero clue
> about, so I heed to your guidance.

I think it should be fixed or reverted before 4.10.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld
On Thu, Dec 22, 2016 at 5:30 PM, Theodore Ts'o  wrote:
> I'd do this first, as one set.  Adding a new file to crypto is
> unlikely to cause merge conflicts.

Ack.

>
>> 2. convert char/random to use siphash. to: ted ts'o's random-next
>
> I'm confused, I thought you had agreed to the batched chacha20
> approach?

Sorry, I meant to write this. Long day, little sleep. Yes, of course.
Batched entropy.

>> 3. move lib/md5.c to static function in crypto/md5.c, remove entry
>> inside of linux/cryptohash.h. to: ??'s ??-next
>
> This is cleanup, so it doesn't matter that much when it happens.  md5
> changes to crypto is also unlikely to cause conflicts, so we could do
> this at the same time as (2), if Herbert (the crypto maintainer) agrees.

Alright, sure.

>
>> 4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove
>> entry inside of linux/cryptohash.c. to: td ts'o's ext-next
>
> This is definitely separate.

Okay, I'll submit it to you separately.

> One more thing.  Can you add some test cases to lib/siphash.h?
> Triggered off of a CONFIG_SIPHASH_REGRESSION_TEST config flag, with
> some test inputs and known outputs?  I'm going to need to add a
> version of siphash to e2fsprogs, and I want to make sure the userspace
> version is implementing the same algorithm as the kernel siphash.

I've already written these. They're behind TEST_HASH. They currently
test every single line of code of all implementations of siphash. I
spent a long time on these. The test vectors themselves were taken
from the SipHash creators' reference publication. Check out
lib/test_siphash.c in my tree.

Jason

On Thu, Dec 22, 2016 at 5:30 PM, Theodore Ts'o  wrote:
> On Thu, Dec 22, 2016 at 05:16:47PM +0100, Jason A. Donenfeld wrote:
>> Could you offer a bit of advice on how to manage dependencies between
>> patchsets during merge windows? I'm a bit new to the process.
>>
>> Specifically, we how have 4 parts:
>> 1. add siphash, and use it for some networking code. to: david miller's 
>> net-next
>
> I'd do this first, as one set.  Adding a new file to crypto is
> unlikely to cause merge conflicts.
>
>> 2. convert char/random to use siphash. to: ted ts'o's random-next
>
> I'm confused, I thought you had agreed to the batched chacha20
> approach?
>
>> 3. move lib/md5.c to static function in crypto/md5.c, remove entry
>> inside of linux/cryptohash.h. to: ??'s ??-next
>
> This is cleanup, so it doesn't matter that much when it happens.  md5
> changes to crypto is also unlikely to cause conflicts, so we could do
> this at the same time as (2), if Herbert (the crypto maintainer) agrees.
>
>> 4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove
>> entry inside of linux/cryptohash.c. to: td ts'o's ext-next
>
> This is definitely separate.
>
> One more thing.  Can you add some test cases to lib/siphash.h?
> Triggered off of a CONFIG_SIPHASH_REGRESSION_TEST config flag, with
> some test inputs and known outputs?  I'm going to need to add a
> version of siphash to e2fsprogs, and I want to make sure the userspace
> version is implementing the same algorithm as the kernel siphash.
>
>   - Ted



-- 
Jason A. Donenfeld
Deep Space Explorer
fr: +33 6 51 90 82 66
us: +1 513 476 1200
www.jasondonenfeld.com
www.zx2c4.com
zx2c4.com/keys/AB9942E6D4A4CFC3412620A749FC7012A5DE03AE.asc
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Theodore Ts'o
On Thu, Dec 22, 2016 at 05:16:47PM +0100, Jason A. Donenfeld wrote:
> Could you offer a bit of advice on how to manage dependencies between
> patchsets during merge windows? I'm a bit new to the process.
> 
> Specifically, we how have 4 parts:
> 1. add siphash, and use it for some networking code. to: david miller's 
> net-next

I'd do this first, as one set.  Adding a new file to crypto is
unlikely to cause merge conflicts.

> 2. convert char/random to use siphash. to: ted ts'o's random-next

I'm confused, I thought you had agreed to the batched chacha20
approach?

> 3. move lib/md5.c to static function in crypto/md5.c, remove entry
> inside of linux/cryptohash.h. to: ??'s ??-next

This is cleanup, so it doesn't matter that much when it happens.  md5
changes to crypto is also unlikely to cause conflicts, so we could do
this at the same time as (2), if Herbert (the crypto maintainer) agrees.

> 4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove
> entry inside of linux/cryptohash.c. to: td ts'o's ext-next

This is definitely separate.

One more thing.  Can you add some test cases to lib/siphash.h?
Triggered off of a CONFIG_SIPHASH_REGRESSION_TEST config flag, with
some test inputs and known outputs?  I'm going to need to add a
version of siphash to e2fsprogs, and I want to make sure the userspace
version is implementing the same algorithm as the kernel siphash.

  - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Jason A. Donenfeld
Hi all,

I don't know what your design requirements are for this. It looks like
you're generating some kind of crypto digest of a program, and you
need to avoid collisions. If you'd like to go with a PRF (keyed hash
function) that uses some kernel secret key, then I'd strongly suggest
using Keyed-Blake2. Alternatively, if you need for userspace to be
able to calculate the same hash, and don't want to use some kernel
secret, then I'd still suggest using Blake2, which will be fast and
secure.

If you can wait until January, I'll work on a commit adding the
primitive to the tree. I've already written it and I just need to get
things cleaned up.

> Blake2 is both less stable (didn't they slightly change it recently?)

No, Blake2 is very stable. It's also extremely secure and has been
extensively studied. Not to mention it's faster than SHA2. And if you
want to use it as a PRF, it's obvious better suited and faster to use
Blake2's keyed PRF mode than HMAC-SHA2.

If you don't care about performance, and you don't want to use a PRF,
then just use SHA2-256. If you're particularly concerned about certain
types of attacks, you could go with SHA2-512 truncated to 256 bytes,
but somehow I doubt you need this.

Anyway, the take away is: stop using SHA1. It's time has come.

> Everyone, please, please, please don't open-code crypto primitives.
> Is this and the code above it even correct?  It might be but on a very

I shuttered looking at this too. How did this possibly make it past
review? The attitude toward crypto here is generally *terrifying*.
Unless you're a professional cryptographer, the only correct attitude
to have is a careful and conservative one.

> At the very least, there should be a separate function that calculates
> the hash of a buffer and that function should explicitly run itself
> against test vectors of various lengths to make sure that it's
> calculating what it claims to be calculating.  And it doesn't look
> like the userspace code has landed, so, if this thing isn't
> calculating SHA1 correctly, it's plausible that no one has noticed.

If userspace hasn't landed, can we get away with changing this code
after 4.10? Or should we just fix it before 4.10? Or should we revert
it before 4.10? Development-policy-things like this I have zero clue
about, so I heed to your guidance.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld
Hi Ted,

On Thu, Dec 22, 2016 at 4:58 PM, Theodore Ts'o  wrote:
> Can we do this as a separate series, please?  At this point, it's a
> completely separate change from a logical perspective, and we can take
> in the change through the random.git tree.
>
> Changes that touch files that are normally changed in several
> different git trees leads to the potential for merge conflicts during
> the linux-next integration and merge window processes.  Which is why
> it's generally best to try to isolate changes as much as possible.

Sure, I can separate things out.

Could you offer a bit of advice on how to manage dependencies between
patchsets during merge windows? I'm a bit new to the process.

Specifically, we how have 4 parts:
1. add siphash, and use it for some networking code. to: david miller's net-next
2. convert char/random to use siphash. to: ted ts'o's random-next
3. move lib/md5.c to static function in crypto/md5.c, remove entry
inside of linux/cryptohash.h. to: ??'s ??-next
4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove
entry inside of linux/cryptohash.c. to: td ts'o's ext-next

Problem: 2 depends on 1, 3 depends on 1 & 2. But this can be
simplified into 3 parts:

1. add siphash, and use it for some networking code. to: david miller's net-next
2. convert char/random to use siphash, move lib/md5.c to static
function in crypto/md5.c, remove entry inside of linux/cryptohash.h.
to: ted ts'o's random-next
3. move lib/halfmd4.c to static function in fs/ext/hash.c, remove
entry inside of linux/cryptohash.c. to: td ts'o's ext-next

Problem: 2 depends on 1. Is that okay with you?

Also, would you like me to merge (3) and (2) of the second list into
one series for you?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)

2016-12-22 Thread Andy Lutomirski
On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa
 wrote:
> On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
>> Hi Hannes,
>>
>> On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
>>  wrote:
>> > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
>> > You don't want to give people new IPv6 addresses with the same stable
>> > secret (across reboots) after a kernel upgrade. Maybe they lose
>> > connectivity then and it is extra work?
>>
>> Ahh, too bad. So it goes.
>
> If no other users survive we can put it into the ipv6 module.
>
>> > The bpf hash stuff can be changed during this merge window, as it is
>> > not yet in a released kernel. Albeit I would probably have preferred
>> > something like sha256 here, which can be easily replicated by user
>> > space tools (minus the problem of patching out references to not
>> > hashable data, which must be zeroed).
>>
>> Oh, interesting, so time is of the essence then. Do you want to handle
>> changing the new eBPF code to something not-SHA1 before it's too late,
>> as part of a ne
w patchset that can fast track itself to David? And
>> then I can preserve my large series for the next merge window.
>
> This algorithm should be a non-seeded algorithm, because the hashes
> should be stable and verifiable by user space tooling. Thus this would
> need a hashing algorithm that is hardened against pre-image
> attacks/collision resistance, which siphash is not. I would prefer some
> higher order SHA algorithm for that actually.
>

You mean:

commit 7bd509e311f408f7a5132fcdde2069af65fa05ae
Author: Daniel Borkmann 
Date:   Sun Dec 4 23:19:41 2016 +0100

bpf: add prog_digest and expose it via fdinfo/netlink


Yes, please!  This actually matters for security -- imagine a
malicious program brute-forcing a collision so that it gets loaded
wrong.  And this is IMO a use case for SHA-256 or SHA-512/256
(preferably the latter).  Speed basically doesn't matter here and
Blake2 is both less stable (didn't they slightly change it recently?)
and much less well studied.

My inclination would have been to seed them with something that isn't
exposed to userspace for the precise reason that it would prevent user
code from making assumptions about what's in the hash.  But if there's
a use case for why user code needs to be able to calculate the hash on
its own, then that's fine.  But perhaps the actual fdinfo string
should be "sha256:abcd1234..." to give some flexibility down the road.

Also:

+   result = (__force __be32 *)fp->digest;
+   for (i = 0; i < SHA_DIGEST_WORDS; i++)
+   result[i] = cpu_to_be32(fp->digest[i]);

Everyone, please, please, please don't open-code crypto primitives.
Is this and the code above it even correct?  It might be but on a very
brief glance it looks wrong to me.  If you're doing this to avoid
depending on crypto, then fix crypto so you can pull in the algorithm
without pulling in the whole crypto core.

At the very least, there should be a separate function that calculates
the hash of a buffer and that function should explicitly run itself
against test vectors of various lengths to make sure that it's
calculating what it claims to be calculating.  And it doesn't look
like the userspace code has landed, so, if this thing isn't
calculating SHA1 correctly, it's plausible that no one has noticed.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Theodore Ts'o
On Thu, Dec 22, 2016 at 07:03:29AM +0100, Jason A. Donenfeld wrote:
> I find this compelling. We'll have one csprng for both
> get_random_int/long and for /dev/urandom, and we'll be able to update
> that silly warning on the comment of get_random_int/long to read
> "gives output of either rdrand quality or of /dev/urandom quality",
> which makes it more useful for other things. It introduces less error
> prone code, and it lets the RNG analysis be spent on just one RNG, not
> two.
> 
> So, with your blessing, I'm going to move ahead with implementing a
> pretty version of this for v8.

Can we do this as a separate series, please?  At this point, it's a
completely separate change from a logical perspective, and we can take
in the change through the random.git tree.

Changes that touch files that are normally changed in several
different git trees leads to the potential for merge conflicts during
the linux-next integration and merge window processes.  Which is why
it's generally best to try to isolate changes as much as possible.

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Theodore Ts'o
On Thu, Dec 22, 2016 at 02:10:33PM +0100, Jason A. Donenfeld wrote:
> On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa
>  wrote:
> > following up on what appears to be a random subject: ;)
> >
> > IIRC, ext4 code by default still uses half_md4 for hashing of filenames
> > in the htree. siphash seems to fit this use case pretty good.
> 
> I saw this too. I'll try to address it in v8 of this series.

This is a separate issue, and this series is getting a bit too
complex.  So I'd suggest pushing this off to a separate change.

Changing the htree hash algorithm is an on-disk format change, and so
we couldn't roll it out until e2fsprogs gets updated and rolled out
pretty broadley.  In fact George sent me patches to add siphash as a
hash algorithm for htree a while back (for both the kernel and
e2fsprogs), but I never got around to testing and applying them,
mainly because while it's technically faster, I had other higher
priority issues to work on --- and see previous comments regarding
pixel peeping.  Improving the hash algorithm by tens or even hundreds
of nanoseconds isn't really going to matter since we only do a htree
lookup on a file creation or cold cache lookup, and the SSD or HDD I/O
times will dominate.  And from the power perspective, saving
microwatts of CPU power isn't going to matter if you're going to be
spinning up the storage device

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld
On Thu, Dec 22, 2016 at 4:51 PM, Hannes Frederic Sowa
 wrote:
> This algorithm should be a non-seeded algorithm, because the hashes
> should be stable and verifiable by user space tooling. Thus this would
> need a hashing algorithm that is hardened against pre-image
> attacks/collision resistance, which siphash is not. I would prefer some
> higher order SHA algorithm for that actually.

Right. SHA-256, SHA-512/256, Blake2s, or Blake2b would probably be
good candidates for this.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Hannes Frederic Sowa
On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote:
> Hi Hannes,
> 
> On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
>  wrote:
> > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
> > You don't want to give people new IPv6 addresses with the same stable
> > secret (across reboots) after a kernel upgrade. Maybe they lose
> > connectivity then and it is extra work?
> 
> Ahh, too bad. So it goes.

If no other users survive we can put it into the ipv6 module.

> > The bpf hash stuff can be changed during this merge window, as it is
> > not yet in a released kernel. Albeit I would probably have preferred
> > something like sha256 here, which can be easily replicated by user
> > space tools (minus the problem of patching out references to not
> > hashable data, which must be zeroed).
> 
> Oh, interesting, so time is of the essence then. Do you want to handle
> changing the new eBPF code to something not-SHA1 before it's too late,
> as part of a new patchset that can fast track itself to David? And
> then I can preserve my large series for the next merge window.

This algorithm should be a non-seeded algorithm, because the hashes
should be stable and verifiable by user space tooling. Thus this would
need a hashing algorithm that is hardened against pre-image
attacks/collision resistance, which siphash is not. I would prefer some
higher order SHA algorithm for that actually.

Bye,
Hannes
 
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld
Hi Hannes,

On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa
 wrote:
> IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
> You don't want to give people new IPv6 addresses with the same stable
> secret (across reboots) after a kernel upgrade. Maybe they lose
> connectivity then and it is extra work?

Ahh, too bad. So it goes.

> The bpf hash stuff can be changed during this merge window, as it is
> not yet in a released kernel. Albeit I would probably have preferred
> something like sha256 here, which can be easily replicated by user
> space tools (minus the problem of patching out references to not
> hashable data, which must be zeroed).

Oh, interesting, so time is of the essence then. Do you want to handle
changing the new eBPF code to something not-SHA1 before it's too late,
as part of a new patchset that can fast track itself to David? And
then I can preserve my large series for the next merge window.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Hannes Frederic Sowa
On Thu, 2016-12-22 at 16:29 +0100, Jason A. Donenfeld wrote:
> On Thu, Dec 22, 2016 at 4:12 PM, Jason A. Donenfeld  wrote:
> > As a first step, I'm considering adding a patch to move halfmd4.c
> > inside the ext4 domain, or at the very least, simply remove it from
> > linux/cryptohash.h. That'll then leave the handful of bizarre sha1
> > usages to consider.
> 
> Specifically something like this:
> 
> https://git.zx2c4.com/linux-dev/commit/?h=siphash=978213351f9633bd1e3d1fdc3f19d28e36eeac90
> 
> That only leaves two more uses of "cryptohash" to consider, but they
> require a bit of help. First, sha_transform in net/ipv6/addrconf.c.
> That might be a straight-forward conversion to SipHash, but perhaps
> not; I need to look closely and think about it. The next is
> sha_transform in kernel/bpf/core.c. I really have no idea what's going
> on with the eBPF stuff, so that will take a bit longer to study. Maybe
> sha1 is fine in the end there? I'm not sure yet.

IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI.
You don't want to give people new IPv6 addresses with the same stable
secret (across reboots) after a kernel upgrade. Maybe they lose
connectivity then and it is extra work?

The bpf hash stuff can be changed during this merge window, as it is
not yet in a released kernel. Albeit I would probably have preferred
something like sha256 here, which can be easily replicated by user
space tools (minus the problem of patching out references to not
hashable data, which must be zeroed).

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld
On Thu, Dec 22, 2016 at 4:12 PM, Jason A. Donenfeld  wrote:
> As a first step, I'm considering adding a patch to move halfmd4.c
> inside the ext4 domain, or at the very least, simply remove it from
> linux/cryptohash.h. That'll then leave the handful of bizarre sha1
> usages to consider.

Specifically something like this:

https://git.zx2c4.com/linux-dev/commit/?h=siphash=978213351f9633bd1e3d1fdc3f19d28e36eeac90

That only leaves two more uses of "cryptohash" to consider, but they
require a bit of help. First, sha_transform in net/ipv6/addrconf.c.
That might be a straight-forward conversion to SipHash, but perhaps
not; I need to look closely and think about it. The next is
sha_transform in kernel/bpf/core.c. I really have no idea what's going
on with the eBPF stuff, so that will take a bit longer to study. Maybe
sha1 is fine in the end there? I'm not sure yet.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld
On Thu, Dec 22, 2016 at 4:05 PM, Hannes Frederic Sowa
 wrote:
> This change would need a new version of the ext4 super block, because
> you should not change existing file systems.

Right.

As a first step, I'm considering adding a patch to move halfmd4.c
inside the ext4 domain, or at the very least, simply remove it from
linux/cryptohash.h. That'll then leave the handful of bizarre sha1
usages to consider.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Hannes Frederic Sowa
On 22.12.2016 14:10, Jason A. Donenfeld wrote:
> On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa
>  wrote:
>> following up on what appears to be a random subject: ;)
>>
>> IIRC, ext4 code by default still uses half_md4 for hashing of filenames
>> in the htree. siphash seems to fit this use case pretty good.
> 
> I saw this too. I'll try to address it in v8 of this series.

This change would need a new version of the ext4 super block, because
you should not change existing file systems.


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Jason A. Donenfeld
On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa
 wrote:
> following up on what appears to be a random subject: ;)
>
> IIRC, ext4 code by default still uses half_md4 for hashing of filenames
> in the htree. siphash seems to fit this use case pretty good.

I saw this too. I'll try to address it in v8 of this series.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-22 Thread Hannes Frederic Sowa
Hi Ted,

On Thu, 2016-12-22 at 00:41 -0500, Theodore Ts'o wrote:
> On Thu, Dec 22, 2016 at 03:49:39AM +0100, Jason A. Donenfeld wrote:
> > 
> > Funny -- while you guys were sending this back & forth, I was writing
> > my reply to Andy which essentially arrives at the same conclusion.
> > Given that we're all arriving to the same thing, and that Ted shot in
> > this direction long before we all did, I'm leaning toward abandoning
> > SipHash for the de-MD5-ification of get_random_int/long, and working
> > on polishing Ted's idea into something shiny for this patchset.
> 
> here are my numbers comparing siphash (using the first three patches
> of the v7 siphash patches) with my batched chacha20 implementation.
> The results are taken by running get_random_* 1 times, and then
> dividing the numbers by 1 to get the average number of cycles for
> the call.  I compiled 32-bit and 64-bit kernels, and ran the results
> using kvm:
> 
>siphashbatched chacha20
>  get_random_int  get_random_long   get_random_int   get_random_long   
> 
> 32-bit270  278 114146
> 64-bit 75   75 106186
> 
> > I did have two objections to this. The first was that my SipHash
> > construction is faster.
> 
> Well, it's faster on everything except 32-bit x86.  :-P
> 
> > The second, and the more
> > important one, was that batching entropy up like this means that 32
> > calls will be really fast, and then the 33rd will be slow, since it
> > has to do a whole ChaCha round, because get_random_bytes must be
> > called to refill the batch.
> 
> ... and this will take 2121 cycles on 64-bit x86, and 2315 cycles on a
> 32-bit x86.  Which on a 2.3 GHz processor, is just under a
> microsecond.  As far as being inconsistent on process startup, I very
> much doubt a microsecond is really going to be visible to the user.  :-)
> 
> The bottom line is that I think we're really "pixel peeping" at this
> point --- which is what obsessed digital photographers will do when
> debating the quality of a Canon vs Nikon DSLR by blowing up a photo by
> a thousand times, and then trying to claim that this is visible to the
> human eye.  Or people who obsessing over the frequency response curves
> of TH-X00 headphones with Mahogony vs Purpleheart wood, when it's
> likely that in a blind head-to-head comparison, most people wouldn't
> be able to tell the difference
> 
> I think the main argument for using the batched getrandom approach is
> that it, I would argue, simpler than introducing siphash into the
> picture.  On 64-bit platforms it is faster and more consistent, so
> it's basically that versus complexity of having to adding siphash to
> the things that people have to analyze when considering random number
> security on Linux.   But it's a close call either way, I think.

following up on what appears to be a random subject: ;)

IIRC, ext4 code by default still uses half_md4 for hashing of filenames
in the htree. siphash seems to fit this use case pretty good.

xfs could also need an update, as they don't seed the directory hash
tables at all (but not sure if they are vulnerable). I should improve
[1] a bit.

[1] http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=blo
b;f=src/dirhash_collide.c;h=55cec872d5061ac2ca0f56d1f11e6bf349d5bb97;hb
=HEAD

Bye,
Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Jason A. Donenfeld
Hi Ted,

On Thu, Dec 22, 2016 at 6:41 AM, Theodore Ts'o  wrote:
> The bottom line is that I think we're really "pixel peeping" at this
> point --- which is what obsessed digital photographers will do when
> debating the quality of a Canon vs Nikon DSLR by blowing up a photo by
> a thousand times, and then trying to claim that this is visible to the
> human eye.  Or people who obsessing over the frequency response curves
> of TH-X00 headphones with Mahogony vs Purpleheart wood, when it's
> likely that in a blind head-to-head comparison, most people wouldn't
> be able to tell the difference

This is hilarious, thanks for the laugh. I believe you're right about this...

>
> I think the main argument for using the batched getrandom approach is
> that it, I would argue, simpler than introducing siphash into the
> picture.  On 64-bit platforms it is faster and more consistent, so
> it's basically that versus complexity of having to adding siphash to
> the things that people have to analyze when considering random number
> security on Linux.   But it's a close call either way, I think.

I find this compelling. We'll have one csprng for both
get_random_int/long and for /dev/urandom, and we'll be able to update
that silly warning on the comment of get_random_int/long to read
"gives output of either rdrand quality or of /dev/urandom quality",
which makes it more useful for other things. It introduces less error
prone code, and it lets the RNG analysis be spent on just one RNG, not
two.

So, with your blessing, I'm going to move ahead with implementing a
pretty version of this for v8.

Regards,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Theodore Ts'o
On Thu, Dec 22, 2016 at 03:49:39AM +0100, Jason A. Donenfeld wrote:
> 
> Funny -- while you guys were sending this back & forth, I was writing
> my reply to Andy which essentially arrives at the same conclusion.
> Given that we're all arriving to the same thing, and that Ted shot in
> this direction long before we all did, I'm leaning toward abandoning
> SipHash for the de-MD5-ification of get_random_int/long, and working
> on polishing Ted's idea into something shiny for this patchset.

here are my numbers comparing siphash (using the first three patches
of the v7 siphash patches) with my batched chacha20 implementation.
The results are taken by running get_random_* 1 times, and then
dividing the numbers by 1 to get the average number of cycles for
the call.  I compiled 32-bit and 64-bit kernels, and ran the results
using kvm:

   siphashbatched chacha20
 get_random_int  get_random_long   get_random_int   get_random_long   

32-bit270  278 114146
64-bit 75   75 106186

> I did have two objections to this. The first was that my SipHash
> construction is faster.

Well, it's faster on everything except 32-bit x86.  :-P

> The second, and the more
> important one, was that batching entropy up like this means that 32
> calls will be really fast, and then the 33rd will be slow, since it
> has to do a whole ChaCha round, because get_random_bytes must be
> called to refill the batch.

... and this will take 2121 cycles on 64-bit x86, and 2315 cycles on a
32-bit x86.  Which on a 2.3 GHz processor, is just under a
microsecond.  As far as being inconsistent on process startup, I very
much doubt a microsecond is really going to be visible to the user.  :-)

The bottom line is that I think we're really "pixel peeping" at this
point --- which is what obsessed digital photographers will do when
debating the quality of a Canon vs Nikon DSLR by blowing up a photo by
a thousand times, and then trying to claim that this is visible to the
human eye.  Or people who obsessing over the frequency response curves
of TH-X00 headphones with Mahogony vs Purpleheart wood, when it's
likely that in a blind head-to-head comparison, most people wouldn't
be able to tell the difference

I think the main argument for using the batched getrandom approach is
that it, I would argue, simpler than introducing siphash into the
picture.  On 64-bit platforms it is faster and more consistent, so
it's basically that versus complexity of having to adding siphash to
the things that people have to analyze when considering random number
security on Linux.   But it's a close call either way, I think.

  - Ted

P.S.  My benchmarking code

diff --git a/drivers/char/random.c b/drivers/char/random.c
index a51f0ff43f00..41860864b775 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1682,6 +1682,55 @@ static int rand_initialize(void)
 }
 early_initcall(rand_initialize);
 
+static unsigned int get_random_int_new(void);
+static unsigned long get_random_long_new(void);
+
+#define NUM_CYCLES 1
+#define AVG(finish, start) ((unsigned int)(finish - start + NUM_CYCLES/2) / 
NUM_CYCLES)
+
+static int rand_benchmark(void)
+{
+   cycles_t start,finish;
+   int i, out;
+
+   pr_crit("random benchmark!!\n");
+   start = get_cycles();
+   for (i = 0; i < NUM_CYCLES; i++) {
+   get_random_int();}
+   finish = get_cycles();
+   pr_err("get_random_int # cycles: %u\n", AVG(finish, start));
+
+   start = get_cycles();
+   for (i = 0; i < NUM_CYCLES; i++) {
+   get_random_int_new();
+   }
+   finish = get_cycles();
+   pr_err("get_random_int_new (batched chacha20) # cycles: %u\n", 
AVG(finish, start));
+
+   start = get_cycles();
+   for (i = 0; i < NUM_CYCLES; i++) {
+   get_random_long();
+   }
+   finish = get_cycles();
+   pr_err("get_random_long # cycles: %u\n", AVG(finish, start));
+
+   start = get_cycles();
+   for (i = 0; i < NUM_CYCLES; i++) {
+   get_random_long_new();
+   }
+   finish = get_cycles();
+   pr_err("get_random_long_new (batched chacha20) # cycles: %u\n", 
AVG(finish, start));
+
+   start = get_cycles();
+   for (i = 0; i < NUM_CYCLES; i++) {
+   get_random_bytes(, sizeof(out));
+   }
+   finish = get_cycles();
+   pr_err("get_random_bytes # cycles: %u\n", AVG(finish, start));
+   return 0;
+}
+device_initcall(rand_benchmark);
+
 #ifdef CONFIG_BLOCK
 void rand_initialize_disk(struct gendisk *disk)
 {
@@ -2064,8 +2113,10 @@ unsigned int get_random_int(void)
unsigned int ret;
u64 *chaining;
 
+#if 0  // force slow path
if (arch_get_random_int())
return ret;
+#endif
 
chaining = _cpu_var(get_random_int_chaining);
ret = *chaining = 

Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Jason A. Donenfeld
On Thu, Dec 22, 2016 at 3:49 AM, Jason A. Donenfeld  wrote:
> I did have two objections to this. The first was that my SipHash
> construction is faster. But in any case, they're both faster than the
> current MD5, so it's just extra rice. The second, and the more
> important one, was that batching entropy up like this means that 32
> calls will be really fast, and then the 33rd will be slow, since it
> has to do a whole ChaCha round, because get_random_bytes must be
> called to refill the batch. Since get_random_long is called for every
> process startup, I didn't really like there being inconsistent
> performance on process startup. And I'm pretty sure that one ChaCha
> whole block is slower than computing MD5, even though it lasts 32
> times as long, though I need to measure this. But maybe that's dumb in
> the end? Are these concerns that should point us toward the
> determinism (and speed) of SipHash? Are these concerns that don't
> matter and so we should roll with the simplicity of reusing ChaCha?

I ran some measurements in order to quantify what I'm talking about.
Repeatedly running md5_transform is about 2.3 times faster than
repeatedly running extract_crng. What does this mean?

One call to extract_crng gives us 32 times as many longs as one call
to md5_transform. This means that spread over 32 process creations,
chacha will be 13.9 times faster. However, every 32nd process will
take 2.3 times as long to generate its ASLR value as it would with the
old md5_transform code.

Personally, I don't think that 2.3 is a big deal. And I really like
how much this simplifies the analysis.
But if it's a big deal to you, then we can continue to discuss my
SipHash construction, which gives faster and more consistent
performance, at the cost of a more complicated and probably less
impressive security analysis.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Jason A. Donenfeld
Hi Andy & Hannes,

On Thu, Dec 22, 2016 at 3:07 AM, Hannes Frederic Sowa
 wrote:
> I wonder if Ted's proposal was analyzed further in terms of performance
> if get_random_int should provide cprng alike properties?
>
> For reference: https://lkml.org/lkml/2016/12/14/351
>
> The proposal made sense to me and would completely solve the above
> mentioned problem on the cost of repeatedly reseeding from the crng.

On Thu, Dec 22, 2016 at 3:09 AM, Andy Lutomirski  wrote:
> Unless I've misunderstood it, Ted's proposal causes get_random_int()
> to return bytes straight from urandom (effectively), which should make
> it very strong.  And if urandom is competitively fast now, I don't see
> the problem.  ChaCha20 is designed for speed, after all.

Funny -- while you guys were sending this back & forth, I was writing
my reply to Andy which essentially arrives at the same conclusion.
Given that we're all arriving to the same thing, and that Ted shot in
this direction long before we all did, I'm leaning toward abandoning
SipHash for the de-MD5-ification of get_random_int/long, and working
on polishing Ted's idea into something shiny for this patchset.

I did have two objections to this. The first was that my SipHash
construction is faster. But in any case, they're both faster than the
current MD5, so it's just extra rice. The second, and the more
important one, was that batching entropy up like this means that 32
calls will be really fast, and then the 33rd will be slow, since it
has to do a whole ChaCha round, because get_random_bytes must be
called to refill the batch. Since get_random_long is called for every
process startup, I didn't really like there being inconsistent
performance on process startup. And I'm pretty sure that one ChaCha
whole block is slower than computing MD5, even though it lasts 32
times as long, though I need to measure this. But maybe that's dumb in
the end? Are these concerns that should point us toward the
determinism (and speed) of SipHash? Are these concerns that don't
matter and so we should roll with the simplicity of reusing ChaCha?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Jason A. Donenfeld
Hi Andy,

On Thu, Dec 22, 2016 at 12:42 AM, Andy Lutomirski  wrote:
> So this is probably good enough, and making it better is hard.  Changing it 
> to:
>
> u64 entropy = (u64)random_get_entropy() + current->pid;
> result = siphash(..., entropy, ...);
> secret->chaining += result + entropy;
>
> would reduce this problem by forcing an attacker to brute-force the
> entropy on each iteration, which is probably an improvement.

Ahh, so that's the reasoning behind a similar suggestion of yours in a
previous email. Makes sense to me. I'll include this in the next merge
if we don't come up with a different idea before then. Your reasoning
seems good for it.

Part of what makes this process a bit goofy is that it's not all
together clear what the design goals are. Right now we're going for
"not worse than before", which we've nearly achieved. How good of an
RNG do we want? I'm willing to examine and analyze the security and
performance of all constructions we can come up with. One thing I
don't want to do, however, is start tweaking the primitives themselves
in ways not endorsed by the designers. So, I believe that precludes
things like carrying over SipHash's internal state (like what was done
with MD5), because there hasn't been a formal security analysis of
this like there has with other uses of SipHash. I also don't want to
change any internals of how SipHash actually works. I mention that
because of some of the suggestions on other threads, which make me
rather uneasy.

So with that said, while writing this reply to you, I was
simultaneously reading some other crypto code and was reminded that
there's a variant of SipHash which outputs an additional 64-bits; it's
part of the siphash reference code, which they call the "128-bit
mode". It has the benefit that we can return 64-bits to the caller and
save 64-bits for the chaining key. That way there's no correlation
between the returned secret and the chaining key, which I think would
completely alleviate all of your concerns, and simplify the analysis a
bit.

Here's what it looks like:
https://git.zx2c4.com/linux-dev/commit/?h=siphash=46fbe5b408e66b2d16b4447860f8083480e1c08d

The downside is that it takes 4 extra Sip rounds. This puts the
performance still better than MD5, though, and likely still better
than the other batched entropy solution. We could optimize this, I
suppose, by giving it only two parameters -- chaining,
jiffies+entropy+pid -- instead of the current three -- chaining,
jiffies, entropy+pid -- which would then shave off 2 Sip rounds. But I
liked the idea of having a bit more spread in the entropy input field.

Anyway, with this in mind, we now have three possibilities:

1. result = siphash(chaining, entropy, key); chaining += result + entropy
2. result = siphash_extra_output(chaining, entropy, key, );
3. Ted's batched entropy idea using chacha20

The more I think about this, the more I suspect that we should just
use chacha20. It will still be faster than MD5. I don't like the
non-determinism of it (some processes will start slower than others,
if the batched entropy has run out and ASLR demands more), but I guess
I can live with that. But, most importantly, it greatly simplifies
both the security analysis and what we can promise to callers about
the function. Right now in the comment documentation, we're coy with
callers about the security of the RNG. If we moved to a known
construction like chacha20/get_random_bytes_batched, then we could
just be straight up with a promise that the numbers it returns are
high quality.

Thoughts on 2 and 3, and on 1 vs 2 vs 3?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Andy Lutomirski
On Wed, Dec 21, 2016 at 6:07 PM, Hannes Frederic Sowa
 wrote:
> On 22.12.2016 00:42, Andy Lutomirski wrote:
>> On Wed, Dec 21, 2016 at 3:02 PM, Jason A. Donenfeld  wrote:
>>>  unsigned int get_random_int(void)
>>>  {
>>> -   __u32 *hash;
>>> -   unsigned int ret;
>>> -
>>> -   if (arch_get_random_int())
>>> -   return ret;
>>> -
>>> -   hash = get_cpu_var(get_random_int_hash);
>>> -
>>> -   hash[0] += current->pid + jiffies + random_get_entropy();
>>> -   md5_transform(hash, random_int_secret);
>>> -   ret = hash[0];
>>> -   put_cpu_var(get_random_int_hash);
>>> -
>>> -   return ret;
>>> +   unsigned int arch_result;
>>> +   u64 result;
>>> +   struct random_int_secret *secret;
>>> +
>>> +   if (arch_get_random_int(_result))
>>> +   return arch_result;
>>> +
>>> +   secret = get_random_int_secret();
>>> +   result = siphash_3u64(secret->chaining, jiffies,
>>> + (u64)random_get_entropy() + current->pid,
>>> + secret->secret);
>>> +   secret->chaining += result;
>>> +   put_cpu_var(secret);
>>> +   return result;
>>>  }
>>>  EXPORT_SYMBOL(get_random_int);
>>
>> Hmm.  I haven't tried to prove anything for real.  But here goes (in
>> the random oracle model):
>>
>> Suppose I'm an attacker and I don't know the secret or the chaining
>> value.  Then, regardless of what the entropy is, I can't predict the
>> numbers.
>>
>> Now suppose I do know the secret and the chaining value due to some
>> leak.  If I want to deduce prior outputs, I think I'm stuck: I'd need
>> to find a value "result" such that prev_chaining + result = chaining
>> and result = H(prev_chaining, ..., secret);.  I don't think this can
>> be done efficiently in the random oracle model regardless of what the
>> "..." is.
>>
>> But, if I know the secret and chaining value, I can predict the next
>> output assuming I can guess the entropy.  What's worse is that, even
>> if I can't guess the entropy, if I *observe* the next output then I
>> can calculate the next chaining value.
>>
>> So this is probably good enough, and making it better is hard.  Changing it 
>> to:
>>
>> u64 entropy = (u64)random_get_entropy() + current->pid;
>> result = siphash(..., entropy, ...);
>> secret->chaining += result + entropy;
>>
>> would reduce this problem by forcing an attacker to brute-force the
>> entropy on each iteration, which is probably an improvement.
>>
>> To fully fix it, something like "catastrophic reseeding" would be
>> needed, but that's hard to get right.
>
> I wonder if Ted's proposal was analyzed further in terms of performance
> if get_random_int should provide cprng alike properties?
>
> For reference: https://lkml.org/lkml/2016/12/14/351
>
> The proposal made sense to me and would completely solve the above
> mentioned problem on the cost of repeatedly reseeding from the crng.
>

Unless I've misunderstood it, Ted's proposal causes get_random_int()
to return bytes straight from urandom (effectively), which should make
it very strong.  And if urandom is competitively fast now, I don't see
the problem.  ChaCha20 is designed for speed, after all.
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Hannes Frederic Sowa
On 22.12.2016 00:42, Andy Lutomirski wrote:
> On Wed, Dec 21, 2016 at 3:02 PM, Jason A. Donenfeld  wrote:
>>  unsigned int get_random_int(void)
>>  {
>> -   __u32 *hash;
>> -   unsigned int ret;
>> -
>> -   if (arch_get_random_int())
>> -   return ret;
>> -
>> -   hash = get_cpu_var(get_random_int_hash);
>> -
>> -   hash[0] += current->pid + jiffies + random_get_entropy();
>> -   md5_transform(hash, random_int_secret);
>> -   ret = hash[0];
>> -   put_cpu_var(get_random_int_hash);
>> -
>> -   return ret;
>> +   unsigned int arch_result;
>> +   u64 result;
>> +   struct random_int_secret *secret;
>> +
>> +   if (arch_get_random_int(_result))
>> +   return arch_result;
>> +
>> +   secret = get_random_int_secret();
>> +   result = siphash_3u64(secret->chaining, jiffies,
>> + (u64)random_get_entropy() + current->pid,
>> + secret->secret);
>> +   secret->chaining += result;
>> +   put_cpu_var(secret);
>> +   return result;
>>  }
>>  EXPORT_SYMBOL(get_random_int);
> 
> Hmm.  I haven't tried to prove anything for real.  But here goes (in
> the random oracle model):
> 
> Suppose I'm an attacker and I don't know the secret or the chaining
> value.  Then, regardless of what the entropy is, I can't predict the
> numbers.
> 
> Now suppose I do know the secret and the chaining value due to some
> leak.  If I want to deduce prior outputs, I think I'm stuck: I'd need
> to find a value "result" such that prev_chaining + result = chaining
> and result = H(prev_chaining, ..., secret);.  I don't think this can
> be done efficiently in the random oracle model regardless of what the
> "..." is.
> 
> But, if I know the secret and chaining value, I can predict the next
> output assuming I can guess the entropy.  What's worse is that, even
> if I can't guess the entropy, if I *observe* the next output then I
> can calculate the next chaining value.
> 
> So this is probably good enough, and making it better is hard.  Changing it 
> to:
> 
> u64 entropy = (u64)random_get_entropy() + current->pid;
> result = siphash(..., entropy, ...);
> secret->chaining += result + entropy;
> 
> would reduce this problem by forcing an attacker to brute-force the
> entropy on each iteration, which is probably an improvement.
> 
> To fully fix it, something like "catastrophic reseeding" would be
> needed, but that's hard to get right.

I wonder if Ted's proposal was analyzed further in terms of performance
if get_random_int should provide cprng alike properties?

For reference: https://lkml.org/lkml/2016/12/14/351

The proposal made sense to me and would completely solve the above
mentioned problem on the cost of repeatedly reseeding from the crng.

Bye,
Hannes


--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Andy Lutomirski
On Wed, Dec 21, 2016 at 3:02 PM, Jason A. Donenfeld  wrote:
>  unsigned int get_random_int(void)
>  {
> -   __u32 *hash;
> -   unsigned int ret;
> -
> -   if (arch_get_random_int())
> -   return ret;
> -
> -   hash = get_cpu_var(get_random_int_hash);
> -
> -   hash[0] += current->pid + jiffies + random_get_entropy();
> -   md5_transform(hash, random_int_secret);
> -   ret = hash[0];
> -   put_cpu_var(get_random_int_hash);
> -
> -   return ret;
> +   unsigned int arch_result;
> +   u64 result;
> +   struct random_int_secret *secret;
> +
> +   if (arch_get_random_int(_result))
> +   return arch_result;
> +
> +   secret = get_random_int_secret();
> +   result = siphash_3u64(secret->chaining, jiffies,
> + (u64)random_get_entropy() + current->pid,
> + secret->secret);
> +   secret->chaining += result;
> +   put_cpu_var(secret);
> +   return result;
>  }
>  EXPORT_SYMBOL(get_random_int);

Hmm.  I haven't tried to prove anything for real.  But here goes (in
the random oracle model):

Suppose I'm an attacker and I don't know the secret or the chaining
value.  Then, regardless of what the entropy is, I can't predict the
numbers.

Now suppose I do know the secret and the chaining value due to some
leak.  If I want to deduce prior outputs, I think I'm stuck: I'd need
to find a value "result" such that prev_chaining + result = chaining
and result = H(prev_chaining, ..., secret);.  I don't think this can
be done efficiently in the random oracle model regardless of what the
"..." is.

But, if I know the secret and chaining value, I can predict the next
output assuming I can guess the entropy.  What's worse is that, even
if I can't guess the entropy, if I *observe* the next output then I
can calculate the next chaining value.

So this is probably good enough, and making it better is hard.  Changing it to:

u64 entropy = (u64)random_get_entropy() + current->pid;
result = siphash(..., entropy, ...);
secret->chaining += result + entropy;

would reduce this problem by forcing an attacker to brute-force the
entropy on each iteration, which is probably an improvement.

To fully fix it, something like "catastrophic reseeding" would be
needed, but that's hard to get right.

(An aside: on x86 at least, using two percpu variables is faster
because directly percpu access is essentially free, whereas getting
the address of a percpu variable is not free.)
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Jason A. Donenfeld
Hi Ted,

On Thu, Dec 22, 2016 at 12:02 AM, Jason A. Donenfeld  wrote:
> This duplicates the current algorithm for get_random_int/long

I should have mentioned this directly in the commit message, which I
forgot to update: this v7 adds the time-based key rotation, which,
while not strictly necessary for ensuring the security of the RNG,
might help alleviate some concerns, as we talked about. Performance is
quite good on both 32-bit and 64-bit -- better than MD5 in both cases.

If you like this, terrific. If not, I'm happy to take this in whatever
direction you prefer, and implement whatever construction you think
best. There's been a lot of noise on this list about it; we can
continue to discuss more, or you can just tell me whatever you want to
do, and I'll implement it and that'll be the end of it. As you said,
we can always get something decent now and improve it later.

Alternatively, if you've decided in the end you prefer your batched
entropy approach using chacha, I'm happy to implement a polished
version of that here in this patch series (so that we can keep the `rm
lib/md5.c` commit.)

Just let me know how you'd like to proceed.

Thanks,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 3/6] random: use SipHash in place of MD5

2016-12-21 Thread Jason A. Donenfeld
This duplicates the current algorithm for get_random_int/long, but uses
siphash instead. This comes with several benefits. It's certainly
faster and more cryptographically secure than MD5. This patch also
separates hashed fields into three values instead of one, in order to
increase diffusion.

The previous MD5 algorithm used a per-cpu MD5 state, which caused
successive calls to the function to chain upon each other. While it's
not entirely clear that this kind of chaining is absolutely necessary
when using a secure PRF like siphash, it can't hurt, and the timing of
the call chain does add a degree of natural entropy. So, in keeping with
this design, instead of the massive per-cpu 64-byte MD5 state, there is
instead a per-cpu previously returned value for chaining.

The speed benefits are substantial:

| siphash | md5| speedup |
--
get_random_long | 137130  | 415983 | 3.03x   |
get_random_int  | 86384   | 343323 | 3.97x   |

Signed-off-by: Jason A. Donenfeld 
Cc: Jean-Philippe Aumasson 
Cc: Ted Tso 
---
 drivers/char/random.c  | 84 +-
 include/linux/random.h |  1 -
 init/main.c|  1 -
 3 files changed, 49 insertions(+), 37 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d6876d506220..ea9858d9d8b4 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -262,6 +262,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -2042,17 +2043,31 @@ struct ctl_table random_table[] = {
 };
 #endif /* CONFIG_SYSCTL */
 
-static u32 random_int_secret[MD5_MESSAGE_BYTES / 4] cacheline_aligned;
 
-int random_int_secret_init(void)
+struct random_int_secret {
+   siphash_key_t secret;
+   u64 chaining;
+   unsigned long birthdate;
+   bool initialized;
+};
+static DEFINE_PER_CPU(struct random_int_secret, random_int_secret);
+
+enum {
+   SECRET_ROTATION_TIME = HZ * 60 * 5
+};
+
+static struct random_int_secret *get_random_int_secret(void)
 {
-   get_random_bytes(random_int_secret, sizeof(random_int_secret));
-   return 0;
+   struct random_int_secret *secret = _cpu_var(random_int_secret);
+   if (unlikely(!secret->initialized ||
+!time_is_after_jiffies(secret->birthdate + 
SECRET_ROTATION_TIME))) {
+   secret->initialized = true;
+   secret->birthdate = jiffies;
+   get_random_bytes(secret->secret, sizeof(secret->secret));
+   }
+   return secret;
 }
 
-static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash)
-   __aligned(sizeof(unsigned long));
-
 /*
  * Get a random word for internal kernel use only. Similar to urandom but
  * with the goal of minimal entropy pool depletion. As a result, the random
@@ -2061,20 +2076,20 @@ static DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], 
get_random_int_hash)
  */
 unsigned int get_random_int(void)
 {
-   __u32 *hash;
-   unsigned int ret;
-
-   if (arch_get_random_int())
-   return ret;
-
-   hash = get_cpu_var(get_random_int_hash);
-
-   hash[0] += current->pid + jiffies + random_get_entropy();
-   md5_transform(hash, random_int_secret);
-   ret = hash[0];
-   put_cpu_var(get_random_int_hash);
-
-   return ret;
+   unsigned int arch_result;
+   u64 result;
+   struct random_int_secret *secret;
+
+   if (arch_get_random_int(_result))
+   return arch_result;
+
+   secret = get_random_int_secret();
+   result = siphash_3u64(secret->chaining, jiffies,
+ (u64)random_get_entropy() + current->pid,
+ secret->secret);
+   secret->chaining += result;
+   put_cpu_var(secret);
+   return result;
 }
 EXPORT_SYMBOL(get_random_int);
 
@@ -2083,20 +2098,19 @@ EXPORT_SYMBOL(get_random_int);
  */
 unsigned long get_random_long(void)
 {
-   __u32 *hash;
-   unsigned long ret;
-
-   if (arch_get_random_long())
-   return ret;
-
-   hash = get_cpu_var(get_random_int_hash);
-
-   hash[0] += current->pid + jiffies + random_get_entropy();
-   md5_transform(hash, random_int_secret);
-   ret = *(unsigned long *)hash;
-   put_cpu_var(get_random_int_hash);
-
-   return ret;
+   unsigned long arch_result;
+   u64 result;
+   struct random_int_secret *secret;
+
+   if (arch_get_random_long(_result))
+   return arch_result;
+
+   secret = get_random_int_secret();
+   result = siphash_3u64(secret->chaining, jiffies, random_get_entropy() +
+ current->pid, secret->secret);
+   secret->chaining += result;
+   put_cpu_var(secret);
+   return result;
 }
 EXPORT_SYMBOL(get_random_long);
 
diff --git a/include/linux/random.h b/include/linux/random.h
index