Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Fri, Dec 23, 2016 at 7:19 PM, Hannes Frederic Sowawrote: > Factoring out sha3 Per the other thread, you probably don't actually want SHA3, because it's slow in software. You want SHA2. If you want something faster and better, then Blake2 is most certainly the way to go. I'll be submitting some patches in a month or so adding Blake2 for future use in the tree. Jason
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On 23.12.2016 17:42, Andy Lutomirski wrote: > On Fri, Dec 23, 2016 at 8:23 AM, Andy Lutomirskiwrote: >> On Fri, Dec 23, 2016 at 3:59 AM, Daniel Borkmann >> wrote: >>> On 12/23/2016 11:59 AM, Hannes Frederic Sowa wrote: On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote: > > On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote: >> >> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: >>> >>> [...] >>> >> The hashing is not a proper sha1 neither, unfortunately. I think that >> is why it will have a custom implementation in iproute2? > > > Still trying to catch up on this admittedly bit confusing thread. I > did run automated tests over couple of days comparing the data I got > from fdinfo with the one from af_alg and found no mismatch on the test > cases varying from min to max possible program sizes. In the process > of testing, as you might have seen on netdev, I found couple of other > bugs in bpf code along the way and fixed them up as well. So my question, > do you or Andy or anyone participating in claiming this have any > concrete data or test cases that suggests something different? If yes, > I'm very curious to hear about it and willing fix it up, of course. > When I'm back from pto I'll prep and cook up my test suite to be > included into the selftests/bpf/, should have done this initially, > sorry about that. I'll also post something to expose the alg, that > sounds fine to me. Looking into your code closer, I noticed that you indeed seem to do the finalization of sha-1 by hand by aligning and padding the buffer accordingly and also patching in the necessary payload length. Apologies for my side for claiming that this is not correct sha1 output, I was only looking at sha_transform and its implementation and couldn't see the padding and finalization round with embedding the data length in there and hadn't thought of it being done manually. Anyway, is it difficult to get the sha finalization into some common code library? It is not very bpf specific and crypto code reviewers won't find it there at all. >>> >>> >>> Yes, sure, I'll rework it that way (early next year when I'm back if >>> that's fine with you). >> >> Can we make it SHA-256 before 4.10 comes out, though? This really >> looks like it will be used in situations where collisions matter and >> it will be exposed to malicious programs, and SHA-1 should not be used >> for new designs for this purpose because it simply isn't long enough. >> >> Also, a SHA-1 digest isn't a pile of u32s, so u32 digest[...] is very >> misleading. That should be u8 or, at the very least, __be32. >> >> I realize that there isn't a sha-256 implementation in lib, but would >> it really be so bad to make the bpf digest only work (for now) when >> crypto is enabled? I would *love* to see the crypto core learn how to >> export simple primitives for direct use without needing the whole >> crypto core, and this doesn't seem particularly hard to do, but I >> don't think that's 4.10 material. > > I'm going to try to send out RFC patches for all of this today or > tomorrow. It doesn't look bad at all. Factoring out sha3 to lib/ and use it as standalone and in crypto api doesn't seem hard, yep. I also proposed this to Daniel offlist. Bye, Hannes
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Fri, Dec 23, 2016 at 8:23 AM, Andy Lutomirskiwrote: > On Fri, Dec 23, 2016 at 3:59 AM, Daniel Borkmann wrote: >> On 12/23/2016 11:59 AM, Hannes Frederic Sowa wrote: >>> >>> On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote: On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote: > > On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: >> >> [...] >> > The hashing is not a proper sha1 neither, unfortunately. I think that > is why it will have a custom implementation in iproute2? Still trying to catch up on this admittedly bit confusing thread. I did run automated tests over couple of days comparing the data I got from fdinfo with the one from af_alg and found no mismatch on the test cases varying from min to max possible program sizes. In the process of testing, as you might have seen on netdev, I found couple of other bugs in bpf code along the way and fixed them up as well. So my question, do you or Andy or anyone participating in claiming this have any concrete data or test cases that suggests something different? If yes, I'm very curious to hear about it and willing fix it up, of course. When I'm back from pto I'll prep and cook up my test suite to be included into the selftests/bpf/, should have done this initially, sorry about that. I'll also post something to expose the alg, that sounds fine to me. >>> >>> >>> Looking into your code closer, I noticed that you indeed seem to do the >>> finalization of sha-1 by hand by aligning and padding the buffer >>> accordingly and also patching in the necessary payload length. >>> >>> Apologies for my side for claiming that this is not correct sha1 >>> output, I was only looking at sha_transform and its implementation and >>> couldn't see the padding and finalization round with embedding the data >>> length in there and hadn't thought of it being done manually. >>> >>> Anyway, is it difficult to get the sha finalization into some common >>> code library? It is not very bpf specific and crypto code reviewers >>> won't find it there at all. >> >> >> Yes, sure, I'll rework it that way (early next year when I'm back if >> that's fine with you). > > Can we make it SHA-256 before 4.10 comes out, though? This really > looks like it will be used in situations where collisions matter and > it will be exposed to malicious programs, and SHA-1 should not be used > for new designs for this purpose because it simply isn't long enough. > > Also, a SHA-1 digest isn't a pile of u32s, so u32 digest[...] is very > misleading. That should be u8 or, at the very least, __be32. > > I realize that there isn't a sha-256 implementation in lib, but would > it really be so bad to make the bpf digest only work (for now) when > crypto is enabled? I would *love* to see the crypto core learn how to > export simple primitives for direct use without needing the whole > crypto core, and this doesn't seem particularly hard to do, but I > don't think that's 4.10 material. I'm going to try to send out RFC patches for all of this today or tomorrow. It doesn't look bad at all. --Andy
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Fri, Dec 23, 2016 at 3:59 AM, Daniel Borkmannwrote: > On 12/23/2016 11:59 AM, Hannes Frederic Sowa wrote: >> >> On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote: >>> >>> On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote: On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: > > [...] > The hashing is not a proper sha1 neither, unfortunately. I think that is why it will have a custom implementation in iproute2? >>> >>> >>> Still trying to catch up on this admittedly bit confusing thread. I >>> did run automated tests over couple of days comparing the data I got >>> from fdinfo with the one from af_alg and found no mismatch on the test >>> cases varying from min to max possible program sizes. In the process >>> of testing, as you might have seen on netdev, I found couple of other >>> bugs in bpf code along the way and fixed them up as well. So my question, >>> do you or Andy or anyone participating in claiming this have any >>> concrete data or test cases that suggests something different? If yes, >>> I'm very curious to hear about it and willing fix it up, of course. >>> When I'm back from pto I'll prep and cook up my test suite to be >>> included into the selftests/bpf/, should have done this initially, >>> sorry about that. I'll also post something to expose the alg, that >>> sounds fine to me. >> >> >> Looking into your code closer, I noticed that you indeed seem to do the >> finalization of sha-1 by hand by aligning and padding the buffer >> accordingly and also patching in the necessary payload length. >> >> Apologies for my side for claiming that this is not correct sha1 >> output, I was only looking at sha_transform and its implementation and >> couldn't see the padding and finalization round with embedding the data >> length in there and hadn't thought of it being done manually. >> >> Anyway, is it difficult to get the sha finalization into some common >> code library? It is not very bpf specific and crypto code reviewers >> won't find it there at all. > > > Yes, sure, I'll rework it that way (early next year when I'm back if > that's fine with you). Can we make it SHA-256 before 4.10 comes out, though? This really looks like it will be used in situations where collisions matter and it will be exposed to malicious programs, and SHA-1 should not be used for new designs for this purpose because it simply isn't long enough. Also, a SHA-1 digest isn't a pile of u32s, so u32 digest[...] is very misleading. That should be u8 or, at the very least, __be32. I realize that there isn't a sha-256 implementation in lib, but would it really be so bad to make the bpf digest only work (for now) when crypto is enabled? I would *love* to see the crypto core learn how to export simple primitives for direct use without needing the whole crypto core, and this doesn't seem particularly hard to do, but I don't think that's 4.10 material. --Andy
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On 12/23/2016 11:59 AM, Hannes Frederic Sowa wrote: On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote: On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote: On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: [...] The hashing is not a proper sha1 neither, unfortunately. I think that is why it will have a custom implementation in iproute2? Still trying to catch up on this admittedly bit confusing thread. I did run automated tests over couple of days comparing the data I got from fdinfo with the one from af_alg and found no mismatch on the test cases varying from min to max possible program sizes. In the process of testing, as you might have seen on netdev, I found couple of other bugs in bpf code along the way and fixed them up as well. So my question, do you or Andy or anyone participating in claiming this have any concrete data or test cases that suggests something different? If yes, I'm very curious to hear about it and willing fix it up, of course. When I'm back from pto I'll prep and cook up my test suite to be included into the selftests/bpf/, should have done this initially, sorry about that. I'll also post something to expose the alg, that sounds fine to me. Looking into your code closer, I noticed that you indeed seem to do the finalization of sha-1 by hand by aligning and padding the buffer accordingly and also patching in the necessary payload length. Apologies for my side for claiming that this is not correct sha1 output, I was only looking at sha_transform and its implementation and couldn't see the padding and finalization round with embedding the data length in there and hadn't thought of it being done manually. Anyway, is it difficult to get the sha finalization into some common code library? It is not very bpf specific and crypto code reviewers won't find it there at all. Yes, sure, I'll rework it that way (early next year when I'm back if that's fine with you). Thanks, Daniel
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote: > On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote: > > On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: > > > On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa > > >wrote: > > > > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote: > > > > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa > > > > > wrote: > > > > > > IPv6 you cannot touch anymore. The hashing algorithm is part of > > > > > > uAPI. > > > > > > You don't want to give people new IPv6 addresses with the same > > > > > > stable > > > > > > secret (across reboots) after a kernel upgrade. Maybe they lose > > > > > > connectivity then and it is extra work? > > > > > > > > > > Ahh, too bad. So it goes. > > > > > > > > If no other users survive we can put it into the ipv6 module. > > > > > > > > > > The bpf hash stuff can be changed during this merge window, as it is > > > > > > not yet in a released kernel. Albeit I would probably have preferred > > > > > > something like sha256 here, which can be easily replicated by user > > > > > > space tools (minus the problem of patching out references to not > > > > > > hashable data, which must be zeroed). > > > > > > > > > > Oh, interesting, so time is of the essence then. Do you want to handle > > > > > changing the new eBPF code to something not-SHA1 before it's too late, > > > > > as part of a ne > > > > > > w patchset that can fast track itself to David? And > > > > > then I can preserve my large series for the next merge window. > > > > > > > > This algorithm should be a non-seeded algorithm, because the hashes > > > > should be stable and verifiable by user space tooling. Thus this would > > > > need a hashing algorithm that is hardened against pre-image > > > > attacks/collision resistance, which siphash is not. I would prefer some > > > > higher order SHA algorithm for that actually. > > > > > > > > > > You mean: > > > > > > commit 7bd509e311f408f7a5132fcdde2069af65fa05ae > > > Author: Daniel Borkmann > > > Date: Sun Dec 4 23:19:41 2016 +0100 > > > > > > bpf: add prog_digest and expose it via fdinfo/netlink > > > > > > > > > Yes, please! This actually matters for security -- imagine a > > > malicious program brute-forcing a collision so that it gets loaded > > > wrong. And this is IMO a use case for SHA-256 or SHA-512/256 > > > (preferably the latter). Speed basically doesn't matter here and > > > Blake2 is both less stable (didn't they slightly change it recently?) > > > and much less well studied. > > > > We don't prevent ebpf programs being loaded based on the digest but > > just to uniquely identify loaded programs from user space and match up > > with their source. > > > > There have been talks about signing bpf programs, thus this would > > probably need another digest algorithm additionally to that one, > > wasting probably instructions. Probably going somewhere in direction of > > PKCS#7 might be the thing to do (which leads to the problem to make > > PKCS#7 attackable by ordinary unpriv users, hmpf). > > > > > My inclination would have been to seed them with something that isn't > > > exposed to userspace for the precise reason that it would prevent user > > > code from making assumptions about what's in the hash. But if there's > > > a use case for why user code needs to be able to calculate the hash on > > > its own, then that's fine. But perhaps the actual fdinfo string > > > should be "sha256:abcd1234..." to give some flexibility down the road. > > > > > > Also: > > > > > > + result = (__force __be32 *)fp->digest; > > > + for (i = 0; i < SHA_DIGEST_WORDS; i++) > > > + result[i] = cpu_to_be32(fp->digest[i]); > > > > > > Everyone, please, please, please don't open-code crypto primitives. > > > Is this and the code above it even correct? It might be but on a very > > > brief glance it looks wrong to me. If you're doing this to avoid > > > depending on crypto, then fix crypto so you can pull in the algorithm > > > without pulling in the whole crypto core. > > > > The hashing is not a proper sha1 neither, unfortunately. I think that > > is why it will have a custom implementation in iproute2? > > Still trying to catch up on this admittedly bit confusing thread. I > did run automated tests over couple of days comparing the data I got > from fdinfo with the one from af_alg and found no mismatch on the test > cases varying from min to max possible program sizes. In the process > of testing, as you might have seen on netdev, I found couple of other > bugs in bpf code along the way and fixed them up as well. So my question, > do you or Andy or anyone participating in claiming this have any > concrete data or test cases that suggests something different? If yes, > I'm very curious to hear about it and willing fix it up, of course. > When I'm back from pto I'll prep
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On 12/22/2016 06:25 PM, Andy Lutomirski wrote: On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa [...] I wondered if bpf program loading should have used the module loading infrastructure from the beginning... That would be way too complicated and would be nasty for the unprivileged cases. Also, there are users be it privileged or not that don't require to have a full obj loader from user space, but are fine with just hard-coding parts or all of the insns in their application. Back then we settled with using fds based on Andy's suggestion, it has both ups and downs as we saw along the way but worked okay thus far.
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote: On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowawrote: On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote: On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa wrote: IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI. You don't want to give people new IPv6 addresses with the same stable secret (across reboots) after a kernel upgrade. Maybe they lose connectivity then and it is extra work? Ahh, too bad. So it goes. If no other users survive we can put it into the ipv6 module. The bpf hash stuff can be changed during this merge window, as it is not yet in a released kernel. Albeit I would probably have preferred something like sha256 here, which can be easily replicated by user space tools (minus the problem of patching out references to not hashable data, which must be zeroed). Oh, interesting, so time is of the essence then. Do you want to handle changing the new eBPF code to something not-SHA1 before it's too late, as part of a ne w patchset that can fast track itself to David? And then I can preserve my large series for the next merge window. This algorithm should be a non-seeded algorithm, because the hashes should be stable and verifiable by user space tooling. Thus this would need a hashing algorithm that is hardened against pre-image attacks/collision resistance, which siphash is not. I would prefer some higher order SHA algorithm for that actually. You mean: commit 7bd509e311f408f7a5132fcdde2069af65fa05ae Author: Daniel Borkmann Date: Sun Dec 4 23:19:41 2016 +0100 bpf: add prog_digest and expose it via fdinfo/netlink Yes, please! This actually matters for security -- imagine a malicious program brute-forcing a collision so that it gets loaded wrong. And this is IMO a use case for SHA-256 or SHA-512/256 (preferably the latter). Speed basically doesn't matter here and Blake2 is both less stable (didn't they slightly change it recently?) and much less well studied. We don't prevent ebpf programs being loaded based on the digest but just to uniquely identify loaded programs from user space and match up with their source. There have been talks about signing bpf programs, thus this would probably need another digest algorithm additionally to that one, wasting probably instructions. Probably going somewhere in direction of PKCS#7 might be the thing to do (which leads to the problem to make PKCS#7 attackable by ordinary unpriv users, hmpf). My inclination would have been to seed them with something that isn't exposed to userspace for the precise reason that it would prevent user code from making assumptions about what's in the hash. But if there's a use case for why user code needs to be able to calculate the hash on its own, then that's fine. But perhaps the actual fdinfo string should be "sha256:abcd1234..." to give some flexibility down the road. Also: + result = (__force __be32 *)fp->digest; + for (i = 0; i < SHA_DIGEST_WORDS; i++) + result[i] = cpu_to_be32(fp->digest[i]); Everyone, please, please, please don't open-code crypto primitives. Is this and the code above it even correct? It might be but on a very brief glance it looks wrong to me. If you're doing this to avoid depending on crypto, then fix crypto so you can pull in the algorithm without pulling in the whole crypto core. The hashing is not a proper sha1 neither, unfortunately. I think that is why it will have a custom implementation in iproute2? Still trying to catch up on this admittedly bit confusing thread. I did run automated tests over couple of days comparing the data I got from fdinfo with the one from af_alg and found no mismatch on the test cases varying from min to max possible program sizes. In the process of testing, as you might have seen on netdev, I found couple of other bugs in bpf code along the way and fixed them up as well. So my question, do you or Andy or anyone participating in claiming this have any concrete data or test cases that suggests something different? If yes, I'm very curious to hear about it and willing fix it up, of course. When I'm back from pto I'll prep and cook up my test suite to be included into the selftests/bpf/, should have done this initially, sorry about that. I'll also post something to expose the alg, that sounds fine to me. I wondered if bpf program loading should have used the module loading infrastructure from the beginning... At the very least, there should be a separate function that calculates the hash of a buffer and that function should explicitly run itself against test vectors of various lengths to make sure that it's calculating what it claims to be calculating. And it doesn't look like the userspace code has landed, so, if this thing isn't calculating SHA1 correctly, it's
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On 22.12.2016 20:56, Andy Lutomirski wrote: > It's also not quite clear to me why userspace needs to be able to > calculate the digest on its own. A bpf(BPF_CALC_PROGRAM_DIGEST) > command that takes a BPF program as input and hashes it would seem to > serve the same purpose, and that would allow the kernel to key the > digest and change the algorithm down the road without breaking things. I think that people expect digests of BPF programs to be stable over time and reboots.
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Thu, Dec 22, 2016 at 11:34 AM, Alexei Starovoitovwrote: > On Thu, Dec 22, 2016 at 9:25 AM, Andy Lutomirski wrote: >> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa >> wrote: >>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: >>> >>> We don't prevent ebpf programs being loaded based on the digest but >>> just to uniquely identify loaded programs from user space and match up >>> with their source. >> >> The commit log talks about using the hash to see if the program has >> already been compiled and JITted. If that's done, then a collision >> will directly cause the kernel to malfunction. > > Andy, please read the code. > we could have used jhash there just as well. > Collisions are fine. There's relevant in the code to read yet AFAICS. The code exports it via fdinfo, and userspace is expected to do something with it. The commit message says: When programs are pinned and retrieved by an ELF loader, the loader can check the program's digest through fdinfo and compare it against one that was generated over the ELF file's program section to see if the program needs to be reloaded. I assume this means that a userspace component is expected to compare the digest of a loaded program to a digest of a program it wants to load and to use the result of the comparison to decide whether the programs are the same. If that's indeed the case (and it sure sounds like it, and I fully expect CRIU to do very similar things when support is added), then malicious collisions do matter. It's also not quite clear to me why userspace needs to be able to calculate the digest on its own. A bpf(BPF_CALC_PROGRAM_DIGEST) command that takes a BPF program as input and hashes it would seem to serve the same purpose, and that would allow the kernel to key the digest and change the algorithm down the road without breaking things. Regardless, adding a new hash algorithm that is almost-but-not-quite SHA-1 and making it a stable interface to userspace is not a good thing. --Andy
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Thu, Dec 22, 2016 at 9:25 AM, Andy Lutomirskiwrote: > On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa > wrote: >> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: >> >> We don't prevent ebpf programs being loaded based on the digest but >> just to uniquely identify loaded programs from user space and match up >> with their source. > > The commit log talks about using the hash to see if the program has > already been compiled and JITted. If that's done, then a collision > will directly cause the kernel to malfunction. Andy, please read the code. we could have used jhash there just as well. Collisions are fine.
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Thu, Dec 22, 2016 at 5:59 PM, Hannes Frederic Sowawrote: > We don't prevent ebpf programs being loaded based on the digest but > just to uniquely identify loaded programs from user space and match up > with their source. Okay, so in that case, a weak hashing function like SHA1 could result in a real vulnerability. Therefore, this SHA1 stuff needs to be reverted immediately, pending a different implementation. If this has ever shipped in a kernel version, it could even deserve a CVE. No SHA1! > The hashing is not a proper sha1 neither, unfortunately. I think that > is why it will have a custom implementation in iproute2? Jeepers creepers. So for some ungodly reason, LKML has invented yet another homebrewed crypto primitive. This story really gets more horrifying every day. No bueno. So yea, let's revert and re-commit (repeal and replace? just kidding...). Out with SHA-1, in with Blake2 or SHA2. Jason
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Thu, 2016-12-22 at 09:25 -0800, Andy Lutomirski wrote: > On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa >wrote: > > On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: > > > > > > You mean: > > > > > > commit 7bd509e311f408f7a5132fcdde2069af65fa05ae > > > Author: Daniel Borkmann > > > Date: Sun Dec 4 23:19:41 2016 +0100 > > > > > > bpf: add prog_digest and expose it via fdinfo/netlink > > > > > > > > > Yes, please! This actually matters for security -- imagine a > > > malicious program brute-forcing a collision so that it gets loaded > > > wrong. And this is IMO a use case for SHA-256 or SHA-512/256 > > > (preferably the latter). Speed basically doesn't matter here and > > > Blake2 is both less stable (didn't they slightly change it recently?) > > > and much less well studied. > > > > We don't prevent ebpf programs being loaded based on the digest but > > just to uniquely identify loaded programs from user space and match up > > with their source. > > The commit log talks about using the hash to see if the program has > already been compiled and JITted. If that's done, then a collision > will directly cause the kernel to malfunction. Yeah, it still shouldn't crash the kernel but it could cause malfunctions because assumptions are not met from user space thus it could act in a strange way: My personal biggest concern is that users of this API will at some point in time assume this digist is unique (as a key itself for a hashtables f.e.), while it is actually not (and not enforced so by the kernel). If you can get an unpriv ebpf program inserted to the kernel with the same weak hash, a controller daemon could pick it up and bind it to another ebpf hook, probably outside of the unpriv realm the user was in before. Only the sorting matters, which might be unstable and is not guaranteed by anything in most hash table based data structures. The API seems flawed to me. > > > My inclination would have been to seed them with something that isn't > > > exposed to userspace for the precise reason that it would prevent user > > > code from making assumptions about what's in the hash. But if there's > > > a use case for why user code needs to be able to calculate the hash on > > > its own, then that's fine. But perhaps the actual fdinfo string > > > should be "sha256:abcd1234..." to give some flexibility down the road. To add to this, I am very much in favor of that. Right now it doesn't have a name because it is a custom algorithm. ;) > > > > > > Also: > > > > > > + result = (__force __be32 *)fp->digest; > > > + for (i = 0; i < SHA_DIGEST_WORDS; i++) > > > + result[i] = cpu_to_be32(fp->digest[i]); > > > > > > Everyone, please, please, please don't open-code crypto primitives. > > > Is this and the code above it even correct? It might be but on a very > > > brief glance it looks wrong to me. If you're doing this to avoid > > > depending on crypto, then fix crypto so you can pull in the algorithm > > > without pulling in the whole crypto core. > > > > The hashing is not a proper sha1 neither, unfortunately. I think that > > is why it will have a custom implementation in iproute2? > > Putting on crypto hat: > > NAK NAK NAK NAK NAK. "The Linux kernel invented a new primitive in > 2016 when people know better and is going to handle it by porting that > new primitive to userspace" is not a particularly good argument. > > Okay, crypto hack back off. > > > > > I wondered if bpf program loading should have used the module loading > > infrastructure from the beginning... > > That would be way too complicated and would be nasty for the unprivileged > cases. I was more or less just thinking about using the syscalls and user space representation not the generic infrastructure, as it is anyway too much concerned with real kernel modules (would probably also solve your cgroup-ebpf thread, as it can be passed by unique name or name and hash ;) ). Anyway... > > > At the very least, there should be a separate function that calculates > > > the hash of a buffer and that function should explicitly run itself > > > against test vectors of various lengths to make sure that it's > > > calculating what it claims to be calculating. And it doesn't look > > > like the userspace code has landed, so, if this thing isn't > > > calculating SHA1 correctly, it's plausible that no one has noticed. > > > > I hope this was known from the beginning, this is not sha1 unfortunately. > > > > But ebpf elf programs also need preprocessing to get rid of some > > embedded load-depending data, so maybe it was considered to be just > > enough? > > I suspect it was actually an accident. Maybe, I don't know. Bye, Hannes
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowawrote: > On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: >> On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa >> wrote: >> > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote: >> > > Hi Hannes, >> > > >> > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa >> > > wrote: >> > > > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI. >> > > > You don't want to give people new IPv6 addresses with the same stable >> > > > secret (across reboots) after a kernel upgrade. Maybe they lose >> > > > connectivity then and it is extra work? >> > > >> > > Ahh, too bad. So it goes. >> > >> > If no other users survive we can put it into the ipv6 module. >> > >> > > > The bpf hash stuff can be changed during this merge window, as it is >> > > > not yet in a released kernel. Albeit I would probably have preferred >> > > > something like sha256 here, which can be easily replicated by user >> > > > space tools (minus the problem of patching out references to not >> > > > hashable data, which must be zeroed). >> > > >> > > Oh, interesting, so time is of the essence then. Do you want to handle >> > > changing the new eBPF code to something not-SHA1 before it's too late, >> > > as part of a ne >> >> w patchset that can fast track itself to David? And >> > > then I can preserve my large series for the next merge window. >> > >> > This algorithm should be a non-seeded algorithm, because the hashes >> > should be stable and verifiable by user space tooling. Thus this would >> > need a hashing algorithm that is hardened against pre-image >> > attacks/collision resistance, which siphash is not. I would prefer some >> > higher order SHA algorithm for that actually. >> > >> >> You mean: >> >> commit 7bd509e311f408f7a5132fcdde2069af65fa05ae >> Author: Daniel Borkmann >> Date: Sun Dec 4 23:19:41 2016 +0100 >> >> bpf: add prog_digest and expose it via fdinfo/netlink >> >> >> Yes, please! This actually matters for security -- imagine a >> malicious program brute-forcing a collision so that it gets loaded >> wrong. And this is IMO a use case for SHA-256 or SHA-512/256 >> (preferably the latter). Speed basically doesn't matter here and >> Blake2 is both less stable (didn't they slightly change it recently?) >> and much less well studied. > > We don't prevent ebpf programs being loaded based on the digest but > just to uniquely identify loaded programs from user space and match up > with their source. The commit log talks about using the hash to see if the program has already been compiled and JITted. If that's done, then a collision will directly cause the kernel to malfunction. >> My inclination would have been to seed them with something that isn't >> exposed to userspace for the precise reason that it would prevent user >> code from making assumptions about what's in the hash. But if there's >> a use case for why user code needs to be able to calculate the hash on >> its own, then that's fine. But perhaps the actual fdinfo string >> should be "sha256:abcd1234..." to give some flexibility down the road. >> >> Also: >> >> + result = (__force __be32 *)fp->digest; >> + for (i = 0; i < SHA_DIGEST_WORDS; i++) >> + result[i] = cpu_to_be32(fp->digest[i]); >> >> Everyone, please, please, please don't open-code crypto primitives. >> Is this and the code above it even correct? It might be but on a very >> brief glance it looks wrong to me. If you're doing this to avoid >> depending on crypto, then fix crypto so you can pull in the algorithm >> without pulling in the whole crypto core. > > The hashing is not a proper sha1 neither, unfortunately. I think that > is why it will have a custom implementation in iproute2? Putting on crypto hat: NAK NAK NAK NAK NAK. "The Linux kernel invented a new primitive in 2016 when people know better and is going to handle it by porting that new primitive to userspace" is not a particularly good argument. Okay, crypto hack back off. > > I wondered if bpf program loading should have used the module loading > infrastructure from the beginning... That would be way too complicated and would be nasty for the unprivileged cases. > >> At the very least, there should be a separate function that calculates >> the hash of a buffer and that function should explicitly run itself >> against test vectors of various lengths to make sure that it's >> calculating what it claims to be calculating. And it doesn't look >> like the userspace code has landed, so, if this thing isn't >> calculating SHA1 correctly, it's plausible that no one has noticed. > > I hope this was known from the beginning, this is not sha1 unfortunately. > > But ebpf elf programs also need preprocessing to get rid of some > embedded load-depending data, so maybe it was considered to be just > enough? I
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: > On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa >wrote: > > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote: > > > Hi Hannes, > > > > > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa > > > wrote: > > > > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI. > > > > You don't want to give people new IPv6 addresses with the same stable > > > > secret (across reboots) after a kernel upgrade. Maybe they lose > > > > connectivity then and it is extra work? > > > > > > Ahh, too bad. So it goes. > > > > If no other users survive we can put it into the ipv6 module. > > > > > > The bpf hash stuff can be changed during this merge window, as it is > > > > not yet in a released kernel. Albeit I would probably have preferred > > > > something like sha256 here, which can be easily replicated by user > > > > space tools (minus the problem of patching out references to not > > > > hashable data, which must be zeroed). > > > > > > Oh, interesting, so time is of the essence then. Do you want to handle > > > changing the new eBPF code to something not-SHA1 before it's too late, > > > as part of a ne > > w patchset that can fast track itself to David? And > > > then I can preserve my large series for the next merge window. > > > > This algorithm should be a non-seeded algorithm, because the hashes > > should be stable and verifiable by user space tooling. Thus this would > > need a hashing algorithm that is hardened against pre-image > > attacks/collision resistance, which siphash is not. I would prefer some > > higher order SHA algorithm for that actually. > > > > You mean: > > commit 7bd509e311f408f7a5132fcdde2069af65fa05ae > Author: Daniel Borkmann > Date: Sun Dec 4 23:19:41 2016 +0100 > > bpf: add prog_digest and expose it via fdinfo/netlink > > > Yes, please! This actually matters for security -- imagine a > malicious program brute-forcing a collision so that it gets loaded > wrong. And this is IMO a use case for SHA-256 or SHA-512/256 > (preferably the latter). Speed basically doesn't matter here and > Blake2 is both less stable (didn't they slightly change it recently?) > and much less well studied. We don't prevent ebpf programs being loaded based on the digest but just to uniquely identify loaded programs from user space and match up with their source. There have been talks about signing bpf programs, thus this would probably need another digest algorithm additionally to that one, wasting probably instructions. Probably going somewhere in direction of PKCS#7 might be the thing to do (which leads to the problem to make PKCS#7 attackable by ordinary unpriv users, hmpf). > My inclination would have been to seed them with something that isn't > exposed to userspace for the precise reason that it would prevent user > code from making assumptions about what's in the hash. But if there's > a use case for why user code needs to be able to calculate the hash on > its own, then that's fine. But perhaps the actual fdinfo string > should be "sha256:abcd1234..." to give some flexibility down the road. > > Also: > > + result = (__force __be32 *)fp->digest; > + for (i = 0; i < SHA_DIGEST_WORDS; i++) > + result[i] = cpu_to_be32(fp->digest[i]); > > Everyone, please, please, please don't open-code crypto primitives. > Is this and the code above it even correct? It might be but on a very > brief glance it looks wrong to me. If you're doing this to avoid > depending on crypto, then fix crypto so you can pull in the algorithm > without pulling in the whole crypto core. The hashing is not a proper sha1 neither, unfortunately. I think that is why it will have a custom implementation in iproute2? I wondered if bpf program loading should have used the module loading infrastructure from the beginning... > At the very least, there should be a separate function that calculates > the hash of a buffer and that function should explicitly run itself > against test vectors of various lengths to make sure that it's > calculating what it claims to be calculating. And it doesn't look > like the userspace code has landed, so, if this thing isn't > calculating SHA1 correctly, it's plausible that no one has noticed. I hope this was known from the beginning, this is not sha1 unfortunately. But ebpf elf programs also need preprocessing to get rid of some embedded load-depending data, so maybe it was considered to be just enough? Bye, Hannes
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Thu, Dec 22, 2016 at 8:28 AM, Jason A. Donenfeldwrote: > Hi all, > > I don't know what your design requirements are for this. It looks like > you're generating some kind of crypto digest of a program, and you > need to avoid collisions. If you'd like to go with a PRF (keyed hash > function) that uses some kernel secret key, then I'd strongly suggest > using Keyed-Blake2. Alternatively, if you need for userspace to be > able to calculate the same hash, and don't want to use some kernel > secret, then I'd still suggest using Blake2, which will be fast and > secure. > > If you can wait until January, I'll work on a commit adding the > primitive to the tree. I've already written it and I just need to get > things cleaned up. > >> Blake2 is both less stable (didn't they slightly change it recently?) > > No, Blake2 is very stable. It's also extremely secure and has been > extensively studied. Not to mention it's faster than SHA2. And if you > want to use it as a PRF, it's obvious better suited and faster to use > Blake2's keyed PRF mode than HMAC-SHA2. > > If you don't care about performance, and you don't want to use a PRF, > then just use SHA2-256. If you're particularly concerned about certain > types of attacks, you could go with SHA2-512 truncated to 256 bytes, > but somehow I doubt you need this. I don't think this cares about performance. (Well, it cares about performance, but the verifier will likely dominiate the cost by such a large margin that the hash algo doesn't matter.) And staying FIPS-compliant-ish is worth a little bit, so I'd advocate for something in the SHA2 family. > If userspace hasn't landed, can we get away with changing this code > after 4.10? Or should we just fix it before 4.10? Or should we revert > it before 4.10? Development-policy-things like this I have zero clue > about, so I heed to your guidance. I think it should be fixed or reverted before 4.10. --Andy
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
Hi all, I don't know what your design requirements are for this. It looks like you're generating some kind of crypto digest of a program, and you need to avoid collisions. If you'd like to go with a PRF (keyed hash function) that uses some kernel secret key, then I'd strongly suggest using Keyed-Blake2. Alternatively, if you need for userspace to be able to calculate the same hash, and don't want to use some kernel secret, then I'd still suggest using Blake2, which will be fast and secure. If you can wait until January, I'll work on a commit adding the primitive to the tree. I've already written it and I just need to get things cleaned up. > Blake2 is both less stable (didn't they slightly change it recently?) No, Blake2 is very stable. It's also extremely secure and has been extensively studied. Not to mention it's faster than SHA2. And if you want to use it as a PRF, it's obvious better suited and faster to use Blake2's keyed PRF mode than HMAC-SHA2. If you don't care about performance, and you don't want to use a PRF, then just use SHA2-256. If you're particularly concerned about certain types of attacks, you could go with SHA2-512 truncated to 256 bytes, but somehow I doubt you need this. Anyway, the take away is: stop using SHA1. It's time has come. > Everyone, please, please, please don't open-code crypto primitives. > Is this and the code above it even correct? It might be but on a very I shuttered looking at this too. How did this possibly make it past review? The attitude toward crypto here is generally *terrifying*. Unless you're a professional cryptographer, the only correct attitude to have is a careful and conservative one. > At the very least, there should be a separate function that calculates > the hash of a buffer and that function should explicitly run itself > against test vectors of various lengths to make sure that it's > calculating what it claims to be calculating. And it doesn't look > like the userspace code has landed, so, if this thing isn't > calculating SHA1 correctly, it's plausible that no one has noticed. If userspace hasn't landed, can we get away with changing this code after 4.10? Or should we just fix it before 4.10? Or should we revert it before 4.10? Development-policy-things like this I have zero clue about, so I heed to your guidance. Jason