> Date: Sat, 15 Aug 2020 10:23:02 +0000 > From: nia <n...@netbsd.org> > > Obviously, I disagree with core's decision, but let's try to be > productive about this. > > I'm happy to have getrandom in NetBSD, it's a good thing. But not with > this behaviour. > > 1) Adopting getrandom for compatibility does not make sense. > > NetBSD's behaviour for getrandom(x, y, 0) is incompatible with Linux > and FreeBSD _at least_ - they will unblock after the kernel receives > an arbitrary amount of "random-ish" data. NetBSD will block forever > until the sysadmin intervenes (by writing to /dev/random or attaching > a forensically analyzed HWRNG, or rebooting with a seed file).
- The behaviour is compatible in the sense that the getrandom calls that _can_ lead to blocking are the same: a getrandom call that may block on NetBSD may also block on Linux/FreeBSD/&c.; a getrandom call that is guaranteed never to block on Linux/FreeBSD/&c. is guaranteed never to block on NetBSD. - The behaviour is compatible in the sense that if getrandom blocks, then it unblocks when the operating system has decided there is adequate entropy. - The behaviour is incompatible only in the sense that NetBSD's idea of `adequate entropy' is stronger than FreeBSD's or Linux's, so blocking is _more likely_ on NetBSD than on FreeBSD or Linux. The difference manifests in a user-visible way primarily only on systems where users are actually in danger of not having adequate entropy -- in other words, on systems where the signal of an alarm might actually amtter. I would like to put effort toward addressing that by making it easier to provide adequate entropy rather than by papering over the alarm. > NetBSD's behaviour for GRND_RANDOM is incompatible with FreeBSD, > which treats it the same as getrandom(x, y, 0). Why do you say this is incompatible? getrandom(...,GRND_RANDOM) just makes fewer promises than getrandom(...,0) as a portable API -- it _may_ block more often and it _may_ return short. In practice, on NetBSD it only blocks when getrandom(...,0) would block too. If FreeBSD makes _more_ promises, fine, but the GRND_RANDOM flag a silly API that exists only for Linux source compatibility that very few reasonable applications use. So I don't see why it's important to put any attention on it or make any stronger promises about it than portable applications can rely on -- that's why, e.g., the man page I wrote specifically calls it out as silly, not recommended, for Linux source compatibility only, and with no usage examples. > 2) The main problem raised with getentropy is that Solaris has a buggy > implementation that projects such as Python were seeking to avoid > (because it blocked a lot, and they preferred something that > wouldn't). The main problem raised with getentropy is that between four different operating systems (OpenBSD, Linux, FreeBSD, Solaris) there seemed to be three different behaviours around blocking (block never, block at boot, block often). That's not good for a portable API, particularly one which was originally defined never to block, period. I'm not saying I disagree with adopting getentropy. I'm just saying that _as a portable API_ its semantics is murkier than getrandom's, despite the additional complexity of flags in getrandom. Indeed, I made an argument, based on a survey of how entropy pool initialization and unblocking works across different operating systems, for adopting getentropy(p,n) == getrandom(p,n,GRND_INSECURE) as you're suggesting: https://mail-index.netbsd.org/tech-userlevel/2020/05/09/msg012390.html But there are reasonable counterarguments too, as gson raised: https://mail-index.netbsd.org/tech-userlevel/2020/05/10/msg012397.html So my somewhat elaborate argument isn't strong enough for me to want to push for it one way or another. Sure would be nice if every computer just had a reliable HWRNG! But alas. > 4) The original argument that we need the getrandom(x, y, 0) behaviour > to please Rust does not make sense, since Rust's randomness library > now uses never-blocking APIs on both OpenBSD and NetBSD. Same for > OpenSSL. The Rust API specifically describes getrandom(p,n,0) semantics: https://docs.rs/rand/0.7.3/rand/rngs/struct.OsRng.html `It is possible that when used during early boot the first call to OsRng will block until the system's RNG is initialised. It is also possible (though highly unlikely) for OsRng to fail on some platforms, most likely due to system mis-configuration. `After the first successful call, it is highly unlikely that failures or significant delays will occur (although performance should be expected to be much slower than a user-space PRNG).' Obviously we can patch OpenSSL in base however we like, but at least one OpenSSL developer reported being uncomfortable with having getrandom(p,n,GNRD_INSECURE) semantics for getentropy if used upstream: https://mail-index.netbsd.org/tech-userlevel/2020/05/02/msg012334.html `If you make getentropy the insecure version, I will need to modify OpenSSL to switch to getrandom() on NetBSD.' To be clear, I'm not saying that getrandom(p,n,GRND_INSECURE) semantics is necessarily _wrong_ for these libraries, absent further context -- just that getrandom(p,n,0) semantics more clearly meets the security expectations of cryptography engineers. > So, I suggest: > > 1) Make the RANDOM case for getrandom an alias for the default behaviour, > as FreeBSD also does. It's just a nail for unsuspecting software to > step on, and we shouldn't be copying bad ideas from Linux into our > own syscalls. Can you identify an existing application that actually behaves badly with GRND_RANDOM as currently implemented, but reliably behaves well on other systems? I expect most applications just don't bother with GRND_RANDOM but I haven't surveyed. > 2) Add a sysctl knob to disable getrandom's blocking behaviour, for > systems without a forensically analyzed HWRNG. This provides an > obvious way to ensure the system doesn't block after entropy is > consolidated after we enter userland. Writing to /dev/random is > non-obvious. What's the benefit of writing sysctl -w kern.entropy.dontblock=1 vs dd if=/dev/urandom of=/dev/random bs=32 count=1 in an rc script? What would make one more obvious than the other, if they can be written in the same place in documentation? I think we already have too many knobs and bells and whistles here and I would like to limit new ones to have really good justification. > On these systems losing the on-disk seed file is a > critical error case that will cause blocking, currently. Yes. Is the seed file getting lost in practice? If yes, that's a real security problem -- blocking is a symptom of the real security problem which is lacking underlying entropy, and I would like to focus effort on fixing the problem rather than just the symptom. So if you've seen it get lost -- do you know how it might have been lost? (I realize on netbsd<=8 any crash or unclean shutdown would lose it, but we fixed that before netbsd-9 was released.)