Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
On 17 Oct 2020, at 6:24, Jason A. Donenfeld wrote: There are a few design goals of notifying userspace: it should be fast, because people who are using userspace RNGs are usually doing so in the first place to completely avoid syscall overhead for whatever high performance application they have - e.g. I recall conversations with Colm about his TLS implementation needing to make random IVs _really_ fast. That’s our old friend TLS1.1 in CBC mode, which needs a random explicit IV for every record sent. Speed is still a reason at the margins in cases like that, but getrandom() is really fast. A stickier problem is that getrandom() is not certified for use with every compliance standard, and those often dictate precise use of some NIST DRBG or NRBG construction. That keeps people proliferating user-space RNGs even when speed isn’t as important. It should also happen as early as possible, with no race or as minimal as possible race window, so that userspace doesn't begin using old randomness and then switch over after the damage is already done. +1 to this, and I’d add that anyone making VM snapshots that they plan to restore from multiple times really needs to think this through top to bottom. The system would likely need to be put in to some kind of quiescent state when the snapshot is taken. So, anyway, here are a few options with some pros and cons for the kernel notifying userspace that its RNG should reseed. 1. SIGRND - a new signal. Lol. 2. Userspace opens a file descriptor that it can epoll on. Pros are that many notification mechanisms already use this. Cons is that this requires syscall and might be more racy than we want. Another con is that this a new thing for userspace programs to do. A library like OpenSSL or BoringSSL also has to account for running inside a chroot, which also makes this hard. Any thoughts on 4c? Is that utterly insane, or does that actually get us somewhere close to what we want? I still like 4c, and as a user-space crypto-person, and a VM person, they have a lot of appeal. Alex and Adrian’s replies get into some of the sufficiency challenge. But for user-space libraries like the *SSLs, the JVMs, and other runtimes where RNGs show up, it could plug in easily enough. - Colm
Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
On 16 Oct 2020, at 21:02, Jann Horn wrote: On Sat, Oct 17, 2020 at 5:36 AM Willy Tarreau wrote: But in userspace, we just need a simple counter. There's no need for us to worry about anything else, like timestamps or whatever. If we repeatedly fork a paused VM, the forked VMs will see the same counter value, but that's totally fine, because the only thing that matters to userspace is that the counter changes when the VM is forked. For user-space, even a single bit would do. We added MADVISE_WIPEONFORK so that userspace libraries can detect fork()/clone() robustly, for the same reasons. It just wipes a page as the indicator, which is effectively a single-bit signal, and it works well. On the user-space side of this, I’m keen to find a solution like that that we can use fairly easily inside of portable libraries and applications. The “have I forked” checks do end up in hot paths, so it’s nice if they can be CPU cache friendly. Comparing a whole 128-bit value wouldn’t be my favorite. And actually, since the value is a cryptographically random 128-bit value, I think that we should definitely use it to help reseed the kernel's RNG, and keep it secret from userspace. That way, even if the VM image is public, we can ensure that going forward, the kernel RNG will return securely random data. If the image is public, you need some extra new raw entropy from somewhere. The gen-id could be mixed in, that can’t do any harm as long as rigorous cryptographic mixing with the prior state is used, but if that’s all you do then the final state is still deterministic and non-secret. The kernel would need to use the change as a trigger to measure some entropy (e.g. interrupts and RDRAND, or whatever). Our just define the machine contract as “this has to be unique random data and if it’s not unique, or if it’s pubic, you’re toast”. - Colm
Re: [PATCH] drivers/virt: vmgenid: add vm generation id driver
On 16 Oct 2020, at 22:01, Jann Horn wrote: On Sat, Oct 17, 2020 at 6:34 AM Colm MacCarthaigh wrote: For user-space, even a single bit would do. We added MADVISE_WIPEONFORK so that userspace libraries can detect fork()/clone() robustly, for the same reasons. It just wipes a page as the indicator, which is effectively a single-bit signal, and it works well. On the user-space side of this, I’m keen to find a solution like that that we can use fairly easily inside of portable libraries and applications. The “have I forked” checks do end up in hot paths, so it’s nice if they can be CPU cache friendly. Comparing a whole 128-bit value wouldn’t be my favorite. I'm pretty sure a single bit is not enough if you want to have a single page, shared across the entire system, that stores the VM forking state; you need a counter for that. You’re right. WIPEONFORK is more like a single-bit per use. If it’s something system wide then a counter is better. So the RNG state after mixing in the new VM Generation ID would contain 128 bits of secret entropy not known to anyone else, including people with access to the VM image. Now, 128 bits of cryptographically random data aren't _optimal_; I think something on the order of 256 bits would be nicer from a theoretical standpoint. But in practice I think we'll be good with the 128 bits we're getting (since the number of users who fork a VM image is probably not going to be so large that worst-case collision probabilities matter). This reminds me on key/IV usage limits for AES encryption, where the same birthday bounds apply, and even though 256-bits would be better, we routinely make 128-bit birthday bounds work for massively scalable systems. The kernel would need to use the change as a trigger to measure some entropy (e.g. interrupts and RDRAND, or whatever). Our just define the machine contract as “this has to be unique random data and if it’s not unique, or if it’s pubic, you’re toast”. As far as I can tell from Microsoft's spec, that is a guarantee we're already getting. Neat. - Colm