Re: lack of boot-time entropy on arm64 ec2 instances
On Fri, Jan 17, 2020 at 02:32:22PM -0500, Noah Meyerhans wrote: > On Thu, Jan 09, 2020 at 05:22:17PM -0500, Noah Meyerhans wrote: > > I've confirmed that 4.19.87 with changes cherry-picked from 50ee7529ec45 > > claims to have entropy at boot: > > > > admin@ip-172-31-49-239:~$ cloud-init analyze blame > > -- Boot Record 01 -- > > 02.88900s (init-network/config-ssh) > > ... > > > > The change applies cleanly to our kernel tree, so this would appear to > > be a possible solution. > > > > I've opened https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=948519 > > against the kernel discuss the entropy issue in general, and will follow > > up there with a proposal for getting this change backported. > > The kernel team would prefer that any backport of 50ee7529ec45 to stable > branches happen upstream, which is sensible. I'll follow up with the > stable kernel maintainers to see about making this happen, if they're > willing. > > In the mean time, regardless of where the backport happens, there's no > possibility of getting this kernel change into 10.3. So, I'd like to > revisit my original proposal of adding haveged to the arm64 EC2 image > configuration. Haveged is used in debian-installer for buster (but not > bullseye+, see below), so there is precident for its use within Debian. > IMO, this is the best option available in the short term. It results in > a far better user experience on the instances in question, and is a > fairly unintrusive change to make. > > Background on haveged in d-i: > Haveged was added to d-i in commit c47000192 ("Add haveged-udeb [linux] > to the pkg-lists/base") in response to bug #923675 and is used in > buster. More recently, with the addition of the in-kernel entropy > collection mechanisms we've been discussing here, the removal of haveged > has been proposed for bullseye. > https://lists.debian.org/debian-boot/2019/11/msg00077.html It has not > yet been removed, though. > > Similarly, I would expect that we would remove haveged from the > generated buster images once the kernel's entropy jitter-entropy > collector is available for buster. Thank you for the legwork on this. I agree that haveged is the way to proceed at this point. -- Luca Filipozzi
Re: lack of boot-time entropy on arm64 ec2 instances
On Thu, Jan 09, 2020 at 05:22:17PM -0500, Noah Meyerhans wrote: > I've confirmed that 4.19.87 with changes cherry-picked from 50ee7529ec45 > claims to have entropy at boot: > > admin@ip-172-31-49-239:~$ cloud-init analyze blame > -- Boot Record 01 -- > 02.88900s (init-network/config-ssh) > ... > > The change applies cleanly to our kernel tree, so this would appear to > be a possible solution. > > I've opened https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=948519 > against the kernel discuss the entropy issue in general, and will follow > up there with a proposal for getting this change backported. The kernel team would prefer that any backport of 50ee7529ec45 to stable branches happen upstream, which is sensible. I'll follow up with the stable kernel maintainers to see about making this happen, if they're willing. In the mean time, regardless of where the backport happens, there's no possibility of getting this kernel change into 10.3. So, I'd like to revisit my original proposal of adding haveged to the arm64 EC2 image configuration. Haveged is used in debian-installer for buster (but not bullseye+, see below), so there is precident for its use within Debian. IMO, this is the best option available in the short term. It results in a far better user experience on the instances in question, and is a fairly unintrusive change to make. Background on haveged in d-i: Haveged was added to d-i in commit c47000192 ("Add haveged-udeb [linux] to the pkg-lists/base") in response to bug #923675 and is used in buster. More recently, with the addition of the in-kernel entropy collection mechanisms we've been discussing here, the removal of haveged has been proposed for bullseye. https://lists.debian.org/debian-boot/2019/11/msg00077.html It has not yet been removed, though. Similarly, I would expect that we would remove haveged from the generated buster images once the kernel's entropy jitter-entropy collector is available for buster. noah
Re: lack of boot-time entropy on arm64 ec2 instances
On Tue, Jan 14, 2020 at 03:01:23PM +, Luca Filipozzi wrote: > > If we want to extend the cloud kernel to support other services, we need > > to do more than just enable virtio-rng. Somebody need to come up with a > > complete list of devices that are needed for the service in question, > > and work with the kernel team ensure that support for all of them is > > enabled in the cloud kernel. > > Folks working on the CCP, etc.: is it of interest to you to use the same > cloud kernel? Does this improve our users' experience to have the same > kernel across the different providers? At present the cloud kernel's only optimizations consiѕt of disabling device drivers that are highly unlikely to be seen in a cloud environment. So the user experience is the same, except for the larger /lib/modules/$(uname -r) directory and the larger initramfs image. The size of the initramfs does, of course, contribute to boot latency by taking longer to uncompress, but I haven't quantified the difference yet. So for now, the cloud flavour is the conservative choice, in that we know it will work and the drawbacks of using it are fairly minor. There is talk of making some additional changes. Bug #947759 contains a decent summary of things that are being considered. There is also #941284, but my inclination is to not implement that suggestion. In any case, we'll need to consider the impact of any proposed changes on the user experience in the supported clouds on an ongoing basis. In an ideal world, we might be able to provide distinct flavours for each cloud, since e.g. it makes no sense to enable the Amazon ENA ethernet driver on kernels targeting environmnents other than AWS, but that would require more resources for diminishing returns. noah
Re: lack of boot-time entropy on arm64 ec2 instances
On Fri, Jan 10, 2020 at 01:33:12PM -0500, Noah Meyerhans wrote: > On Fri, Jan 10, 2020 at 03:52:53AM +, Luca Filipozzi wrote: > > Two questions (pretend i'm 6yo): > > > > (1) why can't AWS offer virtio-rng support (other than "we already offer > > a RDRAND on amd64") and should Debian actively encourage their adding > > this support? > > We can certainly ask. However, it is very clear that EC2 is well aware > of the existence of virtio-rng (just look at who wrote the QEMU > virtio-rng implementation, for example), so, without wanting to > speculate too much, I'm going to guess that the decision to not offer it > is an intentional one, rather than an oversight. If I learn more, and > the organization is willing to share it publicly, I'll pass it along. Thanks! It'd be very interesting to know the reasonsing. > > (2) what prevents our image having virtio-rng support (if it doesn't > > already)? > > The cloud kernel flavour currently only targets AWS and Azure, because > people have put effort into making it support those services. The > images that we generate for those services use that kernel. The images > that we generate for other cloud services use the standard kernel, which > does have virtio-rng support. > > If we want to extend the cloud kernel to support other services, we need > to do more than just enable virtio-rng. Somebody need to come up with a > complete list of devices that are needed for the service in question, > and work with the kernel team ensure that support for all of them is > enabled in the cloud kernel. Folks working on the CCP, etc.: is it of interest to you to use the same cloud kernel? Does this improve our users' experience to have the same kernel across the different providers? -- Luca Filipozzi
Re: lack of boot-time entropy on arm64 ec2 instances
On Fri, Jan 10, 2020 at 03:52:53AM +, Luca Filipozzi wrote: > Two questions (pretend i'm 6yo): > > (1) why can't AWS offer virtio-rng support (other than "we already offer > a RDRAND on amd64") and should Debian actively encourage their adding > this support? We can certainly ask. However, it is very clear that EC2 is well aware of the existence of virtio-rng (just look at who wrote the QEMU virtio-rng implementation, for example), so, without wanting to speculate too much, I'm going to guess that the decision to not offer it is an intentional one, rather than an oversight. If I learn more, and the organization is willing to share it publicly, I'll pass it along. > (2) what prevents our image having virtio-rng support (if it doesn't > already)? The cloud kernel flavour currently only targets AWS and Azure, because people have put effort into making it support those services. The images that we generate for those services use that kernel. The images that we generate for other cloud services use the standard kernel, which does have virtio-rng support. If we want to extend the cloud kernel to support other services, we need to do more than just enable virtio-rng. Somebody need to come up with a complete list of devices that are needed for the service in question, and work with the kernel team ensure that support for all of them is enabled in the cloud kernel. noah
Re: lack of boot-time entropy on arm64 ec2 instances
On Thu, Jan 09, 2020 at 04:56:58PM -0500, Theodore Y. Ts'o wrote: > On Thu, Jan 09, 2020 at 07:15:20PM +, Jeremy Stanley wrote: > > On 2020-01-09 13:18:24 +0100 (+0100), Adam Dobrawy wrote: > > [...] > > > I wonder if the correct criterion for the cloud image is > > > compatibility with AWS and GCP only. I suppose a large number of > > > deployment are based on private cloud environments (OpenStack > > > etc.). > > [...] > > > > Setting aside for the moment that there are plenty of > > OpenStack-based public cloud providers too (at last count, far more > > than there are proprietary cloud providers because, hey, free > > software!), the vast majority of OpenStack deployments rely on KVM > > for their hypervisor layer which has had VirtIO-RNG since ages. > > Works just fine for OpenStack as long as the administrator turns it > > on. > > More to the point, in response to customer demand, a lot of enterprise > customers have demanded, and most/all of the cloud companies have > responded to that demand, product offerrings which support hybrid > cloud approaches. And it's very likely that those on-prem VM's will > be using KVM as their hypervisor. > > That aside, if the cloud image is supposed to be compatible with GCP, > then that would be a good enough reason on its own to support > virtio-rng, since GCP supports virtio-rng today. Two questions (pretend i'm 6yo): (1) why can't AWS offer virtio-rng support (other than "we already offer a RDRAND on amd64") and should Debian actively encourage their adding this support? (2) what prevents our image having virtio-rng support (if it doesn't already)? -- Luca Filipozzi
Re: lack of boot-time entropy on arm64 ec2 instances
On Thu, Jan 09, 2020 at 01:22:30PM -0500, Noah Meyerhans wrote: > Our 5.4 kernel in sid does not suffer from a lack of entropy at boot on > arm64 EC2 instances. I guess it could be due to the "random: try to > actively add entropy rather than passively wait for it" that tytso > mentioned earlier. I'm going to try to cherry-pick that into 4.19 and > see if things speed up. Since we're already running it in 5.4, I guess > it's safe... I've confirmed that 4.19.87 with changes cherry-picked from 50ee7529ec45 claims to have entropy at boot: admin@ip-172-31-49-239:~$ cloud-init analyze blame -- Boot Record 01 -- 02.88900s (init-network/config-ssh) ... The change applies cleanly to our kernel tree, so this would appear to be a possible solution. I've opened https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=948519 against the kernel discuss the entropy issue in general, and will follow up there with a proposal for getting this change backported. noah
Re: lack of boot-time entropy on arm64 ec2 instances
On Thu, Jan 09, 2020 at 07:15:20PM +, Jeremy Stanley wrote: > On 2020-01-09 13:18:24 +0100 (+0100), Adam Dobrawy wrote: > [...] > > I wonder if the correct criterion for the cloud image is > > compatibility with AWS and GCP only. I suppose a large number of > > deployment are based on private cloud environments (OpenStack > > etc.). > [...] > > Setting aside for the moment that there are plenty of > OpenStack-based public cloud providers too (at last count, far more > than there are proprietary cloud providers because, hey, free > software!), the vast majority of OpenStack deployments rely on KVM > for their hypervisor layer which has had VirtIO-RNG since ages. > Works just fine for OpenStack as long as the administrator turns it > on. More to the point, in response to customer demand, a lot of enterprise customers have demanded, and most/all of the cloud companies have responded to that demand, product offerrings which support hybrid cloud approaches. And it's very likely that those on-prem VM's will be using KVM as their hypervisor. That aside, if the cloud image is supposed to be compatible with GCP, then that would be a good enough reason on its own to support virtio-rng, since GCP supports virtio-rng today. - Ted
Re: lack of boot-time entropy on arm64 ec2 instances
On 2020-01-09 13:18:24 +0100 (+0100), Adam Dobrawy wrote: [...] > I wonder if the correct criterion for the cloud image is > compatibility with AWS and GCP only. I suppose a large number of > deployment are based on private cloud environments (OpenStack > etc.). [...] Setting aside for the moment that there are plenty of OpenStack-based public cloud providers too (at last count, far more than there are proprietary cloud providers because, hey, free software!), the vast majority of OpenStack deployments rely on KVM for their hypervisor layer which has had VirtIO-RNG since ages. Works just fine for OpenStack as long as the administrator turns it on. https://docs.openstack.org/security-guide/instance-management/security-services-for-instances.html#entropy-to-instances > -- Jeremy Stanley signature.asc Description: PGP signature
Re: lack of boot-time entropy on arm64 ec2 instances
On Thu, Jan 09, 2020 at 04:57:24PM +, Luca Filipozzi wrote: > > > >> I'd encourage those of you who are in position to make Amazon listen > > > >> to get with the program and support virtio-rng. :-) > > > > Noah: chances of AWS supporting virtio-rng? > > > I wonder if the correct criterion for the cloud image is compatibility > > > with AWS and GCP only. I suppose a large number of deployment are based > > > on private cloud environments (OpenStack etc.). In addition to AWS and > > > GCP, there is also Azure, which is based on Hyper-V, which has a low > > > chance of getting support for virtio-rng for obvious reasons. > > > > The cloud kernel flavour currently targets AWS and Azure only. Hence > > the lack of support for virtio-rng. > > How is entropy starvation at boot solved for x86-64 in AWS / Azure? RDRAND is available on amd64, and contributes early entropy. Our 5.4 kernel in sid does not suffer from a lack of entropy at boot on arm64 EC2 instances. I guess it could be due to the "random: try to actively add entropy rather than passively wait for it" that tytso mentioned earlier. I'm going to try to cherry-pick that into 4.19 and see if things speed up. Since we're already running it in 5.4, I guess it's safe... noah
Re: lack of boot-time entropy on arm64 ec2 instances
On Thu, Jan 09, 2020 at 07:54:14AM -0500, Noah Meyerhans wrote: > On Thu, Jan 09, 2020 at 01:18:24PM +0100, Adam Dobrawy wrote: > > >> I'd encourage those of you who are in position to make Amazon listen > > >> to get with the program and support virtio-rng. :-) > > > Noah: chances of AWS supporting virtio-rng? > > I wonder if the correct criterion for the cloud image is compatibility > > with AWS and GCP only. I suppose a large number of deployment are based > > on private cloud environments (OpenStack etc.). In addition to AWS and > > GCP, there is also Azure, which is based on Hyper-V, which has a low > > chance of getting support for virtio-rng for obvious reasons. > > The cloud kernel flavour currently targets AWS and Azure only. Hence > the lack of support for virtio-rng. How is entropy starvation at boot solved for x86-64 in AWS / Azure? -- Luca Filipozzi
Re: lack of boot-time entropy on arm64 ec2 instances
On Thu, Jan 09, 2020 at 01:18:24PM +0100, Adam Dobrawy wrote: > >> I'd encourage those of you who are in position to make Amazon listen > >> to get with the program and support virtio-rng. :-) > > Noah: chances of AWS supporting virtio-rng? > I wonder if the correct criterion for the cloud image is compatibility > with AWS and GCP only. I suppose a large number of deployment are based > on private cloud environments (OpenStack etc.). In addition to AWS and > GCP, there is also Azure, which is based on Hyper-V, which has a low > chance of getting support for virtio-rng for obvious reasons. The cloud kernel flavour currently targets AWS and Azure only. Hence the lack of support for virtio-rng. noah
Re: lack of boot-time entropy on arm64 ec2 instances
W dniu 09.01.2020 o 06:47, Luca Filipozzi pisze: >> I'd encourage those of you who are in position to make Amazon listen >> to get with the program and support virtio-rng. :-) > Noah: chances of AWS supporting virtio-rng? I wonder if the correct criterion for the cloud image is compatibility with AWS and GCP only. I suppose a large number of deployment are based on private cloud environments (OpenStack etc.). In addition to AWS and GCP, there is also Azure, which is based on Hyper-V, which has a low chance of getting support for virtio-rng for obvious reasons.
Re: lack of boot-time entropy on arm64 ec2 instances
On Wed, Jan 08, 2020 at 11:25:34PM -0500, Theodore Y. Ts'o wrote: > On Thu, Jan 09, 2020 at 01:11:41AM +, Luca Filipozzi wrote: > > > > (It's not like RNG quaility is a new problem... why didn't > > virtualization approaches include host-to-guest RNG passthrough from the > > beginning?) > > Virtio-rng has been around since 2008 (over a decade), and it provides > specifically the host-to-guest RNG passthrough that you've mentioned. > Qemu supports it, as does GCE. I'm a little surprised to find out > that AWS doesn't support virtio-rng; I thought it did, but I just ran > a quick experiment, and it appears I was wrong. Thank you for the very informative reply. I really appreciate it. > I'd encourage those of you who are in position to make Amazon listen > to get with the program and support virtio-rng. :-) Noah: chances of AWS supporting virtio-rng? -- Luca Filipozzi
Re: lack of boot-time entropy on arm64 ec2 instances
On Thu, Jan 09, 2020 at 01:11:41AM +, Luca Filipozzi wrote: > > (It's not like RNG quaility is a new problem... why didn't > virtualization approaches include host-to-guest RNG passthrough from the > beginning?) Virtio-rng has been around since 2008 (over a decade), and it provides specifically the host-to-guest RNG passthrough that you've mentioned. Qemu supports it, as does GCE. I'm a little surprised to find out that AWS doesn't support virtio-rng; I thought it did, but I just ran a quick experiment, and it appears I was wrong. The Debian cloud kernel doesn't appear to enable CONFIG_HW_RANDOM or CONFIG_HW_RANDOM_VIRTIO --- boo, hiss --- but the Ubuntu kernel does, and so I booted an AWS VM with Ubuntu. I tried loading the virtio-rng module, and it didn't show up in /sys/class/misc/hw_random/rng_available. What you will find on GCE VM if you have a Linux kernel configured correctly to support virtio-rng. root@xfstests:~# dmidecode -s system-product-name Google Compute Engine root@xfstests:~# cat /sys/class/misc/hw_random/rng_available virtio_rng.0 tpm-rng-0 root@xfstests:~# cat /sys/class/misc/hw_random/rng_current virtio_rng.0 With newer kernels, virtio-rng will automatically be used to initialize the CRNG, as well as provide continuous entropy to /dev/random, for those people, or companies, or Payment Card Industry (PCI) compliance labs, who have some irrational need for "True Randomness" (whatever the hell that means). Now, I happen to work at Google (in fact, I was one of the people who pushed for virtio-rng in GCE), so the argument can be made that I'm being biased, but QEMU's support of virtio-rng support long predates GCE's support of virtio-rng by many, many years. I'd encourage those of you who are in position to make Amazon listen to get with the program and support virtio-rng. :-) - Ted P.S. The above experiment in GCE was done using kernel built using a defconfig for 5.4+ kernels (copy to .config and run "make olddefconfig). For kernels between 4.19 and 5.3 inclusive, use [2]. These kernel configs are minimal configs optimized for file system testing using gce-xfstests[3] and kvm-xfstests, but some folks might find it useful. The kvm-xfstests framework is also useful for testing kernel configs for randomness. (Compare "kvm-xfstests shell" with and without "--no-virtio-rng".) [1] https://github.com/tytso/xfstests-bld/blob/master/kernel-configs/x86_64-config-4.19 [2] https://github.com/tytso/xfstests-bld/blob/master/kernel-configs/x86_64-config-5.4 [3] https://thunk.org/gce-xfstests
Re: lack of boot-time entropy on arm64 ec2 instances
On Wed, Jan 08, 2020 at 07:18:33PM -0500, Theodore Y. Ts'o wrote: > Another approach would be to cherry pick 50ee7529ec45 ("random: try to > actively add entropy rather than passively wait for it"). I'm pretty > confident that it's probably fine ("it's fine. it's fine. Really, > it's fine") for x86. In particular, at least x86 has RDRAND, so even > if it's utterly predictable to someone who has detailed information > about the CPU's microarchitecture, it probably won't be a diaster. Of course, another possibility would be to use the 5.4 kernel from buster-backports, once it's uploaded, since it'll contain 50ee7529ec45 already. I can confirm that ssh host key generation under Linux 5.4 does not block for lack of entropy. We'll also at that point have the option of using the cloud kernel flavour, when that's available. I don't really like the idea of using something that doesn't get support from the security team, and I'd probably want to switch to the buster-backports kernel for amd64 as well, if we were to do this. It's not what I prefer, but it is an option worth mentioning. noah
Re: lack of boot-time entropy on arm64 ec2 instances
On Thu, Jan 09, 2020 at 12:41:28AM +, Luca Filipozzi wrote: > On Wed, Jan 08, 2020 at 04:29:35PM -0500, Noah Meyerhans wrote: > > If the kernel team is supportive of the > > EFI_RNG+CONFIG_RANDOM_TRUST_BOOTLOADER approach, would folks be in > > favor of enabling haveged temporarily, until kernel support is > > available, or is it better to avoid it completely? > > I prefer passing through hrng but would find haveged acceptable. Other > distros ship with haveged enabled for the same reason as we are debating > here. That said, the concern is the quality of the entropy since it will be used for the generation of long-lived ssh host keys. I use terraform to instantiate instances and a I precompute ssh host keys (RSA only but I could do the others, I suppose) and install them with cloud-init. I did this primarily so that I could generate a known_hosts files that contains the public keys of the instances and thereby avoid ssh unknown host warnings. I suppose there's this added benefit that the quality of the ssh host key is not in question since it's using the entropy of my management machine (where I'm not using haveged). (It's not like RNG quaility is a new problem... why didn't virtualization approaches include host-to-guest RNG passthrough from the beginning?) -- Luca Filipozzi
Re: lack of boot-time entropy on arm64 ec2 instances
On Wed, Jan 08, 2020 at 04:29:35PM -0500, Noah Meyerhans wrote: > If the kernel team is supportive of the > EFI_RNG+CONFIG_RANDOM_TRUST_BOOTLOADER approach, would folks be in > favor of enabling haveged temporarily, until kernel support is > available, or is it better to avoid it completely? I prefer passing through hrng but would find haveged acceptable. Other distros ship with haveged enabled for the same reason as we are debating here. Ted provides another viewpoint in a separate reply to this thread that also merits consideration. -- Luca Filipozzi
Re: lack of boot-time entropy on arm64 ec2 instances
On Wed, Jan 08, 2020 at 07:18:33PM -0500, Theodore Y. Ts'o wrote: > I was under the impression that Amazon provided virtio-rng support for > its VM's? Or does that not apply for their arm64 Vm's? If they do > support virtio-rng, it might just be an issue of building the cloud > kernel with that option enabled. RDRAND is used for amd64, via the RANDOM_TRUST_CPU kernel config option. That is not available for arm64. The rough equivalent there is apparently RANDOM_TRUST_BOOTLOADER, which uses the EFI_RNG protocol. It's only available in Linux 5.4 at the moment, and not currently supported on EC2. It seems like we should consider backporting this. > Another approach would be to cherry pick 50ee7529ec45 ("random: try to > actively add entropy rather than passively wait for it"). I'm pretty > confident that it's probably fine ("it's fine. it's fine. Really, > it's fine") for x86. In particular, at least x86 has RDRAND, so even > if it's utterly predictable to someone who has detailed information > about the CPU's microarchitecture, it probably won't be a diaster. Thanks, this is worth looking at, at least in the absense of RANDOM_TRUST_BOOTLOADER. > Upstream, it's enabled for all architectures, because Linus thinks > hanging at boot is a worse problem than a insufficiently initialized > CRNG. I'm not at all convinced that it's safe for all ARM and RISC-V > CPU's. On the other hand, I don't think it's going to be any worse > that haveged (which I don't really trust on all architectures either), > and it has the advantage of not requiring any additional userspace > packages. ...Although this really isn't a ringing endorsement. :( noah
Re: lack of boot-time entropy on arm64 ec2 instances
On Wed, Jan 08, 2020 at 02:48:13PM -0500, Noah Meyerhans wrote: > The buster arm64 images on Amazon EC2 appear to have insufficient > entropy at boot, and thus take several minutes to complete the boot > process. > > There are a couple of potential fixes (or at least workarounds) for this > problem, but none is entirely perfect. > > ... > > I'm not aware of any other options. Given the above, it seems that > haveged is the only really feasible choice right now. Does anyone > disagree with that assessment? Are there options I've missed? I was under the impression that Amazon provided virtio-rng support for its VM's? Or does that not apply for their arm64 Vm's? If they do support virtio-rng, it might just be an issue of building the cloud kernel with that option enabled. Another approach would be to cherry pick 50ee7529ec45 ("random: try to actively add entropy rather than passively wait for it"). I'm pretty confident that it's probably fine ("it's fine. it's fine. Really, it's fine") for x86. In particular, at least x86 has RDRAND, so even if it's utterly predictable to someone who has detailed information about the CPU's microarchitecture, it probably won't be a diaster. Upstream, it's enabled for all architectures, because Linus thinks hanging at boot is a worse problem than a insufficiently initialized CRNG. I'm not at all convinced that it's safe for all ARM and RISC-V CPU's. On the other hand, I don't think it's going to be any worse that haveged (which I don't really trust on all architectures either), and it has the advantage of not requiring any additional userspace packages. - Ted
Re: lack of boot-time entropy on arm64 ec2 instances
JFTR https://daniel-lange.com/archives/152-Openssh-taking-minutes-to-become-available,-booting-takes-half-an-hour-...-because-your-server-waits-for-a-few-bytes-of-randomness.html -- regards Thomas
Re: lack of boot-time entropy on arm64 ec2 instances
> On Wed, 8 Jan 2020 16:40:33 -0500, Noah Meyerhans said: > To be clear, the problem isn't a failure to boot, but rather a several > minute pause during boot. For me such a delay is kind of failure. And as Daniel wrote in his blog "unverified entropy is better than no entropy." -- regards Thomas
Re: lack of boot-time entropy on arm64 ec2 instances
On Wed, Jan 08, 2020 at 09:24:25PM +, Jeremy Stanley wrote: > > I've seen reactions like this, but never an explanation. Has anyone > > written up the issues? Given that "fail to boot" isn't a workable > > outcome, it'd be useful to know exactly what risks one accepts when > > using haveged. > > While you're at it, defining "fail to boot" would be nice. Just > because sshd won't start, it doesn't necessarily mean the machine > isn't "booted" in some sense, only that maybe you can't log into it > (substitute httpd and inability to browse the Web sites served from > it, or whatever you prefer). To be clear, the problem isn't a failure to boot, but rather a several minute pause during boot. In the default images, the pause occurs during ssh host key generation, but it's possible that other services would be impacted in actual production scenarios, particularly since user-provided cloud-config would not be processed until after the config-ssh module completes. For reference, here's the "systemd-analyze blame" and "cloud-init analyze blame" output showing the delay: admin@ip-10-0-1-42:~$ systemd-analyze blame 2min 27.763s cloud-init.service 26.080s cloud-final.service 2.774s networking.service 2.065s cloud-init-local.service 1.554s cloud-config.service ... admin@ip-10-0-1-42:~$ cloud-init analyze blame -- Boot Record 01 -- 25.26800s (modules-final/config-scripts-user) 145.79700s (init-network/config-ssh) 00.62600s (modules-config/config-grub-dpkg) 00.49900s (init-local/search-Ec2Local)
Re: lack of boot-time entropy on arm64 ec2 instances
On Wed, Jan 08, 2020 at 12:50:04PM -0800, Ross Vandegrift wrote: > I know of two other options: > - pollinate > - jitterentropy-rngd > > pollinate downloads seeds remotely, which feels wrong - and itself may > require random numbers. I've never tried jitterentropy. IMO these are roughly equivalent to haveged, in that they're userspace accumulators of entropy that try to seed the kernel. I think I prefer haveged's approach, but I'm really not qualified to judge.
Re: lack of boot-time entropy on arm64 ec2 instances
On Wed, Jan 08, 2020 at 08:17:13PM +, Luca Filipozzi wrote: > Every time I propose the use of haveged to resolve entropy starvation, I > get reactions from crypto folks saying that it's not a valid solution. > They invariably suggest that passing hardware RNG through to the VM is > the appropriate choice. > > The latest such reaction being from mjg59. See: > https://twitter.com/mjg59/status/1181423056268349441 > https://twitter.com/LucaFilipozzi/status/1181426253636755457 Yeah, this is my understanding as well. But it's not like the haveged developers are clueless, either, and there's a decent amount of research behind their approach. I can't pretend to understand the details of it, though. Even if passing entropy from the host to the VM is the right approach, it's not something we can take advantage of today, due to lack of support both within EC2 and within Debian. I'll follow up with the kernel team to gauge their level of support for enabling CONFIG_RANDOM_TRUST_BOOTLOADER (and backporting it to buster). If the kernel team is supportive of the EFI_RNG+CONFIG_RANDOM_TRUST_BOOTLOADER approach, would folks be in favor of enabling haveged temporarily, until kernel support is available, or is it better to avoid it completely? noah
Re: lack of boot-time entropy on arm64 ec2 instances
On 2020-01-08 13:04:42 -0800 (-0800), Ross Vandegrift wrote: > On Wed, Jan 08, 2020 at 08:17:13PM +, Luca Filipozzi wrote: > > On Wed, Jan 08, 2020 at 02:48:13PM -0500, Noah Meyerhans wrote: > > > We add haveged to the arm64 EC2 AMI. This appears to work, and is > > > something we can do today. The debian-installer has previously used > > > haveged to ensure reasonable entropy during installation, so there is > > > some precident for this. > > > > Every time I propose the use of haveged to resolve entropy starvation, I > > get reactions from crypto folks saying that it's not a valid solution. > > They invariably suggest that passing hardware RNG through to the VM is > > the appropriate choice. > > > > The latest such reaction being from mjg59. See: > > https://twitter.com/mjg59/status/1181423056268349441 > > https://twitter.com/LucaFilipozzi/status/1181426253636755457 > > I've seen reactions like this, but never an explanation. Has anyone > written up the issues? Given that "fail to boot" isn't a workable > outcome, it'd be useful to know exactly what risks one accepts when > using haveged. While you're at it, defining "fail to boot" would be nice. Just because sshd won't start, it doesn't necessarily mean the machine isn't "booted" in some sense, only that maybe you can't log into it (substitute httpd and inability to browse the Web sites served from it, or whatever you prefer). -- Jeremy Stanley signature.asc Description: PGP signature
Re: lack of boot-time entropy on arm64 ec2 instances
On Wed, Jan 08, 2020 at 08:17:13PM +, Luca Filipozzi wrote: > On Wed, Jan 08, 2020 at 02:48:13PM -0500, Noah Meyerhans wrote: > > We add haveged to the arm64 EC2 AMI. This appears to work, and is > > something we can do today. The debian-installer has previously used > > haveged to ensure reasonable entropy during installation, so there is > > some precident for this. > > Every time I propose the use of haveged to resolve entropy starvation, I > get reactions from crypto folks saying that it's not a valid solution. > They invariably suggest that passing hardware RNG through to the VM is > the appropriate choice. > > The latest such reaction being from mjg59. See: > https://twitter.com/mjg59/status/1181423056268349441 > https://twitter.com/LucaFilipozzi/status/1181426253636755457 I've seen reactions like this, but never an explanation. Has anyone written up the issues? Given that "fail to boot" isn't a workable outcome, it'd be useful to know exactly what risks one accepts when using haveged. Ross
Re: lack of boot-time entropy on arm64 ec2 instances
On Wed, Jan 08, 2020 at 02:48:13PM -0500, Noah Meyerhans wrote: > Option 1: > > We add haveged to the arm64 EC2 AMI. This appears to work, and is > something we can do today. The debian-installer has previously used > haveged to ensure reasonable entropy during installation, so there is > some precident for this. > > Option 2: > > There is a mechanism by which the VM host can pass entropy to the guest > at boot time using the EFI_RNG protocol. This won't require any > additional software in our images, but it has a couple of other notable > drawbacks: [snip] > I'm not aware of any other options. Given the above, it seems that > haveged is the only really feasible choice right now. Does anyone > disagree with that assessment? Are there options I've missed? I know of two other options: - pollinate - jitterentropy-rngd pollinate downloads seeds remotely, which feels wrong - and itself may require random numbers. I've never tried jitterentropy. Ross
Re: lack of boot-time entropy on arm64 ec2 instances
On Wed, Jan 08, 2020 at 02:48:13PM -0500, Noah Meyerhans wrote: > We add haveged to the arm64 EC2 AMI. This appears to work, and is > something we can do today. The debian-installer has previously used > haveged to ensure reasonable entropy during installation, so there is > some precident for this. Every time I propose the use of haveged to resolve entropy starvation, I get reactions from crypto folks saying that it's not a valid solution. They invariably suggest that passing hardware RNG through to the VM is the appropriate choice. The latest such reaction being from mjg59. See: https://twitter.com/mjg59/status/1181423056268349441 https://twitter.com/LucaFilipozzi/status/1181426253636755457 -- Luca Filipozzi