Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-17 Thread Luca Filipozzi
On Fri, Jan 17, 2020 at 02:32:22PM -0500, Noah Meyerhans wrote:
> On Thu, Jan 09, 2020 at 05:22:17PM -0500, Noah Meyerhans wrote:
> > I've confirmed that 4.19.87 with changes cherry-picked from 50ee7529ec45
> > claims to have entropy at boot:
> > 
> > admin@ip-172-31-49-239:~$ cloud-init analyze blame
> > -- Boot Record 01 --
> >  02.88900s (init-network/config-ssh)
> >  ...
> > 
> > The change applies cleanly to our kernel tree, so this would appear to
> > be a possible solution.
> > 
> > I've opened https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=948519
> > against the kernel discuss the entropy issue in general, and will follow
> > up there with a proposal for getting this change backported.
> 
> The kernel team would prefer that any backport of 50ee7529ec45 to stable
> branches happen upstream, which is sensible.  I'll follow up with the
> stable kernel maintainers to see about making this happen, if they're
> willing.
> 
> In the mean time, regardless of where the backport happens, there's no
> possibility of getting this kernel change into 10.3.  So, I'd like to
> revisit my original proposal of adding haveged to the arm64 EC2 image
> configuration.  Haveged is used in debian-installer for buster (but not
> bullseye+, see below), so there is precident for its use within Debian.
> IMO, this is the best option available in the short term.  It results in
> a far better user experience on the instances in question, and is a
> fairly unintrusive change to make.
> 
> Background on haveged in d-i:
> Haveged was added to d-i in commit c47000192 ("Add haveged-udeb [linux]
> to the pkg-lists/base") in response to bug #923675 and is used in
> buster.  More recently, with the addition of the in-kernel entropy
> collection mechanisms we've been discussing here, the removal of haveged
> has been proposed for bullseye.
> https://lists.debian.org/debian-boot/2019/11/msg00077.html  It has not
> yet been removed, though.
> 
> Similarly, I would expect that we would remove haveged from the
> generated buster images once the kernel's entropy jitter-entropy
> collector is available for buster.

Thank you for the legwork on this. I agree that haveged is the way to
proceed at this point.

-- 
Luca Filipozzi



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-17 Thread Noah Meyerhans
On Thu, Jan 09, 2020 at 05:22:17PM -0500, Noah Meyerhans wrote:
> I've confirmed that 4.19.87 with changes cherry-picked from 50ee7529ec45
> claims to have entropy at boot:
> 
> admin@ip-172-31-49-239:~$ cloud-init analyze blame
> -- Boot Record 01 --
>  02.88900s (init-network/config-ssh)
>  ...
> 
> The change applies cleanly to our kernel tree, so this would appear to
> be a possible solution.
> 
> I've opened https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=948519
> against the kernel discuss the entropy issue in general, and will follow
> up there with a proposal for getting this change backported.

The kernel team would prefer that any backport of 50ee7529ec45 to stable
branches happen upstream, which is sensible.  I'll follow up with the
stable kernel maintainers to see about making this happen, if they're
willing.

In the mean time, regardless of where the backport happens, there's no
possibility of getting this kernel change into 10.3.  So, I'd like to
revisit my original proposal of adding haveged to the arm64 EC2 image
configuration.  Haveged is used in debian-installer for buster (but not
bullseye+, see below), so there is precident for its use within Debian.
IMO, this is the best option available in the short term.  It results in
a far better user experience on the instances in question, and is a
fairly unintrusive change to make.

Background on haveged in d-i:
Haveged was added to d-i in commit c47000192 ("Add haveged-udeb [linux]
to the pkg-lists/base") in response to bug #923675 and is used in
buster.  More recently, with the addition of the in-kernel entropy
collection mechanisms we've been discussing here, the removal of haveged
has been proposed for bullseye.
https://lists.debian.org/debian-boot/2019/11/msg00077.html  It has not
yet been removed, though.

Similarly, I would expect that we would remove haveged from the
generated buster images once the kernel's entropy jitter-entropy
collector is available for buster.

noah



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-14 Thread Noah Meyerhans
On Tue, Jan 14, 2020 at 03:01:23PM +, Luca Filipozzi wrote:
> > If we want to extend the cloud kernel to support other services, we need
> > to do more than just enable virtio-rng.  Somebody need to come up with a
> > complete list of devices that are needed for the service in question,
> > and work with the kernel team ensure that support for all of them is
> > enabled in the cloud kernel.
> 
> Folks working on the CCP, etc.: is it of interest to you to use the same
> cloud kernel? Does this improve our users' experience to have the same
> kernel across the different providers?

At present the cloud kernel's only optimizations consiѕt of disabling
device drivers that are highly unlikely to be seen in a cloud
environment.  So the user experience is the same, except for the larger
/lib/modules/$(uname -r) directory and the larger initramfs image.  The
size of the initramfs does, of course, contribute to boot latency by
taking longer to uncompress, but I haven't quantified the difference
yet.  So for now, the cloud flavour is the conservative choice, in that
we know it will work and the drawbacks of using it are fairly minor.

There is talk of making some additional changes.  Bug #947759 contains a
decent summary of things that are being considered.  There is also
#941284, but my inclination is to not implement that suggestion.  In any
case, we'll need to consider the impact of any proposed changes on the
user experience in the supported clouds on an ongoing basis.

In an ideal world, we might be able to provide distinct flavours for
each cloud, since e.g. it makes no sense to enable the Amazon ENA
ethernet driver on kernels targeting environmnents other than AWS, but
that would require more resources for diminishing returns.

noah



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-14 Thread Luca Filipozzi
On Fri, Jan 10, 2020 at 01:33:12PM -0500, Noah Meyerhans wrote:
> On Fri, Jan 10, 2020 at 03:52:53AM +, Luca Filipozzi wrote:
> > Two questions (pretend i'm 6yo):
> > 
> > (1) why can't AWS offer virtio-rng support (other than "we already offer
> > a RDRAND on amd64") and should Debian actively encourage their adding
> > this support?
> 
> We can certainly ask.  However, it is very clear that EC2 is well aware
> of the existence of virtio-rng (just look at who wrote the QEMU
> virtio-rng implementation, for example), so, without wanting to
> speculate too much, I'm going to guess that the decision to not offer it
> is an intentional one, rather than an oversight.  If I learn more, and
> the organization is willing to share it publicly, I'll pass it along.

Thanks! It'd be very interesting to know the reasonsing.

> > (2) what prevents our image having virtio-rng support (if it doesn't
> > already)?
> 
> The cloud kernel flavour currently only targets AWS and Azure, because
> people have put effort into making it support those services.  The
> images that we generate for those services use that kernel.  The images
> that we generate for other cloud services use the standard kernel, which
> does have virtio-rng support.
> 
> If we want to extend the cloud kernel to support other services, we need
> to do more than just enable virtio-rng.  Somebody need to come up with a
> complete list of devices that are needed for the service in question,
> and work with the kernel team ensure that support for all of them is
> enabled in the cloud kernel.

Folks working on the CCP, etc.: is it of interest to you to use the same
cloud kernel? Does this improve our users' experience to have the same
kernel across the different providers?

-- 
Luca Filipozzi



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-10 Thread Noah Meyerhans
On Fri, Jan 10, 2020 at 03:52:53AM +, Luca Filipozzi wrote:
> Two questions (pretend i'm 6yo):
> 
> (1) why can't AWS offer virtio-rng support (other than "we already offer
> a RDRAND on amd64") and should Debian actively encourage their adding
> this support?

We can certainly ask.  However, it is very clear that EC2 is well aware
of the existence of virtio-rng (just look at who wrote the QEMU
virtio-rng implementation, for example), so, without wanting to
speculate too much, I'm going to guess that the decision to not offer it
is an intentional one, rather than an oversight.  If I learn more, and
the organization is willing to share it publicly, I'll pass it along.

> (2) what prevents our image having virtio-rng support (if it doesn't
> already)?

The cloud kernel flavour currently only targets AWS and Azure, because
people have put effort into making it support those services.  The
images that we generate for those services use that kernel.  The images
that we generate for other cloud services use the standard kernel, which
does have virtio-rng support.

If we want to extend the cloud kernel to support other services, we need
to do more than just enable virtio-rng.  Somebody need to come up with a
complete list of devices that are needed for the service in question,
and work with the kernel team ensure that support for all of them is
enabled in the cloud kernel.

noah



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-09 Thread Luca Filipozzi
On Thu, Jan 09, 2020 at 04:56:58PM -0500, Theodore Y. Ts'o wrote:
> On Thu, Jan 09, 2020 at 07:15:20PM +, Jeremy Stanley wrote:
> > On 2020-01-09 13:18:24 +0100 (+0100), Adam Dobrawy wrote:
> > [...]
> > > I wonder if the correct criterion for the cloud image is
> > > compatibility with AWS and GCP only. I suppose a large number of
> > > deployment are based on private cloud environments (OpenStack
> > > etc.).
> > [...]
> > 
> > Setting aside for the moment that there are plenty of
> > OpenStack-based public cloud providers too (at last count, far more
> > than there are proprietary cloud providers because, hey, free
> > software!), the vast majority of OpenStack deployments rely on KVM
> > for their hypervisor layer which has had VirtIO-RNG since ages.
> > Works just fine for OpenStack as long as the administrator turns it
> > on.
> 
> More to the point, in response to customer demand, a lot of enterprise
> customers have demanded, and most/all of the cloud companies have
> responded to that demand, product offerrings which support hybrid
> cloud approaches.  And it's very likely that those on-prem VM's will
> be using KVM as their hypervisor.
> 
> That aside, if the cloud image is supposed to be compatible with GCP,
> then that would be a good enough reason on its own to support
> virtio-rng, since GCP supports virtio-rng today.

Two questions (pretend i'm 6yo):

(1) why can't AWS offer virtio-rng support (other than "we already offer
a RDRAND on amd64") and should Debian actively encourage their adding
this support?

(2) what prevents our image having virtio-rng support (if it doesn't
already)?

-- 
Luca Filipozzi



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-09 Thread Noah Meyerhans
On Thu, Jan 09, 2020 at 01:22:30PM -0500, Noah Meyerhans wrote:
> Our 5.4 kernel in sid does not suffer from a lack of entropy at boot on
> arm64 EC2 instances.  I guess it could be due to the "random: try to
> actively add entropy rather than passively wait for it" that tytso
> mentioned earlier.  I'm going to try to cherry-pick that into 4.19 and
> see if things speed up.  Since we're already running it in 5.4, I guess
> it's safe...

I've confirmed that 4.19.87 with changes cherry-picked from 50ee7529ec45
claims to have entropy at boot:

admin@ip-172-31-49-239:~$ cloud-init analyze blame
-- Boot Record 01 --
 02.88900s (init-network/config-ssh)
 ...

The change applies cleanly to our kernel tree, so this would appear to
be a possible solution.

I've opened https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=948519
against the kernel discuss the entropy issue in general, and will follow
up there with a proposal for getting this change backported.

noah



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-09 Thread Theodore Y. Ts'o
On Thu, Jan 09, 2020 at 07:15:20PM +, Jeremy Stanley wrote:
> On 2020-01-09 13:18:24 +0100 (+0100), Adam Dobrawy wrote:
> [...]
> > I wonder if the correct criterion for the cloud image is
> > compatibility with AWS and GCP only. I suppose a large number of
> > deployment are based on private cloud environments (OpenStack
> > etc.).
> [...]
> 
> Setting aside for the moment that there are plenty of
> OpenStack-based public cloud providers too (at last count, far more
> than there are proprietary cloud providers because, hey, free
> software!), the vast majority of OpenStack deployments rely on KVM
> for their hypervisor layer which has had VirtIO-RNG since ages.
> Works just fine for OpenStack as long as the administrator turns it
> on.

More to the point, in response to customer demand, a lot of enterprise
customers have demanded, and most/all of the cloud companies have
responded to that demand, product offerrings which support hybrid
cloud approaches.  And it's very likely that those on-prem VM's will
be using KVM as their hypervisor.

That aside, if the cloud image is supposed to be compatible with GCP,
then that would be a good enough reason on its own to support
virtio-rng, since GCP supports virtio-rng today.

- Ted



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-09 Thread Jeremy Stanley
On 2020-01-09 13:18:24 +0100 (+0100), Adam Dobrawy wrote:
[...]
> I wonder if the correct criterion for the cloud image is
> compatibility with AWS and GCP only. I suppose a large number of
> deployment are based on private cloud environments (OpenStack
> etc.).
[...]

Setting aside for the moment that there are plenty of
OpenStack-based public cloud providers too (at last count, far more
than there are proprietary cloud providers because, hey, free
software!), the vast majority of OpenStack deployments rely on KVM
for their hypervisor layer which has had VirtIO-RNG since ages.
Works just fine for OpenStack as long as the administrator turns it
on.

https://docs.openstack.org/security-guide/instance-management/security-services-for-instances.html#entropy-to-instances
 >

-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-09 Thread Noah Meyerhans
On Thu, Jan 09, 2020 at 04:57:24PM +, Luca Filipozzi wrote:
> > > >> I'd encourage those of you who are in position to make Amazon listen
> > > >> to get with the program and support virtio-rng.  :-)
> > > > Noah: chances of AWS supporting virtio-rng?
> > > I wonder if the correct criterion for the cloud image is compatibility
> > > with AWS and GCP only. I suppose a large number of deployment are based
> > > on private cloud environments (OpenStack etc.). In addition to AWS and
> > > GCP, there is also Azure, which is based on Hyper-V, which has a low
> > > chance of getting support for virtio-rng for obvious reasons.
> > 
> > The cloud kernel flavour currently targets AWS and Azure only.  Hence
> > the lack of support for virtio-rng.
> 
> How is entropy starvation at boot solved for x86-64 in AWS / Azure?

RDRAND is available on amd64, and contributes early entropy.

Our 5.4 kernel in sid does not suffer from a lack of entropy at boot on
arm64 EC2 instances.  I guess it could be due to the "random: try to
actively add entropy rather than passively wait for it" that tytso
mentioned earlier.  I'm going to try to cherry-pick that into 4.19 and
see if things speed up.  Since we're already running it in 5.4, I guess
it's safe...

noah



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-09 Thread Luca Filipozzi
On Thu, Jan 09, 2020 at 07:54:14AM -0500, Noah Meyerhans wrote:
> On Thu, Jan 09, 2020 at 01:18:24PM +0100, Adam Dobrawy wrote:
> > >> I'd encourage those of you who are in position to make Amazon listen
> > >> to get with the program and support virtio-rng.  :-)
> > > Noah: chances of AWS supporting virtio-rng?
> > I wonder if the correct criterion for the cloud image is compatibility
> > with AWS and GCP only. I suppose a large number of deployment are based
> > on private cloud environments (OpenStack etc.). In addition to AWS and
> > GCP, there is also Azure, which is based on Hyper-V, which has a low
> > chance of getting support for virtio-rng for obvious reasons.
> 
> The cloud kernel flavour currently targets AWS and Azure only.  Hence
> the lack of support for virtio-rng.

How is entropy starvation at boot solved for x86-64 in AWS / Azure?

-- 
Luca Filipozzi



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-09 Thread Noah Meyerhans
On Thu, Jan 09, 2020 at 01:18:24PM +0100, Adam Dobrawy wrote:
> >> I'd encourage those of you who are in position to make Amazon listen
> >> to get with the program and support virtio-rng.  :-)
> > Noah: chances of AWS supporting virtio-rng?
> I wonder if the correct criterion for the cloud image is compatibility
> with AWS and GCP only. I suppose a large number of deployment are based
> on private cloud environments (OpenStack etc.). In addition to AWS and
> GCP, there is also Azure, which is based on Hyper-V, which has a low
> chance of getting support for virtio-rng for obvious reasons.

The cloud kernel flavour currently targets AWS and Azure only.  Hence
the lack of support for virtio-rng.

noah



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-09 Thread Adam Dobrawy


W dniu 09.01.2020 o 06:47, Luca Filipozzi pisze:
>> I'd encourage those of you who are in position to make Amazon listen
>> to get with the program and support virtio-rng.  :-)
> Noah: chances of AWS supporting virtio-rng?
I wonder if the correct criterion for the cloud image is compatibility
with AWS and GCP only. I suppose a large number of deployment are based
on private cloud environments (OpenStack etc.). In addition to AWS and
GCP, there is also Azure, which is based on Hyper-V, which has a low
chance of getting support for virtio-rng for obvious reasons.



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Luca Filipozzi
On Wed, Jan 08, 2020 at 11:25:34PM -0500, Theodore Y. Ts'o wrote:
> On Thu, Jan 09, 2020 at 01:11:41AM +, Luca Filipozzi wrote:
> > 
> > (It's not like RNG quaility is a new problem... why didn't
> > virtualization approaches include host-to-guest RNG passthrough from the
> > beginning?)
> 
> Virtio-rng has been around since 2008 (over a decade), and it provides
> specifically the host-to-guest RNG passthrough that you've mentioned.
> Qemu supports it, as does GCE.  I'm a little surprised to find out
> that AWS doesn't support virtio-rng; I thought it did, but I just ran
> a quick experiment, and it appears I was wrong.

Thank you for the very informative reply. I really appreciate it.

> I'd encourage those of you who are in position to make Amazon listen
> to get with the program and support virtio-rng.  :-)

Noah: chances of AWS supporting virtio-rng?

-- 
Luca Filipozzi



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Theodore Y. Ts'o
On Thu, Jan 09, 2020 at 01:11:41AM +, Luca Filipozzi wrote:
> 
> (It's not like RNG quaility is a new problem... why didn't
> virtualization approaches include host-to-guest RNG passthrough from the
> beginning?)

Virtio-rng has been around since 2008 (over a decade), and it provides
specifically the host-to-guest RNG passthrough that you've mentioned.
Qemu supports it, as does GCE.  I'm a little surprised to find out
that AWS doesn't support virtio-rng; I thought it did, but I just ran
a quick experiment, and it appears I was wrong.  The Debian cloud
kernel doesn't appear to enable CONFIG_HW_RANDOM or
CONFIG_HW_RANDOM_VIRTIO --- boo, hiss --- but the Ubuntu kernel does,
and so I booted an AWS VM with Ubuntu.  I tried loading the virtio-rng
module, and it didn't show up in /sys/class/misc/hw_random/rng_available.

What you will find on GCE VM if you have a Linux kernel configured
correctly to support virtio-rng.

root@xfstests:~# dmidecode -s system-product-name
Google Compute Engine
root@xfstests:~# cat /sys/class/misc/hw_random/rng_available 
virtio_rng.0 tpm-rng-0 
root@xfstests:~# cat /sys/class/misc/hw_random/rng_current
virtio_rng.0

With newer kernels, virtio-rng will automatically be used to
initialize the CRNG, as well as provide continuous entropy to
/dev/random, for those people, or companies, or Payment Card Industry
(PCI) compliance labs, who have some irrational need for "True
Randomness" (whatever the hell that means).

Now, I happen to work at Google (in fact, I was one of the people who
pushed for virtio-rng in GCE), so the argument can be made that I'm
being biased, but QEMU's support of virtio-rng support long predates
GCE's support of virtio-rng by many, many years.  I'd encourage those
of you who are in position to make Amazon listen to get with the
program and support virtio-rng.  :-)

- Ted

P.S.  The above experiment in GCE was done using kernel built using a
defconfig for 5.4+ kernels (copy to .config and run "make
olddefconfig).  For kernels between 4.19 and 5.3 inclusive, use [2].
These kernel configs are minimal configs optimized for file system
testing using gce-xfstests[3] and kvm-xfstests, but some folks might
find it useful.  The kvm-xfstests framework is also useful for testing
kernel configs for randomness.  (Compare "kvm-xfstests shell" with and
without "--no-virtio-rng".)

[1] 
https://github.com/tytso/xfstests-bld/blob/master/kernel-configs/x86_64-config-4.19
[2] 
https://github.com/tytso/xfstests-bld/blob/master/kernel-configs/x86_64-config-5.4
[3] https://thunk.org/gce-xfstests



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Noah Meyerhans
On Wed, Jan 08, 2020 at 07:18:33PM -0500, Theodore Y. Ts'o wrote:
> Another approach would be to cherry pick 50ee7529ec45 ("random: try to
> actively add entropy rather than passively wait for it").  I'm pretty
> confident that it's probably fine ("it's fine.  it's fine.  Really,
> it's fine") for x86.  In particular, at least x86 has RDRAND, so even
> if it's utterly predictable to someone who has detailed information
> about the CPU's microarchitecture, it probably won't be a diaster.

Of course, another possibility would be to use the 5.4 kernel from
buster-backports, once it's uploaded, since it'll contain 50ee7529ec45
already.  I can confirm that ssh host key generation under Linux 5.4
does not block for lack of entropy.  We'll also at that point have the
option of using the cloud kernel flavour, when that's available.  I
don't really like the idea of using something that doesn't get support
from the security team, and I'd probably want to switch to the
buster-backports kernel for amd64 as well, if we were to do this.  It's
not what I prefer, but it is an option worth mentioning.

noah



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Luca Filipozzi
On Thu, Jan 09, 2020 at 12:41:28AM +, Luca Filipozzi wrote:
> On Wed, Jan 08, 2020 at 04:29:35PM -0500, Noah Meyerhans wrote:
> > If the kernel team is supportive of the
> > EFI_RNG+CONFIG_RANDOM_TRUST_BOOTLOADER approach, would folks be in
> > favor of enabling haveged temporarily, until kernel support is
> > available, or is it better to avoid it completely?
> 
> I prefer passing through hrng but would find haveged acceptable. Other
> distros ship with haveged enabled for the same reason as we are debating
> here.

That said, the concern is the quality of the entropy since it will be
used for the generation of long-lived ssh host keys.

I use terraform to instantiate instances and a I precompute ssh host
keys (RSA only but I could do the others, I suppose) and install them
with cloud-init. I did this primarily so that I could generate a
known_hosts files that contains the public keys of the instances and
thereby avoid ssh unknown host warnings. I suppose there's this added
benefit that the quality of the ssh host key is not in question since
it's using the entropy of my management machine (where I'm not using
haveged).

(It's not like RNG quaility is a new problem... why didn't
virtualization approaches include host-to-guest RNG passthrough from the
beginning?)

-- 
Luca Filipozzi



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Luca Filipozzi
On Wed, Jan 08, 2020 at 04:29:35PM -0500, Noah Meyerhans wrote:
> If the kernel team is supportive of the
> EFI_RNG+CONFIG_RANDOM_TRUST_BOOTLOADER approach, would folks be in
> favor of enabling haveged temporarily, until kernel support is
> available, or is it better to avoid it completely?

I prefer passing through hrng but would find haveged acceptable. Other
distros ship with haveged enabled for the same reason as we are debating
here.

Ted provides another viewpoint in a separate reply to this thread that
also merits consideration.

-- 
Luca Filipozzi



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Noah Meyerhans
On Wed, Jan 08, 2020 at 07:18:33PM -0500, Theodore Y. Ts'o wrote:
> I was under the impression that Amazon provided virtio-rng support for
> its VM's?  Or does that not apply for their arm64 Vm's?  If they do
> support virtio-rng, it might just be an issue of building the cloud
> kernel with that option enabled.

RDRAND is used for amd64, via the RANDOM_TRUST_CPU kernel config option.
That is not available for arm64.  The rough equivalent there is
apparently RANDOM_TRUST_BOOTLOADER, which uses the EFI_RNG protocol.
It's only available in Linux 5.4 at the moment, and not currently
supported on EC2.  It seems like we should consider backporting this.

> Another approach would be to cherry pick 50ee7529ec45 ("random: try to
> actively add entropy rather than passively wait for it").  I'm pretty
> confident that it's probably fine ("it's fine.  it's fine.  Really,
> it's fine") for x86.  In particular, at least x86 has RDRAND, so even
> if it's utterly predictable to someone who has detailed information
> about the CPU's microarchitecture, it probably won't be a diaster.

Thanks, this is worth looking at, at least in the absense of
RANDOM_TRUST_BOOTLOADER.

> Upstream, it's enabled for all architectures, because Linus thinks
> hanging at boot is a worse problem than a insufficiently initialized
> CRNG.  I'm not at all convinced that it's safe for all ARM and RISC-V
> CPU's.  On the other hand, I don't think it's going to be any worse
> that haveged (which I don't really trust on all architectures either),
> and it has the advantage of not requiring any additional userspace
> packages.

...Although this really isn't a ringing endorsement. :(

noah



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Theodore Y. Ts'o
On Wed, Jan 08, 2020 at 02:48:13PM -0500, Noah Meyerhans wrote:
> The buster arm64 images on Amazon EC2 appear to have insufficient
> entropy at boot, and thus take several minutes to complete the boot
> process.
> 
> There are a couple of potential fixes (or at least workarounds) for this
> problem, but none is entirely perfect.
>
> ...
> 
> I'm not aware of any other options.  Given the above, it seems that
> haveged is the only really feasible choice right now.  Does anyone
> disagree with that assessment?  Are there options I've missed?

I was under the impression that Amazon provided virtio-rng support for
its VM's?  Or does that not apply for their arm64 Vm's?  If they do
support virtio-rng, it might just be an issue of building the cloud
kernel with that option enabled.

Another approach would be to cherry pick 50ee7529ec45 ("random: try to
actively add entropy rather than passively wait for it").  I'm pretty
confident that it's probably fine ("it's fine.  it's fine.  Really,
it's fine") for x86.  In particular, at least x86 has RDRAND, so even
if it's utterly predictable to someone who has detailed information
about the CPU's microarchitecture, it probably won't be a diaster.

Upstream, it's enabled for all architectures, because Linus thinks
hanging at boot is a worse problem than a insufficiently initialized
CRNG.  I'm not at all convinced that it's safe for all ARM and RISC-V
CPU's.  On the other hand, I don't think it's going to be any worse
that haveged (which I don't really trust on all architectures either),
and it has the advantage of not requiring any additional userspace
packages.

- Ted



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Thomas Lange
JFTR

https://daniel-lange.com/archives/152-Openssh-taking-minutes-to-become-available,-booting-takes-half-an-hour-...-because-your-server-waits-for-a-few-bytes-of-randomness.html

-- 
regards Thomas



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Thomas Lange
> On Wed, 8 Jan 2020 16:40:33 -0500, Noah Meyerhans  said:

> To be clear, the problem isn't a failure to boot, but rather a several
> minute pause during boot.
For me such a delay is kind of failure. And as Daniel wrote in his
blog "unverified entropy is better than no entropy."

-- 
regards Thomas



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Noah Meyerhans
On Wed, Jan 08, 2020 at 09:24:25PM +, Jeremy Stanley wrote:
> > I've seen reactions like this, but never an explanation.  Has anyone
> > written up the issues?  Given that "fail to boot" isn't a workable
> > outcome, it'd be useful to know exactly what risks one accepts when
> > using haveged.
> 
> While you're at it, defining "fail to boot" would be nice. Just
> because sshd won't start, it doesn't necessarily mean the machine
> isn't "booted" in some sense, only that maybe you can't log into it
> (substitute httpd and inability to browse the Web sites served from
> it, or whatever you prefer).

To be clear, the problem isn't a failure to boot, but rather a several
minute pause during boot.  In the default images, the pause occurs
during ssh host key generation, but it's possible that other services
would be impacted in actual production scenarios, particularly since
user-provided cloud-config would not be processed until after the
config-ssh module completes.

For reference, here's the "systemd-analyze blame" and "cloud-init
analyze blame" output showing the delay:

admin@ip-10-0-1-42:~$ systemd-analyze blame
2min 27.763s cloud-init.service
 26.080s cloud-final.service
  2.774s networking.service
  2.065s cloud-init-local.service
  1.554s cloud-config.service
  ...

admin@ip-10-0-1-42:~$ cloud-init analyze blame
-- Boot Record 01 --
 25.26800s (modules-final/config-scripts-user)
 145.79700s (init-network/config-ssh)
 00.62600s (modules-config/config-grub-dpkg)
 00.49900s (init-local/search-Ec2Local)



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Noah Meyerhans
On Wed, Jan 08, 2020 at 12:50:04PM -0800, Ross Vandegrift wrote:
> I know of two other options:
> - pollinate
> - jitterentropy-rngd
> 
> pollinate downloads seeds remotely, which feels wrong - and itself may
> require random numbers.  I've never tried jitterentropy.

IMO these are roughly equivalent to haveged, in that they're userspace
accumulators of entropy that try to seed the kernel.  I think I prefer
haveged's approach, but I'm really not qualified to judge.



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Noah Meyerhans
On Wed, Jan 08, 2020 at 08:17:13PM +, Luca Filipozzi wrote:
> Every time I propose the use of haveged to resolve entropy starvation, I
> get reactions from crypto folks saying that it's not a valid solution.
> They invariably suggest that passing hardware RNG through to the VM is
> the appropriate choice.
> 
> The latest such reaction being from mjg59. See:
> https://twitter.com/mjg59/status/1181423056268349441
> https://twitter.com/LucaFilipozzi/status/1181426253636755457

Yeah, this is my understanding as well.  But it's not like the haveged
developers are clueless, either, and there's a decent amount of research
behind their approach.  I can't pretend to understand the details of it,
though.

Even if passing entropy from the host to the VM is the right approach,
it's not something we can take advantage of today, due to lack of
support both within EC2 and within Debian.  I'll follow up with the
kernel team to gauge their level of support for enabling
CONFIG_RANDOM_TRUST_BOOTLOADER (and backporting it to buster).

If the kernel team is supportive of the
EFI_RNG+CONFIG_RANDOM_TRUST_BOOTLOADER approach, would folks be in favor
of enabling haveged temporarily, until kernel support is available, or
is it better to avoid it completely?

noah



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Jeremy Stanley
On 2020-01-08 13:04:42 -0800 (-0800), Ross Vandegrift wrote:
> On Wed, Jan 08, 2020 at 08:17:13PM +, Luca Filipozzi wrote:
> > On Wed, Jan 08, 2020 at 02:48:13PM -0500, Noah Meyerhans wrote:
> > > We add haveged to the arm64 EC2 AMI.  This appears to work, and is
> > > something we can do today.  The debian-installer has previously used
> > > haveged to ensure reasonable entropy during installation, so there is
> > > some precident for this.
> > 
> > Every time I propose the use of haveged to resolve entropy starvation, I
> > get reactions from crypto folks saying that it's not a valid solution.
> > They invariably suggest that passing hardware RNG through to the VM is
> > the appropriate choice.
> > 
> > The latest such reaction being from mjg59. See:
> > https://twitter.com/mjg59/status/1181423056268349441
> > https://twitter.com/LucaFilipozzi/status/1181426253636755457
> 
> I've seen reactions like this, but never an explanation.  Has anyone
> written up the issues?  Given that "fail to boot" isn't a workable
> outcome, it'd be useful to know exactly what risks one accepts when
> using haveged.

While you're at it, defining "fail to boot" would be nice. Just
because sshd won't start, it doesn't necessarily mean the machine
isn't "booted" in some sense, only that maybe you can't log into it
(substitute httpd and inability to browse the Web sites served from
it, or whatever you prefer).
-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Ross Vandegrift
On Wed, Jan 08, 2020 at 08:17:13PM +, Luca Filipozzi wrote:
> On Wed, Jan 08, 2020 at 02:48:13PM -0500, Noah Meyerhans wrote:
> > We add haveged to the arm64 EC2 AMI.  This appears to work, and is
> > something we can do today.  The debian-installer has previously used
> > haveged to ensure reasonable entropy during installation, so there is
> > some precident for this.
> 
> Every time I propose the use of haveged to resolve entropy starvation, I
> get reactions from crypto folks saying that it's not a valid solution.
> They invariably suggest that passing hardware RNG through to the VM is
> the appropriate choice.
> 
> The latest such reaction being from mjg59. See:
> https://twitter.com/mjg59/status/1181423056268349441
> https://twitter.com/LucaFilipozzi/status/1181426253636755457

I've seen reactions like this, but never an explanation.  Has anyone
written up the issues?  Given that "fail to boot" isn't a workable
outcome, it'd be useful to know exactly what risks one accepts when
using haveged.

Ross



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Ross Vandegrift
On Wed, Jan 08, 2020 at 02:48:13PM -0500, Noah Meyerhans wrote:
> Option 1:
> 
> We add haveged to the arm64 EC2 AMI.  This appears to work, and is
> something we can do today.  The debian-installer has previously used
> haveged to ensure reasonable entropy during installation, so there is
> some precident for this.
> 
> Option 2:
> 
> There is a mechanism by which the VM host can pass entropy to the guest
> at boot time using the EFI_RNG protocol.  This won't require any
> additional software in our images, but it has a couple of other notable
> drawbacks:
[snip]
> I'm not aware of any other options.  Given the above, it seems that
> haveged is the only really feasible choice right now.  Does anyone
> disagree with that assessment?  Are there options I've missed?

I know of two other options:
- pollinate
- jitterentropy-rngd

pollinate downloads seeds remotely, which feels wrong - and itself may
require random numbers.  I've never tried jitterentropy.

Ross



Re: lack of boot-time entropy on arm64 ec2 instances

2020-01-08 Thread Luca Filipozzi
On Wed, Jan 08, 2020 at 02:48:13PM -0500, Noah Meyerhans wrote:
> We add haveged to the arm64 EC2 AMI.  This appears to work, and is
> something we can do today.  The debian-installer has previously used
> haveged to ensure reasonable entropy during installation, so there is
> some precident for this.

Every time I propose the use of haveged to resolve entropy starvation, I
get reactions from crypto folks saying that it's not a valid solution.
They invariably suggest that passing hardware RNG through to the VM is
the appropriate choice.

The latest such reaction being from mjg59. See:
https://twitter.com/mjg59/status/1181423056268349441
https://twitter.com/LucaFilipozzi/status/1181426253636755457

-- 
Luca Filipozzi