Re: Handling of entropy during boot

2019-09-11 Thread Ben Hutchings
On Wed, 2019-09-11 at 15:52 -0400, Paul Thomas wrote:
> Hi All,
> 
> First off, I want to acknowledge how great system Debian is, very nice work!
> 
> I know the issue with Entropy Starvation is understood, and I
> understand the security concern:
> https://wiki.debian.org/BoottimeEntropyStarvation
> https://daniel-lange.com/archives/152-hello-buster.html
> 
> However, I would just like to indicate how nuts this has been driving
> me. First, with a new Buster install it took me a little while just to
> figure out what was going on with sshd. I did install haveged, and
> this helps for general cases. But then I have corner cases like when
> the root filesystem is readonly then haveged doesn't work.
> 
> I'm not using ancient hardware I'm on a modern arm64 processor, but it
> is an embedded environment with no keyboard or mouse.

And no hardware RNG?

Ben.

-- 
Ben Hutchings
Unix is many things to many people,
but it's never been everything to anybody.




signature.asc
Description: This is a digitally signed message part


Re: Handling of entropy during boot

2019-09-11 Thread Paul Thomas
Hi All,

First off, I want to acknowledge how great system Debian is, very nice work!

I know the issue with Entropy Starvation is understood, and I
understand the security concern:
https://wiki.debian.org/BoottimeEntropyStarvation
https://daniel-lange.com/archives/152-hello-buster.html

However, I would just like to indicate how nuts this has been driving
me. First, with a new Buster install it took me a little while just to
figure out what was going on with sshd. I did install haveged, and
this helps for general cases. But then I have corner cases like when
the root filesystem is readonly then haveged doesn't work.

I'm not using ancient hardware I'm on a modern arm64 processor, but it
is an embedded environment with no keyboard or mouse.

-Paul



Re: Handling of entropy during boot

2019-02-12 Thread Ben Hutchings
On Mon, 2019-01-21 at 21:46 +, Ben Hutchings wrote:
> On Mon, 2019-01-21 at 20:49 +, Andy Simpkins wrote:
> [...]
> > Should we add to or change the possible entropy sources?
> [...]
> 
> Yes, we should (by default) enable use of available hardware RNGs to
> produce entropy and if none is available then we should (by default)
> install one of the various software entropy gathering daemons.

In linux version 4.19.20-1 I've enabled CONFIG_RANDOM_TRUST_CPU.

Ben.

> We should also document this so that users that distrust certain
> entropy sources will know how to disable them.
> 
> Ben.
> 
-- 
Ben Hutchings
The world is coming to an end.  Please log off.



signature.asc
Description: This is a digitally signed message part


Re: Handling of entropy during boot

2019-01-21 Thread Ben Hutchings
On Mon, 2019-01-21 at 20:49 +, Andy Simpkins wrote:
[...]
> Should we add to or change the possible entropy sources?
[...]

Yes, we should (by default) enable use of available hardware RNGs to
produce entropy and if none is available then we should (by default)
install one of the various software entropy gathering daemons.

We should also document this so that users that distrust certain
entropy sources will know how to disable them.

Ben.

-- 
Ben Hutchings
Klipstein's 4th Law of Prototyping and Production:
   A fail-safe circuit will destroy others.




signature.asc
Description: This is a digitally signed message part


Re: Handling of entropy during boot

2019-01-21 Thread Andy Simpkins
Hi,

This thread seems to have gone quite for some time.  Re-Reading the
thread I don't see any solutions being proposed that will truly suit
everyone.

If I have correctly understood the problem we are seeing a change from a
more open and trusting software environment to one with more emphasis on
security that is also less trusting:
* More packages are requiring the use of the kernel's high quality
entropy pool (including aspects of the kernel itself)
* At the same time questions are being asked over how much we can trust
our entropy sources. There is no agreement of which sources we should
trust; this appears to be based upon a cultural perspective rather than
evidence based.
* Different platforms may have different entropy sources available to
them (think desktops, mobile devices, headless servers, small IoT
devices & virtualised instances)

What does this mean for Buster?

Some services may take a long time to start.  I am not talking about a
few seconds here, but instead minutes or even hours.  I myself see sshd
timing out and being restarted by systemd several times before finally
starting some 7 min after the rest of the system on my ARM64 Mustang
platform.  I have seen reports of taking literally several hours for all
services to start on some NAS boxes.

Unfortunately some services fail to start completely, others are
terminated and unlimited restart attempts are made.

In all cases, that I have seen, there is no mention of the reason for
the failed start being that there is insufficient entropy available.
This itself is a bug whatever your view on how to address lack of
available entropy during start-up.

We should at the very least state the reason a service has not started.
I believe that systemd has the ability to only start services when a
given event has happened (i.e. wait for network).  Should we be asking
for wait for “entropy pool > x bytes” before starting a given service?



Should we add to or change the possible entropy sources?

Increasing the number of different sources of entropy may well reduce
the time waiting for sufficient entropy, (although this is not an excuse
not to explain why a service has failed to start).

There has been some discussion about adding in further possible entropy
sources, and whether or not that source is enabled by default of not.
In general nobody appears to be arguing against having  the ability to
use additional entropy sources, the only debate is over which should be
enabled by default within debian.

This debate appears to boil down to ‘do I trust this source’ and it is
accepted that this is very much dependant upon what the installation is
going to be used for AND your geo-political leanings.  i.e. you may well
trust a HRNG for an Intel device if you are an American, but be less
inclined to trust one from China, and vice versa.

I don't think that we can OR SHOULD make a sensible decision for an out
of the box experience that will suitable for all users.
Perhaps instead we should consider a tool (to be included in DI as well
as just the archive) that can present the different options and allow
the user to decide?

If this is the way we as a project decide to go I would very much like
to be involved in this new package.  Such a tool is probably beyond my
ability to write, however I would be very happy to work on the design,
UI and testing.

Is this the right approach to take?

Best regards

Andy



Re: Handling of entropy during boot

2019-01-16 Thread Marco d'Itri
On Jan 16, Guido Günther  wrote:

> There's also jitterentropy-rngd which does the trick but I haven't
> looked at the security implications.
Nowadays rngd collects jitter entropy, so I would not use something 
else.

-- 
ciao,
Marco


signature.asc
Description: PGP signature


Re: Handling of entropy during boot

2019-01-16 Thread Luca Boccassi
On Wed, 2019-01-16 at 11:05 +0100, Guido Günther wrote:
> Hi,
> On Mon, Jan 14, 2019 at 05:56:20PM +0100, W. Martin Borgert wrote:
> > Quoting Michael Stone :
> > > Unless the cpu supports rdrand/rdseed, installing rng-tools5
> > > won't
> > > really change anything. If it does support those, it probably
> > > makes more
> > > sense going forward to just enable CONFIG_RANDOM_TRUST_CPU rather
> > > than
> > > installing another package.
> > 
> > This option is only available for some architectures (X86, S390,
> > PPC)?
> > What about the others?
> 
> There's also jitterentropy-rngd which does the trick but I haven't
> looked at the security implications.
>  -- Guido

FWIW I've been using jitterentropy-rngd and rng-tools in production for
years, in Azure/VMWare/AWS x86 VMs, exactly for this problem. Haven't
been hacked so far... as far as I know :-)

-- 
Kind regards,
Luca Boccassi

signature.asc
Description: This is a digitally signed message part


Re: Handling of entropy during boot

2019-01-16 Thread Guido Günther
Hi,
On Mon, Jan 14, 2019 at 05:56:20PM +0100, W. Martin Borgert wrote:
> Quoting Michael Stone :
> > Unless the cpu supports rdrand/rdseed, installing rng-tools5 won't
> > really change anything. If it does support those, it probably makes more
> > sense going forward to just enable CONFIG_RANDOM_TRUST_CPU rather than
> > installing another package.
> 
> This option is only available for some architectures (X86, S390, PPC)?
> What about the others?

There's also jitterentropy-rngd which does the trick but I haven't
looked at the security implications.
 -- Guido



Re: Handling of entropy during boot

2019-01-15 Thread Anthony DeRobertis

On 1/14/19 7:07 AM, Thomas Goirand wrote:

On 12/18/18 8:11 PM, Theodore Y. Ts'o wrote:

If you are firmly convinced that there is a good
chance that the NSA has suborned Intel in putting a backdoor into
RDRAND, you won't want to use that boot option.

I have read numerous times that some people trust this or that part of
the instruction set, and I always found it silly. Why should some
instruction or part of the Intel CPU be more trusted? To me, either you
trust the entire CPU, or you just don't trust it at all and consider
using other CPU brands. Am I wrong with this reasoning?


I think the idea behind that is that the rest of the CPU has defined, 
verifiable behaviors. If NSA makes 1+1 sometimes equal 3, then that's 
detectable. So it'd be a fairly risky attack, someone might notice it. 
It also risks that other countries' NSA-equivalents make use of the 
backdoor.


OTOH, the RNG is not verifiable. It's supposed to take two entropy 
sources and apply AES to them to combine them. But how do you know it 
actually did that? You can't tell what the input to AES was, at least as 
long as AES remains secure. It could well be giving you the equivalent 
of 1, 2, 3, 4, etc. encrypted with a key known only to NSA. And there is 
much less risk of another country taking advantage as the numbers still 
are fully CSPRNG — to everyone but NSA.


(Also, see Dual_EC_DRBG)



Re: Handling of entropy during boot

2019-01-14 Thread Michael Stone
On January 14, 2019 11:56:20 AM EST, "W. Martin Borgert"  
wrote:
>Quoting Michael Stone :
>> Unless the cpu supports rdrand/rdseed, installing rng-tools5 won't  
>> really change anything. If it does support those, it probably makes  
>> more sense going forward to just enable CONFIG_RANDOM_TRUST_CPU  
>> rather than installing another package.
>
>This option is only available for some architectures (X86, S390, PPC)?
>What about the others?

I'm not aware of a good general solution for them.
-- 
Michael Stone
(From phone, please excuse typos)



Re: Handling of entropy during boot

2019-01-14 Thread W. Martin Borgert

Quoting Michael Stone :
Unless the cpu supports rdrand/rdseed, installing rng-tools5 won't  
really change anything. If it does support those, it probably makes  
more sense going forward to just enable CONFIG_RANDOM_TRUST_CPU  
rather than installing another package.


This option is only available for some architectures (X86, S390, PPC)?
What about the others?



Re: Re: Handling of entropy during boot

2019-01-14 Thread Alexander E. Patrakov

Sam Hartman wrote:


"Marco" == Marco d'Itri  writes:


Marco> online.  Is it enough to feed the host side of virtio-rng
Marco> with /dev/random or should everybody who has virtual machines
Marco> also install rngd in the host?  Is rngd to be preferred to
Marco> haveged?

I'd also like to point out that virtio-rng is only a solution for kvm.
I recently discovered that Vmware appears to have no virtual RNG
available to the guest at all.

A buster vmware guest will boot but will be unable to start sshd because
of lack of entropy for typically five minutes or so.
A lot of stuff breaks in that configuration.
virtio-rng doesn't help at all.

You can claim that Vmware is broken all you want, but a lot of people us
it, and we really should produce an operating system that  you can ssh
into when you boot a bunch of instances in a virtual environment.


Another data point: there exist high-profile KVM-based cloud providers 
that don't give their customers a virtio RNG device in the guest. One 
particular example is AliYun, also known as Alibaba Cloud. Note that in 
some locations they provide Xen, not KVM, instances, so try Shanghai if 
you want to confirm my statement.


--
Alexander E. Patrakov



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Handling of entropy during boot

2019-01-14 Thread Michael Stone

On Mon, Jan 14, 2019 at 12:55:09PM +0100, Marco d'Itri wrote:

Agreed. I think that d-i should install rngd (or haveged? And why?) if
it detects a virtualized environment without virtio-rng.


Unless the cpu supports rdrand/rdseed, installing rng-tools5 won't 
really change anything. If it does support those, it probably makes more 
sense going forward to just enable CONFIG_RANDOM_TRUST_CPU rather than 
installing another package.


As far as haveged, it's not clear how much better that is than the old 
practice of having rngd read from /dev/urandom.


Mike Stone



Re: Handling of entropy during boot

2019-01-14 Thread Thomas Goirand
On 12/18/18 8:11 PM, Theodore Y. Ts'o wrote:
> If you are firmly convinced that there is a good
> chance that the NSA has suborned Intel in putting a backdoor into
> RDRAND, you won't want to use that boot option.

I have read numerous times that some people trust this or that part of
the instruction set, and I always found it silly. Why should some
instruction or part of the Intel CPU be more trusted? To me, either you
trust the entire CPU, or you just don't trust it at all and consider
using other CPU brands. Am I wrong with this reasoning?

Cheers,

Thomas Goirand (zigo)



Re: Handling of entropy during boot

2019-01-14 Thread Marco d'Itri
On Jan 13, Sam Hartman  wrote:

> I recently discovered that Vmware appears to have no virtual RNG
> available to the guest at all.
AFAIK you are right.

> A buster vmware guest will boot but will be unable to start sshd because
> of lack of entropy for typically five minutes or so.
> A lot of stuff breaks in that configuration.
> virtio-rng doesn't help at all.
> 
> You can claim that Vmware is broken all you want, but a lot of people us
> it, and we really should produce an operating system that  you can ssh
> into when you boot a bunch of instances in a virtual environment.
Agreed. I think that d-i should install rngd (or haveged? And why?) if 
it detects a virtualized environment without virtio-rng.

-- 
ciao,
Marco


signature.asc
Description: PGP signature


Re: Handling of entropy during boot

2019-01-13 Thread Sam Hartman
> "Marco" == Marco d'Itri  writes:

Marco> online.  Is it enough to feed the host side of virtio-rng
Marco> with /dev/random or should everybody who has virtual machines
Marco> also install rngd in the host?  Is rngd to be preferred to
Marco> haveged?

I'd also like to point out that virtio-rng is only a solution for kvm.
I recently discovered that Vmware appears to have no virtual RNG
available to the guest at all.

A buster vmware guest will boot but will be unable to start sshd because
of lack of entropy for typically five minutes or so.
A lot of stuff breaks in that configuration.
virtio-rng doesn't help at all.

You can claim that Vmware is broken all you want, but a lot of people us
it, and we really should produce an operating system that  you can ssh
into when you boot a bunch of instances in a virtual environment.

--Sam



Re: Handling of entropy during boot

2019-01-13 Thread Marco d'Itri
On Jan 09, "Theodore Y. Ts'o"  wrote:

> x86 systems have a high resolution timer; Rasberry PI's don't.
> Furthermore, if libvirt is miconfigured, it should just be fixed (and
> better yet, it should be configured to enable virtio-rng, which is
> *not* hard).
Can you clarify what is the best practice here? I am finding a lot of 
conflicting and often obviously clueless advice online.
Is it enough to feed the host side of virtio-rng with /dev/random or 
should everybody who has virtual machines also install rngd in the host?
Is rngd to be preferred to haveged?

Data points: none of my current virtualization hosts (very new HPE 
Gen10 and Cisco UCS M5 blades) have an hardware RNG available to the 
kernel, at least with RHEL 7.
When rngd is installed it reports RDRAND and jitter entropy (the rngd
internal source, not the kernel module) to be available.

-- 
ciao,
Marco


signature.asc
Description: PGP signature


Re: Handling of entropy during boot

2019-01-10 Thread Michael Stone

On Thu, Jan 10, 2019 at 03:57:00PM +0100, Michael Biebl wrote:

with possible solutions like installing haveged


It still isn't clear to me that this is actually secure, so I'm not sure 
we should be telling people to do it in release notes.


Mike Stone



Re: Handling of entropy during boot

2019-01-10 Thread Stefan Fritsch
On Thu, 10 Jan 2019, Michael Biebl wrote:

> Am 10.01.19 um 15:51 schrieb Stefan Fritsch:
> > On Thu, 10 Jan 2019, Michael Biebl wrote:
> >>> ACK, we also had to do the same in Grml[.org] and our latest release
> >>> (2018.12). Now we automatically enable haveged when users boot using
> >>> the ssh boot option (which is something Grml specific, taking care
> >>> of setting user password and invoking the ssh service).
> >>
> >> And this is a perfect example why crediting the seed file (#914297) is
> >> not a solution to this problem.
> > 
> > While I still think this case should be handled by documentation, let's 
> > try to find a way forward that we can agree upon.
> > 
> > I think the absolute minimum we need something that prints a big fat 
> > warning during boot if the RNG is not yet initialized, points out that 
> > further services may block and that the admin should add entropy sources 
> > like virtio-rng or rdrand. The time when this warning should be printed 
> > should probably be before network is started, because if the admin has 
> > configured vpn services in /etc/network/interfaces, those will already 
> > block because of lack of entropy.
> > 
> > A second thing we need is a service that finishes when the RNG is 
> > initialized and that has a suitable large timeout for starting (maybe one 
> > day?). Services that need randomness can then depend on that service and 
> > don't need to set their own timeout to huge values. Also it is a lot 
> > easier to see what's wrong if the "wait for RNG" service is blocking than 
> > if some random network service is blocking.
> > 
> > More things should be done but maybe we can figure those out while we 
> > implement the above two things. Can we agree on this?
> > 
> 
> I'd prefer having this documented in the release notes:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=916690
> with possible solutions like installing haveged, configuring virtio-rng,
> etc. depending on the situation.

That would be an extremely user-unfriendly "solution" and would lead to 
countless hours of debugging and useless bug reports.



Re: Handling of entropy during boot

2019-01-10 Thread Michael Biebl
Am 10.01.19 um 15:51 schrieb Stefan Fritsch:
> On Thu, 10 Jan 2019, Michael Biebl wrote:
>>> ACK, we also had to do the same in Grml[.org] and our latest release
>>> (2018.12). Now we automatically enable haveged when users boot using
>>> the ssh boot option (which is something Grml specific, taking care
>>> of setting user password and invoking the ssh service).
>>
>> And this is a perfect example why crediting the seed file (#914297) is
>> not a solution to this problem.
> 
> While I still think this case should be handled by documentation, let's 
> try to find a way forward that we can agree upon.
> 
> I think the absolute minimum we need something that prints a big fat 
> warning during boot if the RNG is not yet initialized, points out that 
> further services may block and that the admin should add entropy sources 
> like virtio-rng or rdrand. The time when this warning should be printed 
> should probably be before network is started, because if the admin has 
> configured vpn services in /etc/network/interfaces, those will already 
> block because of lack of entropy.
> 
> A second thing we need is a service that finishes when the RNG is 
> initialized and that has a suitable large timeout for starting (maybe one 
> day?). Services that need randomness can then depend on that service and 
> don't need to set their own timeout to huge values. Also it is a lot 
> easier to see what's wrong if the "wait for RNG" service is blocking than 
> if some random network service is blocking.
> 
> More things should be done but maybe we can figure those out while we 
> implement the above two things. Can we agree on this?
> 

I'd prefer having this documented in the release notes:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=916690
with possible solutions like installing haveged, configuring virtio-rng,
etc. depending on the situation.

Michael

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Re: Handling of entropy during boot

2019-01-10 Thread Stefan Fritsch
On Thu, 10 Jan 2019, Michael Biebl wrote:
> > ACK, we also had to do the same in Grml[.org] and our latest release
> > (2018.12). Now we automatically enable haveged when users boot using
> > the ssh boot option (which is something Grml specific, taking care
> > of setting user password and invoking the ssh service).
> 
> And this is a perfect example why crediting the seed file (#914297) is
> not a solution to this problem.

While I still think this case should be handled by documentation, let's 
try to find a way forward that we can agree upon.

I think the absolute minimum we need something that prints a big fat 
warning during boot if the RNG is not yet initialized, points out that 
further services may block and that the admin should add entropy sources 
like virtio-rng or rdrand. The time when this warning should be printed 
should probably be before network is started, because if the admin has 
configured vpn services in /etc/network/interfaces, those will already 
block because of lack of entropy.

A second thing we need is a service that finishes when the RNG is 
initialized and that has a suitable large timeout for starting (maybe one 
day?). Services that need randomness can then depend on that service and 
don't need to set their own timeout to huge values. Also it is a lot 
easier to see what's wrong if the "wait for RNG" service is blocking than 
if some random network service is blocking.

More things should be done but maybe we can figure those out while we 
implement the above two things. Can we agree on this?


Now, in which packages should those services be shipped? Should they be 
part of the individual init system packages or into some central package 
like initscripts? Any opinions?



Re: Handling of entropy during boot

2019-01-10 Thread Stefan Fritsch
On Wed, 9 Jan 2019, Theodore Y. Ts'o wrote:

> On Wed, Jan 09, 2019 at 09:58:22AM +0100, Stefan Fritsch wrote:
> > 
> > There have been a number of bug reports and blog posts about this, despite 
> > buster not being release yet. So it's not that uncommon.
> 
> Pointers, please?  Let's see them and investigate.  The primary issue
> I've been aware of to date has been on Fedora systems, and it's due to
> some Red Hat specific changes that they made for FEDRAMP compliance
> --- and Red Hat has dealt with those issues.
> 
> If there are problems for people using Debian Testing, we should
> investigate them and understand what is going on.

Some other people already have sent you a few pointers (thanks!). The 
reason why I am looking into this is that it affects apache2 (see 
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914297 ). Apache does 
not call getrandom itself but libssl does, and it definitely needs secure 
randomness for diffie-hellman. So there is nothing that can or should be 
fixed in apache.

More links are at the end of 
https://lists.debian.org/debian-devel/2018/12/msg00184.html

Also, the thread on debian-kernel pointed to by Ben Hutchings is an 
interesting read, I had not noticed that before.


> > No, that's utterly wrong. If it's a hassle to use good entropy, people 
> > will use gettimeofday() for getting "entropy" and they will use it for 
> > security relevant purposes. In this way, you would achieve exactly the 
> > opposite of what you want.
> 
> If *users* do this, then if they end up releasing credit card numbers
> or PII or violate their customers privacy which brings the EU's GDPR
> enforcers down on then, it's on *their* heads.  If *Debian* makes a
> local Debian-specific change which causes these really bad outcomes,
> then it's on *ours*.

Since many users and developers will take the shortest path to a "working" 
service, we must make sure that the secure way just works.

> > Any program that does secure network connections needs entropy for 
> > Diffie-Hellman. And even seeds for hash buckets can be security relevant. 
> > You really don't want that people need to distinguish between 
> > security-critical and stupid uses of entropy, because they WILL get it 
> > wrong.
> 
> Sure, this is why developers need to investigate the bugs.  You said
> you provided links, but I couldn't find any in your e-mail messages or
> earlier ones on this thread.  Perhaps I missed them; in which case, my
> apologies.   Can you please send/resend those links?
> 
> Can you please prioritize reports from people running Debian Unstable
> or Debain Testing?  As I said above, these issues tend to be very
> distro specific, especially when distros are messing around with
> crypto-related libraries in order to keep the US Government happy.

As far as I can see, all reports are from unstable/testing only, because 
stable does not cause getrandom() to block (see 
https://lists.debian.org/debian-release/2018/05/msg00130.html ).



Re: Handling of entropy during boot

2019-01-10 Thread Michael Biebl
Am 10.01.19 um 14:23 schrieb Michael Prokop:
> * Raphael Hertzog [Thu Jan 10, 2019 at 12:24:45PM +0100]:
>> On Wed, 09 Jan 2019, Theodore Y. Ts'o wrote:
> 
>>> Pointers, please?  Let's see them and investigate.  The primary issue
>>> I've been aware of to date has been on Fedora systems, and it's due to
>>> some Red Hat specific changes that they made for FEDRAMP compliance
>>> --- and Red Hat has dealt with those issues.
> 
>> In Kali I had to install haveged by default due to this problem.
>> We got reports of having to wait up to 5 minutes to get to their desktop.
>> We got reports of sshd not working on first boot (in fact just taking too
>> long to start).
> 
> ACK, we also had to do the same in Grml[.org] and our latest release
> (2018.12). Now we automatically enable haveged when users boot using
> the ssh boot option (which is something Grml specific, taking care
> of setting user password and invoking the ssh service).

And this is a perfect example why crediting the seed file (#914297) is
not a solution to this problem.


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Re: Handling of entropy during boot

2019-01-10 Thread Michael Prokop
* Raphael Hertzog [Thu Jan 10, 2019 at 12:24:45PM +0100]:
> On Wed, 09 Jan 2019, Theodore Y. Ts'o wrote:

> > Pointers, please?  Let's see them and investigate.  The primary issue
> > I've been aware of to date has been on Fedora systems, and it's due to
> > some Red Hat specific changes that they made for FEDRAMP compliance
> > --- and Red Hat has dealt with those issues.

> In Kali I had to install haveged by default due to this problem.
> We got reports of having to wait up to 5 minutes to get to their desktop.
> We got reports of sshd not working on first boot (in fact just taking too
> long to start).

ACK, we also had to do the same in Grml[.org] and our latest release
(2018.12). Now we automatically enable haveged when users boot using
the ssh boot option (which is something Grml specific, taking care
of setting user password and invoking the ssh service).

We saw exactly what Daniel documented at
https://daniel-lange.com/archives/152-Openssh-taking-minutes-to-become-available,-booting-takes-half-an-hour-...-because-your-server-waits-for-a-few-bytes-of-randomness.html

regards,
-mika-
-- 
https://michael-prokop.at/  || https://adminzen.org/
https://grml-solutions.com/ || https://grml.org/


signature.asc
Description: Digital signature


Re: Handling of entropy during boot

2019-01-10 Thread Raphael Hertzog
Hi,

On Wed, 09 Jan 2019, Theodore Y. Ts'o wrote:
> Pointers, please?  Let's see them and investigate.  The primary issue
> I've been aware of to date has been on Fedora systems, and it's due to
> some Red Hat specific changes that they made for FEDRAMP compliance
> --- and Red Hat has dealt with those issues.

In Kali I had to install haveged by default due to this problem.
We got reports of having to wait up to 5 minutes to get to their desktop.
We got reports of sshd not working on first boot (in fact just taking too
long to start).

https://bugs.kali.org/view.php?id=5124
https://bugs.kali.org/view.php?id=4994
https://bugs.kali.org/view.php?id=5011

I haven't looked, but it seems likely that thin.service is trying to
generate some keys on initial startup. Which explains why it gets stalled.

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Support Debian LTS: https://www.freexian.com/services/debian-lts.html
Learn to master Debian: https://debian-handbook.info/get/



Re: Handling of entropy during boot

2019-01-09 Thread Theodore Y. Ts'o
On Tue, Jan 08, 2019 at 10:41:55AM +0100, Stefan Fritsch wrote:
> 
> If the security issue only affects a small percentage of the installations 
> and fixing it means breaking many other installations, then there has to 
> be a discussion if we really want fix the issue or if a "don't do that" 
> documentation is the better choice.

One of the questions which needs to be answered is exactly how many
installations are actually broken.  I don't think it's going to be bad
as you suspect

> Rasberry PIs were only an example. There are also other systems, including 
> old x86 systems, that don't have a HWRNG. Also, there are probably a load 
> of x86 VMs that emulate an older CPU due to libvirt misconfiguration and 
> don't expose the rdrand cpuid bit.

x86 systems have a high resolution timer; Rasberry PI's don't.
Furthermore, if libvirt is miconfigured, it should just be fixed (and
better yet, it should be configured to enable virtio-rng, which is
*not* hard).

> Systems that don't suffer from blocking on entropy because they have other 
> sources of entropy (hwrng, ...) won't have their security reduced because 
> the good entropy will still be added to the pool, regardless of the seed 
> file being credited or not.

The question is how long they have to block.  *Very* unfortunately,
there's a lot of busted software that try to generate
security-critical keys when the system is first booted, which is when
entropy available is the least available.  Such packages include ssh
and various packages which call openssl (such as CUPS) which are
visible to the internet.

And if the system doesn't have good sources of entropy, and don't have
sufficient interrupts to initialize the entropy pool, the question is
what should we do?  Should we just blindly proceed and let them
generate insecure keys?  At least, if the system blocks, they'll know
something is wrong, and they can fix the problem (for example, such as
*fixing* their libvirt configuration).

Ultimately, I don't think it's a big problem, primarily because I'm
not hearing a lot of yelling from Debian users.  It may be annoying
for your Rasberry Pi system, but the question is whether that is a
common case or an isolated case.

> So, how could we go forward from here. Maybe we could limit the wait for 
> entropy to some reasonable value (1 minute? 5 minutes?). This could be 
> done by creating a program that does a blocking getrandom but with a 
> timeout. If the timeout expires and the seed file has been read 
> successfully before, it would then credit the read entropy. This program 
> would be added as systemd unit so that services that need entropy can 
> depend on it and don't get killed with a timeout. Is this a reasonable 
> approach? Or do you (or anyone else) have any better suggestions?

My suggest is to try and figure out *what* is blocking, and *why*.  If
it's because it's something security-critical, such as generating ssh
keys, letting things continue even though we don't have secure entropy
is a bad, bad, BAD idea.  If it's for something stupid, like
generating seeds for Python dictionaries (just as an example; that one
has been fixed) then the application should be fixed not to request
secure randomness in the first place.

That's the correct fix, as opposed to a short cut that might leave us
in worst place, from a security perspective.

- Ted



Re: Handling of entropy during boot

2019-01-09 Thread Ben Hutchings
On Wed, 2019-01-09 at 11:40 -0500, Theodore Y. Ts'o wrote:
> On Wed, Jan 09, 2019 at 09:58:22AM +0100, Stefan Fritsch wrote:
[...]
> > No, that's utterly wrong. If it's a hassle to use good entropy, people 
> > will use gettimeofday() for getting "entropy" and they will use it for 
> > security relevant purposes. In this way, you would achieve exactly the 
> > opposite of what you want.
> 
> If *users* do this, then if they end up releasing credit card numbers
> or PII or violate their customers privacy which brings the EU's GDPR
> enforcers down on then, it's on *their* heads.  If *Debian* makes a
> local Debian-specific change which causes these really bad outcomes,
> then it's on *ours*.
> 
> We've tried to do this ten years ago, when well-meaning Debian
> Developers tried to "fix" OpenSSL's random number library, and it
> turned out to be a disaster[1].  So let's be careful and to replicate
> past mistakes, eh?

It's a bit late for that:
https://lists.debian.org/debian-release/2018/05/msg00130.html

[...]
> Sure, this is why developers need to investigate the bugs.  You said
> you provided links, but I couldn't find any in your e-mail messages or
> earlier ones on this thread.  Perhaps I missed them; in which case, my
> apologies.   Can you please send/resend those links?
[...]

I sent you a bunch of bug links in message
 in
August.

Ben.

-- 
Ben Hutchings
Every program is either trivial or else contains at least one bug



signature.asc
Description: This is a digitally signed message part


Re: Handling of entropy during boot

2019-01-09 Thread Matt Zagrabelny
On Wed, Jan 9, 2019 at 12:13 PM Theodore Y. Ts'o  wrote:

> On Wed, Jan 09, 2019 at 09:58:22AM +0100, Stefan Fritsch wrote:
> >
> > There have been a number of bug reports and blog posts about this,
> despite
> > buster not being release yet. So it's not that uncommon.
>
> Pointers, please?  Let's see them and investigate.


https://daniel-lange.com/archives/152-Openssh-taking-minutes-to-become-available,-booting-takes-half-an-hour-...-because-your-server-waits-for-a-few-bytes-of-randomness.html

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912087
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912616

There's lots of chatter in the systemd github isses, too.

I've been bitten by both ssh taking forever and puppet timing out on VMs.
I'll need to investigate about virtio-rng.

I've got an embedded x86-64 system where lightdm starts quickly when I am
plugged into an ethernet connection, but takes about 8 minutes when the
ethernet is disconnected. I am very suspicious of the low entropy in this
case, too.

-m


Re: Handling of entropy during boot

2019-01-09 Thread Theodore Y. Ts'o
On Wed, Jan 09, 2019 at 09:58:22AM +0100, Stefan Fritsch wrote:
> 
> There have been a number of bug reports and blog posts about this, despite 
> buster not being release yet. So it's not that uncommon.

Pointers, please?  Let's see them and investigate.  The primary issue
I've been aware of to date has been on Fedora systems, and it's due to
some Red Hat specific changes that they made for FEDRAMP compliance
--- and Red Hat has dealt with those issues.

If there are problems for people using Debian Testing, we should
investigate them and understand what is going on.

> > My suggest is to try and figure out *what* is blocking, and *why*.  If
> > it's because it's something security-critical, such as generating ssh
> > keys, letting things continue even though we don't have secure entropy
> > is a bad, bad, BAD idea.  If it's for something stupid, like
> > generating seeds for Python dictionaries (just as an example; that one
> > has been fixed) then the application should be fixed not to request
> > secure randomness in the first place.
> 
> No, that's utterly wrong. If it's a hassle to use good entropy, people 
> will use gettimeofday() for getting "entropy" and they will use it for 
> security relevant purposes. In this way, you would achieve exactly the 
> opposite of what you want.

If *users* do this, then if they end up releasing credit card numbers
or PII or violate their customers privacy which brings the EU's GDPR
enforcers down on then, it's on *their* heads.  If *Debian* makes a
local Debian-specific change which causes these really bad outcomes,
then it's on *ours*.

We've tried to do this ten years ago, when well-meaning Debian
Developers tried to "fix" OpenSSL's random number library, and it
turned out to be a disaster[1].  So let's be careful and to replicate
past mistakes, eh?

[1] https://www.schneier.com/blog/archives/2008/05/random_number_b.html

> Any program that does secure network connections needs entropy for 
> Diffie-Hellman. And even seeds for hash buckets can be security relevant. 
> You really don't want that people need to distinguish between 
> security-critical and stupid uses of entropy, because they WILL get it 
> wrong.

Sure, this is why developers need to investigate the bugs.  You said
you provided links, but I couldn't find any in your e-mail messages or
earlier ones on this thread.  Perhaps I missed them; in which case, my
apologies.   Can you please send/resend those links?

Can you please prioritize reports from people running Debian Unstable
or Debain Testing?  As I said above, these issues tend to be very
distro specific, especially when distros are messing around with
crypto-related libraries in order to keep the US Government happy.

- Ted



Re: Handling of entropy during boot

2019-01-09 Thread Stefan Fritsch
On Tue, 8 Jan 2019, Theodore Y. Ts'o wrote:

> On Tue, Jan 08, 2019 at 10:41:55AM +0100, Stefan Fritsch wrote:
> > 
> > If the security issue only affects a small percentage of the installations 
> > and fixing it means breaking many other installations, then there has to 
> > be a discussion if we really want fix the issue or if a "don't do that" 
> > documentation is the better choice.
> 
> One of the questions which needs to be answered is exactly how many
> installations are actually broken.  I don't think it's going to be bad
> as you suspect

There have been a number of bug reports and blog posts about this, despite 
buster not being release yet. So it's not that uncommon.

> 
> > Rasberry PIs were only an example. There are also other systems, including 
> > old x86 systems, that don't have a HWRNG. Also, there are probably a load 
> > of x86 VMs that emulate an older CPU due to libvirt misconfiguration and 
> > don't expose the rdrand cpuid bit.
> 
> x86 systems have a high resolution timer; Rasberry PI's don't.
> Furthermore, if libvirt is miconfigured, it should just be fixed (and
> better yet, it should be configured to enable virtio-rng, which is
> *not* hard).

It can be very hard if the VM host is not under your control.

> > Systems that don't suffer from blocking on entropy because they have other 
> > sources of entropy (hwrng, ...) won't have their security reduced because 
> > the good entropy will still be added to the pool, regardless of the seed 
> > file being credited or not.
> 
> The question is how long they have to block.  *Very* unfortunately,
> there's a lot of busted software that try to generate
> security-critical keys when the system is first booted, which is when
> entropy available is the least available.  Such packages include ssh
> and various packages which call openssl (such as CUPS) which are
> visible to the internet.
> 
> And if the system doesn't have good sources of entropy, and don't have
> sufficient interrupts to initialize the entropy pool, the question is
> what should we do?  Should we just blindly proceed and let them
> generate insecure keys?  At least, if the system blocks, they'll know
> something is wrong, and they can fix the problem (for example, such as
> *fixing* their libvirt configuration).

At the very least, there must be a clear message what the problem is. 
People having to use strace to figure out what is broken is just not 
acceptable.

> Ultimately, I don't think it's a big problem, primarily because I'm
> not hearing a lot of yelling from Debian users.

I think the amount of yelling is already quite high, considering that it's 
only for testing and the vast majority of large deployments only use 
stable. I have included some links in my first mail.

> It may be annoying
> for your Rasberry Pi system, but the question is whether that is a
> common case or an isolated case.


> > So, how could we go forward from here. Maybe we could limit the wait for 
> > entropy to some reasonable value (1 minute? 5 minutes?). This could be 
> > done by creating a program that does a blocking getrandom but with a 
> > timeout. If the timeout expires and the seed file has been read 
> > successfully before, it would then credit the read entropy. This program 
> > would be added as systemd unit so that services that need entropy can 
> > depend on it and don't get killed with a timeout. Is this a reasonable 
> > approach? Or do you (or anyone else) have any better suggestions?
> 
> My suggest is to try and figure out *what* is blocking, and *why*.  If
> it's because it's something security-critical, such as generating ssh
> keys, letting things continue even though we don't have secure entropy
> is a bad, bad, BAD idea.  If it's for something stupid, like
> generating seeds for Python dictionaries (just as an example; that one
> has been fixed) then the application should be fixed not to request
> secure randomness in the first place.

No, that's utterly wrong. If it's a hassle to use good entropy, people 
will use gettimeofday() for getting "entropy" and they will use it for 
security relevant purposes. In this way, you would achieve exactly the 
opposite of what you want.

Any program that does secure network connections needs entropy for 
Diffie-Hellman. And even seeds for hash buckets can be security relevant. 
You really don't want that people need to distinguish between 
security-critical and stupid uses of entropy, because they WILL get it 
wrong.

For the most part, daemons block during startup because openssl decides it 
wants entropy for something. This is really difficult to change without 
creating other security issues.

> That's the correct fix, as opposed to a short cut that might leave us
> in worst place, from a security perspective.

We already were there with the random() library call, and that was not a 
good situation. People used it for everything, including security-critical 
stuff. Now people have been educated to use good entrop

Re: Handling of entropy during boot

2019-01-08 Thread Stefan Fritsch
On Sun, 23 Dec 2018, Theodore Y. Ts'o wrote:

> On Sun, Dec 23, 2018 at 05:52:31PM +0100, Stefan Fritsch wrote:
> > I think some other questions should be considered first. Did Debian protect 
> > from these attacks in the past? The answer is clearly no. Now, should we 
> > break 
> > the systems of those people who keep their random-seed file secret and 
> > don't 
> > clone their OS image, in order to offer some protection to other people? 
> > This 
> > is really what we need to answer first, and in my opinion, we should try 
> > very 
> > hard not to break the systems of those users. And I see no other way than 
> > to 
> > credit the random seed file with entropy.
> 
> I don't think this line of reasoning is valid.  Supposed there was a
> horrific security hole, such that 10% of publically available SSH
> hosts had insecurely shared public keys such that were vulnerable to
> being guessed[1].  Cearly, in the past (before we knew about such a
> vulnerability) we did not protect those systems against this attack.
> Does this mean we shouldn't in the future?  I don't think it so
> follows!

If the security issue only affects a small percentage of the installations 
and fixing it means breaking many other installations, then there has to 
be a discussion if we really want fix the issue or if a "don't do that" 
documentation is the better choice.

> [1] Mining your p's and q's: Widespread Weak Keys in Network Devices.
> https://factorable.net

> There is a balancing test that has to go on here.  And quite frankly
> Rasberry PI's are extremely problematic devices from a security
> perspective.  They use a coarse-grained clock, so it's very hard to
> get good entropy out of timing events, and very the hardware that they
> have on them is such that there aren't many events that we can use to
> generate entropy in the first place.

Rasberry PIs were only an example. There are also other systems, including 
old x86 systems, that don't have a HWRNG. Also, there are probably a load 
of x86 VMs that emulate an older CPU due to libvirt misconfiguration and 
don't expose the rdrand cpuid bit. Will the Linux kernel try to detect 
rdrand by detecting the UD exception or does it trust the cpuid bit?

> I'm not sure that it's a great idea to weaken *all* Debian systems to
> the security of Rasberry PI's, including x86 servers and laptops, just
> because one platform has crappy hardware with respect to getting
> secure random numbers.

Systems that don't suffer from blocking on entropy because they have other 
sources of entropy (hwrng, ...) won't have their security reduced because 
the good entropy will still be added to the pool, regardless of the seed 
file being credited or not.
 
> So perhaps the right answer is we have one default value for certain
> architectures, or maybe classes of devices (e.g., a server-class ARM64
> device is very different from a IOT-style ARM platform).
> 
> > 
> > One could also make it harder for an attacker to regenerate key material 
> > from 
> > a system where he knows the seed file. For example, if there is a RTC one 
> > could 
> > put the boot time and all serial numbers / MAC addresses that one can find 
> > into 
> > an expensive function like PBKDF2 or bcrypt and feed the result to the 
> > random 
> > seed. This way, even if the attacker has an approximate knowledge of most 
> > of 
> > that information, he would still need to spend quite a bit of computing 
> > power 
> > to get all the possible random seeds that could be used.
> 
> We mix things like serial numbers and MAC addresses into the random
> pool already.  Unfortunately, if the attacker can snoop the
> random-seed file, it's likely he or she can simply obtain the MAC
> addresses or serial numbers of the device.

Including the boot time would help, if this was done with sufficient 
granularity, but the boot time can probably leak by stuff like tcp 
timestamps, too. Still, making it more expensive for an attacker to try 
all possible values may still be a good idea.


> > If the number of rounds in the function depends on timing, like do
> > as many rounds as possible in 1 second, things like the load of the
> > VM host and the temperature of the CPU will also play a role in the
> > result. A sha sum of dmesg would probably also help, because it
> > contains a lot of timings that also depend on the load of the VM
> > host.
> 
> We are already mixing timing information into the entropy pool, and to
> the extent that there is randomness there, it is cr editedi
> appropriately.  The problem is that the Rasberry Pi doesn't have a
> fine-grained clock, and there is a lot less entropy from timing events
> than most people might suppose.
> 
> As I said, though; it's one thing for this to be added to the entropy
> pool.  It's quite another for it to be reflected in the random seed
> file.  Today, if the system was booted a year ago, the random seed
> file will not have been updated for the past 12 months.  The last time
> it 

Re: Handling of entropy during boot

2018-12-24 Thread Theodore Y. Ts'o
On Sun, Dec 23, 2018 at 05:52:31PM +0100, Stefan Fritsch wrote:
> I think some other questions should be considered first. Did Debian protect 
> from these attacks in the past? The answer is clearly no. Now, should we 
> break 
> the systems of those people who keep their random-seed file secret and don't 
> clone their OS image, in order to offer some protection to other people? This 
> is really what we need to answer first, and in my opinion, we should try very 
> hard not to break the systems of those users. And I see no other way than to 
> credit the random seed file with entropy.

I don't think this line of reasoning is valid.  Supposed there was a
horrific security hole, such that 10% of publically available SSH
hosts had insecurely shared public keys such that were vulnerable to
being guessed[1].  Cearly, in the past (before we knew about such a
vulnerability) we did not protect those systems against this attack.
Does this mean we shouldn't in the future?  I don't think it so
follows!

[1] Mining your p's and q's: Widespread Weak Keys in Network Devices.
https://factorable.net

There is a balancing test that has to go on here.  And quite frankly
Rasberry PI's are extremely problematic devices from a security
perspective.  They use a coarse-grained clock, so it's very hard to
get good entropy out of timing events, and very the hardware that they
have on them is such that there aren't many events that we can use to
generate entropy in the first place.

I'm not sure that it's a great idea to weaken *all* Debian systems to
the security of Rasberry PI's, including x86 servers and laptops, just
because one platform has crappy hardware with respect to getting
secure random numbers.

So perhaps the right answer is we have one default value for certain
architectures, or maybe classes of devices (e.g., a server-class ARM64
device is very different from a IOT-style ARM platform).

> 
> One could also make it harder for an attacker to regenerate key material from 
> a system where he knows the seed file. For example, if there is a RTC one 
> could 
> put the boot time and all serial numbers / MAC addresses that one can find 
> into 
> an expensive function like PBKDF2 or bcrypt and feed the result to the random 
> seed. This way, even if the attacker has an approximate knowledge of most of 
> that information, he would still need to spend quite a bit of computing power 
> to get all the possible random seeds that could be used.

We mix things like serial numbers and MAC addresses into the random
pool already.  Unfortunately, if the attacker can snoop the
random-seed file, it's likely he or she can simply obtain the MAC
addresses or serial numbers of the device.

> If the number of rounds in the function depends on timing, like do
> as many rounds as possible in 1 second, things like the load of the
> VM host and the temperature of the CPU will also play a role in the
> result. A sha sum of dmesg would probably also help, because it
> contains a lot of timings that also depend on the load of the VM
> host.

We are already mixing timing information into the entropy pool, and to
the extent that there is randomness there, it is cr editedi
appropriately.  The problem is that the Rasberry Pi doesn't have a
fine-grained clock, and there is a lot less entropy from timing events
than most people might suppose.

As I said, though; it's one thing for this to be added to the entropy
pool.  It's quite another for it to be reflected in the random seed
file.  Today, if the system was booted a year ago, the random seed
file will not have been updated for the past 12 months.  The last time
it would have been updated is shortly after the system was first
booted.  This is **terrible* if you want to assume that we should give
full credit to the random-seed file --- because entropy means, "not
known to the adversary".  The adversary can have access to it,
including, say, when ethernet interrupts may have caused timing events
because the Rasberry PI only keeps time to 100Hz granularity, and an
outside attacker can look at the external timing of packets on the
network, assuming that the timing of network interrupts are actually
contributing entropy is not clear.

I understand that having Rasberry Pi's take a long time to boot
because they don't have entropy is frustrating.  But is silently
assuming they have entropy when someone really determined to reverset
engineer state of the pool a preferable alternative?

If someone is using the prototype and IOT device (remember: 'S' in IOT
standards for security), maybe it's fine, since IOT devices are
generally wide open to security problems anyway, so what's one more?
Just don't put them on *my* home network.  :-)

But is that *really* the best answer for Debian?   My opinion is "no"

At least, let's please not make the security for x86 servers and
desktops worse just to please Rasberry Pi IOT developers

 - Ted



Re: Handling of entropy during boot

2018-12-23 Thread Stefan Fritsch
On Tuesday, 18 December 2018 20:11:58 CET you wrote:
> On Mon, Dec 17, 2018 at 09:46:42PM +0100, Stefan Fritsch wrote:
> > There is a random seed file stored by systemd-random-seed.service that
> > saves entropy from one boot and loads it again after the next reboot. The
> > random seed file is re-written immediately after the file is read, so the
> > system not properly shutting down won't cause the same seed file to be
> > used again. The problem is that systemd (and probably
> > /etc/init.d/urandom, too) does not set the flag that allows the kernel to
> > credit the randomness and so the kernel does not know about the entropy
> > contained in that file. Systemd upstream argues that this is supposed to
> > protect against the same OS image being used many times [3]. (More links
> > to more discussion can be found at [4]).
> 
> This is an issue which Debian should be deciding more than systemd,
> since the issues involved involve how the entire OS is packaged and
> installed. 

I definitely agree with that.

> That being said, the issues involved are subtle.

> 
> The decision to not credit any randomness for the contents of
> /var/lib/systemd/random-seed is definitely the conservative thing to
> do.  One of the issues is indeed what happens if the OS image gets
> reused.  And it's not just for Virtual Machines, but it can also be an
> issue any time an image is cloned --- for example, in some kind of
> consumer electronic device.  Another question is that has to be
> considered is whether you trust that random-seed file hasn't been
> tampered with or read between it was written and when the system is
> next booted.  For example, if the "Targetted Access Organization" at
> NSA, or its equivalent at German BND, or Chinese MSS, etc., were to
> intercept a specific device, and read the random-seed file, they
> wouldn't need to make any changes to the devices (which might, after
> all, be detectable).  If the OS were to blindly trust the random-seed
> file as having entropy that can't be guessed by an adversary, this
> kind of attack becomes possible.
> 
> Now, should Debian care about this particular attack? 

I think some other questions should be considered first. Did Debian protect 
from these attacks in the past? The answer is clearly no. Now, should we break 
the systems of those people who keep their random-seed file secret and don't 
clone their OS image, in order to offer some protection to other people? This 
is really what we need to answer first, and in my opinion, we should try very 
hard not to break the systems of those users. And I see no other way than to 
credit the random seed file with entropy.

> If the kernel is only going to be used by a VM, you have to trust the
> Host OS provider, and if you're paranoid enough that you doubt Intel's
> ability to resist being suborned by the NSA, you're probably going to
> be even more concerned of the hosting/cloud provider from being in bed
> with the its local government authorities.  So what the default should
> be for Google's "Cloud Optimized OS" is pretty obvious.  The COS
> kernel trusts RDRAND, and this avoids any delays in the boot process
> waiting for the random number to be securely initialized --- because
> we trust RDRAND.

RDRAND is not the answer here, simply because not all architectures have it. 
Do Raspberry Pis have a HW-RNG? I am pretty sure that they don't. My 
cubietruck definitely does not. Therefore the question what to do with RDRAND 
is not related to the question above, as it does not prevent breaking people's 
systems.


> That being said, there are some thing we can do that can help
> regardless of what the default ends up being, and how we enable users
> or image installers to change the default.  For example, at least
> every day, or perhaps sooner (and maybe once an hour if the device is
> powered by the AC mains) the contents of the random-seed file should
> be refreshed.  The reason for that is that if the system has been up
> for weeks or month, and the user reboots the system by forcing power
> down or if the kernel crashes, or if the user is in too much of a
> hurry to wait for a clean shutdown sequence, and runs something like
> "echo b > /proc/sysrq-trigger", there is an increased chance that the
> random-seed file may have been snooped sometime in the past
> week/month/quarter.

One could also make it harder for an attacker to regenerate key material from 
a system where he knows the seed file. For example, if there is a RTC one could 
put the boot time and all serial numbers / MAC addresses that one can find into 
an expensive function like PBKDF2 or bcrypt and feed the result to the random 
seed. This way, even if the attacker has an approximate knowledge of most of 
that information, he would still need to spend quite a bit of computing power 
to get all the possible random seeds that could be used. If the number of 
rounds in the function depends on timing, like do as many rounds as possible 
in 1 second, t

Re: Handling of entropy during boot

2018-12-18 Thread Theodore Y. Ts'o
On Mon, Dec 17, 2018 at 09:46:42PM +0100, Stefan Fritsch wrote:
> 
> There is a random seed file stored by systemd-random-seed.service that saves 
> entropy from one boot and loads it again after the next reboot. The random 
> seed file is re-written immediately after the file is read, so the system not 
> properly shutting down won't cause the same seed file to be used again. The 
> problem is that systemd (and probably /etc/init.d/urandom, too) does not set 
> the flag that allows the kernel to credit the randomness and so the kernel 
> does 
> not know about the entropy contained in that file. Systemd upstream argues 
> that 
> this is supposed to protect against the same OS image being used many times 
> [3]. (More links to more discussion can be found at [4]).

This is an issue which Debian should be deciding more than systemd,
since the issues involved involve how the entire OS is packaged and
installed.  That being said, the issues involved are subtle.

The decision to not credit any randomness for the contents of
/var/lib/systemd/random-seed is definitely the conservative thing to
do.  One of the issues is indeed what happens if the OS image gets
reused.  And it's not just for Virtual Machines, but it can also be an
issue any time an image is cloned --- for example, in some kind of
consumer electronic device.  Another question is that has to be
considered is whether you trust that random-seed file hasn't been
tampered with or read between it was written and when the system is
next booted.  For example, if the "Targetted Access Organization" at
NSA, or its equivalent at German BND, or Chinese MSS, etc., were to
intercept a specific device, and read the random-seed file, they
wouldn't need to make any changes to the devices (which might, after
all, be detectable).  If the OS were to blindly trust the random-seed
file as having entropy that can't be guessed by an adversary, this
kind of attack becomes possible.

Now, should Debian care about this particular attack?  I suspect
people of good will could very well disagree.  There is a similar
issue with newer kernels which support the boot-command-line option
random.trust_cpu=on.  If you are firmly convinced that there is a good
chance that the NSA has suborned Intel in putting a backdoor into
RDRAND, you won't want to use that boot option.  But from the
perspective of the distro, especially one who is striving to be a
"Universal OS", how should you set this default?

If the kernel is only going to be used by a VM, you have to trust the
Host OS provider, and if you're paranoid enough that you doubt Intel's
ability to resist being suborned by the NSA, you're probably going to
be even more concerned of the hosting/cloud provider from being in bed
with the its local government authorities.  So what the default should
be for Google's "Cloud Optimized OS" is pretty obvious.  The COS
kernel trusts RDRAND, and this avoids any delays in the boot process
waiting for the random number to be securely initialized --- because
we trust RDRAND.

But for the Universal OS, it answer of whether we should blindly trust
the random-seed or RDRAND is not so easy.  I can construct scenarios
where we should obviously trust random-seed --- and scenarios where we
shouldn't.  And we could throw it up to the user, and ask them to
answer a question at installation time --- but most users probably
won't be equipped to be able to answer the question with full
understanding of the consequencs one way or another.

That being said, there are some thing we can do that can help
regardless of what the default ends up being, and how we enable users
or image installers to change the default.  For example, at least
every day, or perhaps sooner (and maybe once an hour if the device is
powered by the AC mains) the contents of the random-seed file should
be refreshed.  The reason for that is that if the system has been up
for weeks or month, and the user reboots the system by forcing power
down or if the kernel crashes, or if the user is in too much of a
hurry to wait for a clean shutdown sequence, and runs something like
"echo b > /proc/sysrq-trigger", there is an increased chance that the
random-seed file may have been snooped sometime in the past
week/month/quarter.

> A refinement of the random seed handling could be to check if the hostname/
> virtual machine-id is the same when saving the seed, and only credit the 
> entropy if it is unchanged since the last boot.

This is a good idea, but how you set the virtual machine-id is
very cloud/hosting provider specific.  Also, very often, in many cloud
environments, the hostname is not set until after the network is
brought up, since they end up querying the hostname for the VM via the
metadata server.

Also, for a kernel meant for a virtualization or cloud environment, my
recommendation is to use random.trust_cpu=on, or compile the kernel
with CONFIG_RANDOM_TRUST_CPU, which sets random.trust_cpu to be
defaulted to on.  Trusting RDRAND in a 

Handling of entropy during boot

2018-12-17 Thread Stefan Fritsch
Hi,

since the getrandom() system call is used more and more, there have been bugs 
that services that use it block for a long time at startup and/or get killed 
by systemd because they don't start fast enough [1, 2]

There is a random seed file stored by systemd-random-seed.service that saves 
entropy from one boot and loads it again after the next reboot. The random 
seed file is re-written immediately after the file is read, so the system not 
properly shutting down won't cause the same seed file to be used again. The 
problem is that systemd (and probably /etc/init.d/urandom, too) does not set 
the flag that allows the kernel to credit the randomness and so the kernel does 
not know about the entropy contained in that file. Systemd upstream argues that 
this is supposed to protect against the same OS image being used many times 
[3]. (More links to more discussion can be found at [4]).

But an identical OS image needs to be modified anyway in order to be secure 
(re-create ssh host keys, change root password, re-create ssl-cert's private 
keys, etc.). Injecting some entropy in some way is just another task that 
needs to be done for that use case.  So basically the current implementation 
of systemd-random-seed.service breaks stuff for everyone while not fixing the 
thing they are claiming to fix. Also, the breakage will cause people to invent 
their own workarounds which will probably create more security issues than 
those that are fixed by the systemd behavior. Therefore I think it should be 
the default to credit the entropy of the saved random seed when loading it, 
and the special needs of identical OS images used many times should be 
documented in the release notes. 

A refinement of the random seed handling could be to check if the hostname/
virtual machine-id is the same when saving the seed, and only credit the 
entropy if it is unchanged since the last boot.

In case that the random seed file is not present (or the hostname/machine-id 
check fails), services may still block for a long time until they start. To 
avoid that they are killed by systemd because of timeouts, there should be a 
oneshot service that waits for getrandom to unblock and that other services 
can use as a dependency. (This is not neccessary with /etc/init.d/urandom 
because there are no timeouts).

The systemd maintainers argue that individual services should handle this 
problem [1,2]. But this does not scale and the whole point of the getrandom() 
syscall is that it cannot fail and that its users do not need fallback code 
that is not well-tested and probably buggy. [5]

Cheers,
Stefan

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912087
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914297
[3] https://github.com/systemd/systemd/issues/4271
[4] 
https://daniel-lange.com/archives/152-Openssh-taking-minutes-to-become-available,-booting-takes-half-an-hour-...-because-your-server-waits-for-a-few-bytes-of-randomness.html
[5] https://lwn.net/Articles/605828/