Bug#897572: urandom hang in early boot

2018-05-09 Thread Yves-Alexis Perez
On Tue, 2018-05-08 at 13:23 +0200, Bjørn Mork wrote:
> And sd_id128_randomize() is called from all over the place.  I haven't
> bothered looking at all the call sites, but would be surprised if not at
> least one of them is unconditionally called at boot.
> 
> If I am correct, then I guess this is a systemd bug?

It might be (I took the liberty to add your findings to https://github.com/sys
temd/systemd/issues/4167).

In our case, as far as I can tell since we're still in the initramfs, systemd
is not yet PID1, but we do use udev which might have the same issue.

Regards,
-- 
Yves-Alexis

signature.asc
Description: This is a digitally signed message part


Bug#897572: urandom hang in early boot

2018-05-08 Thread Bjørn Mork
Ben Hutchings  writes:
> On Tue, 2018-05-08 at 11:12 +1200, Ben Caradoc-Davies wrote:
>> On 08/05/18 05:34, Laurent Bigonville wrote:
>> > Apparently it's also happening for other applications that are starting 
>> > later during the boot like GDM.
>> > Somebody has reported an issue on IRC where GDM was taking upto 8 
>> > minutes to start (dmesg was showing several "random: systemd: 
>> > uninitialized urandom read (16 bytes read)" during boot)
>> > That problem might impact lot of people I'm afraid.
>> 
>> systemd is the underlying cause: plymouthd uses libudev1, which expects 
>> getrandom/urandom(?) to never block:
>> https://github.com/systemd/systemd/blob/master/src/basic/random-util.c#L34
>> 
>> See discussion here about systemd usage of random numbers:
>> systemd reads from urandom before initialization
>> https://github.com/systemd/systemd/issues/4167
>> 
>> The new problem is that 43838a23a05f ("random: fix crng_ready() test") 
>> turns an ugly warning and cryptographic weakness into an indefinite 
>> hang. Security achieved!
>
> You keep saying this, but based on my reading of the code I don't see
> how reads from /dev/urandom can end up blocking.

It's a bit convoluted, but if I read the code correctly then
acquire_random_bytes() falls back to busy-loop reading from /dev/urandom
until it has the requested number of bytes if 'high_quality_required' is
true.

There aren't more than two such calls, but one of then is
sd_id128_randomize() which calls acquire_random_bytes(, sizeof t, true).

And sd_id128_randomize() is called from all over the place.  I haven't
bothered looking at all the call sites, but would be surprised if not at
least one of them is unconditionally called at boot.

If I am correct, then I guess this is a systemd bug?


Bjørn



Bug#897572: urandom hang in early boot

2018-05-07 Thread Ben Caradoc-Davies

On 08/05/18 15:55, Ben Caradoc-Davies wrote:
If something calls getrandom without GRND_NONBLOCK while crng_init==1 
(during early boot)
I now have conclusive evidence that this is the cause of the hang. If I 
add a printk:


diff --git a/drivers/char/random.c b/drivers/char/random.c
index cd888d4ee605..b7358cc32f42 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2021,6 +2021,9 @@ SYSCALL_DEFINE3(getrandom, char __user *, buf, 
size_t, count,

if (!crng_ready()) {
if (flags & GRND_NONBLOCK)
return -EAGAIN;
+   printk(KERN_NOTICE "random: %s: getrandom without "
+   "GRND_NONBLOCK while crng not ready\n",
+   current->comm);
ret = wait_for_random_bytes();
if (unlikely(ret))
return ret;

I get these at the console just before the hang:

random: plymouthd: uninitialized urandom read (8 bytes read)
random: plymouthd: uninitialized urandom read (8 bytes read)
random: plymouthd: getrandom without GRND_NONBLOCK while crng not ready

Kind regards,

--
Ben Caradoc-Davies 
Director
Transient Software Limited 
New Zealand



Bug#897572: urandom hang in early boot

2018-05-07 Thread Ben Caradoc-Davies

On 08/05/18 14:00, Ben Hutchings wrote:

You keep saying this, but based on my reading of the code I don't see
how reads from /dev/urandom can end up blocking.


Ben, I think you are right. I have picked through the code in detail and 
none of the changes affect any substantive logic (except logging). I do 
not think urandom_read can ever block. The urandom warning may be from a 
previous read before the hang: related, but a red herring.


The *one* substantive change that is affected is getrandom:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/char/random.c#n2007

If something calls getrandom without GRND_NONBLOCK while crng_init==1 
(during early boot):


- Before 43838a23a05f ("random: fix crng_ready() test"), this just falls 
thorough to urandom_read and everything seems to work (but is not 
cryptographically secure).


- After 43838a23a05f ("random: fix crng_ready() test"), this will call 
wait_for_random_bytes and hang waiting on mouse wiggles 
(cryptographically secure).


But what is calling getrandom without GRND_NONBLOCK? I could find 
nothing in the plymouth or systemd/udev codebase. Or is it something 
they spawn? I even read the plymouth softwaves.script.


Kind regards,

--
Ben Caradoc-Davies 
Director
Transient Software Limited 
New Zealand



Bug#897572: urandom hang in early boot

2018-05-07 Thread Ben Hutchings
On Tue, 2018-05-08 at 11:12 +1200, Ben Caradoc-Davies wrote:
> On 08/05/18 05:34, Laurent Bigonville wrote:
> > Apparently it's also happening for other applications that are starting 
> > later during the boot like GDM.
> > Somebody has reported an issue on IRC where GDM was taking upto 8 
> > minutes to start (dmesg was showing several "random: systemd: 
> > uninitialized urandom read (16 bytes read)" during boot)
> > That problem might impact lot of people I'm afraid.
> 
> systemd is the underlying cause: plymouthd uses libudev1, which expects 
> getrandom/urandom(?) to never block:
> https://github.com/systemd/systemd/blob/master/src/basic/random-util.c#L34
> 
> See discussion here about systemd usage of random numbers:
> systemd reads from urandom before initialization
> https://github.com/systemd/systemd/issues/4167
> 
> The new problem is that 43838a23a05f ("random: fix crng_ready() test") 
> turns an ugly warning and cryptographic weakness into an indefinite 
> hang. Security achieved!

You keep saying this, but based on my reading of the code I don't see
how reads from /dev/urandom can end up blocking.

(For the time being I've concentrated on fixing stretch, so I haven't
done substantial testing in unstable.)

Ben.

-- 
Ben Hutchings
Never attribute to conspiracy what can adequately be explained
by stupidity.



signature.asc
Description: This is a digitally signed message part


Bug#897572: urandom hang in early boot

2018-05-07 Thread Ben Caradoc-Davies

On 08/05/18 05:34, Laurent Bigonville wrote:
Apparently it's also happening for other applications that are starting 
later during the boot like GDM.
Somebody has reported an issue on IRC where GDM was taking upto 8 
minutes to start (dmesg was showing several "random: systemd: 
uninitialized urandom read (16 bytes read)" during boot)

That problem might impact lot of people I'm afraid.


systemd is the underlying cause: plymouthd uses libudev1, which expects 
getrandom/urandom(?) to never block:

https://github.com/systemd/systemd/blob/master/src/basic/random-util.c#L34

See discussion here about systemd usage of random numbers:
systemd reads from urandom before initialization
https://github.com/systemd/systemd/issues/4167

The new problem is that 43838a23a05f ("random: fix crng_ready() test") 
turns an ugly warning and cryptographic weakness into an indefinite 
hang. Security achieved!


Kind regards,

--
Ben Caradoc-Davies 
Director
Transient Software Limited 
New Zealand



Bug#897572: urandom hang in early boot

2018-05-07 Thread Laurent Bigonville

Hello,

Apparently it's also happening for other applications that are starting 
later during the boot like GDM.


Somebody has reported an issue on IRC where GDM was taking upto 8 
minutes to start (dmesg was showing several "random: systemd: 
uninitialized urandom read (16 bytes read)" during boot)


That problem might impact lot of people I'm afraid.

Installing rng-tools5 seems to help in that case.