urandom ever revert from the "good" to the "bad" state?

Donald Stufft Wed, 22 Jun 2016 20:03:04 -0700

> On Jun 22, 2016, at 10:40 PM, Tim Peters <[email protected]> wrote:
> 
> [Guido]
>> Before I can possibly start thinking about what to do when the system's
>> CSPRNG is initialized, I need to understand more about how it works.
>> Apparently there's a possible transition from the "not ready yet" ("bad")
>> state to "ready" ("good"), and all it takes is usually waiting for a second
>> or two. But is this a wait that only gets incurred once, somewhere early
>> after a boot, or is this something that can happen at any time?
> 
> [Donald Stufft]
>> Once, only after boot. On most (all?) modern Linux systems there’s even part
>> of the boot process that attempts to seed the CSPRNG using random values
>> stored during a previous boot to shorten the time window between when it’s
>> ready and when it’s not yet initialized. However, once it is initialized it
>> will never block (or EAGAIN) again.
> 
> Donald, at the end you're talking about how getrandom() behaves -
> /dev/urandom on Linux never blocks, as I understand it (but there's no
> advertised way to tell when /dev/urandom enters the "good" state).


Yes sorry, Guido asked about the system CSPRNG, in Linux there are three
(previously two) basic interfaces to the same CSPRNG:

/dev/urandom
  - This will never block, but until it gathers enough entropy in the boot
    process it will silently return data that is not cryptographically secure.
    Essentially, predictably random, however to what degree it is predictable
    depends on a lot of factors. As far as I am aware, there is no practical
    way to determine “given a read of /dev/urandom did I get ‘good’ or ‘bad’
    data out of it”.

/dev/random
   - This will randomly block whenever the kernel thinks that the entropy is
     “running low”. All security experts I’m aware of with maybe the exception
     of Ted (I don’t know how he feels about this) believe that this action of
     counting entropy is pure bollocks and that /dev/random randomly blocking
     because it thinks the entropy is low achieves nothing except to hurt the
     performance of things that need randomness at runtime.

And on newer kernels there is the getrandom() sys call which has flags that
enable three different mode of operations:

getrandom(0)
   - This will block until the same “pool” of entropy that /dev/urandom uses
     has been initialized once, at boot, and then it will never block again.

getrandom(GRND_NONBLOCK)
   - This will return a -1 and set errno to EAGAIN if the same pool of entropy
     that /dev/urandom uses has not been initialized, and will otherwise always
     return data. This is essentially the same as getrandom(0) except instead
     of blocking it returns an error.

getrandom(GRND_RANDOM)
   - This is basically just a syscall interface to /dev/random and it doesn’t
     meaningfully deviate from what /dev/random does, except not require a file
     descriptor to use it.

This getrandom() interface is the newer way to access these two types of random
and I think it is important to notice that this newer interface does *not* have
a way to get “sometimes a CSPRNG, sometimes not” data out of it like 
/dev/urandom
does. This newer interface promises that you’ll always get cryptographically 
secure
random and it will either block until it can do that or will EAGAIN to let you
take some other action instead of relying on a CSPRNG if that suits your 
application.



> 
> 
> [Guido]
>> Then shouldn't it be the responsibility of the boot sequence rather than
>> of the Python stdlib to wait for that event? IIUC that's what OS X
>> does (I think someone described that it even kernel-panics when it can't
>> enter the "good" state).
> 
> The rub is that sometimes Python is running soooo early in the boot
> sequence in these rare Linux cases.  That's said to be impossible on
> OS X (or Windows).

Yes, once the system has booted and initialized then all forms of accessing the
/dev/urandom pool (/dev/urandom, getrandom(0), getrandom(GRND_NONBLOCK)) 
function
basically the same (plus or minus a file descriptor). The problem comes in a few
flavors but really they all boil down to the same thing: Code that is calling
os.urandom() prior to the /dev/urandom CSPRNG being initialized.

The primary case this will happen is code that is called early on in the boot
sequence prior to pid 0 initializing the urandom CSPRNG from random data saved
in the previous boot [1]. There are other cases this could happen though, like
embedded Linux systems or RaspberryPi’s or the like that don’t have great 
sources
of hardware entropy that will make it so the initialization of the CSPRNG will
take a longer period of time. This is particularly true on systems that don’t
(currently) have an active network connection since Networking is one of the 
better
sources of randomness that the kernel can use to seed these values with.

[1] This is basically what caused the initial report, systemd-cron was a Python
    script and the SipHash for the dictionary hash randomization was calling
    os.urandom to seed itself. However this particular thing isn’t being asked
    to be made blocking (or an error). As far as I know, most everyone agrees
    that for SipHash’s purpose it’s reasonable fine to fall back to an insecure
    source of random if a secure source isn’t available at the moment. What the
    security side wants is for people explicitly calling os.urandom (directly or
    indirectly) as part of the execute of their Python program to always get
    secure random if the platform we are on provides a reasonable interface to
    get access to it (e.g. /dev/random is not a reasonable interface, but
    getrandom() is).

—
Donald Stufft



_______________________________________________
Security-SIG mailing list
[email protected]
https://mail.python.org/mailman/listinfo/security-sig

Re: [Security-sig] Can /dev/urandom ever revert from the "good" to the "bad" state?

Reply via email to