Nick expressed:

The *actual bug* that triggered this latest firestorm of commentary
(from experts and non-experts alike) had *nothing* to do with user
code calling os.urandom, and instead was a combination of:

- CPython startup requesting cryptographically secure randomness when
it didn't need it
- a systemd init script written in Python running before the kernel
RNG was fully initialised

That created a deadlock between CPython startup and the rest of the
Linux init process, so the latter only continued when the systemd
watchdog timed out and killed the offending script. As others have
noted, this kind of deadlock scenario is generally impossible on other
operating systems, as the operating system doesn't provide a way to
run Python code before the random number generator is ready.

The change Victor made in 3.5.2 to fall back to reading /dev/urandom
directly if the getrandom() syscall returns EAGAIN (effectively
reverting to the Python 3.4 behaviour) was the simplest possible fix
for that problem (and an approach I thoroughly endorse, both for 3.5.2
and for the life of the 3.5 series), but that doesn't make it the
right answer for 3.6+.

To repeat: the problem encountered was NOT due to user code calling
os.urandom(), but rather due to the way CPython initialises its own
internal hash algorithm at interpreter startup. However, due to the
way CPython is currently implemented, fixing the regression in that
not only changed the behaviour of CPython startup, it *also* changed
the behaviour of every call to os.urandom() in Python 3.5.2+.

For 3.6+, we can instead make it so that the only things that actually
rely on cryptographic quality randomness being available are:

- calling a secrets module API
- calling a random.SystemRandom method
- calling os.urandom directly

These are all APIs that were either created specifically for use in
security sensitive situations (secrets module), or have long been
documented (both within our own documentation, and in third party
documentation, books and Q&A sites) as being an appropriate choice for
use in security sensitive situations (os.urandom and
random.SystemRandom).

However, we don't need to make those block waiting for randomness to
be available - we can update them to raise BlockingIOError instead
(which makes it trivial for people to decide for themselves how they
want to handle that case).

Along with that change, we can make it so that starting the
interpreter will never block waiting for cryptographic randomness to
be available (since it doesn't need it), and importing the random
module won't block waiting for it either.

To the best of our knowledge, on all operating systems other than
Linux, encountering the new exception will still be impossible in
practice, as there is no known opportunity to run Python code before
the kernel random number generator is ready.

On Linux, init scripts may still run before the kernel random number
generator is ready, but will now throw an immediate BlockingIOError if
they access an API that relies on crytographic randomness being
available, rather than potentially deadlocking the init process. Folks
encountering that situation will then need to make an explicit
decision:

- loop until the exception is no longer thrown
- switch to reading from /dev/urandom directly instead of calling os.urandom()
- switch to using a cross-platform non-cryptographic API (probably the
random module)

Victor has some additional technical details written up at
http://haypo-notes.readthedocs.io/pep_random.html and I'd be happy to
formalise this proposed approach as a PEP (the current reference is
http://bugs.python.org/issue27282 )

and Nathaniel added:

I'd make two additional suggestions:

- one person did chime in on the thread to say that they've used
os.urandom for non-security-sensitive purposes, simply because it
provided a convenient "give me a random byte-string" API that is
missing from random. I think we should go ahead and add a .randbytes
method to random.Random that simply returns a random bytestring using
the regular RNG, to give these users a nice drop-in replacement for
os.urandom.

Rationale: I don't think the existence of these users should block
making os.urandom appropriate for generating secrets, because (1) a
glance at github shows that this is very unusual -- if you skim
through this search you get page after page of functions with names
like "generate_secret_key"

  
https://github.com/search?l=python&p=2&q=urandom&ref=searchresults&type=Code&utf8=%E2%9C%93

and (2) for the minority of people who are using os.urandom for
non-security-sensitive purposes, if they find os.urandom raising an
error, then this is just a regular bug that they will notice
immediately and fix, and anyway it's basically never going to happen.
(As far as we can tell, this has never yet happened in the wild, even
once.) OTOH if os.urandom is allowed to fail silently, then people who
are using it to generate secrets will get silent catastrophic
failures, plus those users can't assume it will never happen because
they have to worry about active attackers trying to drive systems into
unusual states. So I'd much rather ask the non-security-sensitive
users to switch to using something in random, than force the
cryptographic users to switch to using secrets. But it does seem like
it would be good to give those non-security-sensitive users something
to switch to .

- It's not exactly true that the Python interpreter doesn't need
cryptographic randomness to initialize SipHash -- it's more that
*some* Python invocations need unguessable randomness (to first
approximation: all those which are exposed to hostile input), and some
don't. And since the Python interpreter has no idea which case it's
in, and since it's unacceptable for it to break invocations that don't
need unguessable hashes, then it has to err on the side of continuing
without randomness. All that's fine.

But, given that the interpreter doesn't know which state it's in,
there's also the possibility that this invocation *will* be exposed to
hostile input, and the 3.5.2+ behavior gives absolutely no warning
that this is what's happening. So instead of letting this potential
error pass silently, I propose that if SipHash fails to acquire real
randomness at startup, then it should issue a warning. In practice,
this will almost never happen. But in the rare cases it does, it at
least gives the user a fighting chance to realize that their system is
in a potentially dangerous state. And by using the warnings module, we
automatically get quite a bit of flexibility. If some particular
invocation (e.g. systemd-cron) has audited their code and decided that
they don't care about this issue, they can make the message go away:

   PYTHONWARNINGS=ignore::NoEntropyAtStartupWarning

OTOH if some particular invocation knows that they do process
potentially hostile input early on (e.g. cloud-init, maybe?), then
they can explicitly promote the warning to an error:

  PYTHONWARNINGS=error::NoEntropyAtStartupWarning

(I guess the way to implement this would be for the SipHash
initialization code -- which runs very early -- to set some flag, and
then we expose that flag in sys._something, and later in the startup
sequence check for it after the warnings module is functional.
Exposing the flag at the Python level would also make it possible for
code like cloud-init to do its own explicit check and respond
appropriately.)

Victor, does your PEP differ from these proposals? (my apologies for my lack of time at the moment).

--
~Ethan~
_______________________________________________
Security-SIG mailing list
Security-SIG@python.org
https://mail.python.org/mailman/listinfo/security-sig

Reply via email to