Hi,

I completed my PEP. Here is a second version of my PEP. Changes:

* I added new sections:

   - The bug
   - Use Cases
   - Fix system urandom
   - Denial-of-service when reading random

* I added alternatives:

  - Leave os.urandom() unchanged, add os.getrandom()
  - Raise BlockingIOError in os.urandom()
  - Add an optional block parameter to os.urandom()

I added 3 sections to try to describe the context of "the bug". For
example, I think that it's important to mention that all operating
systems loads entropy from the disk at the boot.

For me, the last tricky question is the use case 2 (run a web server)
on a VM or embedded when system urandom is not initialized yet and
there is no entropy on disk yet (ex: first boot, or maybe second boot,
of a VM).

I read quickly that a VM connected to a network should be able to
quickly initialized the system urandom. So I'm not sure that the use
case 2 (web server) is really an issue in practice.

Victor


HTML version:
https://haypo-notes.readthedocs.io/pep_random.html



++++++++++++++++++++++++++++++++++++++++
PEP: Make os.urandom() blocking on Linux
++++++++++++++++++++++++++++++++++++++++

Headers::

    PEP: xxx
    Title: Make os.urandom() blocking on Linux
    Version: $Revision$
    Last-Modified: $Date$
    Author: Victor Stinner <victor.stin...@gmail.com>
    Status: Draft
    Type: Standards Track
    Content-Type: text/x-rst
    Created: 20-June-2016
    Python-Version: 3.6


Abstract
========

Modify ``os.urandom()`` to block on Linux 3.17 and newer until the OS
urandom is initialized.


The bug
=======

Python 3.5.0 was enhanced to use the new ``getrandom()`` syscall
introduced in Linux 3.17 and Solaris 11.3. The problem is that users
started to complain that Python 3.5 blocks at startup on Linux in
virtual machines and embedded devices: see issues `#25420
<http://bugs.python.org/issue25420>`_ and `#26839
<http://bugs.python.org/issue26839>`_.

On Linux, ``getrandom(0)`` blocks until the kernel initialized urandom
with 128 bits of entropy. The issue #25420 describes a Linux build
platform blocking at ``import random``. The issue #26839 describes a
short Python script used to compute a MD5 hash, systemd-cron, script
called very early in the init process. The system initialization blocks
on this script which blocks on ``getrandom(0)`` to initialize Python.

The Python initilization requires random bytes to implement a
counter-measure against the hash denial-of-service (hash DoS), see:

* `Issue #13703: Hash collision security issue
  <http://bugs.python.org/issue13703>`_
* `PEP 456: Secure and interchangeable hash algorithm
  <https://www.python.org/dev/peps/pep-0456/>`_

Importing the ``random`` module creates an instance of
``random.Random``: ``random._inst``. On Python 3.5, random.Random
constructor reads 2500 bytes from ``os.urandom()`` to seed a Mersenne
Twister RNG (random number generator).

Other platforms may be affected by this bug, but in practice, only Linux
systems use Python scripts to initialize the system.


Use Cases
=========

The following use cases are used to help to choose the right compromise
between security and practicability.


Use Case 1: init script
-----------------------

Use a Python 3 script to initialize the system, like systemd-cron. If
the script blocks, the system initialize is stuck too.

The issue #26839 is a good example of this use case.


Use Case 2: web server
----------------------

Run a Python 3 web server serving web pages using HTTP and HTTPS
protocols. The server is started as soon as possible.

The first target of the hash DoS attack was web server: it's important
that the hash secret cannot be easily guessed by an attacker.

If serving a web page needs a secret to create a cookie, create an
encryption key, ..., the secret must be created with good entropy:
again, it must be hard to guess the secret.

A web server requires security. If a choice must be made between
security and running the server with weak entropy, security is more
important. If there is no good entropy: the server must block or fail
with an error.

The question is if it makes sense to start a web server on a host before
system urandom is initialized.

The issues #25420 and #26839 are restricted to the Python startup, not
to generate a secret before the system urandom is initialized.


Fix system urandom
==================

Load entropy from disk at boot
-------------------------------

Collecting entropy can take several minutes. To accelerate the system
initialization, operating systems store entropy on disk at shutdown, and
then reload entropy from disk at the boot.

If a system collects enough entropy at least once, the system urandom
will be initialized quickly, as soon as the entropy is reloaded from
disk.


Virtual machines
----------------

Virtual machines don't have a direct access to the hardware and so have
less sources of entropy than bare metal. A solution is to add a
`virtio-rng device
<https://fedoraproject.org/wiki/Features/Virtio_RNG>`_ to pass entropy
from the host to the virtual machine.


Embedded devices
----------------

A solution for embedded devices is to plug an hardware RNG.

For example, Raspberry Pi have an hardware RNG but it's not used by
default. See: `Hardware RNG on Raspberry Pi
<http://fios.sector16.net/hardware-rng-on-raspberry-pi/>`_.



Denial-of-service when reading random
=====================================

The ``/dev/random`` device should only used for very specific use cases.
Reading from ``/dev/random`` on Linux is likely to block. Users don't
like when an application blocks longer than 5 seconds to generate a
secret. It is only expected for specific cases like generating
explicitly an encryption key.

When the system has no available entropy, choosing between blocking
until entropy is available or falling back on lower quality entropy is a
matter of compromise between security and practicability. The choice
depends on the use case.

On Linux, ``/dev/urandom`` is secure, it should be used instead of
``/dev/random``:

* `Myths about /dev/urandom <http://www.2uo.de/myths-about-urandom/>`_
  by Thomas Hühn: "Fact: /dev/urandom is the preferred source of
  cryptographic randomness on UNIX-like systems"



Rationale
=========

On Linux, reading the ``/dev/urandom`` can return "weak" entropy before
urandom is fully initialized, before the kernel collected 128 bits of
entropy. Linux 3.17 adds a new ``getrandom()`` syscall which allows to
block until urandom is initialized.

On Python 3.5.2, os.urandom() uses the ``getrandom(GRND_NONBLOCK)``, but
falls back on reading the non-blocking ``/dev/urandom`` if
``getrandom(GRND_NONBLOCK)`` fails with ``EAGAIN``.

Security experts promotes ``os.urandom()`` to genereate cryptographic
keys. By the way, ``os.urandom()`` is preferred over
``ssl.RAND_bytes()`` for different reasons.

This PEP proposes to modify os.urandom() to use ``getrandom()`` in
blocking mode to not return weak entropy, but also ensure that Python
will not block at startup.


Changes
=======

All changes described in this section are specific to the Linux
platform.

* Initialize hash secret from non-blocking system urandom
* Initialize ``random._inst`` with non-blocking system urandom
* Modify os.urandom() to block (until system urandom is initialized)

A new ``_PyOS_URandom_Nonblocking()`` private method is added: try to
call ``getrandom(GRND_NONBLOCK)``, but falls back on reading
``/dev/urandom`` if it fails with ``EAGAIN``.

``_PyRandom_Init()`` is modified to call
``_PyOS_URandom_Nonblocking()``.  Moreover, a new ``random_inst_seed``
field is added to the ``_Py_HashSecret_t`` structure.

``random._inst`` (an instance of ``random.Random``) is initialized with
the new ``random_inst_seed`` secret. A ("fuse") flag is used to ensure
that this secret is only used once.

If a second instance of random.Random is created, blocking
``os.urandom()`` is used.

``os.urandom()`` (C function ``_PyOS_URandom()``) is modified to always
call ``getrandom(0)`` (blocking mode).



Alternative
===========

Never use blocking urandom in the random module
-----------------------------------------------

The random module can use ``random_inst_seed`` as a seed, but add other
sources of entropy like the process identifier (``os.getpid()``), the
current time (``time.time()``), memory addresses, etc.

Reading 2500 bytes from os.urandom() to initialize the Mersenne Twister
RNG in random.Random is a deliberate choice to get access to the full
range of the RNG. This PEP is a compromise between "security" and
"feature". Python should not block at startup before the OS collected
enough entropy. But on the regular use case (system urandom
iniitalized), the random module should continue to its code to
initialize the seed.

Python 3.5.0 was blocked on ``import random``, not on building a second
instance of ``random.Random``.


Leave os.urandom() unchanged, add os.getrandom()
------------------------------------------------

os.urandom() remains unchanged: never block, but it can return weak
entropy if system urandom is not initialized yet.

A new ``os.getrandom()`` function is added: thin wrapper to the
``getrandom()`` syscall.

Expected usage to write portable code::

    def my_random(n):
        if hasattr(os, 'getrandom'):
            return os.getrandom(n, 0)
        return os.urandom(n)

The problem with this change is that it expects that users understand
well security and know well each platforms. Python has the tradition of
hiding "implementation details". For example, ``os.urandom()`` is not a
thin wrapper to the ``/dev/urandom`` device: it uses
``CryptGenRandom()`` on Windows, it uses ``getentropy()`` on OpenBSD, it
tries ``getrandom()`` on Linux and Solaris or falls back on reading
``/dev/urandom``. Python already uses the best available system RNG
depending on the platform.

This PEP does not change the API which didn't change since the creation
of Python:

* ``os.urandom()``, ``random.SystemRandom`` and ``secrets`` for security
* ``random`` module (except ``random.SystemRandom``) for all other usages


Raise BlockingIOError in os.urandom()
-------------------------------------

This idea was proposed as a compromise to let developers decide themself
how to handle the case:

* catch the exception and uses another weaker entropy source: read
  ``/dev/urandom`` on Linux, the Python ``random`` module (which is not
  secure at all), time, process identifier, etc.
* don't catch the error, the whole program fails with this fatal
  exception

First of all, no user complained yet that ``os.urandom()`` blocks. This
point is currently theorical. The Python issues #25420 and #26839 were
restricted to the Python startup: users complained that Python was
blocked at startup.

Even if reading /dev/urandom block on OpenBSD, FreeBSD, Mac OS X, etc.
until urandom is initialized, no user complained yet because Python is
not used in the process initializing the system and /dev/urandom is
quickly initialized.  It looks like only Linux users hit the problem on
virtual machines or embedded devices, and only in some short Python
scripts used to initialize the the system. Again, ``os.urandom()`` is
not used in such script (at least, not yet).

As `Leave os.urandom() unchanged, add os.getrandom()`_, the problem is
that it makes the API more complex and so more error-prone.


Add an optional block parameter to os.urandom()
-----------------------------------------------

Add an optional block parameter to os.urandom(). The default value may
be ``True`` (block by default) or ``False`` (non-blocking).

The first technical issue is to implement ``os.urandom(block=False)`` on
all platforms. On Linux 3.17 and newer has a well defined non-blocking
API.

See the `issue #27250: Add os.urandom_block()
<http://bugs.python.org/issue27250>`_.

As `Raise BlockingIOError in os.urandom()`_, it doesn't seem worth it to
make the API more complex for a theorical (or at least very rare) use
case.

As `Leave os.urandom() unchanged, add os.getrandom()`_, the problem is
that it makes the API more complex and so more error-prone.





Annexes
=======

Operating system random functions
---------------------------------

``os.urandom()`` uses the following functions:

* OpenBSD: `getentropy()
  <http://man.openbsd.org/OpenBSD-current/man2/getentropy.2>`_
  (OpenBSD 5.6)
* Linux: `getrandom()
  <http://man7.org/linux/man-pages/man2/getrandom.2.html>`_ (Linux 3.17)
  -- see also `A system call for random numbers: getrandom()
  <https://lwn.net/Articles/606141/>`_
* Solaris: `getentropy()
  
<https://docs.oracle.com/cd/E53394_01/html/E54765/getentropy-2.html#scrolltoc>`_,
  `getrandom()
  <https://docs.oracle.com/cd/E53394_01/html/E54765/getrandom-2.html>`_
  (both need Solaris 11.3)
* Windows: `CryptGenRandom()
  
<https://msdn.microsoft.com/en-us/library/windows/desktop/aa379942%28v=vs.85%29.aspx>`_
  (Windows XP)
* UNIX, BSD: /dev/urandom, /dev/random
* OpenBSD: /dev/srandom

On Linux, commands to get the status of ``/dev/random`` (results are
number of bytes)::

    $ cat /proc/sys/kernel/random/entropy_avail
    2850
    $ cat /proc/sys/kernel/random/poolsize
    4096

Why using os.urandom()?
-----------------------

Since ``os.urandom()`` is implemented in the kernel, it doesn't have
some issues of user-space RNG. For example, it is much harder to get its
state. It is usually built on a CSPRNG, so even if its state is get, it
is hard to compute previously generated numbers. The kernel has a good
knowledge of entropy sources and feed regulary the entropy pool.


Links
=====

* `Cryptographically secure pseudo-random number generator (CSPRNG)
  
<https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator>`_


Copyright
=========

This document has been placed in the public domain.
_______________________________________________
Security-SIG mailing list
Security-SIG@python.org
https://mail.python.org/mailman/listinfo/security-sig

Reply via email to