Larry Hastings added the comment:

Everybody: let's drop discussing "hashlib" unless someone says it actually is a 
problem.  I think it was always, as we say in English, a "red herring".


> The secret for SipHash is composed of two 64bit integers. The entire 
> _Py_HashSecret_t struct is 24 bytes. The remaining 8 bytes are used for XML 
> hash randomization of libexpat. Only the manual seed with PYTHONHASHSEED is a 
> 32bit integer which is stretched to 24 bytes with a LCG.

Okay, I have misunderstood the code.  Have I misunderstood the strength of 
SipHash?  Is it regarded as "cryptographically secure"?

Predictability of the hash function on web servers was the original use case of 
the "hash seed"; I remember a demonstration of an attack where the attacker 
produced pathologically bad hash behavior on a Python-based web server with 
very little data.  So it seems like web servers running on cloud instances is 
exactly the sort of use case where we'd want less-predictable hashing.

--

Nevertheless, a 90 second startup time is simply unacceptable.  I am officially 
making a pronouncement as Release Manager: Python 3.5 *must not* take 90 
seconds to start up under *any* circumstances.  I view this as a performance 
regression, and it is and will remain a release blocker for 3.5.2.

Python *must not* require special command-line flags to avoid a 90 second 
startup time.  Python *must not* require a special environment-variable to 
avoid a 90 second startup time.  This is no longer open to debate, and I will 
only be overruled by Guido.

--

If I understand the technical issues correctly, here's how I expect it to work. 
 For seeding the hash randomization, and seeding the _inst in the random 
module, we will use getrandom() in a non-blocking way (GRND_NONBLOCK?).  If it 
succeeds, we use those bits.  If it fails because it would have blocked 
(EAGAIN?), we fall back to a less-random source of random bits.  Under no 
circumstances will Python block when seeding the hash randomization function or 
seeding the MT for the random module.

This means cloud instances may inadvertently use lower-quality hash 
randomization seeds.  I judge this as obviously better than cloud instances 
taking 90 seconds to start up.  Also, as Christian points out, the people 
running these cloud instances should be managing their entropy pools anyway.  
Additionally, there are many uses of cloud instances that aren't exposed to 
tainted data that permit these predictable-hash abuses.

--

As a final note, let me steer you towards this comment in Python/random.c:

/* Issue #25003: Don' use getentropy() on Solaris (available since
 * Solaris 11.3), it is blocking whereas os.urandom() should not block. */

Yes: we already had this discussion for Solaris, nine months ago, on issue 
#25003.  Both Guido and Tim Peters were involved in the discussion.  The 
decision there: use lower-quality random bits to seed the MT when importing the 
random module.  Keeping the slowdown was so obviously wrong it wasn't even 
debated.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26839>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to