Re: [Python-Dev] Status of the fix for the hash collision vulnerability

Glenn Linderman Fri, 13 Jan 2012 18:11:42 -0800

On 1/13/2012 5:35 PM, Victor Stinner wrote:

- Glenn Linderman proposes to fix the vulnerability by adding a new
"safe" dict type (only accepting string keys). His proof-of-concept
(SafeDict.py) uses a secret of 64 random bits and uses it to compute
the hash of a key.

We could mix Marc's collision counter with SafeDict idea (being able
to use a different secret for each dict): use hash(key, secret)
(simple example: hash(secret+key)) instead of hash(key) in dict (and
set), and change the secret if we have more than N collisions. But it
would slow down all dict lookup (dict creation, get, set, del, ...).
And getting new random data can also be slow.


SafeDict and hash(secret+key) lose the benefit of the cached hash
result. Because the hash result depends on a argument, we cannot cache
the result anymore, and we have to recompute the hash for each lookup
(even if you lookup the same key twice ore more).

Victor

So integrating SafeDict into dict so it could be automatically convertedwould mean changing the data structures underneath dict. Given that, atechnique for hash caching could be created, that isn't quite as good asthe one in place, but may be less expensive than not caching thehashes. It would also take more space, a second dict, internally, aswell as the secret.

So once the collision counter reaches some threshold (since there wouldbe a functional fallback, it could be much lower than 1000), the secretis obtained, and the keys are rehashed using hash(secret+key). Now whenlookups occur, the object id of the key and the hash of the key are usedas the index and hash(secret+key) is stored as a cached value. Thiswould only benefit lookups by the same object, other objects with thesame key value would be recalculated (at least the first time). Somelimit on the number of cached values would probably be appropriate.This would add complexity, of course, in trying to save time.

An alternate solution would be to convert a dict to a tree once thenumber of collisions produces poor performance. Converting to a treewould result in O(log N) instead of O(1) lookup performance, but that isbetter than the degenerate case of O(N) which is produced by theexcessive number of collisions resulting from an attack. This wouldrequire new tree code to be included in the core, of course, probably ared-black tree, which stays balanced.

In either of these cases, the conversion is expensive, because acollision threshold must first be reached to determine the need forconversion, so the hash could already contain lots of data. If it weretoo expensive, the attack could still be effective.

Another solution would be to change the collision code, so thatcolliding keys don't produce O(N) behavior, but some other behavior.Each colliding entry could convert that entry to a tree of entries,perhaps. This would require no conversion of "bad dicts", and an attackcould at worst convert O(1) performance to O(log N).

Clearly these ideas are more complex than adding randomization, butadding randomization doesn't seem to be produce immunity from attack,when data about the randomness is leaked.

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Status of the fix for the hash collision vulnerability

Reply via email to