Re: [Python-Dev] Hash collision security issue (now public)

Victor Stinner Fri, 30 Dec 2011 18:24:24 -0800

Le 29/12/2011 02:28, Michael Foord a écrit :

A paper (well, presentation) has been published highlighting security problems 
with the hashing algorithm (exploiting collisions) in many programming 
languages Python included:


         
http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

This PDF doesn't explain exactly the problem and how it can be solved.Let's try to summarize this "vulnerability".

The creation of a Python dictionary has a complexity of O(n) in mostcases, but O(n^2) in the *worst* case. The attack tries to go into theworst case. It requires to compute a set of N keys having the same hashvalue (hash(key1) == hash(key2) == ... hash(keyN)). It only has tocompute these keys once. It looks like it is now cheap enough inpractice to compute this dataset for Python (and other languages).

A countermeasure would be to check that we don't have more than X keyswith the same hash value. But in practice, we don't know in advance howdata are processed, and there are too many input vectors in various formats.

If we want to fix something, it should be done in the implementation ofthe dict type or in the hash algorithm. We can implement dictdifferently to avoid this issue, using a binary tree for example.Because dict is a fundamental type in Python, I don't think that we canchange its implementation (without breaking backward compatibility andso applications in production). A possibility would be to add a *new*type, but all libraries and applications would need to be changed to fixthe vulnerability.

The last choice is to change the hash algorithm. The *idea* is the samethan adding salt to hashed password (in practice it will be a little bitdifferent): if a pseudo-random salt is added, the attacker cannotprepare a single dataset, he/she will have to regenerate a new datasetfor each possible salt value. If the salt is big enough (size in bits),the attacker will need too much CPU to generate the dataset (compute Nkeys with the same hash value). Basically, it slows down the attack by2^(size of the salt).

--

Another possibility would be to replace our fast hash function by abetter hash function like MD5 or SHA1 (so the creation of the datasetwould be too slow in practice = too expensive), but cryptographic hashfunctions are much slower (and so would slow down Python too much).

Limiting the size of the POST data doesn't solve the problem becausethere are many other input vectors and data formats. It may block themost simple attacks because the attacker cannot inject enough keys toslow down your CPU.


Victor
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Hash collision security issue (now public)

Reply via email to