On Thu, Jan 19, 2012 at 4:48 PM, Victor Stinner < victor.stin...@haypocalc.com> wrote:
> Hi, > > I'm working on the hash collision issue since 2 or 3 weeks. I > evaluated all solutions and I think that I have now a good knowledge > of the problem and how it should be solved. The major issue is to have > a minor or no impact on applications (don't break backward > compatibility). I saw three major solutions: > > - use a randomized hash > - use two hashes, a randomized hash and the actual hash kept for > backward compatibility > - count collisions on dictionary lookup > > Using a randomized hash does break a lot of tests (e.g. tests relying > on the representation of a dictionary). The patch is huge, too big to > backport it directly on stable versions. Using a randomized hash may > also break (indirectly) real applications because the application > output is also somehow "randomized". For example, in the Django test > suite, the HTML output is different at each run. Web browsers may > render the web page differently, or crash, or ... I don't think that > Django would like to sort attributes of each HTML tag, just because we > wanted to fix a vulnerability. > > Randomized hash has also a major issue: if the attacker is able to > compute the secret, (s)he can easily compute collisions and exploit > the hash collision vulnerability again. I don't know exactly how > complex it is to compute the secret, but our hash function is weak (it > is far from being cryptographic, it is really simple to run it > backward). If someone writes a fast function to compute the secret, we > will go back to the same point. > > IMO using two hashes has the same disavantages of the randomized hash > solution, whereas it is more complex to implement. > > The last solution is very simple: count collision and raise an > exception if it hits a limit. The path is something like 10 lines > whereas the randomized hash is more close to 500 lines, add a new > file, change Visual Studio project file, etc. First I thaught that it > would break more applications than the randomized hash, but I tried on > Django: the test suite fails with a limit of 20 collisions, but not > with a limit of 50 collisions, whereas the patch uses a limit of 1000 > collisions. According to my basic tests, a limit of 35 collisions > requires a dictionary with more than 10,000,000 integer keys to raise > an error. I am not talking about the attack, but valid data. > > More details about my tests on the Django test suite: > http://bugs.python.org/issue13703#msg151620 > > -- > > I propose to solve the hash collision vulnerability by counting > collisions because it does fix the vulnerability with a minor or no > impact on applications or backward compatibility. I don't see why we > should use a different fix for Python 3.3. If counting collisons > solves the issue for stable versions, it is also enough for Python > 3.3. We now know all issues of the randomized hash solution, and I > think that there are more drawbacks than advantages. IMO the > randomized hash is overkill to fix the hash collision issue. > +1 > I just have some requests on Marc Andre Lemburg patch: > > - the limit should be configurable: a new function in the sys module > should be enough. It may be private (or replaced by an environment > variable?) in stable versions > - the set type should also be patched (I didn't check if it is > vulnerable or not using the patch) > - the patch has no test! (a class with a fixed hash should be enough > to write a test) > - the limit must be documented somwhere > - the exception type should be different than KeyError > > Victor > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com