On 14/01/12 12:58, Gregory P. Smith wrote:
I do like *randomly seeding the hash*. *+1*. This is easy. It can easily be
back ported to any Python version.
It is perfectly okay to break existing users who had anything depending on
ordering of internal hash tables. Their code was already broken.
For the record:
steve@runes:~$ python -c "print(hash('spam ham'))"
-376510515
steve@runes:~$ jython -c "print(hash('spam ham'))"
2054637885
So it is already the case that Python code that assumes stable hashing is
broken.
For what it's worth, I'm not convinced that we should be overly-concerned by
"poor saps" (Guido's words) who rely on accidents of implementation regarding
hash. We shouldn't break their code unless we have a good reason, but this
strikes me as a good reason. The documentation for hash certainly makes no
promise about stability, and relying on it strikes me as about as sensible as
relying on the stability of error messages.
I'm also not convinced that the option to raise an exception after 1000
collisions actually solves the problem. That relies on the application being
re-written to catch the exception and recover from it (how?). Otherwise, all
it does is change the attack vector from "cause an indefinite number of hash
collisions" to "cause 999 hash collisions followed by crashing the application
with an exception", which doesn't strike me as much of an improvement.
+1 on random seeding. Default to on in 3.3+ and default to off in older
versions, which allows people to avoid breaking their code until they're ready
for it to be broken.
--
Steven
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com