On Tue, 17 Jan 2012 21:59:28 +0100 "Martin v. Löwis" <mar...@v.loewis.de> wrote: > I'd like to propose a different approach to seeding the string hashes: > only do so for dictionaries involving only strings, and leave the > tp_hash slot of strings unchanged.
I think Python 3 would be better with a clean fix (all hashes randomized). Now for Python 2... The problem with this idea is that it only addresses str dicts. Unicode dicts, and any other dicts, are left vulnerable. Unicode dicts are quite likely in Web frameworks/applications and other places which have well-thought text semantics. That said, here's a suggestion to squeeze those bits: > 1. add an additional field to all string objects, to cache the second > hash value. > a) variant: in 3.3, drop the extra field, and declare that hashes > may change across runs In 2.7, a string object has the following fields: long ob_shash; int ob_sstate; Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits could cache a "hash perturbation" computed from the string and the random bits: - hash() would use ob_shash - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3)) This way, you cache almost all computations, adding only a computation and a couple logical ops when looking up a string in a dict. Regards Antoine. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com