I'd like to propose a different approach to seeding the string hashes: only do so for dictionaries involving only strings, and leave the tp_hash slot of strings unchanged.
Each string would get two hashes: the "public" hash, which is constant across runs and bugfix releases, and the dict-hash, which is only used by the dictionary implementation, and only if all keys to the dict are strings. In order to allow caching of the hash, all dicts should use the same hash (if caching wasn't necessary, each dict could use its own seed). There are several variants of that approach wrt. caching of the hash 1. add an additional field to all string objects, to cache the second hash value. a) variant: in 3.3, drop the extra field, and declare that hashes may change across runs 2. only cache the dict-hash, recomputing the public hash each time 3. on a per-string choice, cache either the dict-hash or the public hash, depending on which one gets computed first, and recompute the other one every time it's needed. As you can see, 1 vs. 2/3 is a classical time-space-tradeoff. What do you think? Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com