I'd like Tim Peters's input on this before we change it. I seem to recall that there's an aspect of non-randomness to the existing hash function that's important when you hash many closely related strings, e.g. "0001", "0002", "0003", etc., into a dictionary. Though it's been so long that I may misremember this, and perhaps it was related to the dictionary implementation.
In any case we need to see the code as a patch, of course. On 9/7/07, Gregory P. Smith <[EMAIL PROTECTED]> wrote: > On 9/4/07, Thomas Hunger <[EMAIL PROTECTED]> wrote: > > > > Hello, > > > > I don't know much about python internals, so the following might be > > bogus: > > > > I replaced unicode_hash and string_hash with the hash function from > > here: http://www.azillionmonkeys.com/qed/hash.html. > > > > Then I ran the following micro-benchmark : > > > > $ time ./python bench.py > > > > where bech.py is: > > > > f = dict((line, nr) for nr, line > > in enumerate(open('/usr/share/dict/words', > > > encoding='latin1').readlines())) > > > > Python3k original hash: real 0m2.210s > > new hash: real 0m1.842s > > > > So maybe this is an interesting hash function? > > > > Tom > > Sounds like a great idea to me. Can you submit it as a patch? > > We should run some more realistic perf tests and profiles but I imagine the > impact will only be good. > > -gps > > _______________________________________________ > Python-3000 mailing list > Python-3000@python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com