On 10 Sep 2007, at 01:58, Jim Jewett wrote:
> To spell this out a bit more:
> ...
> When adding four entries to an 8-slot table, a truly random hash would
> have at least one collision (0/8 + 1/8 + 2/8 + 3/8 =) 3/4 of the
> time. As expected, the proposed hash does have a collision for those
> fo
[Larry Hastings]
> I see--it's avoiding the Birthday Paradox.
It /tends/ to, yes. This wasn't a design goal of the string hash,
it's just a property observed after it was adopted, and appreciated
much later ;-)
It's much clearer for Python's small-int hash, where hash(i) == i for
i != -1. That
On 9/8/07, Tim Peters <[EMAIL PROTECTED]> wrote:
> in comments in dictobject.c. As it notes there, hashing the strings
> "namea", "nameb", "namec", and "named" currently produces (on a
> sizeof(long) == 4 box):
> -1658398457
> -1658398460
> -1658398459
> -1658398462
> That the hash codes are ve
Thomas Wouters wrote:
Because (relatively) small dicts with (broadly speaking) similar keys
are quite common in Python. Module and class and instance __dict__s,
for instance ;) As Tim mentioned, the dict implementation only looks
at part of the actual hash value (depending on the size of the di
On 9/9/07, Larry Hastings <[EMAIL PROTECTED]> wrote:
> One goal of Jenkin's hashes is uniform distribution, so these functions
> presumably lack the serendipitous "similar inputs hash to similar values"
> behavior of Python's current hash function. But why is that a feature?
> (Not that I doubt T
If the Python community is just noticing the Hsieh hash, that implies
that the Bob Jenkins hashes are probably unknown as well. Behold:
http://burtleburtle.net/bob/hash/doobs.html
To save you a little head-scratching, the functions you want to play
with are hashlittle()/hashlittle2() in "l
[Guido]
> I'd like Tim Peters's input on this before we change it. I seem to
> recall that there's an aspect of non-randomness to the existing hash
> function that's important when you hash many closely related strings,
> e.g. "0001", "0002", "0003", etc., into a dictionary. Though it's been
> so l
I'd like Tim Peters's input on this before we change it. I seem to
recall that there's an aspect of non-randomness to the existing hash
function that's important when you hash many closely related strings,
e.g. "0001", "0002", "0003", etc., into a dictionary. Though it's been
so long that I may mis
On 9/4/07, Thomas Hunger <[EMAIL PROTECTED]> wrote:
>
>
> Hello,
>
> I don't know much about python internals, so the following might be
> bogus:
>
> I replaced unicode_hash and string_hash with the hash function from
> here: http://www.azillionmonkeys.com/qed/hash.html.
>
> Then I ran the followin