On 10 Sep 2007, at 01:58, Jim Jewett wrote:
> To spell this out a bit more:
> ...
> When adding four entries to an 8-slot table, a truly random hash would
> have at least one collision (0/8 + 1/8 + 2/8 + 3/8 =) 3/4 of the
> time. As expected, the proposed hash does have a collision for those
> fo
[Larry Hastings]
> I see--it's avoiding the Birthday Paradox.
It /tends/ to, yes. This wasn't a design goal of the string hash,
it's just a property observed after it was adopted, and appreciated
much later ;-)
It's much clearer for Python's small-int hash, where hash(i) == i for
i != -1. That
On 9/8/07, Tim Peters <[EMAIL PROTECTED]> wrote:
> in comments in dictobject.c. As it notes there, hashing the strings
> "namea", "nameb", "namec", and "named" currently produces (on a
> sizeof(long) == 4 box):
> -1658398457
> -1658398460
> -1658398459
> -1658398462
> That the hash codes are ve
Thomas Wouters wrote:
Because (relatively) small dicts with (broadly speaking) similar keys
are quite common in Python. Module and class and instance __dict__s,
for instance ;) As Tim mentioned, the dict implementation only looks
at part of the actual hash value (depending on the size of the di
On 9/9/07, Larry Hastings <[EMAIL PROTECTED]> wrote:
> One goal of Jenkin's hashes is uniform distribution, so these functions
> presumably lack the serendipitous "similar inputs hash to similar values"
> behavior of Python's current hash function. But why is that a feature?
> (Not that I doubt T
If the Python community is just noticing the Hsieh hash, that implies
that the Bob Jenkins hashes are probably unknown as well. Behold:
http://burtleburtle.net/bob/hash/doobs.html
To save you a little head-scratching, the functions you want to play
with are hashlittle()/hashlittle2() in "l
[Guido]
> I'd like Tim Peters's input on this before we change it. I seem to
> recall that there's an aspect of non-randomness to the existing hash
> function that's important when you hash many closely related strings,
> e.g. "0001", "0002", "0003", etc., into a dictionary. Though it's been
> so l
I'd like Tim Peters's input on this before we change it. I seem to
recall that there's an aspect of non-randomness to the existing hash
function that's important when you hash many closely related strings,
e.g. "0001", "0002", "0003", etc., into a dictionary. Though it's been
so long that I may mis
On 9/4/07, Thomas Hunger <[EMAIL PROTECTED]> wrote:
>
>
> Hello,
>
> I don't know much about python internals, so the following might be
> bogus:
>
> I replaced unicode_hash and string_hash with the hash function from
> here: http://www.azillionmonkeys.com/qed/hash.html.
>
> Then I ran the followin
On 9/3/07, Nicholas Bastin <[EMAIL PROTECTED]> wrote:
> NOTE: This data is time sampling, not call graph. Added time could
> come from either more calls, or longer calls.
>
> +312.9% PyDict_GetItem
I've finally managed to get call graph data and it's fairly
interesting for this call. I try to f
> I've been doing some profiling of 3.0 vs. 2.6 release builds on
> Windows XP for the purpose of hopefully closing the performance
> gap. This data is very preliminary, but I thought I'd throw it out
> here in case someone else also wanted to look into this. Also,
> possibly useful for comparing
On 9/3/07, Nicholas Bastin <[EMAIL PROTECTED]> wrote:
> I've been doing some profiling of 3.0 vs. 2.6 release builds on
> Windows XP for the purpose of hopefully closing the performance gap.
> This data is very preliminary, but I thought I'd throw it out here in
> case someone else also wanted to l
Interesting! Thanks for doing this. We'll need a lot of this over the
coming year.
I read in this that the increased cost is largely due to using unicode
strings for all variable and attribute names. So the next step might
be to optimize the snot out of unicode hashing and introduce the
unicode eq
I've been doing some profiling of 3.0 vs. 2.6 release builds on
Windows XP for the purpose of hopefully closing the performance gap.
This data is very preliminary, but I thought I'd throw it out here in
case someone else also wanted to look into this. Also, possibly
useful for comparing against pr
14 matches
Mail list logo