Tim Peters <t...@python.org> added the comment:

Following up, at least under Visual Studio for x86 "it's free" to change the 
code to add in `perturb` shifted left.  The C source:

    perturb >>= PERTURB_SHIFT;
    i = (i*5 + (perturb << 1) + 1) & mask;

compiles to this, where I added comments relating the instructions to the 
source code:

    lea         rax,[r14+r14*4]  # rax = i*5
    shr         r12,5            # r12(perturb) >>= 5
    lea         r14,[r12*2+1]    # r14(i) = (perturb << 1) + 1
    add         r14,rax          # r14(i) += i*5
    and         r14,rbx          # r14(i) &= mask

Good job!  There are actually fewer machine instructions than C operators.

As noted, this change would eliminate the possibility of fixed points from the 
start, so would eliminate the symptoms in this specific bug report.

Which I don't much care about ;-)  But I _do_ care about that, in general, the 
early stages of collision resolution for small tables can "frequently" visit 
slots more than once.  This is true even if the hashes aren't identical.  And 
it would remain true even with the change (cycles can still exist - that there 
are no fixed points just means there are no more cycles of the minimum possible 
length 1).

Improving that is worth some effort, but not much.  In general, hash codes 
aren't identical, and in small tables redundant slot visits just cost the time 
to read the hash code from cache and re-deduce that it's not equal to what 
we're looking for (__eq__ isn't typically called).

In larger tables redundant slot visits in the early stages are too rare to care 

In small tables, it's worth something to guarantee that the first probe after a 
collision can't hit exactly the same slot again (in a random model, there's a 
12.5% chance of that happening now in the smallest tables - 12.5% is 1 in 8 - 
and this bug report contrived cases where the chance is 100% across a dozen 

Anyone up a for a world of tedious benchmarking? ;-)


Python tracker <rep...@bugs.python.org>
Python-bugs-list mailing list

Reply via email to