[Tim]
>... I haven't yet thought of a cheap way to compute an
> `inc` for double-hashing that isn't vulnerable to bad behavior for
> _some_ easily constructed set of int keys. If you forget "cheap",
> it's easy; e.g.,
>
> random.seed(h)
> inc = random.choice(range(1, mask + 1, 2))
Heh.
[MRAB ]
> If the current scheme suffers only for small tables, couldn't you use an
> alternative scheme only for small tables?
Sure. But whether that's desirable partly depends on timing actual C
code. Try it ;-) For maintenance sanity, it's obviously better to
have
On 2017-06-26 19:21, Tim Peters wrote:
Some explanations and cautions. An advantage of sticking with pure
Python instead of C is that spare moments are devoted to investigating
the problem instead of thrashing with micro-optimizations ;-)
Why does the current scheme suffer for small tables?
Two bits of new info: first, it's possible to get the performance of
"double" without division, at least via this way:
"""
# Double hashing using Fibonacci multiplication for the increment. This
# does about as well as `double`, but doesn't require division.
#
# The multiplicative constant
Short course: the average number of probes needed when searching
small dicts/sets can be reduced, in both successful ("found") and
failing ("not found") cases.
But I'm not going to pursue this. This is a brain dump for someone
who's willing to endure the interminable pain of arguing about