Re: [Python-ideas] Reducing collisions in small dicts/sets

2017-06-26 Thread Tim Peters
[Tim] >... I haven't yet thought of a cheap way to compute an > `inc` for double-hashing that isn't vulnerable to bad behavior for > _some_ easily constructed set of int keys. If you forget "cheap", > it's easy; e.g., > > random.seed(h) > inc = random.choice(range(1, mask + 1, 2)) Heh.

Re: [Python-ideas] Reducing collisions in small dicts/sets

2017-06-26 Thread Tim Peters
[MRAB ] > If the current scheme suffers only for small tables, couldn't you use an > alternative scheme only for small tables? Sure. But whether that's desirable partly depends on timing actual C code. Try it ;-) For maintenance sanity, it's obviously better to have

Re: [Python-ideas] Reducing collisions in small dicts/sets

2017-06-26 Thread MRAB
On 2017-06-26 19:21, Tim Peters wrote: Some explanations and cautions. An advantage of sticking with pure Python instead of C is that spare moments are devoted to investigating the problem instead of thrashing with micro-optimizations ;-) Why does the current scheme suffer for small tables?

Re: [Python-ideas] Reducing collisions in small dicts/sets

2017-06-25 Thread Tim Peters
Two bits of new info: first, it's possible to get the performance of "double" without division, at least via this way: """ # Double hashing using Fibonacci multiplication for the increment. This # does about as well as `double`, but doesn't require division. # # The multiplicative constant

[Python-ideas] Reducing collisions in small dicts/sets

2017-06-24 Thread Tim Peters
Short course: the average number of probes needed when searching small dicts/sets can be reduced, in both successful ("found") and failing ("not found") cases. But I'm not going to pursue this. This is a brain dump for someone who's willing to endure the interminable pain of arguing about