On Mon, 29 Nov 2010 13:32:27 +0100
Joerg Sonnenberger <[email protected]> wrote:

> On Mon, Nov 29, 2010 at 09:31:37AM +0000, Sad Clouds wrote:
> > Joerg I was specifically talking about particular sequences of numbers
> > that result in high collision rates, not random numbers. For example,
> > if you take the following:
> > 
> > 4, 8, 12, 16, 20 ...
> > 
> > Every integer is a multiple of 4. Can you give me a full example of a
> > hash function for power of 2 hash table, that will perform as well as
> > a simple:
> > 
> > interger % prime
> 
> I asked the Python Oracle for a 32bit number and got 3063404725. This
> results in hash(x,n) = ((3063404725ULL * x) >> 32) % n as function.
> For n=32 and 32 keys, I get 8 single collisions. For n=256 and 256 keys,
> I get 39 single collisions. Depending on the choosen random number, the
> collision rate will be higher or lower, but if it is too high, you can
> always just pick a different one and rehash. Theory says that the result
> is probalistically near a random distribution *independent* of the input.
> 
> Joerg

I'm sorry, but this hash function results in too many collisions for power 2 
hash table:

nbuckets = 33554432 /* power of 2 */
nitems = 18000000
hash_function = ((3063404725ULL * num) >> 32) % nbuckets

Integer sequence: 2, 3, 4, 5, 6 ...
Collisions: 5161419

Integer sequence: 2, 4, 6, 8, 10 ...
Collisions: 0

Integer sequence: 4, 8, 12, 16, 20 ...
Collisions: 908960


This time with hash table that is prime:

nbuckets = 33554467 /* prime */
nitems = 18000000
hash_function = num % nbuckets

Integer sequence: 2, 3, 4, 5, 6 ...
Collisions: 0

Integer sequence: 2, 4, 6, 8, 10 ...
Collisions: 0

Integer sequence: 4, 8, 12, 16, 20 ...
Collisions: 0

I've tried many different hash functions and from my experiments, for hashing 
integers, it's very hard to beat 'number % prime'. There is no magic number 
that you need to derive for every different set of integers, it just works for 
all sorts of integers and gives very good results.

Reply via email to