On Mon, 29 Nov 2010 13:32:27 +0100 Joerg Sonnenberger <[email protected]> wrote:
> On Mon, Nov 29, 2010 at 09:31:37AM +0000, Sad Clouds wrote: > > Joerg I was specifically talking about particular sequences of numbers > > that result in high collision rates, not random numbers. For example, > > if you take the following: > > > > 4, 8, 12, 16, 20 ... > > > > Every integer is a multiple of 4. Can you give me a full example of a > > hash function for power of 2 hash table, that will perform as well as > > a simple: > > > > interger % prime > > I asked the Python Oracle for a 32bit number and got 3063404725. This > results in hash(x,n) = ((3063404725ULL * x) >> 32) % n as function. > For n=32 and 32 keys, I get 8 single collisions. For n=256 and 256 keys, > I get 39 single collisions. Depending on the choosen random number, the > collision rate will be higher or lower, but if it is too high, you can > always just pick a different one and rehash. Theory says that the result > is probalistically near a random distribution *independent* of the input. > > Joerg I'm sorry, but this hash function results in too many collisions for power 2 hash table: nbuckets = 33554432 /* power of 2 */ nitems = 18000000 hash_function = ((3063404725ULL * num) >> 32) % nbuckets Integer sequence: 2, 3, 4, 5, 6 ... Collisions: 5161419 Integer sequence: 2, 4, 6, 8, 10 ... Collisions: 0 Integer sequence: 4, 8, 12, 16, 20 ... Collisions: 908960 This time with hash table that is prime: nbuckets = 33554467 /* prime */ nitems = 18000000 hash_function = num % nbuckets Integer sequence: 2, 3, 4, 5, 6 ... Collisions: 0 Integer sequence: 2, 4, 6, 8, 10 ... Collisions: 0 Integer sequence: 4, 8, 12, 16, 20 ... Collisions: 0 I've tried many different hash functions and from my experiments, for hashing integers, it's very hard to beat 'number % prime'. There is no magic number that you need to derive for every different set of integers, it just works for all sorts of integers and gives very good results.
