Dennis Bjorklund <[EMAIL PROTECTED]> writes: > On Sun, 18 Apr 2004, Bruno Wolff III wrote: > > > Another option would be to put the numbers into two int4s. For int4 or > > smaller types one of these would be zero. int8s would be split between > > the two. The hash function would then be defined on the two int4s. > > Sure, this is an internal calculation in the hash function. The only > important thing is that the number 7 (for example) gives the same hash > value no matter if it is an int2 or an int8 and that the hash function > works well also for int8 numbers (which is does not today).

What's missing here is that the actual API for hash functions is that the data type provides a function that hashes to 32 bit integers. Then the hash code uses the 32 bit integer to crunch down to the actual number of buckets (using mod). The choice of 32 bit integers is purely arbitrary. As long as it's larger than than the number of buckets in any sane hash table it's fine. 32 bits is plenty. I question the use of mod to crunch the hash value down though. In the case of int4 the mapping to 32 bits is simply the identity. So the entire hash function ends up being simply "input mod #buckets". It seems way too easy to find real world data sets where many numbers will all be multiples of some number. If that common divisor shares any factors with the number of buckets, then the distribution will be very far from even with many empty buckets. If the hash tables were made a power of two then it would be possible to mix the bits of the 32 bit value and just mask off the unneeded bits. I've found one page via google that mentions mixing bits in a hash function, but I would look for a more serious treatment somewhere. http://burtleburtle.net/bob/hash/doobs.html Incidentally, this text claims mod is extremely slow compared to bit manipulations. I don't know that that kind of cycle counting is really is a factor for postgres though. Also, incidentally, this text is interesting: http://www.isthe.com/chongo/tech/comp/fnv/ -- greg ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend