"Dave Sill" <[EMAIL PROTECTED]> writes:
> Ian Lance Taylor <[EMAIL PROTECTED]> wrote:
>
> >If the input numbers are truly random, then a modulos hash will
> >distribute well whether or not the hash size is prime.
> >
> >However, if the input numbers are not truly random, then a modulos
> >hash may pick out some regularity in the input, and preferentially
> >hash to a given set of buckets.
>
> If the input numbers are not fairly random, then a modulo hash is not
> a choice.
Sure it is. A modulos hash works fine if the input numbers are fairly
random with respect to the hash size. That doesn't imply that the
input numbers are random in any real size.
> >For a trivial example, if the numbers
> >tend to be even, then an even modulos hash will tend toward using the
> >even numbered buckets.
>
> Which, unfortunately, wouldn't be helped by a prime table size.
I guess one of us misunderstands the other. My point is that if the
input numbers and the table size tend to have a divisor greater than
one, then the distribution will be skewed. Using a prime number
minimizes the number of divisors greater than one which may be shared
by the table size and the input numbers.
> >A prime modulos hash minimizes the types of
> >regularity which will lead to a poor hash distribution.
>
> Exactly how does a prime modulus help? Can you give an example?
Suppose the input numbers are 2 4 6 8 10 12. Suppose the hash size is
8. Then the buckets are 2 4 6 0 2 4. Note the bad distribution.
Suppose the hash size is 7. Then the buckers are 2 4 6 1 3 5. Note
the good distribution.
Here's another way to say it. For any given hash size, call a series
of inputs which lead to a bad distribution B. Consider the set of all
bad inputs {B}. A prime hash size minimizes the number of elements in
{B}. Therefore, if you don't know anything about the inputs--in
particular, if you don't know if they are random--it is better to
choose a prime hash size.
Ian