Greg Stark <gsst...@mit.edu> writes:
> I suppose you could use crc where our interface does allow incremental use.

Hm, that's a thought.  I'm not sure though whether CRC really has the
behavior you want from a hash function, in particular that all the bits
are independent (a/k/a taking N low-order bits is a reasonable way to
cut the hash down to a suitable bucket index).  Anybody know?

> I'm not really familiar enough with the math to know whether CRC is
> any better than your XOR mixing. I suspect they're pretty similar. I
> suppose if you have arrays of various lengths containing repetitions
> of a single value than the non-CRC would produce a hash of 0 for all
> arrays with a length that's a multiple of 32. I'm not sure what CRC
> would do.

Some quick experimenting suggests that you get -1 from an array of 32 of
the same value, then 0 from 64 of the same, etc.  This doesn't bother me
particularly though.  There are always going to be some situations that
produce degenerate hash values.

> Oh, one question that occurred to me you didn't mention including the
> bounds of the array or the null bitmap in the hash function. I suppose
> it's correct with or without them (or in the case of the null bitmap
> another way to put it is the choice of the hash value to represent
> null is arbitrary).

Yeah.  I have it coded to treat a null element as if it had a hash value
of zero.  I don't see a great need to consider the array bounds per se.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to