Re: [HACKERS] Hash support for arrays

Robert Haas Tue, 02 Nov 2010 13:08:19 -0700

On Tue, Nov 2, 2010 at 12:34 PM, Kevin Grittner
<kevin.gritt...@wicourts.gov> wrote:
> Robert Haas <robertmh...@gmail.com> wrote:
>> Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> Robert Haas <robertmh...@gmail.com> writes:
>
>>>> The goal is to make those hard to predict, though.
>>>
>>> Really?  I think "I don't understand when this fails" isn't
>>> obviously better than being able to predict when it fails ...
>>
>> Isn't that the whole point of hash functions?  Collisions are
>> inevitable, but you want them to be unpredictable.
>
> Are you sure you're not confusing attributes of a good random number
> generator with hash generation?  (They have much in common, but I've
> only seen concern with "hard to predict" when working on random
> number algorithms, as for financial audits or jury selection.)
>
>> If you want a hash function with predictable collision behavior,
>> just has the first element.  It'll be highly predictable AND
>> wicked fast.
>
> You're confusing "unpredictable" with "widely distributed", I think.
> There's no reason that the hash value of an integer of the same size
> as the hash shouldn't be the integer itself, for example.  It's hard
> to get more predictable than that, yet it works well for hash
> lookups.


Well, no, not really.  For example, it may be that you have a hash
table with N buckets and values that of the form N+k, 2N+k, 3N+k, ....
 and everything will collide.

If you do some mixing of the bits, that case is far less likely to
occur in practice.

See for example http://www.azillionmonkeys.com/qed/hash.html

> I know that for a hash of a character string, the total = (total *
> 31) + nextcharacter has been shown to be effective.  That's hardly
> random or hard to predict, but it does tend to spread out the hash
> values well.  Whether it is as good for arrays, I don't know.  It
> seems like a reasonable place to start, though, since a character
> string is rather like an array of characters.

That was my thought.

>>> What concerns me about that is that it tends to push the bits to
>>> the left --- I think the effects of the earlier inputs are going
>>> to overflow out as you incorporate a lot of newer inputs.  What
>>> you want is a scheme where every input item affects all the bits
>>> of the final result about equally, and my gut feeling is this
>>> doesn't provide that.
>>
>> I don't think so.  That would be a problem if you multiplied by an
>> even number.  You can see that you don't get dups if you just add
>> in the same value over and over:
>
> Yeah, that 1 in the low order position ensures that the impact of
> any value which is ever added into the total is never entirely lost.

Right...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash support for arrays

Reply via email to