On Tue, Nov 2, 2010 at 12:34 PM, Kevin Grittner <kevin.gritt...@wicourts.gov> wrote: > Robert Haas <robertmh...@gmail.com> wrote: >> Tom Lane <t...@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmh...@gmail.com> writes: > >>>> The goal is to make those hard to predict, though. >>> >>> Really? I think "I don't understand when this fails" isn't >>> obviously better than being able to predict when it fails ... >> >> Isn't that the whole point of hash functions? Collisions are >> inevitable, but you want them to be unpredictable. > > Are you sure you're not confusing attributes of a good random number > generator with hash generation? (They have much in common, but I've > only seen concern with "hard to predict" when working on random > number algorithms, as for financial audits or jury selection.) > >> If you want a hash function with predictable collision behavior, >> just has the first element. It'll be highly predictable AND >> wicked fast. > > You're confusing "unpredictable" with "widely distributed", I think. > There's no reason that the hash value of an integer of the same size > as the hash shouldn't be the integer itself, for example. It's hard > to get more predictable than that, yet it works well for hash > lookups.
Well, no, not really. For example, it may be that you have a hash table with N buckets and values that of the form N+k, 2N+k, 3N+k, .... and everything will collide. If you do some mixing of the bits, that case is far less likely to occur in practice. See for example http://www.azillionmonkeys.com/qed/hash.html > I know that for a hash of a character string, the total = (total * > 31) + nextcharacter has been shown to be effective. That's hardly > random or hard to predict, but it does tend to spread out the hash > values well. Whether it is as good for arrays, I don't know. It > seems like a reasonable place to start, though, since a character > string is rather like an array of characters. That was my thought. >>> What concerns me about that is that it tends to push the bits to >>> the left --- I think the effects of the earlier inputs are going >>> to overflow out as you incorporate a lot of newer inputs. What >>> you want is a scheme where every input item affects all the bits >>> of the final result about equally, and my gut feeling is this >>> doesn't provide that. >> >> I don't think so. That would be a problem if you multiplied by an >> even number. You can see that you don't get dups if you just add >> in the same value over and over: > > Yeah, that 1 in the low order position ensures that the impact of > any value which is ever added into the total is never entirely lost. Right... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers