On 19 July 2012 23:20, Steven D'Aprano <st...@pearwood.info> wrote: > Selby Rowley-Cannon wrote: >> >> I am using a hash table in a small randomization program. I know that some >> hash functions can be prone to collisions, so I need a way to detect >> collisions.
> This entire question seems like a remarkable case of premature optimization. > Start with demonstrating that collisions are an actual problem that need > fixing. Unless you know all the keys in advance and have consructed a proven perfect hash, potential collisions are something that *any* hash table implementation fundamentally *has* to deal with, no matter how small the probability of a collision. (If there's a non-zero probability of collision you *have* to deal with collisions.) > Unless you have profiled your application and proven that hash collisions is > a real problem -- and unless you are hashing thousands of float NANs, that > is almost certainly not the case -- you are just wasting your time and > making your code slower rather than faster -- a pessimation, not > optimization. > > And if it *is* a problem, then the solution is to fix your data so that its > __hash__ method is less likely to collide. If you are rolling your own hash > method, instead of using one of Python's, that's your first problem. Unfortunately you usually can't prescribe the data to be stored in a hash table to accomodate a given hashing function. ;) You can however of course improve your hashing function, or (within reason) increase the hash table size. But yes, rolling your own is not a good idea given highly optimized solutions already exist inside the Python language (dict) Except of course if you're studying hash tables as part of some course... Selby, can you clarify please, are you doing this as part of an assignment or some coursework? I can understand your question in the context of learning/computer studies, but otherwise you should just use Python's dict. In any case, to somewhat answer you question, dealing with collisions in hash tables typically involves some variant of one of the following 2 approaches: 1) Each slot in the hash table is a "bucket", usually implemented as a list (but might be something else), so you do a search through the hash bucket to see if the key you'r looking for is in fact in the "bucket" or 2) You have each slot in the hash table be a single key/value pair and then once you've hashed you look to see if the key at the entry identified is the key you're looking for. If not (collision), you hash ("probe") again by some factor (modulo the size of you hash table) and look at the next slot identified etc until you either find the key you're looking for, or an empty slot. There many variations one could come up with. Have a read of this page on wikipedia: http://en.wikipedia.org/wiki/Hash_table Walter _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor