2009/10/22 Martin McClure <[email protected]>: > Igor Stasenko wrote: >> Well, you can always attack the problem from different side - create a >> Set and Dictionary >> which using a perfect hash algorythms. >> As you may know, a perfect hash is a function which producing unique >> values for a limited set of elements. >> So, that in sets hashed using perfect hash function, no collisions >> possible, and hence the access time to any element is O(1). >> >> The drawback of such approach, that finding a perfect hash function >> for a set of elements is time consuming (best ones, AFAIK, consuming >> O(k*n) time, where n is number of elements), so >> you would never want to use a perfect hashing for collections which >> tend to grow or shrink often. >> >> But for collections, which populated once, and then used for read-only >> access this is worth doing. > > > Hi Igor, > > I'm afraid I don't see the usefulness. What kind of read accesses will > you be doing on this hashed collection? I can think of only two kinds of > access that make sense: The 'is this object in this collection' > question, and iterating over the collection to do some operation on all > objects therein. > > For iteration, you might as well use an Array, and forget hashing > altogether. > > For the 'is this object in this collection' question, you might more > straightforwardly replace the perfectHash instvar with a Boolean > isInTheCollection instvar. Saves space and time; you don't need the > collection at all. > > Perfect hashing sounds cool, but is of quite limited usefulness. I can > see it being useful when *all* of the following are true: > > * There are a relatively small number of possible objects in the > universe that is being hashed. Then each object can be given a unique > (thus "perfect") hash value. > > * There are multiple collections that can contain exactly this universe > of objects, and no others. > > * Membership in each collection by a given object is not easily > predictable, or might vary over time. > >> And its quite easy to modify current classes to use a perfect hash: >> just add a perfectHash ivar to it, and if its not nil , then use it >> to find an index of item. >> And if perfectHash is nil, then we just use the current implementation. >> >> And of course, you need to implement a perfect hash search, which >> standing aside :) >> But then you could just do: >> >> set := something collect: [... ] asSet rehashWithPerfectHash. > > I don't think you can easily use class Set, or any existing > general-purpose collection class, to get a perfect hash. Existing > Smalltalk collection classes combine two functions sequentially to form > the hash function: The first function is a property of the object being > stored (typically implemented as #hash or #identityHash), and the second > function is a property of the collection itself (typically mod with the > current prime table size). > > If your object knows its perfect hash with respect to a given > collection, you don't want that collection adding anything to that > function. In particular, you don't want the collection to take a modulo, > because then your hash function is no longer perfect and collisions > become quite likely again, defeating the entire purpose. > > In general, you'd set it up so that the perfect hash value is exactly > the table index reserved for that object in the collection. Then that > slot either contains a reference to that object, or nil. In any > perfect-hash situation, you need a table the same size as your universe > in every collection. This means that the universe cannot change size > easily, and once it gets very large the space costs are no longer worth > the time savings. >
Err, your thoughts went in wrong direction. Let me simplify the goal of perfect hashing: - suppose you having N unique integer numbers, without strictly specified range (any number could lie in range 0..infinity). - now we need to find a function such that, for each element 'x' in the above finite set, following conditions met: 0 <= f(x) < k*N and for each pair of two different elements x , y in original set f(x) <> f(y) and k < infinity As you can see , this function maps an arbitrary integer values without specific range, to an integer values in range [0..k*N], (k>=1). Given that we need to deal with a limited number of elemens for each particular set of numbers, such function can be found. We just need to search for an optimal (minimal) k, which is < 2 and in perfect case == 1. > Regards, > > -Martin > > _______________________________________________ > Pharo-project mailing list > [email protected] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [email protected] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
