Igor Stasenko wrote:
> Well, you can always attack the problem from different side - create a
> Set and Dictionary
> which using a perfect hash algorythms.
> As you may know, a perfect hash is a function which producing unique
> values for a limited set of elements.
> So, that in sets hashed using perfect hash function, no collisions
> possible, and hence the access time to any element is O(1).
> 
> The drawback of such approach, that finding a perfect hash function
> for a set of elements is time consuming (best ones, AFAIK, consuming
> O(k*n) time, where n is number of elements), so
> you would never want to use a perfect hashing for collections which
> tend to grow or shrink often.
> 
> But for collections, which populated once, and then used for read-only
> access this is worth doing.


Hi Igor,

I'm afraid I don't see the usefulness. What kind of read accesses will 
you be doing on this hashed collection? I can think of only two kinds of 
access that make sense: The 'is this object in this collection' 
question, and iterating over the collection to do some operation on all 
objects therein.

For iteration, you might as well use an Array, and forget hashing 
altogether.

For the 'is this object in this collection' question, you might more 
straightforwardly replace the perfectHash instvar with a Boolean 
isInTheCollection instvar. Saves space and time; you don't need the 
collection at all.

Perfect hashing sounds cool, but is of quite limited usefulness. I can 
see it being useful when *all* of the following are true:

* There are a relatively small number of possible objects in the 
universe that is being hashed. Then each object can be given a unique 
(thus "perfect") hash value.

* There are multiple collections that can contain exactly this universe 
of objects, and no others.

* Membership in each collection by a given object is not easily 
predictable, or might vary over time.

> And its quite easy to modify current classes to use a perfect hash:
>  just add a perfectHash ivar to it, and if its not nil , then use it
> to find an index of item.
> And if perfectHash is nil, then we just use the current implementation.
> 
> And of course, you need to implement a perfect hash search, which
> standing aside :)
> But then you could just do:
> 
> set := something collect: [... ] asSet rehashWithPerfectHash.

I don't think you can easily use class Set, or any existing 
general-purpose collection class, to get a perfect hash. Existing 
Smalltalk collection classes combine two functions sequentially to form 
the hash function: The first function is a property of the object being 
stored (typically implemented as #hash or #identityHash), and the second 
function is a property of the collection itself (typically mod with the 
current prime table size).

If your object knows its perfect hash with respect to a given 
collection, you don't want that collection adding anything to that 
function. In particular, you don't want the collection to take a modulo, 
because then your hash function is no longer perfect and collisions 
become quite likely again, defeating the entire purpose.

In general, you'd set it up so that the perfect hash value is exactly 
the table index reserved for that object in the collection. Then that 
slot either contains a reference to that object, or nil. In any 
perfect-hash situation, you need a table the same size as your universe 
in every collection. This means that the universe cannot change size 
easily, and once it gets very large the space costs are no longer worth 
the time savings.

Regards,

-Martin

_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Reply via email to