Re: [Pharo-project] Perfect hash [was: Re: HashTable vs Dictionary]

Igor Stasenko Wed, 21 Oct 2009 19:25:05 -0700

2009/10/22 Martin McClure <[email protected]>:
> Igor Stasenko wrote:
>> Well, you can always attack the problem from different side - create a
>> Set and Dictionary
>> which using a perfect hash algorythms.
>> As you may know, a perfect hash is a function which producing unique
>> values for a limited set of elements.
>> So, that in sets hashed using perfect hash function, no collisions
>> possible, and hence the access time to any element is O(1).
>>
>> The drawback of such approach, that finding a perfect hash function
>> for a set of elements is time consuming (best ones, AFAIK, consuming
>> O(k*n) time, where n is number of elements), so
>> you would never want to use a perfect hashing for collections which
>> tend to grow or shrink often.
>>
>> But for collections, which populated once, and then used for read-only
>> access this is worth doing.
>
>
> Hi Igor,
>
> I'm afraid I don't see the usefulness. What kind of read accesses will
> you be doing on this hashed collection? I can think of only two kinds of
> access that make sense: The 'is this object in this collection'
> question, and iterating over the collection to do some operation on all
> objects therein.
>
> For iteration, you might as well use an Array, and forget hashing
> altogether.
>
> For the 'is this object in this collection' question, you might more
> straightforwardly replace the perfectHash instvar with a Boolean
> isInTheCollection instvar. Saves space and time; you don't need the
> collection at all.
>
> Perfect hashing sounds cool, but is of quite limited usefulness. I can
> see it being useful when *all* of the following are true:
>
> * There are a relatively small number of possible objects in the
> universe that is being hashed. Then each object can be given a unique
> (thus "perfect") hash value.
>
> * There are multiple collections that can contain exactly this universe
> of objects, and no others.
>
> * Membership in each collection by a given object is not easily
> predictable, or might vary over time.
>
>> And its quite easy to modify current classes to use a perfect hash:
>>  just add a perfectHash ivar to it, and if its not nil , then use it
>> to find an index of item.
>> And if perfectHash is nil, then we just use the current implementation.
>>
>> And of course, you need to implement a perfect hash search, which
>> standing aside :)
>> But then you could just do:
>>
>> set := something collect: [... ] asSet rehashWithPerfectHash.
>
> I don't think you can easily use class Set, or any existing
> general-purpose collection class, to get a perfect hash. Existing
> Smalltalk collection classes combine two functions sequentially to form
> the hash function: The first function is a property of the object being
> stored (typically implemented as #hash or #identityHash), and the second
> function is a property of the collection itself (typically mod with the
> current prime table size).
>
> If your object knows its perfect hash with respect to a given
> collection, you don't want that collection adding anything to that
> function. In particular, you don't want the collection to take a modulo,
> because then your hash function is no longer perfect and collisions
> become quite likely again, defeating the entire purpose.
>
> In general, you'd set it up so that the perfect hash value is exactly
> the table index reserved for that object in the collection. Then that
> slot either contains a reference to that object, or nil. In any
> perfect-hash situation, you need a table the same size as your universe
> in every collection. This means that the universe cannot change size
> easily, and once it gets very large the space costs are no longer worth
> the time savings.
>


Err, your thoughts went in wrong direction.
Let me simplify the goal of perfect hashing:
- suppose you having N unique integer numbers, without strictly
specified range (any number could lie in range 0..infinity).
- now we need to find a function such that, for each element 'x' in
the above finite set, following conditions met:

0 <= f(x) < k*N

and for each pair of two different elements x , y in original set
  f(x) <> f(y)

and k < infinity

As you can see , this function maps an arbitrary integer values
without specific range, to an
integer values in range [0..k*N], (k>=1).
Given that we need to deal with a limited number of elemens for each
particular set of numbers, such function can be found.
We just need to search for an optimal (minimal) k, which is < 2 and in
perfect case == 1.

> Regards,
>
> -Martin
>
> _______________________________________________
> Pharo-project mailing list
> [email protected]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



-- 
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] Perfect hash [was: Re: HashTable vs Dictionary]

Reply via email to