2009/10/22 Andres Valloud <[email protected]>:
> My point would be that either:
>
> * the hashed storage is always of size 1000 (or greater), so if you
> store a few objects in the hashed collection it wastes a lot of space;
>
> * the hashed storage changes size to match the objects stored in it, but
> looking at f(x) mod p risks reintroducing the collisions you wanted to
> avoid in the first place.
>

a different hash function picked if size of input set is changed.
So, there is no chance to have storage of size 1000 , holding few
elements (unless your implementation really suck :).
That's why you can always be sure that your storage is good enough to
store all of the elements and
with minimal possible chances of collision.
The only problem here, as i noted, that you have to rehash the storage
each time you modifying
the input set, because you changing the hash function which is
'optimal' for new input set.
And that's why this approach is bad for sets which changing often.


> So, to not waste space in this particular scenario, then either N should
> be small or basically all N objects should be stored in the hashed
> collection.  If N is small, then why not set f(x) to produce values
> between 1 and N, and then use an array?  And if N is large and the
> utilization of the hashed collection is close to 100% of N, then why not
> also use an array in that case?  In the end, really nice perfect hash
> functions point the finger at alternatives that may obviate using hashed
> collections.  Of course, your mileage may vary...
>
perfect hash is just about that: find a function which having minimal
, or no hash collisions and
with storage size very close to the number of elements in input set.

> Andres.
>
>
> Martin McClure wrote:
>> Andres Valloud wrote:
>>
>>> I thought I'd point out a subtle potential problem.  Assume N = 1000,
>>> f(x_j) = j-1, and so k = 1 because 0 <= f(x_j) < N because 0 < j <=
>>> 1000.  Now, store x_{7i} in a Set, for all i such that 1 <= 7i <= 1000.
>>> Clearly, f(x) is a perfect hash function as defined above.  However, if
>>> the size of the table used by the set is a multiple of 7, performance
>>> will be terrible.
>>>
>>>
>>
>> Huh? If your hash function produces integers between 0 and 999, your
>> table size must be at least 1000, otherwise you store past the end of
>> your table.
>>
>> Regards,
>>
>> -Martin
>>
>>
>
> _______________________________________________
> Pharo-project mailing list
> [email protected]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



-- 
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Reply via email to