Part of why those attacks are so easy is that the factor for the
multiplicative hash function is very poor (i.e.: 33, 31, etc). There is
nothing spectacular about this reasoning, those "cool looking" factors
chosen using the digital (finger) method should have been stamped out a
long time ago. The hashing book I wrote has a ton of additional details
on how to choose a factor properly if you want to use multiplicative
hash functions, see here:
http://www.lulu.com/spotlight/avsmalltalkbooks
Of course you can avoid all that trouble by using a crypto hash, at the
cost of the (mostly unnecessary) crypto hashing.
The authors suggest using a randomized hash function, but how do
persistent hash tables survive across invocations of the randomizer?
Moreover, I suspect the following:
1. The multiplicative hash function they show as "improved" in Perl
5.8.1 seems to use effectively 1025 as a factor which is generally
terrible (slide 77).
2. How is the seed determined? What criteria are used to guarantee it
is a good seed value?
3. It is likely you can still abuse the hash function with series of
null bytes, or by constructing strings that abuse that poor hash
function into producing tons of collisions mod 2^x, x < 32.
To sum it up...
1. If you use multiplicative hash functions, then you must choose the
factor properly. Go read the literature.
2. If you are expecting attackers, consider making it difficult to find
collisions in your hash function. A way to make it REALLY hard is to
use a crypto hash. But the applicability depends on the application.
Generally speaking, making something like a crypto hash function the
default hash function is a bad idea.
3. Just "randomizing" a hash function can be as useful as "randomizing"
a random number generator. Obfuscation is not equivalent to a fix.
Without a clear understanding of why the results have to be good,
chances are the results will be poor.
On 12/30/11 7:58 , Philippe Marschall wrote:
Hi
As you probably noted string hash collisions are all the rage these days
[1]. Has anybody looked into whether this applies to Pharo as well?
[1] http://events.ccc.de/congress/2011/Fahrplan/events/4680.en.html
Cheers
Philippe