> I found some strange behaviour in the hash function.
> When applied to numbers, it works ok, but when applied
> to strings, it leads to a huge number of collisions.
> # uniq hashed values, using 50000 different numbers: 50000
> # uniq hashed values, using 50000 different strings: 10271
> # ==========================================================
You are right. This is not optimal. Thanks for the hint!
The reason is the initSeed()/initSeedE_E function in
It uses numbers directly, and the names (which are techincally also
numbers) in case of symbols. But for symbols these numbers have less
entropy, as they are not as denseley packed bit patterns like pure
numbers (basically what Oskar Wieland points out in his reply).
initSeed() should be improved, by doing more than simply adding up the
32-bit or 64-bit "digits", at least in case of symbols. Any proposals?