On Sat, Aug 22, 2009 at 10:27:58PM +0100, Nicholas Clark wrote:
> Perl 5's hash tables consist of an array of linked lists. The list elements
> consist of a pointer to the next element, pointer to the value stored, and
> pointer to a structure, usually shared, containing the key (length, octets,
> hash value, flags).
> A 32 bit hash value is computed from the string key:
> The bottom m bits of the hash value are used to index into the array
> and then the linked list is walked looking for the item.
> There is no direct relationship between n, the number of elements, and m, the
> number of bits used. The initial array size is 8, and the array is doubled in
> size if the number of keys becomes too large, or any linked list becomes to
> large (hsplit() is the splitting function):
> [Whilst I have refactored the implementation, I didn't write the original,
> so I have to work out the algorithm from it, and may be wrong]
And I forgot the bit of the implementation I *did* write.
Since perl 5.8.2 there is adaptative behaviour to defeat algorithmic complexity
attacks, whilst in the general case allowing pre-computed 32 bit hash values
passed into the API to be used.
If the hash splitting routine detects that doubling the bucket size *didn't*
split an overlong linked list:
then it enters hash randomisation mode. The hashing algorithm is changed to
one that uses a per-process random 32 bit seed, so that the distribution of
elements across the buckets changes completely. Pre-computed 32 bit hash
values passed into the API are now ignored. In the code, HvREHASH() is
flipped from false to true.
I was inspired by the Introspective Sort:
Crosby and Wallach's paper on Algorithmic Complexity Attacks is here: