On Sat, Aug 22, 2009 at 10:27:58PM +0100, Nicholas Clark wrote: > Perl 5's hash tables consist of an array of linked lists. The list elements > consist of a pointer to the next element, pointer to the value stored, and > pointer to a structure, usually shared, containing the key (length, octets, > hash value, flags). > > A 32 bit hash value is computed from the string key: > > http://perl5.git.perl.org/perl.git/blob/c8e503bfe4136506c95da755:/hv.h#l120 > > The bottom m bits of the hash value are used to index into the array > > http://perl5.git.perl.org/perl.git/blob/c8e503bfe4136506c95da755:/hv.c#l789 > > and then the linked list is walked looking for the item. > > There is no direct relationship between n, the number of elements, and m, the > number of bits used. The initial array size is 8, and the array is doubled in > size if the number of keys becomes too large, or any linked list becomes to > large (hsplit() is the splitting function): > > http://perl5.git.perl.org/perl.git/blob/c8e503bfe4136506c95da755:/hv.c#l815 > > [Whilst I have refactored the implementation, I didn't write the original, > so I have to work out the algorithm from it, and may be wrong]
And I forgot the bit of the implementation I *did* write. Since perl 5.8.2 there is adaptative behaviour to defeat algorithmic complexity attacks, whilst in the general case allowing pre-computed 32 bit hash values passed into the API to be used. If the hash splitting routine detects that doubling the bucket size *didn't* split an overlong linked list: http://perl5.git.perl.org/perl.git/blob/c8e503bfe4136506c95da755:/hv.c#l1182 then it enters hash randomisation mode. The hashing algorithm is changed to one that uses a per-process random 32 bit seed, so that the distribution of elements across the buckets changes completely. Pre-computed 32 bit hash values passed into the API are now ignored. In the code, HvREHASH() is flipped from false to true. I was inspired by the Introspective Sort: http://en.wikipedia.org/wiki/Introsort Crosby and Wallach's paper on Algorithmic Complexity Attacks is here: http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/ Nicholas Clark