Re: time complexity of searching elements in hash tables in Perl

Nicholas Clark Sat, 22 Aug 2009 14:48:04 -0700

On Sat, Aug 22, 2009 at 10:27:58PM +0100, Nicholas Clark wrote:

> Perl 5's hash tables consist of an array of linked lists. The list elements
> consist of a pointer to the next element, pointer to the value stored, and
> pointer to a structure, usually shared, containing the key (length, octets,
> hash value, flags).
> 
> A 32 bit hash value is computed from the string key:
> 
> http://perl5.git.perl.org/perl.git/blob/c8e503bfe4136506c95da755:/hv.h#l120
> 
> The bottom m bits of the hash value are used to index into the array
> 
> http://perl5.git.perl.org/perl.git/blob/c8e503bfe4136506c95da755:/hv.c#l789
> 
> and then the linked list is walked looking for the item.
> 
> There is no direct relationship between n, the number of elements, and m, the
> number of bits used. The initial array size is 8, and the array is doubled in
> size if the number of keys becomes too large, or any linked list becomes to
> large (hsplit() is the splitting function):
> 
> http://perl5.git.perl.org/perl.git/blob/c8e503bfe4136506c95da755:/hv.c#l815
> 
> [Whilst I have refactored the implementation, I didn't write the original,
> so I have to work out the algorithm from it, and may be wrong]


And I forgot the bit of the implementation I *did* write.

Since perl 5.8.2 there is adaptative behaviour to defeat algorithmic complexity
attacks, whilst in the general case allowing pre-computed 32 bit hash values
passed into the API to be used.

If the hash splitting routine detects that doubling the bucket size *didn't*
split an overlong linked list:

http://perl5.git.perl.org/perl.git/blob/c8e503bfe4136506c95da755:/hv.c#l1182

then it enters hash randomisation mode. The hashing algorithm is changed to
one that uses a per-process random 32 bit seed, so that the distribution of
elements across the buckets changes completely. Pre-computed 32 bit hash
values passed into the API are now ignored. In the code, HvREHASH() is
flipped from false to true.

I was inspired by the Introspective Sort:
http://en.wikipedia.org/wiki/Introsort

Crosby and Wallach's paper on Algorithmic Complexity Attacks is here:
http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec2003/

Nicholas Clark

Re: time complexity of searching elements in hash tables in Perl

Reply via email to