Hello WebKittens, I noticed the other day that our hashing method in wtf/HashFunctions.h for hashing of pairs can be improved.
We have a hash method (WTF::intHash) for a single integer that takes 32 bits as input and produces a key of 32 bits. This works well enough for hashing an integer, as it is a reversible function. To hash a pair (A, B) of 32 bits, we hash each of A and B with the above method, to produce (A', B'). Then concatenate the two and hash the result to produce a 64 bit hash code. This whole process is reversible. However, as a final step we truncate the hash code to 32 bits, which destroys the properties of the hashing method. The result is a hash function that collides more than it should for typical inputs. I am proposing a patch [1] to improve the hash method for keys composed of pairs. The new method [2][3][4] is both almost-universal and efficient for machines to compute. To demonstrate the difference I inserted pairs of integers into a WTF::HashMap. A good hash method will populate the HashMap more densely. Whereas a bad hash method does not and will cause the HashMap to require more space and more calls to realloc(), which is a costly function and will be slower in the end. In testing, the new hash method reduced the time to insert integer pair keys into a WTF::HashMap by between 23% and 49% for different sized input sets. For example, with various input set sizes, the time to insert into the WTF::HashMap, across multiple runs, was reduced by: 8192 elements: 32% 31% 30% 34% 34% 33% 4096 elements: 47% 48% 48% 47% 47% 49% 1024 elements: 43% 43% 39% 42% 35% 256 elements: 25% 39% 33% 41% 23% Lookup times into an empty HashMap were also improved, incidentally. For example, with the old intHash method, making 8192 queries requires approximately 129ms on my machine while the new method takes less than 1ms. All testing was done on a 64-bit x86 machine. The tests I ran are attached to the bug [1] for those who may be interested in repeating the experiment [5]. I want to repeat that the reason the new hash method is faster is because of better hashing behaviour - ie, fewer collisions and a denser resulting hash table. I am not comparing the time to do the hash() itself. Any feedback would be welcome on the bug! Thanks, Dana [1] https://bugs.webkit.org/show_bug.cgi?id=96022 [2] http://opendatastructures.org/versions/edition-0.1d/ods-java/node33.html#SECTION00832000000000000000 [3] http://pages.cpsc.ucalgary.ca/~woelfel/paper/efficient_strongly/efficient_strongly.pdf [4] http://pages.cpsc.ucalgary.ca/~woelfel/paper/diss/diss.pdf [5] https://bugs.webkit.org/attachment.cgi?id=162804 _______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo/webkit-dev

