Some datapoints on hash performance

David Landgren Sat, 11 Jun 2005 15:10:41 -0700

List,

I tried to attempt to benchmark Nicholas' work on shrinking HVs, SVs etal. At the end of the day I'm no longer sure what I've shown. At first Ithought I saw a spectalar improvement, but that turned out to be becauseI was using a crappy distro perl, with threading and the kitchen sinkcompiled in. When I turned to perls I compiled myself the results wereless clear.

I took my Regexp::Assemble module, because I know it is a heavy consumerof hashes and arrays. For each pattern added, an array is created, andis inserted into a structure which involves either replacing an arrayelement by a hash pointing to two arrays, or adding a key to an existinghash to point to an array. In the former case, one of the second arraysis a slice of the original array, the second is the tail of the added array.

After all the patterns have been loaded, a reduction step takes place.This involves a depth-first scan through the structure and will make anumber of adjustments to it, but on the whole it is fairly read-only. Ipatched the module to load Time::HiRes and print out the elapsed time toinsert all the patterns into the structure, and to print out the elapsedtime taken to reduce the structure.

I ran the test on two files. The first one contains about 3000 patternstotalling about 100Kb, the second file was a /usr/dict/words file thatcontains about 230000 lines. I ran the tests with 5.005_04, 5.6.2, 5.87and blead (patch 24800).

The first file is more real-world, in that the file has an order that isnot at all congruent with the structure, and so the insertions are allover the place which means that the structure get trampled on all over.The file is, however, barely large enough to really get the module cooking.

The second file, being in alphabetical order, is a bit more contrived,as once a hash has been hammered for a while during the load phase, itwill be left alone and won't be touched again until the reduction phase.I suppose I should prepare a shuffled version of the file.


The results are as follows (I fear this will wrap terribly):

uh no, it does wrap terribly. Take a look at

  http://www.landgren.net/perl/blead/20050611.txt

The run.sh cats the input file twice to /dev/null in the naive hope thatit will help warm the cache, and the fastest and slowest results foreach file on each perl are thrown away.

Executive summary: it looks like 5.8.7 is doing a better job than bleadat the moment. I haven't looked closely at what I chose in the differentbuilds but I doubt anything is radically different: I tend to buildperls the same way (notably: using perl's malloc).

If anyone has suggestions on how to tighten up the test environment I'mall ears.


Thanks,
David

Some datapoints on hash performance

Reply via email to