List,

I tried to attempt to benchmark Nicholas' work on shrinking HVs, SVs et al. At the end of the day I'm no longer sure what I've shown. At first I thought I saw a spectalar improvement, but that turned out to be because I was using a crappy distro perl, with threading and the kitchen sink compiled in. When I turned to perls I compiled myself the results were less clear.

I took my Regexp::Assemble module, because I know it is a heavy consumer of hashes and arrays. For each pattern added, an array is created, and is inserted into a structure which involves either replacing an array element by a hash pointing to two arrays, or adding a key to an existing hash to point to an array. In the former case, one of the second arrays is a slice of the original array, the second is the tail of the added array.

After all the patterns have been loaded, a reduction step takes place. This involves a depth-first scan through the structure and will make a number of adjustments to it, but on the whole it is fairly read-only. I patched the module to load Time::HiRes and print out the elapsed time to insert all the patterns into the structure, and to print out the elapsed time taken to reduce the structure.

I ran the test on two files. The first one contains about 3000 patterns totalling about 100Kb, the second file was a /usr/dict/words file that contains about 230000 lines. I ran the tests with 5.005_04, 5.6.2, 5.87 and blead (patch 24800).

The first file is more real-world, in that the file has an order that is not at all congruent with the structure, and so the insertions are all over the place which means that the structure get trampled on all over. The file is, however, barely large enough to really get the module cooking.

The second file, being in alphabetical order, is a bit more contrived, as once a hash has been hammered for a while during the load phase, it will be left alone and won't be touched again until the reduction phase. I suppose I should prepare a shuffled version of the file.

The results are as follows (I fear this will wrap terribly):

uh no, it does wrap terribly. Take a look at

  http://www.landgren.net/perl/blead/20050611.txt

The run.sh cats the input file twice to /dev/null in the naive hope that it will help warm the cache, and the fastest and slowest results for each file on each perl are thrown away.

Executive summary: it looks like 5.8.7 is doing a better job than blead at the moment. I haven't looked closely at what I chose in the different builds but I doubt anything is radically different: I tend to build perls the same way (notably: using perl's malloc).

If anyone has suggestions on how to tighten up the test environment I'm all ears.

Thanks,
David

Reply via email to