List,
I tried to attempt to benchmark Nicholas' work on shrinking HVs, SVs et
al. At the end of the day I'm no longer sure what I've shown. At first I
thought I saw a spectalar improvement, but that turned out to be because
I was using a crappy distro perl, with threading and the kitchen sink
compiled in. When I turned to perls I compiled myself the results were
less clear.
I took my Regexp::Assemble module, because I know it is a heavy consumer
of hashes and arrays. For each pattern added, an array is created, and
is inserted into a structure which involves either replacing an array
element by a hash pointing to two arrays, or adding a key to an existing
hash to point to an array. In the former case, one of the second arrays
is a slice of the original array, the second is the tail of the added array.
After all the patterns have been loaded, a reduction step takes place.
This involves a depth-first scan through the structure and will make a
number of adjustments to it, but on the whole it is fairly read-only. I
patched the module to load Time::HiRes and print out the elapsed time to
insert all the patterns into the structure, and to print out the elapsed
time taken to reduce the structure.
I ran the test on two files. The first one contains about 3000 patterns
totalling about 100Kb, the second file was a /usr/dict/words file that
contains about 230000 lines. I ran the tests with 5.005_04, 5.6.2, 5.87
and blead (patch 24800).
The first file is more real-world, in that the file has an order that is
not at all congruent with the structure, and so the insertions are all
over the place which means that the structure get trampled on all over.
The file is, however, barely large enough to really get the module cooking.
The second file, being in alphabetical order, is a bit more contrived,
as once a hash has been hammered for a while during the load phase, it
will be left alone and won't be touched again until the reduction phase.
I suppose I should prepare a shuffled version of the file.
The results are as follows (I fear this will wrap terribly):
uh no, it does wrap terribly. Take a look at
http://www.landgren.net/perl/blead/20050611.txt
The run.sh cats the input file twice to /dev/null in the naive hope that
it will help warm the cache, and the fastest and slowest results for
each file on each perl are thrown away.
Executive summary: it looks like 5.8.7 is doing a better job than blead
at the moment. I haven't looked closely at what I chose in the different
builds but I doubt anything is radically different: I tend to build
perls the same way (notably: using perl's malloc).
If anyone has suggestions on how to tighten up the test environment I'm
all ears.
Thanks,
David