-----BEGIN PGP SIGNED MESSAGE-----
Moin,
[Warning! Long mail!]
I am currently tracking down why Benchmark.pm takes so long for fast
operations:
Modified Benchmrk.pm to print out the n's and timecounts:
te@null:~ > cat time.pl
#!/usr/bin/perl -w
use lib '.';
use Benchmark;
$| = 1;
$a = 0; $b = 9; $c = 4;
timethese ( shift || -3,
{
add => sub { $a = $b + $c; },
} ),
te@null:~ > time perl time.pl -2
Benchmark: running add for at least 2 CPU seconds...
Sun Mar 3 13:15:06 2002 at 1
Sun Mar 3 13:15:06 2002 at 2
Sun Mar 3 13:15:06 2002 at 4
Sun Mar 3 13:15:06 2002 at 8
Sun Mar 3 13:15:06 2002 at 16
Sun Mar 3 13:15:06 2002 at 32
Sun Mar 3 13:15:06 2002 at 64
Sun Mar 3 13:15:06 2002 at 128
Sun Mar 3 13:15:06 2002 at 256
Sun Mar 3 13:15:06 2002 at 512
Sun Mar 3 13:15:06 2002 at 1024
Sun Mar 3 13:15:06 2002 at 2048
Sun Mar 3 13:15:06 2002 at 4096
Sun Mar 3 13:15:06 2002 at 8192
Sun Mar 3 13:15:06 2002 at 16384
Sun Mar 3 13:15:06 2002 at 32768
Sun Mar 3 13:15:06 2002 at 65536
Sun Mar 3 13:15:07 2002 at 131072
Sun Mar 3 13:15:09 2002 at tc 0.11
Sun Mar 3 13:15:09 2002 final tc 0.21
add: 3 wallclock secs ( 2.17 usr + 0.00 sys = 2.17 CPU) @
2306248.85/s (n=5004560)
real 0m16.035s
user 0m15.900s
sys 0m0.030s
te@null:~ > time perl time.pl -3
Benchmark: running add for at least 3 CPU seconds...
Sun Mar 3 13:15:37 2002 at 1
Sun Mar 3 13:15:37 2002 at 2
Sun Mar 3 13:15:37 2002 at 4
Sun Mar 3 13:15:37 2002 at 8
Sun Mar 3 13:15:37 2002 at 16
Sun Mar 3 13:15:37 2002 at 32
Sun Mar 3 13:15:37 2002 at 64
Sun Mar 3 13:15:37 2002 at 128
Sun Mar 3 13:15:37 2002 at 256
Sun Mar 3 13:15:37 2002 at 512
Sun Mar 3 13:15:37 2002 at 1024
Sun Mar 3 13:15:37 2002 at 2048
Sun Mar 3 13:15:37 2002 at 4096
Sun Mar 3 13:15:37 2002 at 8192
Sun Mar 3 13:15:37 2002 at 16384
Sun Mar 3 13:15:38 2002 at 32768
Sun Mar 3 13:15:38 2002 at 65536
Sun Mar 3 13:15:38 2002 at 131072
Sun Mar 3 13:15:39 2002 at 262144
Sun Mar 3 13:15:42 2002 at tc 0.23
Sun Mar 3 13:15:42 2002 final tc 0.32
add: 3 wallclock secs ( 3.06 usr + -0.00 sys = 3.06 CPU) @
2309890.52/s (n=7068265)
real 0m23.416s
user 0m23.260s
sys 0m0.030s
te@null:~ >
- -4 takes about 30 seconds.
te@null:~ > time perl time.pl -.2
Benchmark: running add for at least 0.2 CPU seconds...
Sun Mar 3 13:35:16 2002 at 1
Sun Mar 3 13:35:16 2002 at 2
Sun Mar 3 13:35:16 2002 at 4
Sun Mar 3 13:35:16 2002 at 8
Sun Mar 3 13:35:16 2002 at 16
Sun Mar 3 13:35:16 2002 at 32
Sun Mar 3 13:35:16 2002 at 64
Sun Mar 3 13:35:16 2002 at 128
Sun Mar 3 13:35:16 2002 at 256
Sun Mar 3 13:35:16 2002 at 512
Sun Mar 3 13:35:16 2002 at 1024
Sun Mar 3 13:35:16 2002 at 2048
Sun Mar 3 13:35:16 2002 at 4096
Sun Mar 3 13:35:16 2002 at 8192
Sun Mar 3 13:35:16 2002 at 16384
Sun Mar 3 13:35:17 2002 at 32768
Sun Mar 3 13:35:17 2002 at 65536
Sun Mar 3 13:35:17 2002 at 131072
Sun Mar 3 13:35:18 2002 at 262144
Sun Mar 3 13:35:19 2002 final tc 0.23
add: -1 wallclock secs ( 0.20 usr + 0.00 sys = 0.20 CPU) @
2393485.00/s (n=478697)
(warning: too few iterations for a reliable count)
real 0m4.089s
user 0m4.070s
sys 0m0.000s
te@null:~ >te@null:~ > time perl time.pl -.5
Benchmark: running add for at least 0.5 CPU seconds...
Sun Mar 3 13:35:33 2002 at 1
Sun Mar 3 13:35:33 2002 at 2
Sun Mar 3 13:35:33 2002 at 4
Sun Mar 3 13:35:33 2002 at 8
Sun Mar 3 13:35:33 2002 at 16
Sun Mar 3 13:35:33 2002 at 32
Sun Mar 3 13:35:33 2002 at 64
Sun Mar 3 13:35:33 2002 at 128
Sun Mar 3 13:35:33 2002 at 256
Sun Mar 3 13:35:33 2002 at 512
Sun Mar 3 13:35:33 2002 at 1024
Sun Mar 3 13:35:33 2002 at 2048
Sun Mar 3 13:35:33 2002 at 4096
Sun Mar 3 13:35:33 2002 at 8192
Sun Mar 3 13:35:33 2002 at 16384
Sun Mar 3 13:35:33 2002 at 32768
Sun Mar 3 13:35:34 2002 at 65536
Sun Mar 3 13:35:34 2002 at 131072
Sun Mar 3 13:35:35 2002 final tc 0.11
add: 1 wallclock secs ( 0.56 usr + -0.01 sys = 0.55 CPU) @
2274801.82/s (n=1251141)
real 0m4.755s
user 0m4.750s
sys 0m0.010s
The "a bit more" stated in the doc is quite ..untrue ;)
It seems that the method of benchmarking takes much longer than the
benchmark in itself, for instance (example!)
6000000 empty loops = 20 seconds
6000000 code loops = 23 seconds
That will then determine that it takes 3 seconds for 6000000 ops and print
out the result of 2000000 ops/s. Side effect is that it still takes 23
seconds (plus the time to determine n and the first count) to find that out.
I have only some hazard ideas how to make it faster:
* Instead of doubling $n, multiply by 4. Since the worst case 0.1 second * 4
= 0.4, this is not much worse than now (0.1*2 = 0.2 seconds), but it
halves the steps to reach a first $n with high counts.
* The first tc could probably also improved, if we let it running for at
least one second, so gunning for 10% (0.4 for four seconds) can probably
improved. Maybe that step can be eliminated.
After changing $n *= 2 to $n *= 4 around line 591:
te@null:~ > time perl time.pl -.2
Benchmark: running add for at least 0.2 CPU seconds...
Sun Mar 3 13:37:11 2002 at 1
Sun Mar 3 13:37:11 2002 at 4
Sun Mar 3 13:37:11 2002 at 16
Sun Mar 3 13:37:11 2002 at 64
Sun Mar 3 13:37:11 2002 at 256
Sun Mar 3 13:37:11 2002 at 1024
Sun Mar 3 13:37:11 2002 at 4096
Sun Mar 3 13:37:11 2002 at 16384
Sun Mar 3 13:37:11 2002 at 65536
Sun Mar 3 13:37:12 2002 final tc 0.12
Sun Mar 3 13:37:13 2002 Timing count n=262144
add: 1 wallclock secs ( 0.29 usr + 0.00 sys = 0.29 CPU) @
2485848.28/s (n=720896)
(warning: too few iterations for a reliable count)
real 0m2.874s
user 0m2.860s
sys 0m0.000s
te@null:~ >
te@null:~ > time perl time.pl -.5
Benchmark: running add for at least 0.5 CPU seconds...
Sun Mar 3 13:37:59 2002 at 1
Sun Mar 3 13:37:59 2002 at 4
Sun Mar 3 13:37:59 2002 at 16
Sun Mar 3 13:37:59 2002 at 64
Sun Mar 3 13:37:59 2002 at 256
Sun Mar 3 13:37:59 2002 at 1024
Sun Mar 3 13:37:59 2002 at 4096
Sun Mar 3 13:37:59 2002 at 16384
Sun Mar 3 13:37:59 2002 at 65536
Sun Mar 3 13:38:00 2002 final tc 0.11
add: 1 wallclock secs ( 0.54 usr + 0.00 sys = 0.54 CPU) @
2316927.78/s (n=1251141)
real 0m4.263s
user 0m4.240s
sys 0m0.020s
te@null:~ >
te@null:~ > time perl time.pl -3
Benchmark: running add for at least 3 CPU seconds...
Sun Mar 3 13:38:23 2002 at 1
Sun Mar 3 13:38:23 2002 at 4
Sun Mar 3 13:38:23 2002 at 16
Sun Mar 3 13:38:23 2002 at 64
Sun Mar 3 13:38:23 2002 at 256
Sun Mar 3 13:38:23 2002 at 1024
Sun Mar 3 13:38:23 2002 at 4096
Sun Mar 3 13:38:23 2002 at 16384
Sun Mar 3 13:38:23 2002 at 65536
Sun Mar 3 13:38:26 2002 at tc 0.13
Sun Mar 3 13:38:28 2002 at tc 0.26
Sun Mar 3 13:38:28 2002 final tc 0.34
add: 2 wallclock secs ( 3.06 usr + 0.00 sys = 3.06 CPU) @
2329991.50/s (n=7129774)
real 0m23.373s
user 0m23.370s
sys 0m0.010s
te@null:~ >
Seems to help only for > -0.5 cases and even this might not be the case
everytime (if the ops/s is even higher than 2 millions, it might help).
* Optimize the benchmark loop:
my ($subcode, $subref);
if (ref $c eq 'CODE') {
$subcode = "sub { for (1 .. $n) { local \$_; package $pack; &\$c; }
}";
$subref = eval $subcode;
}
else {
$subcode = "sub { for (1 .. $n) { local \$_; package $pack; $c;} }";
$subref = _doeval($subcode);
}
Maybe something along these lines (untested);
if (ref $c eq 'CODE') {
if ($n > 116) {
my $r = $n % 16; my $m = int($n / 16);
$subcode = "sub { for (1 .. $m) {"
. "local \$_; package $pack; "
. "&\$c; &\$c; &\$c; &\$c; "
. "&\$c; &\$c; &\$c; &\$c; "
. "&\$c; &\$c; &\$c; &\$c; "
. "&\$c; &\$c; &\$c; &\$c; } ";
$subcode .= " for (1 .. $r) { local \$_; package $pack; &\$c; }"
if $r != 0;
$subcode .= " } ";
}
}
else
{
$subcode = "sub { for (1 .. $n) { local \$_; package $pack;
&\$c; } }"
}
$subref = eval $subcode;
}
Basically, reduce the loopcount by some factor. Or the "local \$_; package
$pack" maybe can to be done only once, not for every loop?"
The empty loop would then be adjusted accordingly.
A quick hack reducing the loop-count by factor 16, without adjusting the
empty loop (perhaps the same code is used to construct the empty loop, I
didn't look):
te@null:~ > time perl time.pl -3
Benchmark: running add for at least 3 CPU seconds...
Sun Mar 3 13:47:01 2002 at 1
Sun Mar 3 13:47:01 2002 at 4
Sun Mar 3 13:47:01 2002 at 16
Sun Mar 3 13:47:01 2002 at 64
Sun Mar 3 13:47:01 2002 at 256
Sun Mar 3 13:47:01 2002 at 1024
Sun Mar 3 13:47:01 2002 at 4096
Sun Mar 3 13:47:01 2002 at 16384
Sun Mar 3 13:47:01 2002 at 65536
Sun Mar 3 13:47:03 2002 at tc 0.12
Sun Mar 3 13:47:04 2002 at tc 0.28
Sun Mar 3 13:47:04 2002 final tc 0.336
Sun Mar 3 13:47:16 2002 Timing count n=355267
Sun Mar 3 13:47:17 2002 Timing count n=262144
add: 5 wallclock secs ( 3.10 usr + 0.01 sys = 3.11 CPU) @
2532157.88/s (n=7875011)
real 0m16.297s
user 0m16.250s
sys 0m0.020s
Better, but not good enough.
The problems seem only to arise when the code to bench starts to getting
near or faster than the loop-overhead to do the benchmark. The main
question is, of course: Does it make sense to use "timethese ( -3, ..." or
is "timethese ( -0.5" enough? Will a longer runtime improve the accuracy?
Seeing how wild the values vary in even subsequent runs, that sounds
questionable ;)
What do the benchmark experts (e.g. the original authors) say to this
matter?
Best regards,
Tels
PS: I'll also provide some doc patch fixing some spelling and mentioning
Devel::Cover, which can not only do timings, but looks more promising than
DProf, which starts to get more and more useless results (like: combined
percent of time uses exceeds 100% in quite some large ways etc).
- --
perl -MMath::String -e 'print \
Math::String->from_number("215960156869840440586892398248"),"\n"'
http://bloodgate.com/perl My current Perl projects
PGP key available on http://bloodgate.com/tels.asc or via email.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
iQEVAwUBPIIcpHcLPEOTuEwVAQGsfwf6A6xBn6Hn8WJgjDKGZLcTz6RHCtaJyFsj
WkcPQqKWB7dMuu1TFt8acTd0r26OpR/xKiDVTbmFvUHGqzbSuNuV+NmN5wRmfW5R
Kn+oexc60WhHoEfm9ATzTNnnxNftn8xo1ye736yYS50Dbpx7+CmUtlzcnhrN9AcK
OQmp0K8J8ZbQsJ0x3rZcB2uhidm+2oFzvnTIxUNv9so/TG9HP6SbFXQV2BQanm/1
8N5DY7vi+0xWFtPAyiXBu72ouGmSRIyfGBgko6NOfUMkplRYkLN6t+9tL5go8S58
o3M2D0MWtoGzHkbVdaAC5K+3YJkfP3PrgspeYJPveIZjoNpQ32ylcQ==
=G6B8
-----END PGP SIGNATURE-----