performance testing with Test::

Jim Cromie Tue, 18 Aug 2009 12:01:30 -0700

I was pondering a perl-core todo-item

=head2 A decent benchmark


C<perlbench> seems impervious to any recent changes made to the perl core. It
would be useful to have a reasonable general benchmarking suite that roughly
represented what current perl programs do, and measurably reported whether
tweaks to the core improve, degrade or don't really affect performance, to
guide people attempting to optimise the guts of perl. Gisle would welcome
new tests for perlbench.

IMO, the strategy is :
- ghetto-ize the problem - leave it to perlbench.
- tighten up the measurements
  including empty-loop baseline / calibration (also done by Benchmark)
- hammer specific language constructs,
  hoping to stress specific code paths

What's notable in its absence is any *real* use of perl-dist's tests.
I dug into the code, and found that this works.

$> HARNESS_TIMER=1 make test
ext/threads/t/end.............................................ok       60 ms
ext/threads/t/err.............................................ok     1090 ms
ext/threads/t/exit............................................ok      365 ms
ext/threads/t/free2...........................................ok     2367 ms
ext/threads/t/free............................................ok    12383 ms

so clearly:
- somebody thought of this already ;-)
- does Test::* support this feature ?
- t/harness doesnt, but HARNESS_OPTIONS could allow 't'
- it wouldnt be too hard to collect this data.
- could by default for -Dusedevel

if t/TEST ( etc ..) saved this to persistent data - Storable, SQLlite
files, etc..
# for example
$data = {
  "./perl -Ilib -V" =>   `./perl -Ilib -V`,
  ok_times => { ... successful tests => times }
  nok_tests => [ .. for completeness ]
  platform_static => { "/proc/cpuinfo" => `cat /proc/cpuinfo` }
  platform_load =>  { "uptime" => [ `uptime && sleep 5` while 1 ` ], #notionally
}
.. and collected it over time,
say into ~/.perl-core-hacking/blead-run-times.stor
each bleader (assuming this is on by default in -Dusedevel builds) would
- collect significant longitudinal data on their favorite build (forex: -des)
- have same data (just less) for other systematic and/or ad-hoc configs.

If we then integrate that data, there'd be enough info (eventually) to:
- see statistical groupings around important build options
  -D/-Uusethreads, -U/-DDEBUGGING are 2 obvious ones
- see statistical groups around platforms.
- see the fuzz due to system-load-noise, etc.
   (finally satisfying, not frustrating)
- see which tests are more/less load/build-option dependent
- etc.

This kind of thing could be useful for CPAN modules too,
presumably through some optional keyword-ish:
   -perf_test_results_to => foo-module-p...@cpan.org,



So, Im attaching a strawman patch to t/TEST which collects the timing info.

running it a bunch of times, and averaging the results, I get data like this:

run/switchF.t    19 avg: 0.005  sdev: 0.000 fuzz: 0.08
run/switchF1.t   19 avg: 0.016  sdev: 0.009 fuzz: 0.54
run/switchI.t    19 avg: 0.047  sdev: 0.013 fuzz: 0.28
run/switcha.t    19 avg: 0.006  sdev: 0.003 fuzz: 0.42
run/switchd.t    19 avg: 0.075  sdev: 0.027 fuzz: 0.36
run/switches.t   19 avg: 0.555  sdev: 0.048 fuzz: 0.09
run/switchn.t    19 avg: 0.007  sdev: 0.006 fuzz: 0.84
run/switchp.t    19 avg: 0.007  sdev: 0.004 fuzz: 0.53
run/switcht.t    19 avg: 0.057  sdev: 0.018 fuzz: 0.31
run/switchx.t    19 avg: 0.062  sdev: 0.020 fuzz: 0.33
uni/cache.t      19 avg: 0.115  sdev: 0.055 fuzz: 0.48
uni/chomp.t      19 avg: 0.837  sdev: 0.261 fuzz: 0.31
uni/chr.t        19 avg: 0.382  sdev: 0.168 fuzz: 0.44
uni/class.t      19 avg: 4.028  sdev: 3.601 fuzz: 0.89
uni/fold.t       19 avg: 0.807  sdev: 0.369 fuzz: 0.46
uni/greek.t      19 avg: 0.247  sdev: 0.127 fuzz: 0.51
uni/latin2.t     19 avg: 0.164  sdev: 0.032 fuzz: 0.19
uni/lex_utf8.t   19 avg: 0.112  sdev: 0.067 fuzz: 0.60
uni/lower.t      19 avg: 0.386  sdev: 0.065 fuzz: 0.17
uni/overload.t   19 avg: 0.617  sdev: 0.321 fuzz: 0.52
uni/sprintf.t    19 avg: 0.035  sdev: 0.019 fuzz: 0.55
uni/tie.t        19 avg: 0.075  sdev: 0.072 fuzz: 0.96
uni/title.t      19 avg: 0.269  sdev: 0.069 fuzz: 0.26
uni/tr_7jis.t    19 avg: 0.135  sdev: 0.078 fuzz: 0.58
uni/tr_eucjp.t   19 avg: 0.117  sdev: 0.037 fuzz: 0.31


19 make test runs --> 19 elapsed times

"fuzz" is standard-deviation / avg -- low fuzz means the test's runtimes are
rather consistent across the sampleset.   These low-fuzz tests are
good for detecting
actual runtime improvements, since theyll be less hidden in the noise.


Id like to discuss all of this, towards :

- devising an acceptable interface to use in t/TEST
    presumably a require + qualified-fn-call.
- fitting same into Test::*  && TAP, as applicable
- whether the body of data we could get is useful
- if we collect it, will they come ?
- how does this fit with other plans for blead, Test:: etc.

diff --git a/t/TEST b/t/TEST
index 2634485..e75c7bb 100755
--- a/t/TEST
+++ b/t/TEST
@@ -19,6 +19,7 @@ delete $ENV{PERL5LIB};
 # remove empty elements due to insertion of empty symbols via "''p1'" syntax
 @ARGV = grep($_,@ARGV) if $^O eq 'VMS';
 our $show_elapsed_time = $ENV{HARNESS_TIMER} || 0;
+our %elapsed; # collect actuals
 
 # Cheesy version of Getopt::Std.  Maybe we should replace it with that.
 {
@@ -32,6 +33,7 @@ our $show_elapsed_time = $ENV{HARNESS_TIMER} || 0;
 	$::with_utf8 = 1 if $1 eq 'utf8';
 	$::with_utf16 = 1 if $1 eq 'utf16';
 	$::taintwarn = 1 if $1 eq 'taintwarn';
+	$::show_elapsed_time = 1 if $1 eq 'times'; # alt for HARNESS_TIMER
 	$ENV{PERL_CORE_MINITEST} = 1 if $1 eq 'minitest';
 	if ($1 =~ /^deparse(,.+)?$/) {
 	    $::deparse = 1;
@@ -70,6 +72,25 @@ $ENV{EMXSHELL} = 'sh';        # For OS/2
 use TestInit;
 use File::Spec;
 if ($show_elapsed_time) { require Time::HiRes }
+END { # maybe collect runtime data
+    last unless -d "$ENV{HOME}/.p5p";	# no dest-dir
+    # collect additional ID-ish info: branch/HEAD, build-conf, etc
+    # require Devel::Test
+    # $elapsed{platform} = Devel::Test::platform_info();
+    # write it if desired format is given
+    if ($show_elapsed_time =~ /stor/i) {
+	require Storable;
+	Storable::store(\%elapsed,
+			"$ENV{HOME}/.p5p/mktest-perf.$$.stor");
+    }
+    elsif ($show_elapsed_time =~ /Dumper/i) {
+	require Data::Dumper;
+	$Data::Dumper::Sortkeys = 1;
+	open(my $rtdata, ">$ENV{HOME}/.p5p/mktest-perf.$$")
+	    or die "cant write perf-data: $!\n";
+	print $rtdata Data::Dumper::Dumper(\%elapsed);
+    }
+}
 my $curdir = File::Spec->curdir;
 my $updir  = File::Spec->updir;
 
@@ -508,12 +529,11 @@ EOT
 	}
 	else {
 	    if ($max) {
-		my $elapsed;
+		my $elapsed = "";
 		if ( $show_elapsed_time ) {
-		    $elapsed = sprintf( " %8.0f ms", (Time::HiRes::time() - $test_start_time) * 1000 );
-		}
-		else {
-		    $elapsed = "";
+		    my $et = Time::HiRes::time() - $test_start_time;
+		    $elapsed = sprintf( " %8.0f ms", $et * 1000 );
+		    $elapsed{$test} = $et;
 		}
 		print "${te}ok$elapsed\n";
 		$good_files++;

performance testing with Test::

Reply via email to