On Thu, 05 Oct 2006 11:32:37 -0700, Carl Worth wrote: > > Another tool that will be helpful to have is something for doing > historical comparison over several runs. The simplest tool, and very > useful, would be a "performance diff" that takes two runs and reports > the difference, (perhaps only showing tests where the results differ > more than a single standard deviation). I would use that kind of tool > constantly to ensure that submitted patches to provide desired > performance improvements.
I've written that program now. It's called cairo-perf-diff and it's
built in the cairo/perf directory. The Makefile won't install it or
anything, as I figure it's easy enough for interested people to just
manually copy it to ~/bin or whatever.
The interesting part of making this program work well is in what it
_doesn't_ show. Currently it is discarding as uninteresting any change
for which the mean values are not separated by more than 3 of the
standard deviation of each.
Ideally, that's the only kind of discarding we would do, but it's not
quite working well enough yet. So, currently I'm also discarding any
changes below a given threshold, (5% by default, but can also be
specified as the third argument on the command line).
Even then, it's still not discarding all the noise. There's a really
easy test for this. Just run cairo-perf twice (saving the output from
each run into first.perf and second.perf) and then run:
cairo-perf first.perf second.perf 0.0
(That third 0.0 forces it to only discard based on overlapping
probability distributions based on the 3 standard deviations---and not
too discard things based on the percentage change being too small.)
If everything were working correctly, the output from the above would
be empty, since there should be no interesting changes in the
performance results, (and any variation should be captured by the
reported standard deviations). But the results aren't empty yet.
I did some things to attempt to improve this already. For example, I've
made cairo-perf output the number of ticks it measures in addition to
the time in milliseconds it estimates, (based on an estimate of the CPU
frequency that it measures). So cairo-perf-diff computes only on the
ticks columns, (but puts the time in its output for readability).
I think other problems are the fixed-percentage outlier elimination
and early bailout based on a stably low standard deviation. I think
these prevent the standard deviation from capturing the true amount of
variation. I started some work to eliminate the early bailout and to
do adaptive outlier detection, (based on the conventional "1.5 times
the interquartile range above the third quartile or below the first
quartile" http://mathworld.wolfram.com/Outlier.html ).
I haven't succeeded at making great improvements along those lines,
(particularly in light of the fact that removing the early bail out
slows things down a lot). And I really need to start using this tool
to land cairo patches rather than develop it. So if anyone else wants
to improve things to try to get the command above to report nothing,
then that would be greatly appreciated.
In the meantime, here's a sample showing what the output can look
like. Here's what cairo-perf-diff gives me when I give it the results
of cairo-perf before and after the patch that Monty provided for
fixing the subimage_copy performance bug in cairo:
-Carl
Speedups
========
xlib-rgba subimage_copy-512 3.93 2.46% -> 0.07 2.71%: 52.91x
faster
███████████████████████████████████████████████████▉
xlib-rgb subimage_copy-512 4.03 1.97% -> 0.09 2.61%: 44.74x
faster
███████████████████████████████████████████▊
xlib-rgba subimage_copy-256 1.02 2.25% -> 0.07 0.56%: 14.42x
faster
█████████████▍
xlib-rgba text_image_rgb_over-256 63.21 1.53% -> 11.87 2.17%: 5.33x
faster
████▍
xlib-rgba text_image_rgba_over-256 62.31 0.72% -> 11.87 2.82%: 5.25x
faster
████▎
xlib-rgba text_image_rgba_source-256 67.97 0.85% -> 16.48 2.23%: 4.13x
faster
███▏
xlib-rgba text_image_rgb_source-256 68.82 0.55% -> 16.93 2.10%: 4.07x
faster
███▏
xlib-rgba subimage_copy-128 0.19 1.72% -> 0.06 0.85%: 3.10x
faster
██▏
xlib-rgb text_image_rgb_over-256 108.22 0.40% -> 57.47 0.37%: 1.88x
faster
▉
xlib-rgb text_image_rgba_over-256 107.32 0.59% -> 57.32 0.78%: 1.87x
faster
▉
xlib-rgb text_image_rgb_source-256 114.92 0.44% -> 61.73 0.79%: 1.86x
faster
▉
xlib-rgb text_image_rgba_source-256 114.01 0.51% -> 61.69 0.51%: 1.85x
faster
▉
xlib-rgba subimage_copy-64 0.11 2.24% -> 0.06 0.73%: 1.83x
faster
▉
xlib-rgb subimage_copy-256 2.81 1.57% -> 1.65 1.19%: 1.71x
faster
▊
xlib-rgba text_image_rgb_over-128 4.78 2.22% -> 2.85 1.06%: 1.68x
faster
▋
xlib-rgba text_image_rgba_over-128 4.72 1.38% -> 2.83 0.92%: 1.67x
faster
▋
xlib-rgba text_image_rgb_source-128 5.82 0.22% -> 3.92 0.57%: 1.48x
faster
▌
xlib-rgba text_image_rgba_source-128 5.79 0.25% -> 3.93 1.56%: 1.47x
faster
▌
xlib-rgba text_image_rgba_over-64 1.53 1.03% -> 1.13 0.42%: 1.35x
faster
▍
xlib-rgba text_image_rgb_over-64 1.52 0.45% -> 1.13 1.15%: 1.34x
faster
▍
xlib-rgb subimage_copy-64 0.25 1.04% -> 0.19 2.61%: 1.34x
faster
▍
xlib-rgb subimage_copy-128 0.64 1.65% -> 0.50 1.09%: 1.27x
faster
▎
xlib-rgba fill_radial_rgba_over-256 9.75 0.95% -> 7.81 2.55%: 1.25x
faster
▎
xlib-rgba fill_image_rgb_over-256 2.56 0.77% -> 2.07 1.49%: 1.24x
faster
▎
xlib-rgba fill_image_rgba_over-256 2.55 0.41% -> 2.06 1.01%: 1.23x
faster
▎
xlib-rgba text_image_rgb_source-64 2.27 0.91% -> 1.88 0.20%: 1.21x
faster
▎
xlib-rgba fill_radial_rgb_over-256 9.68 0.60% -> 8.17 0.51%: 1.18x
faster
▏
xlib-rgba fill_image_rgba_source-256 3.95 2.11% -> 3.35 1.51%: 1.18x
faster
▏
xlib-rgba subimage_copy-32 0.07 1.57% -> 0.06 0.91%: 1.17x
faster
▏
xlib-rgba text_image_rgba_source-64 2.25 0.28% -> 1.92 1.57%: 1.17x
faster
▏
xlib-rgba fill_image_rgb_source-256 3.85 0.39% -> 3.32 1.20%: 1.16x
faster
▏
xlib-rgb text_image_rgb_over-64 4.60 2.34% -> 4.06 0.51%: 1.13x
faster
▏
xlib-rgb text_image_rgb_over-128 16.05 1.57% -> 14.24 1.86%: 1.13x
faster
▏
xlib-rgb text_image_rgb_source-128 17.20 2.02% -> 15.32 1.76%: 1.12x
faster
▏
xlib-rgb text_image_rgba_over-64 4.54 0.71% -> 4.11 1.08%: 1.10x
faster
▏
xlib-rgb text_image_rgb_source-64 5.03 0.35% -> 4.59 0.16%: 1.10x
faster
▏
xlib-rgba fill_image_rgba_over-64 0.36 1.78% -> 0.33 0.61%: 1.09x
faster
▏
xlib-rgb text_image_rgba_source-64 4.99 0.20% -> 4.61 0.49%: 1.08x
faster
▏
xlib-rgb subimage_copy-32 0.11 1.24% -> 0.10 1.13%: 1.07x
faster
▏
xlib-rgba fill_radial_rgb_source-128 2.54 0.44% -> 2.38 0.31%: 1.07x
faster
▏
xlib-rgba fill_image_rgba_source-64 0.48 0.65% -> 0.45 0.58%: 1.07x
faster
▏
xlib-rgba fill_radial_rgb_over-128 2.19 0.33% -> 2.06 1.00%: 1.06x
faster
▏
xlib-rgba fill_image_rgb_source-64 0.48 0.60% -> 0.45 0.72%: 1.06x
faster
Slowdowns
=========
xlib-rgba paint_similar_rgba_source-256 0.12 2.52% -> 0.16 2.81%: 1.33x
slower
▍
image-rgba paint_image_rgba_source-256 0.08 0.39% -> 0.10 2.45%: 1.25x
slower
▎
image-rgba paint_similar_rgba_source-256 0.09 0.38% -> 0.10 2.35%: 1.20x
slower
▎
image-rgb paint_solid_rgb_over-512 0.64 1.12% -> 0.74 1.57%: 1.17x
slower
▏
image-rgb paint_solid_rgba_source-512 0.64 1.21% -> 0.74 0.44%: 1.17x
slower
▏
image-rgb paint_solid_rgb_source-512 0.64 0.93% -> 0.74 0.59%: 1.16x
slower
▏
image-rgb paint_radial_rgb_source-512 53.05 2.18% -> 60.76 2.07%: 1.15x
slower
▏
xlib-rgba text_radial_rgb_over-64 3.95 0.57% -> 4.48 1.09%: 1.14x
slower
▏
image-rgba paint_solid_rgba_source-512 0.66 1.65% -> 0.73 1.10%: 1.12x
slower
▏
image-rgba paint_solid_rgb_source-512 0.66 1.90% -> 0.73 0.74%: 1.11x
slower
▏
image-rgb paint_similar_rgba_source-256 0.26 1.09% -> 0.29 0.98%: 1.11x
slower
▏
image-rgb fill_radial_rgba_over-256 5.57 0.30% -> 6.11 0.24%: 1.10x
slower
▏
image-rgb paint_radial_rgba_over-512 55.79 1.42% -> 60.80 0.68%: 1.09x
slower
▏
image-rgb fill_radial_rgba_source-128 1.64 0.20% -> 1.78 0.15%: 1.09x
slower
▏
image-rgb fill_radial_rgba_source-256 6.02 0.49% -> 6.55 0.26%: 1.09x
slower
▏
image-rgb fill_radial_rgba_over-128 1.54 1.08% -> 1.66 0.15%: 1.07x
slower
▏
image-rgb fill_radial_rgba_source-64 0.56 0.47% -> 0.60 0.46%: 1.07x
slower
▏
image-rgb paint_image_rgb_source-256 0.08 0.39% -> 0.09 0.78%: 1.06x
slower
image-rgb fill_radial_rgba_over-64 0.53 0.14% -> 0.56 0.47%: 1.06x
slower
xlib-rgba fill_radial_rgb_source-64 0.83 0.38% -> 0.88 0.46%: 1.05x
slower
pgphfKuWvfNoG.pgp
Description: PGP signature
_______________________________________________ Performance-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/performance-list
