The ones I remember are Smark [1] and CalipeL [2] Cheers,
[1] http://www.smalltalkhub.com/#!/~StefanMarr/SMark [2] https://bitbucket.org/janvrany/jv-calipel On Wed, Jul 19, 2017 at 4:17 AM, Luke Gorrie <[email protected]> wrote: > Hi Evan, > > I am also really interesting in this topic and have been doing a bunch of > work on automating statistical benchmarks. I don't have a background in > statistics or formal QA but I am learning as I go along :). > > The tools I'm building are outside Smalltalk. Our full performance test > suite takes about a week of machine time to run because tests ~15,000 QEMU > VMs with different software versions / configurations / workloads. There is > a CI server that runs all those tests, getting pretty fast turnarounds by > distributing across a cluster of servers and reusing results from > unmodified software branches, and spits out a CSV with one row per test > result (giving the benchmark score and the parameters of the test.) > > Then what to do with that ~15,000 line CSV file? Just now I run Rmarkdown > to make a report on the distribution of results and then manually inspect > that to check for interesting differences. I lump all of the different > configurations in together and treat them as one population at the moment. > Here is an example report: > https://hydra.snabb.co/build/1604171/download/2/report.html > > It's a bit primitive but it is getting the job done for release > engineering. I'm reasonably confident that new software releases don't > break or slow down in obscure configurations. We are building network > equipment and performance regressions are generally not acceptable. > > I'm looking into more clever ways to automatically interpret the results, > e.g. fumbling around at https://stats.stackexchange. > com/questions/288416/non-parametric-test-if-two- > samples-are-drawn-from-the-same-distribution. > > Could relate to your ambitions somehow? > > > On 19 July 2017 at 02:00, Evan Donahue <[email protected]> wrote: > >> Hi, >> >> I've been doing a lot of performance testing lately, and I've found >> myself wanting to upgrade my methods from ad hoc use of bench and message >> tally. Is there any kind of framework for like, statistically comparing >> improvements in performance benchmarks across different versions of code, >> or anything that generally helps manage the test-tweak-test loop? Just >> curious what's out there before I go writing something. Too many useful >> little libraries to keep track of! >> >> Evan >> > > -- Mariano http://marianopeck.wordpress.com
