The ones I remember are Smark [1] and CalipeL [2]

Cheers,

[1] http://www.smalltalkhub.com/#!/~StefanMarr/SMark
[2] https://bitbucket.org/janvrany/jv-calipel

On Wed, Jul 19, 2017 at 4:17 AM, Luke Gorrie <[email protected]> wrote:

> Hi Evan,
>
> I am also really interesting in this topic and have been doing a bunch of
> work on automating statistical benchmarks. I don't have a background in
> statistics or formal QA but I am learning as I go along :).
>
> The tools I'm building are outside Smalltalk. Our full performance test
> suite takes about a week of machine time to run because tests ~15,000 QEMU
> VMs with different software versions / configurations / workloads. There is
> a CI server that runs all those tests, getting pretty fast turnarounds by
> distributing across a cluster of servers and reusing results from
> unmodified software branches, and spits out a CSV with one row per test
> result (giving the benchmark score and the parameters of the test.)
>
> Then what to do with that ~15,000 line CSV file? Just now I run Rmarkdown
> to make a report on the distribution of results and then manually inspect
> that to check for interesting differences. I lump all of the different
> configurations in together and treat them as one population at the moment.
> Here is an example report:
> https://hydra.snabb.co/build/1604171/download/2/report.html
>
> It's a bit primitive but it is getting the job done for release
> engineering. I'm reasonably confident that new software releases don't
> break or slow down in obscure configurations. We are building network
> equipment and performance regressions are generally not acceptable.
>
> I'm looking into more clever ways to automatically interpret the results,
> e.g. fumbling around at https://stats.stackexchange.
> com/questions/288416/non-parametric-test-if-two-
> samples-are-drawn-from-the-same-distribution.
>
> Could relate to your ambitions somehow?
>
>
> On 19 July 2017 at 02:00, Evan Donahue <[email protected]> wrote:
>
>> Hi,
>>
>> I've been doing a lot of performance testing lately, and I've found
>> myself wanting to upgrade my methods from ad hoc use of bench and message
>> tally. Is there any kind of framework for like, statistically comparing
>> improvements in performance benchmarks across different versions of code,
>> or anything that generally helps manage the test-tweak-test loop? Just
>> curious what's out there before I go writing something. Too many useful
>> little libraries to keep track of!
>>
>> Evan
>>
>
>


-- 
Mariano
http://marianopeck.wordpress.com

Reply via email to