> On 01 Nov 2015, at 23:45, Jan Vrany <[email protected]> wrote: > > Hi Max, > > I looked at some version of SMark years ago and never used > it extensively, so I might be wrong, but: > > * SMark executor does some magic with numbers. It tries to > calculate a number of iterations to run in order to get > "statistically meaningful results". Maybe it's me, but > I could not fully understand what it does and why it does it > so. > CalipeL does no magic - it gives you raw numbers (no average, no > mean, > rather a sequence of measurements). It's up to the one who processes > and > interprets the data to use whatever method she likes (and whichever > gives the numbers she'd like to see :-) This transparency was > important for our needs. > > * SMark, IIRC, requires benchmarks to inherit from some base class > (like SUnit). Also, not sure if SMark allows you to specify a warmup- > phase (handy for example to measure peak performance when caches are > filled or so). > CalipeL, OTOH, uses method annotations to describe the benchmark, > so one can turn a regular SUnit test method into benchmark as simply > as annotating it with <benchmark>. A warmup method and setup/teardown > methods can be specified per-benchmark. > > * SMark has no support for parametrization. > In Calipel, support for benchmark parameters was one of the > requirements from the very beginning. A little example: > I had to optimize performance of Object>>perform: family of methods > for they was thought to be slowish. I came up with several > variants of of a "better" implementation, no knowing which one > is the best. How does each of them behave under different workloads? > Like - how the number of distinct receivier classes affects the > performance? > How the number of distinct selectors affects the performance? > Is the performance different when receiver classes are distributed > uniformly or normally (which seems to be more common case). > Same for selectors? Is 256 row, 2-way associative cache > better than 128 rows, 4-way associative? > You have number of parameters, for each parameter you define > a number of values and CalipeL work out all possible combinations > and run benchmarks using each. Without parametrization, the number > of benchmark methods would grow exponentially, making hard > to experiment with different setups. For me, this is one of > key things. > > * SMark measures time only. > CalipeL measures time, too, but has the facility to provide a > user-defined "measurement instrument", which can be anything > (what can be measured, indeed). For example, for some web > application the execution time might not be that useful, perhaps > a number of SQL queries it makes is more important. No problem, > define your own measurement instrument and tell CalipeL to use it > in addition to time, number of GCs, you name it. All results of > all instruments are part of machine-readable report, indeed. > > * SMark had no support for "system" profilers and similar. > CalipeL integrates with systemtap/dtrace and cachegrind so one > can have a full profile, including VM code and see things like > L1/L2 I/D cache misses, mispredicted branches or count events > like context switches, monitor signaling, context evacuation. > Useful only for VM engineers I think, but I cannot image doing > my work without this. Available only for Smalltalk/X, but should > not be a big deal adding this to Pharo (simple plugin would do it, > IMO) > > * Finally, SMark spits out a report and that's it. > CalipeL, OTOH, goes beyond that. It tries for provide tools > to gather, store and query results in a centralised way so > nothing is forgotten. > (no more: hmm, where are the results of #perform: benchmarks > I run three months ago? Is it this file? Or that file? Or did I > deleted them when my laptop run out of disk space?) > And yes, I know that in this area there's a lot of space for > improvements. What we have now is certainly not ideal, to put > it mildly :-) > > > Hope that gives you the idea.
Thanks Jan! That was quite thorough. I’ll have to take a look at CalipeL sometime. Sure sounds great :) Cheers, Max > > Jan > > > On Sun, 2015-11-01 at 12:11 +0100, Max Leske wrote: >> Hi Jan, >> >> That looks pretty cool! >> We use SMark (http://smalltalkhub.com/#!/~PharoExtras/SMark) for >> benchmarking and CI integration for Fuel. If you know SMark, could >> you give me an idea of what the differences are? >> >> Cheers, >> Max >> >> >>> On 23 Oct 2015, at 10:47, Jan Vrany <[email protected]> wrote: >>> >>> Hi there, >>> >>> After more than 2 years of (time-to-time) development and about >>> that much time of use, I'd like to announce CalipeL, a tool for >>> benchmarking and monitoring performance regressions. >>> >>> The basic ideas that drove the development: >>> >>> * Benchmarking and (especially) interpreting benchmark results >>> is always a monkey business. The tool should produce raw numbers, >>> letting the user to use whichever statistics she need to make up >>> (desired) results. >>> * Benchmark results should be kept and managed at a single place so >>> one can view and retrieve all past benchmark results pretty much >>> the same way as one can view and retrieve past versions of >>> the software from a source code management tool. >>> >>> Features: >>> >>> - simple - creating a benchmark is as simple as writing a method >>> in a class >>> - flexible - a special set-up and/or warm-up routines could be >>> specified at benchmark-level as well as set of parameters >>> to allow fine-grained measurements under different conditions >>> - batch runner - contains a batch runner allowing one to run >>> benchmarks from a command line or at CI servers such as Jenkins. >>> - web - comes with simple web interface to gather and process >>> benchmark results. However, the web application would deserve >>> some more work. >>> >>> Repository: >>> >>> https://bitbucket.org/janvrany/jv-calipel >>> >>> http://smalltalkhub.com/#!/~JanVrany/CalipeL-S (read-only export >>> from the above and Pharo-specific code) >>> >>> More information: >>> >>> https://bitbucket.org/janvrany/jv-calipel/wiki/Home >>> >>> I have been using CalipeL for benchmarking and keeping track of >>> performance of Smalltalk/X VM, STX:LIBJAVA, a PetitParser compiler >>> and other code I was working over the time. >>> >>> Finally, I'd like to thank to Marcel Hlopko for his work on the >>> web application and Jan Kurs for his comments. >>> >>> I hope some of you may find it useful. If you have any comments >>> or questions, do not hesitate and let me know! >>> >>> Regards, Jan >>> >> >> >> >
