Re: [Pharo-dev] [ANN] CalipeL: a benchmarking tool for Smalltalk/X and Pharo

Max Leske Thu, 05 Nov 2015 07:55:47 -0800

> On 01 Nov 2015, at 23:45, Jan Vrany <[email protected]> wrote:
> 
> Hi Max,
> 
> I looked at some version of SMark years ago and never used 
> it extensively, so I might be wrong, but: 
> 
> * SMark executor does some magic with numbers. It tries to 
>   calculate a number of iterations to run in order to get 
>   "statistically meaningful results". Maybe it's me, but
>   I could not fully understand what it does and why it does it
>   so. 
>   CalipeL does no magic - it gives you raw numbers (no average, no
> mean,
>   rather a sequence of measurements). It's up to the one who processes
> and
>   interprets the data to use whatever method she likes (and whichever
>   gives the numbers she'd like to see :-) This transparency was 
>   important for our needs. 
> 
> * SMark, IIRC, requires benchmarks to inherit from some base class
>   (like SUnit). Also, not sure if SMark allows you to specify a warmup-
>   phase (handy for example to measure peak performance when caches are
>   filled or so). 
>   CalipeL, OTOH, uses method annotations to describe the benchmark,
>   so one can turn a regular SUnit test method into benchmark as simply
>   as annotating it with <benchmark>. A warmup method and setup/teardown
>   methods can be specified per-benchmark. 
> 
> * SMark has no support for parametrization. 
>   In Calipel, support for benchmark parameters was one of the 
>   requirements from the very beginning. A little example: 
>   I had to optimize performance of Object>>perform: family of methods
>   for they was thought to be slowish. I came up with several 
>   variants of of a "better" implementation, no knowing which one 
>   is the best. How does each of them behave under different workloads? 
>   Like - how the number of distinct receivier classes affects the
> performance? 
>   How the number of distinct selectors affects the performance? 
>   Is the performance different when receiver classes are distributed
>   uniformly or normally (which seems to be more common case). 
>   Same for selectors? Is 256 row, 2-way associative cache
>   better than 128 rows, 4-way associative? 
>   You have number of parameters, for each parameter you define
>   a number of values and CalipeL work out all possible combinations
>   and run benchmarks using each. Without parametrization, the number
>   of benchmark methods would grow exponentially, making hard 
>   to experiment with different setups. For me, this is one of 
>   key things. 
> 
> * SMark measures time only. 
>   CalipeL measures time, too, but has the facility to provide a 
>   user-defined "measurement instrument", which can be anything 
>   (what can be measured, indeed). For example, for some web 
>   application the execution time might not be that useful, perhaps
>   a number of SQL queries it makes is more important. No problem, 
>   define your own measurement instrument and tell CalipeL to use it
>   in addition to time, number of GCs, you name it. All results of 
>   all instruments are part of machine-readable report, indeed. 
> 
> * SMark had no support for "system" profilers and similar. 
>   CalipeL integrates with systemtap/dtrace and cachegrind so one 
>   can have a full profile, including VM code and see things like 
>   L1/L2 I/D cache misses, mispredicted branches or count events 
>   like context switches, monitor signaling, context evacuation. 
>   Useful only for VM engineers I think, but I cannot image doing 
>   my work without this. Available only for Smalltalk/X, but should
>   not be a big deal adding this to Pharo (simple plugin would do it, 
>   IMO)
> 
> * Finally, SMark spits out a report and that's it. 
>   CalipeL, OTOH, goes beyond that. It tries for provide tools 
>   to gather, store and query results in a centralised way so 
>   nothing is forgotten.
>   (no more: hmm, where are the results of #perform: benchmarks
>   I run three months ago? Is it this file? Or that file? Or did I 
>   deleted them when my laptop run out of disk space?) 
>   And yes, I know that in this area there's a lot of space for
>   improvements. What we have now is certainly not ideal, to put
>   it mildly :-) 
> 
> 
> Hope that gives you the idea.


Thanks Jan! That was quite thorough. I’ll have to take a look at CalipeL 
sometime. Sure sounds great :)

Cheers,
Max

> 
> Jan
> 
> 
> On Sun, 2015-11-01 at 12:11 +0100, Max Leske wrote:
>> Hi Jan,
>> 
>> That looks pretty cool!
>> We use SMark (http://smalltalkhub.com/#!/~PharoExtras/SMark) for
>> benchmarking and CI integration for Fuel. If you know SMark, could
>> you give me an idea of what the differences are?
>> 
>> Cheers,
>> Max
>> 
>> 
>>> On 23 Oct 2015, at 10:47, Jan Vrany <[email protected]> wrote:
>>> 
>>> Hi there,
>>> 
>>> After more than 2 years of (time-to-time) development and about 
>>> that much time of use, I'd like to announce CalipeL, a tool for
>>> benchmarking and monitoring performance regressions.
>>> 
>>> The basic ideas that drove the development:
>>> 
>>> * Benchmarking and (especially) interpreting benchmark results 
>>>   is always a monkey business. The tool should produce raw numbers,
>>>   letting the user to use whichever statistics she need to make up
>>>   (desired) results.
>>> * Benchmark results should be kept and managed at a single place so
>>>   one can view and retrieve all past benchmark results pretty much 
>>>   the same way as one can view and retrieve past versions of 
>>>   the software from a source code management tool.
>>> 
>>> Features:
>>> 
>>> - simple - creating a benchmark is as simple as writing a method 
>>>   in a class
>>> - flexible - a special set-up and/or warm-up routines could be
>>>   specified at benchmark-level as well as set of parameters 
>>>   to allow fine-grained measurements under different conditions
>>> - batch runner - contains a batch runner allowing one to run 
>>>   benchmarks from a command line or at CI servers such as Jenkins.
>>> - web - comes with simple web interface to gather and process 
>>>   benchmark results. However, the web application would deserve
>>>   some more work.
>>> 
>>> Repository: 
>>> 
>>>   https://bitbucket.org/janvrany/jv-calipel
>>> 
>>>   http://smalltalkhub.com/#!/~JanVrany/CalipeL-S (read-only export
>>>   from the above and Pharo-specific code)
>>> 
>>> More information: 
>>> 
>>>   https://bitbucket.org/janvrany/jv-calipel/wiki/Home
>>> 
>>> I have been using CalipeL for benchmarking and keeping track of 
>>> performance of Smalltalk/X VM, STX:LIBJAVA, a PetitParser compiler 
>>> and other code I was working over the time.
>>> 
>>> Finally, I'd like to thank to Marcel Hlopko for his work on the 
>>> web application and Jan Kurs for his comments.
>>> 
>>> I hope some of you may find it useful. If you have any comments 
>>> or questions, do not hesitate and let me know!
>>> 
>>> Regards, Jan
>>> 
>> 
>> 
>> 
>

Re: [Pharo-dev] [ANN] CalipeL: a benchmarking tool for Smalltalk/X and Pharo

Reply via email to