Re: [Pharo-dev] [ANN] CalipeL: a benchmarking tool for Smalltalk/X and Pharo

Jan Vrany Sun, 01 Nov 2015 14:47:23 -0800

Hi Max,

I looked at some version of SMark years ago and never used 
it extensively, so I might be wrong, but:

* SMark executor does some magic with numbers. It tries to 
  calculate a number of iterations to run in order to get 
  "statistically meaningful results". Maybe it's me, but
  I could not fully understand what it does and why it does it
  so. 
  CalipeL does no magic - it gives you raw numbers (no average, no
mean,
  rather a sequence of measurements). It's up to the one who processes
and
  interprets the data to use whatever method she likes (and whichever
  gives the numbers she'd like to see :-) This transparency was 
  important for our needs. 

* SMark, IIRC, requires benchmarks to inherit from some base class
  (like SUnit). Also, not sure if SMark allows you to specify a warmup-
  phase (handy for example to measure peak performance when caches are
  filled or so). 
  CalipeL, OTOH, uses method annotations to describe the benchmark,
  so one can turn a regular SUnit test method into benchmark as simply
  as annotating it with <benchmark>. A warmup method and setup/teardown
  methods can be specified per-benchmark. 

* SMark has no support for parametrization. 
  In Calipel, support for benchmark parameters was one of the 
  requirements from the very beginning. A little example: 
  I had to optimize performance of Object>>perform: family of methods
  for they was thought to be slowish. I came up with several 
  variants of of a "better" implementation, no knowing which one 
  is the best. How does each of them behave under different workloads? 
  Like - how the number of distinct receivier classes affects the
performance? 
  How the number of distinct selectors affects the performance? 
  Is the performance different when receiver classes are distributed
  uniformly or normally (which seems to be more common case). 
  Same for selectors? Is 256 row, 2-way associative cache
  better than 128 rows, 4-way associative? 
  You have number of parameters, for each parameter you define
  a number of values and CalipeL work out all possible combinations
  and run benchmarks using each. Without parametrization, the number
  of benchmark methods would grow exponentially, making hard 
  to experiment with different setups. For me, this is one of 
  key things. 

* SMark measures time only. 
  CalipeL measures time, too, but has the facility to provide a 
  user-defined "measurement instrument", which can be anything 
  (what can be measured, indeed). For example, for some web 
  application the execution time might not be that useful, perhaps
  a number of SQL queries it makes is more important. No problem, 
  define your own measurement instrument and tell CalipeL to use it
  in addition to time, number of GCs, you name it. All results of 
  all instruments are part of machine-readable report, indeed. 

* SMark had no support for "system" profilers and similar. 
  CalipeL integrates with systemtap/dtrace and cachegrind so one 
  can have a full profile, including VM code and see things like 
  L1/L2 I/D cache misses, mispredicted branches or count events 
  like context switches, monitor signaling, context evacuation. 
  Useful only for VM engineers I think, but I cannot image doing 
  my work without this. Available only for Smalltalk/X, but should
  not be a big deal adding this to Pharo (simple plugin would do it, 
  IMO)

* Finally, SMark spits out a report and that's it. 
  CalipeL, OTOH, goes beyond that. It tries for provide tools 
  to gather, store and query results in a centralised way so 
  nothing is forgotten.
  (no more: hmm, where are the results of #perform: benchmarks
  I run three months ago? Is it this file? Or that file? Or did I 
  deleted them when my laptop run out of disk space?) 
  And yes, I know that in this area there's a lot of space for
  improvements. What we have now is certainly not ideal, to put
  it mildly :-) 

Hope that gives you the idea. 

Jan

On Sun, 2015-11-01 at 12:11 +0100, Max Leske wrote:
> Hi Jan,
> 
> That looks pretty cool!
> We use SMark (http://smalltalkhub.com/#!/~PharoExtras/SMark) for
> benchmarking and CI integration for Fuel. If you know SMark, could
> you give me an idea of what the differences are?
> 
> Cheers,
> Max
> 
> 
> > On 23 Oct 2015, at 10:47, Jan Vrany <[email protected]> wrote:
> > 
> > Hi there,
> > 
> > After more than 2 years of (time-to-time) development and about 
> > that much time of use, I'd like to announce CalipeL, a tool for
> > benchmarking and monitoring performance regressions.
> > 
> > The basic ideas that drove the development:
> > 
> > * Benchmarking and (especially) interpreting benchmark results 
> >   is always a monkey business. The tool should produce raw numbers,
> >   letting the user to use whichever statistics she need to make up
> >   (desired) results.
> > * Benchmark results should be kept and managed at a single place so
> >   one can view and retrieve all past benchmark results pretty much 
> >   the same way as one can view and retrieve past versions of 
> >   the software from a source code management tool.
> > 
> > Features:
> > 
> > - simple - creating a benchmark is as simple as writing a method 
> >   in a class
> > - flexible - a special set-up and/or warm-up routines could be
> >   specified at benchmark-level as well as set of parameters 
> >   to allow fine-grained measurements under different conditions
> > - batch runner - contains a batch runner allowing one to run 
> >   benchmarks from a command line or at CI servers such as Jenkins.
> > - web - comes with simple web interface to gather and process 
> >   benchmark results. However, the web application would deserve
> >   some more work.
> > 
> > Repository: 
> > 
> >   https://bitbucket.org/janvrany/jv-calipel
> > 
> >   http://smalltalkhub.com/#!/~JanVrany/CalipeL-S (read-only export
> >   from the above and Pharo-specific code)
> > 
> > More information: 
> > 
> >   https://bitbucket.org/janvrany/jv-calipel/wiki/Home
> > 
> > I have been using CalipeL for benchmarking and keeping track of 
> > performance of Smalltalk/X VM, STX:LIBJAVA, a PetitParser compiler 
> > and other code I was working over the time.
> > 
> > Finally, I'd like to thank to Marcel Hlopko for his work on the 
> > web application and Jan Kurs for his comments.
> > 
> > I hope some of you may find it useful. If you have any comments 
> > or questions, do not hesitate and let me know!
> > 
> > Regards, Jan
> > 
> 
> 
>

Re: [Pharo-dev] [ANN] CalipeL: a benchmarking tool for Smalltalk/X and Pharo

Reply via email to