Re: [Pharo-project] Cog VM -- Thanks and Performance / Optimization Questions

Schwab,Wilhelm K Thu, 17 Feb 2011 07:06:25 -0800

A nice mix is to do memory management and logic (e.g. decide when to stop 
iterating) in Smalltalk and to have C-callable "primitives" for the heavy 
loops.  A great way to reach the latter is to define the functions using extern 
"C" - then you can use C++ features (streams, templates) in the function 
bodies.  IMHO, C++ with some suitable operator overloading does a fairly nice 
job of formula translation, and it is a good fit for fixed size arithmetic.

If Cog can make the above optional, so much the better.  

________________________________________
From: [email protected] 
[[email protected]] On Behalf Of John B Thiel 
[[email protected]]
Sent: Thursday, February 17, 2011 9:21 AM
To: [email protected]
Subject: [Pharo-project] Cog VM -- Thanks and Performance / Optimization        
Questions

Cog VM -- Thanks and Performance / Optimization Questions

To Everyone, thanks for your great work on Pharo and Squeak,  and to
Eliot Miranda, Ian Piumarta, and all VM/JIT gurus, especially thanks
for the Squeak VM Cog and its precursors, which I was keenly
anticipating for a decade or so, and is really going into stride with
the latest builds.

I like to code with awareness of performance issues.  Can you tell or
point me to some performance and efficiency tips for Cog and the
Squeak compiler -- detail on which methods are inlined, best among
alternatives, etc.  For example, I understand #to:do: is inlined --
what about #to:do:by: and #timesRepeat and #repeat  ?  Basically, I
would like to read a full overview of which core methods are specially
optimized (or planned).

I know about the list of NoLookup primitives, as per Object
class>>howToModifyPrimitives,  supposing that is still valid?

What do you think is a reasonable speed factor for number-crunching
Squeak code vs C ?   I am seeing about 20x slower in the semi-large
scale, which surprised me a bit because I got about 10x on smaller
tests, and a simple fib: with beautiful Cog is now about 3x (wow!).
That range, 3x tiny tight loop, to 20x for general multi-class
computation, seems a bit wide -- is it about expected?

My profiling does not reveal any hotspots, as such -- it's basically
2, 3, 5% scattered around, so I envision this is just the general
vm/jit overhead as you scale up -- referencing distant objects, slots,
dispatch lookups, more cache misses, etc.  But maybe I am generally
using some backwater loop/control methods, techniques, etc. that could
be tuned up.  e.g. I seem to recall a trace at some point showing
#timesRepeat taking 10% of the time (?!).   Also, I recall reading
about an anomaly with BlockClosures -- something like being rebuilt
every time thru the loop - has that been fixed?  Any other gotchas to
watch for currently?

(Also, any notoriously slow subsystems?  For example, Transcript
writing is glacial.)

The Squeak bytecode compiler looks fairly straightforward and
non-optimizing - just statement by statement translation.  So it
misses e.g. chances to store and reuse, instead of pop, etc.  I see
lots of redundant sequences emitted.  Are those kind of things now
optimized out by Cog, or would tighter bytecode be another potential
optimization path.  (Is that what the Opal project is targetting?)

-- jbthiel

Re: [Pharo-project] Cog VM -- Thanks and Performance / Optimization Questions

Reply via email to