Hi john have a look MessageNode class side methods you will see the list of messages that are inlined.
Stef On Feb 17, 2011, at 3:21 PM, John B Thiel wrote: > Cog VM -- Thanks and Performance / Optimization Questions > > > To Everyone, thanks for your great work on Pharo and Squeak, and to > Eliot Miranda, Ian Piumarta, and all VM/JIT gurus, especially thanks > for the Squeak VM Cog and its precursors, which I was keenly > anticipating for a decade or so, and is really going into stride with > the latest builds. > > I like to code with awareness of performance issues. Can you tell or > point me to some performance and efficiency tips for Cog and the > Squeak compiler -- detail on which methods are inlined, best among > alternatives, etc. For example, I understand #to:do: is inlined -- > what about #to:do:by: and #timesRepeat and #repeat ? Basically, I > would like to read a full overview of which core methods are specially > optimized (or planned). > > I know about the list of NoLookup primitives, as per Object > class>>howToModifyPrimitives, supposing that is still valid? > > What do you think is a reasonable speed factor for number-crunching > Squeak code vs C ? I am seeing about 20x slower in the semi-large > scale, which surprised me a bit because I got about 10x on smaller > tests, and a simple fib: with beautiful Cog is now about 3x (wow!). > That range, 3x tiny tight loop, to 20x for general multi-class > computation, seems a bit wide -- is it about expected? > > My profiling does not reveal any hotspots, as such -- it's basically > 2, 3, 5% scattered around, so I envision this is just the general > vm/jit overhead as you scale up -- referencing distant objects, slots, > dispatch lookups, more cache misses, etc. But maybe I am generally > using some backwater loop/control methods, techniques, etc. that could > be tuned up. e.g. I seem to recall a trace at some point showing > #timesRepeat taking 10% of the time (?!). Also, I recall reading > about an anomaly with BlockClosures -- something like being rebuilt > every time thru the loop - has that been fixed? Any other gotchas to > watch for currently? > > (Also, any notoriously slow subsystems? For example, Transcript > writing is glacial.) > > The Squeak bytecode compiler looks fairly straightforward and > non-optimizing - just statement by statement translation. So it > misses e.g. chances to store and reuse, instead of pop, etc. I see > lots of redundant sequences emitted. Are those kind of things now > optimized out by Cog, or would tighter bytecode be another potential > optimization path. (Is that what the Opal project is targetting?) > > -- jbthiel >
