Ok, I will do so. (read the f-ing paper) I only read the blogpost until now.
I just realized that I actually made a mistake in my mental model of
your model. See! It's complex!
So I realized that getting to the remotes is exactly as fast as going to
the parent or outer context.
This makes it as fast as having a method context with maximally 2 nested
contexts (3 blocks nested), and faster than deeper nestings. How often
does it occur that you have deeper nesting in Pharo? Is it worthwhile to
make the remote arrays just for those cases?
Is the copying really worthwhile to make those cases faster?
My biggest problem until now is... why wouldn't you be able to do
everything you do with the remote arrays, directly with the context
frames? Why limit it to only the part that is being closed over? The
naive implementation that just extends squeak with proper closure-links
will obviously be slow. I agree that you need a stack. Now I'd just like
to read why you choose to just take a part of the frame (the remote
array) rather than the whole frame. This would avoid the copyTemps thing...
But then. I guess I should go off and read the f-ing paper. I hope that
particular thing is described there, since it's basically the piece I'm
missing.
Also I don't exactly know what Peter Deutsch did, but if it was the
straightforward implementation then it seems obvious you get such a
speedup. Implementing it is less obvious, naturally ;)
These responses are exactly why I posed the question here... I'd like to
understand why. No offense.
cheers,
Toon
On 03/25/2011 02:22 AM, Eliot Miranda wrote:
Toon,
what you describe is how Peter Deutsch designed closures for
ObjectWorks 2.4 & ObjectWorks 2.5, whose virtual machine and bytecode
set served all the way through VisualWorks 3.0. If you read the
context management paper you'll understand why this is a really slow
design for a JIT. When I replaced that scheme by one essentially
isomorphic to the Squeak one the VM became substantially faster; for
example factors of two and three in exception delivery performance.
The description of the problem and the performance numbers are all in
the paper. There are two main optimizations I performed on the
VisualWorkas VM, one is the closures scheme and the other is PICs.
Those together sped-up what was the fastest commercial Smalltalk
implementation by a factor of two on most platforms and a factor of
three on Windows.
I'm sorry it's complex, but if one wants good performance it's a price
well-worth paying. After all I was able to implement the compiler and
decompiler within a month, and Jorge proved at INRIA-Lille that I'm
far form the only person on the planet who understands it. Lispers
have understood the scheme for a long time now.
best,
Eliot
On Thu, Mar 24, 2011 at 6:01 PM, Toon Verwaest
<[email protected] <mailto:[email protected]>> wrote:
I can't say that i clearly understood your concept. But if it will
simplify implementation
without seemingly speed loss, i am all ears :)
test
|b|
[ |a|
a + b ]
Suppose you can't compile anything away, then you get
|==============
|MethodContext
|
|a := ...
|==============
^
|
|==============
|BlockContext
|
|b := ...
|==============
And you just look up starting at the current context and go up.
Except if the var is from the homeContext, then you directly
follow the home-context pointer.
Since all contexts link to the home-context, this makes it 1
pointer indirection to get to the method's context. 1 for the
parent context. So that makes only 2 indirections starting from
the 3 nested block (so when you have [ ... [ ... [ ... ] ... ] ...
]; where all of them are required for storing captured data.
ifTrue:ifFalse: etc blocks obviously don't count. And blocks
without shared locals could be left out (although we might not do
that, for debugging reasons).
Hope that helps.
cheers,
Toon