Ok, I will do so. (read the f-ing paper) I only read the blogpost until now.

I just realized that I actually made a mistake in my mental model of your model. See! It's complex! So I realized that getting to the remotes is exactly as fast as going to the parent or outer context.

This makes it as fast as having a method context with maximally 2 nested contexts (3 blocks nested), and faster than deeper nestings. How often does it occur that you have deeper nesting in Pharo? Is it worthwhile to make the remote arrays just for those cases?

Is the copying really worthwhile to make those cases faster?

My biggest problem until now is... why wouldn't you be able to do everything you do with the remote arrays, directly with the context frames? Why limit it to only the part that is being closed over? The naive implementation that just extends squeak with proper closure-links will obviously be slow. I agree that you need a stack. Now I'd just like to read why you choose to just take a part of the frame (the remote array) rather than the whole frame. This would avoid the copyTemps thing...

But then. I guess I should go off and read the f-ing paper. I hope that particular thing is described there, since it's basically the piece I'm missing.

Also I don't exactly know what Peter Deutsch did, but if it was the straightforward implementation then it seems obvious you get such a speedup. Implementing it is less obvious, naturally ;) These responses are exactly why I posed the question here... I'd like to understand why. No offense.

cheers,
Toon


On 03/25/2011 02:22 AM, Eliot Miranda wrote:
Toon,

what you describe is how Peter Deutsch designed closures for ObjectWorks 2.4 & ObjectWorks 2.5, whose virtual machine and bytecode set served all the way through VisualWorks 3.0. If you read the context management paper you'll understand why this is a really slow design for a JIT. When I replaced that scheme by one essentially isomorphic to the Squeak one the VM became substantially faster; for example factors of two and three in exception delivery performance. The description of the problem and the performance numbers are all in the paper. There are two main optimizations I performed on the VisualWorkas VM, one is the closures scheme and the other is PICs. Those together sped-up what was the fastest commercial Smalltalk implementation by a factor of two on most platforms and a factor of three on Windows.

I'm sorry it's complex, but if one wants good performance it's a price well-worth paying. After all I was able to implement the compiler and decompiler within a month, and Jorge proved at INRIA-Lille that I'm far form the only person on the planet who understands it. Lispers have understood the scheme for a long time now.

best,
Eliot

On Thu, Mar 24, 2011 at 6:01 PM, Toon Verwaest <[email protected] <mailto:[email protected]>> wrote:


        I can't say that i clearly understood your concept. But if it will
        simplify implementation
        without seemingly speed loss, i am all ears :)


    test
       |b|
       [ |a|
         a + b ]

    Suppose you can't compile anything away, then you get

    |==============
    |MethodContext
    |
    |a := ...
    |==============
        ^
        |
    |==============
    |BlockContext
    |
    |b := ...
    |==============

    And you just look up starting at the current context and go up.
    Except if the var is from the homeContext, then you directly
    follow the home-context pointer.
    Since all contexts link to the home-context, this makes it 1
    pointer indirection to get to the method's context. 1 for the
    parent context. So that makes only 2 indirections starting from
    the 3 nested block (so when you have [ ... [ ... [ ... ] ... ] ...
    ]; where all of them are required for storing captured data.
    ifTrue:ifFalse: etc blocks obviously don't count. And blocks
    without shared locals could be left out (although we might not do
    that, for debugging reasons).

    Hope that helps.

    cheers,
    Toon



Reply via email to