On 25 March 2011 01:44, Toon Verwaest <[email protected]> wrote: > > No I can't. Since I did it, I naturally think it's a good idea. Perhaps, > instead of denigrating it without substantiating your claims you could > propose (and then implement, and then get adopted) a better idea? > > Sure. My own VM will take a lot longer to get done! ;) I don't want to > blemish any of your credit for building a cool VM. I was rather just > wondering why you decided to go for this particular implementation which > seems unobvious to me. Hence the question. I guess I should've formulated it > slightly differently :) More info below. >> >> I can see why it would pay off for Lisp programmers to have closures that >> run like the Pharo closures, since it has O(1) access performance. However, >> this performance boost only starts paying off once you have at least more >> than 4 levels of nested closures, something which, unlike in LISP, almost >> never happens in Pharo. Or at least shouldn't happen (if it does, it's >> probably ok to punish the people by giving them slower performance). > > Slower performance than what? BTW, I think you have things backwards. I > modelled the pharo closure implementation on lisp closures, not the other > way around. > > This is exactly what I meant. The closures seem like a very good idea for > languages with very deeply nested closures. Lisp is such a language with all > the macros ... I don't really see this being so in Pharo. >> >> This implementation is pretty hard to understand, and it makes >> decompilation semi-impossible unless you make very strong assumptions about >> how the bytecodes are used. This then again reduces the reusability of the >> new bytecodes and probably of the decompiler once people start actually >> using the pushNewArray: bytecodes. > > Um, the decompiler works, and in fact works better now than it did a couple > of years ago. So how does your claim stand up? > > For example when I just use the InstructionClient I get in pushNewArray: and > then later popIntoTemp. This combination is supposed to make clear that you > are storing a remote array. This is not what the bytecode says however. And > this bytecode can easily be reused for something else; what if I use the > bytecode to make my own arrays? What if this array is created in a different > way? I can think of a lot of ways the temparray could come to be using lots > of variations of bytecodes, from which I would never (...) be able to figure > out that it's actually making the tempvector. Somehow I just feel there's a > bigger disconnect between the bytecodes and the Smalltalk code and I'm > unsure if this isn't harmful. > > But ok, I am working on the Opal decompiler of course. Are you building an > IR out with your decompiler? If so I'd like to have a look since I'm > spending the whole day already trying to get the Opal compiler to somehow do > what I want... getting one that works and builds a reusable IR would be > useful. (I'm implementing your field-index-updating through bytecode > transformation btw). >> >> You might save a teeny tiny bit of memory by having stuff garbage >> collected when it's not needed anymore ... but I doubt that the whole design >> is based on that? Especially since it just penalizes the performance in >> almost all possible ways for standard methods. And it even wastes memory in >> general cases. I don't get it. > > What has garbage collection got to do with anything? What precisely are you > talking about? Indirection vectors? To understand the rationale for > indirection vectors you have to understand the rationale for implementing > closures on a conventional machine stack. For lisp that's clear; compile to > a conventional stack as that's an easy model, in which case one has to store > values that outlive LIFO discipline on the heap, hence indirection vectors. > Why you might want to do that in a Smalltalk implementation when you could > just access the outer context directly has a lot to do with VM internals. > Basically its the same argument. If one can map Smalltalk execution to a > conventional stack organization then the JIT can produce a more efficient > execution engine. Not doing this causes significant problems in context > management. > > With the garbage collection I meant the fact that you can already collect > part of the stack frames and leave other parts (the remote temps) and only > get them GCd later on when possible. > > I do understand why you want to keep them on the stack as long as possible. > The stack-frame marriage stuff for optimizations is very neat indeed. What > I'm more worried about myself is the fact that stackframes aren't just > linked to each other and share memory that way. This means that you only > have 1 indirection to access the method-frame (via the homeContext), and 1 > for the outer context. You can directly access yourself. So only the 4rd > context will have 2 indirections (what all contexts have now for remotes). > From the 5th on it gets worse... but I can't really see this happening in > real world situations. > > Then you have the problem that since you don't just link the frames and > don't look up values via the frames, you have to copy over part of your > frame for activation. This isn't necessarily -that- slow (although it is > overhead); but it's slightly clumsy and uses more memory. And that's where > my problem lies I guess ... There's such a straightforward implementation > possible, by just linking up stackframes (well... they are already linked up > anyway), and traversing them. You'll have to do some rewriting whenever you > leave a context that's still needed, but you do that anyway for the remote > temps right? >
> The explanation is all on my blog and in my Context Management in > VisualWorks 5i paper. > But does a bright buy like yourself find this /really/ hard to understand? > It's not that hard a transformation, and compared to what goes on in the > JIT (e.g. in bytecode to machine-code pc mapping) its pretty trivial. > > I guess I just like to really see what's going on by having a decent model > around. When I look at the bytecodes; in the end I can reconstruct what it's > doing ... as long as they are aligned in the way that the compiler currently > generates them. But I can easily see how slight permutations would already > throw me off completely. > >> >> But probably I'm missing something? > > It's me who's missing something. I did the simplest thing I knew could > possibly work re getting an efficient JIT and a Squeak with closures > (there's huge similarity between the above scheme and the changes I made to > VisualWorks that resulted in a VM that was 2 to 3 times faster depending on > platform than VW 1.0). But you can see a far more efficient and simple > scheme. What is it? > > Basically my scheme isn't necessarily far more efficient. It's just more > understandable I think. I can understand scopes that point to their outer > scope; and I can follow these scopes to see how the lookup works. And the > fact that it does some pointer dereferencing and copying of data less is > just something that makes me think it wouldn't be less efficient than what > you have now. My problem is not that your implementation is slow, rather > that it's complex. And I don't really see why this complexity is needed. > > Obviously playing on my ego by telling me I should be clever enough to > understand it makes me say I do! But I feel it's not the easiest; and > probably less people understand this than the general model of just linking > contexts together. > I can't say that i clearly understood your concept. But if it will simplify implementation without seemingly speed loss, i am all ears :) > best, > Eliot > > cheers, > Toon > -- Best regards, Igor Stasenko AKA sig.
