cotto++, bacek++ and I had a very edifying discussion in #parrot at around 01:00 to 01:30 PDT on July 5th, focused around some difficulty bacek was having in coming up with a workable migration plan for gradually transitioning to L1 as the base on which Ops and PMCs are built.
<bacek> btw, I spent few hours trying to understand how to replace op/vtable with L1 based one. <bacek> guess what? <cotto> it's hard? <bacek> cotto: may be not, but I can't figure out how to do it... <cotto> Yeah. There are some steps in the L1 conversion where all I can see are big question marks. <cotto> I don't doubt that they're feasible, but I'd rather build up the momentum and figure out the details jit. <cotto> Where'd you get stuck? <bacek> erm. Tough questions. <bacek> Consider Integer.add, op add, and L1. <cotto> I'm considering them. <bacek> currently "op add" is "shortcut" for VTABLE_add(...), <bacek> if Integer.add is some kind of bytecode segment <bacek> how "op add" should be implemented? <bacek> Additional "l1vtable" in PMC? <bacek> to choose between "old" vtable and "l1vtable" What follows is around where I became interested, because I noticed that they were working under the assumption that the best migration path would be to, in the interim, compile L1 to C and then call that C. That struck me as being slightly backwards. <bacek> or we have to generate C version of Integer.add which will call PCCINVOKE. <bacek> ? <bacek> Same for ops. If we have op foo implemented in L1 bytecode how we adjust dispatch to handle it? <cotto> I don't see the problem. If we're calling VTABLE_add on two PMCs (we get that info for free), we just build a vtable call to the first's add, then that add vtable function dtrt. <bacek> VTABLE_add is "C". And we try to avoid it <cotto> No, VTABLE functions in C don't cost us anything because we don't have to mess around with pcc to use them. <cotto> I don't think so, at least. <bacek> consider PCCINVOKE call from VTABLE_add <cotto> That's an implementation detail of the PMC. <bacek> erm... We are working on implementation details! <bacek> :) <cotto> yes. Go ahead. <bacek> I can't... I'm stuck... <bacek> Calling PCCINVOKE from automatically generated C stub will work. <cotto> So you don't see how we'd do the equivalent of a PCCINVOKE call from L1? <bacek> But it will cause big slowdown. <bacek> no-no-no. <bacek> Current op dispatcher is pure C <cotto> so far, so good <bacek> if some of ops are in L1 we need smarter dispatcher. <bacek> or use PCCINVOKE form auto-generated C stubs for L1-based ops. I then piped up, suggesting that perhaps we could have a 'runcore' that runs L1 and can call into C as needed. bacek pointed out some issues, namely speed of execution (since that's one problem we already have with PCC) <eternaleye> bacek: Why not 'C dispatcher dispatches L1, which dispatches PIR/PBC/PASM' <bacek> and using PCCINVOKE will be slow. <cotto> ah <eternaleye> Then "C dispatcher" can be cgoto, jit, etc <bacek> eternaleye: We can't replace whole PIR ops with L1 based in single step <cotto> basically, how do we get C and L1-based opcodes to play nice <cotto> From my understanding, we don't have to. <bacek> cotto: indeed <eternaleye> bacek: Have the C PIR/PASM ops dispatcher call into an L1 dispatcher? <bacek> eternaleye: it's single dispatcher <eternaleye> bacek: But does it have to be? <cotto> Until everything is L1-capable, we just convert L1 to C. <bacek> eternaleye: but some of ops isn't implemented in C <cotto> (automatically) <eternaleye> bacek: I'm saying you need to take the microcode analogy a bit further <eternaleye> x86 cpus contain a risc core that executes the microcode. That microcode runs x86 ASM. <cotto> Part of what L1 will need to do is be capable of emitting C that's functionally equivalent to the C we've got now. <bacek> eternaleye: it's ultimate goal. But for time being we'll have mixed environment. And this is hardest part... <eternaleye> bacek: We already can call from PIR to C and back. What part of that equationchanges when s/PIR/L1/ ? <bacek> eternaleye: speed... We usually doesn't call from C to PIR in ops I then produced a counterpoint: the only reason that the L1 dispatcher needs to call into C (well, aside from NCI) is the existence of non-L1-based ops and PMCs - an _inherently_ temporary condition (I also produce a stupidly optimistic estimate, but oh well) <eternaleye> bacek: But if the stage where we have both types of ops is temporary, the speed loss is also temporary <eternaleye> Since later on, we won't _need_ to switch control back and fort <bacek> eternaleye: of course... But it's still slowdown. <eternaleye> bacek: Premature optimization is etc. etc. <eternaleye> If it's possible to do it in ~1 month, then the slowdown won't even be in a release --> iblechbot ([email protected]) has joined #parrot <bacek> eternaleye: oh... 1 month... You are way too optimistic... <cotto> eternaleye, the expected plan is to do s/C/L1/ for a bunch of code, only going to the next step once all {ops|pmcs} are converted. Then an idea hit me: If runcores are pluggable, why not do what I described... as a new runcore? Then, nobody who wasn't already aware of the L1 development process would encounter it (or any slowness therein) <eternaleye> What if we implement it as a runcore? That allows doing everything except actually translating PMCs/ops without anything changing unless you use -R l1core <cotto> during the that transition time, L1 will essentially be a different C-like language <bacek> cotto: no-no-no. Some HighLevelLanguageWhichEasyToCompileToL1AndC <eternaleye> Then, right after a release, we can switch to l1core and immediately translate ops/etc. If the majority (or most-used) get translated, the frequency of calling between C and L1 is minimized <eternaleye> Thus, little performance lost <cotto> bacek, In general I mean "anything that compiles to L1" by "L1" <bacek> cotto: ok :) <cotto> I need a word for that. that's the second time that's caused confusion. <cotto> L1-capable? <eternaleye> L1-directed? <cotto> That works. <cotto> It's so many more letters than "L1", though. :( <eternaleye> So call it L1-t for L1-targeting <bacek> actually, for this "language" we need only byte munging and "if" <bacek> Level One Language :) <cotto> eternaleye, there's actually a plan to implement L1 opcodes as dynops. <eternaleye> But honestly, if the option is to temporarily give up some speed, in order to permanently improve the architecture... <cotto> It'd be very slow, but it'd let us see them in action. <cotto> eternaleye, you're saying that while we're switching ops to L1 <cotto> (which is emitting C), we should also work on making those ops directly runnable? <bacek> 1. Implement some very-very tiny language which an emit L1 bytecode and C <bacek> 2. Implement opsc which can emit L1 bytecode and C stubs with PCCINVOKE <bacek> 3. Patch imcc to emit L1 bytecode for L1 reimplemented ops I next tried to make sure I had an accurate picture of the end goal, since my next point depended on it. Then, I laid out the barest beginnings of a rough idea for a possible migration path with little, if any, disruption <eternaleye> cotto: IIUC, the plan is not to compile L1 to C, but to compile everything to L1 and make L1 the bytecode language of the virtual machine <cotto> long-term, that's pretty accurate <cotto> but L1 -> C is a short-term way to keep Parrot working while only a subset of the ops have been rewritten <eternaleye> cotto: Then why go about it backwards? If the plan is to VM L1, then why compile L1 to c and run that? Why not VM the L1, and call into what residual C is needed? It puts us on a direct path (rewrite each C op and you gain more speed since it's all L1 now) to the end goal (which would be achieved immediately when the last op is translated) <bacek> eternaleye: 4 cores stay on this path At this point, I realized I should probably know more about ops than I had been running on so far. This indicates a potentially workable solution <eternaleye> As is, the ops need to be implemented for each runcore, right? <cotto> eternaleye, not currently. <cotto> They're in src/ops/foo.ops, which ops2c mangles into the various runcores <eternaleye> Ah <bacek> eternaleye: no, Ops2c will generate everything required. <eternaleye> Are the ops pretty much static these days? <cotto> eternaleye, mostly yes. <eternaleye> Then why not make an L1core, that prefers L1ops and can call into Cops, and rewrite the ops into L1 _for_that_core_? Then, when they're all written (or enough that speed is no longer a problem), make L1core the default <eternaleye> If the ops were still frequently changing it'd be infeasible, but as they're mostly static... <bacek> but... <bacek> but... <bacek> Wow <bacek> It's very good point! <bacek> Lets steal C generation for current ops from Ops2c <bacek> L1 still able to call C function directly <cotto> eternaleye, I think you may have a good idea. (I need to process it more fully and I'm sleepy.) We should probably put an L1 roadmap on the wiki so these kinds of suggestions can be added. cotto then suggested that I post a message to the list detailing the results of this conversation. Questions? Answers? Gigantic black beasts of Aaaaaaaarggghhhh.... _______________________________________________ http://lists.parrot.org/mailman/listinfo/parrot-dev
