Paths towards implementing L1

Alex Elsayed Sun, 05 Jul 2009 01:55:46 -0700

cotto++, bacek++ and I had a very edifying discussion in #parrot at around 
01:00 to 01:30 PDT on July 5th, focused around some difficulty bacek was 
having in coming up with a workable migration plan for gradually 
transitioning to L1 as the base on which Ops and PMCs are built.


<bacek> btw, I spent few hours trying to understand how to replace op/vtable 
with L1 based one.
<bacek> guess what?
<cotto> it's hard?
<bacek> cotto: may be not, but I can't figure out how to do it...
<cotto> Yeah.  There are some steps in the L1 conversion where all I can see 
are big question marks.
<cotto> I don't doubt that they're feasible, but I'd rather build up the 
momentum and figure out the details jit.
<cotto> Where'd you get stuck?
<bacek> erm. Tough questions.
<bacek> Consider Integer.add, op add, and L1.
<cotto> I'm considering them.
<bacek> currently "op add" is "shortcut" for VTABLE_add(...),
<bacek> if Integer.add is some kind of bytecode segment
<bacek> how "op add" should be implemented?
<bacek> Additional "l1vtable" in PMC?
<bacek> to choose between "old" vtable and "l1vtable"

What follows is around where I became interested, because I noticed that 
they were working under the assumption that the best migration path would be 
to, in the interim, compile L1 to C and then call that C. That struck me as 
being slightly backwards.

<bacek> or we have to generate C version of  Integer.add which will call 
PCCINVOKE.
<bacek> ?
<bacek> Same for ops. If we have op foo implemented in L1 bytecode how we 
adjust dispatch to handle it?
<cotto> I don't see the problem.  If we're calling VTABLE_add on two PMCs 
(we get that info for free), we just build a vtable call to the first's add, 
then that add vtable function dtrt.
<bacek> VTABLE_add is "C". And we try to avoid it
<cotto> No, VTABLE functions in C don't cost us anything because we don't 
have to mess around with pcc to use them.
<cotto> I don't think so, at least.
<bacek> consider PCCINVOKE call from VTABLE_add
<cotto> That's an implementation detail of the PMC.
<bacek> erm... We are working on implementation details!
<bacek> :)
<cotto> yes.  Go ahead.
<bacek> I can't... I'm stuck...
<bacek> Calling PCCINVOKE from automatically generated C stub will work.
<cotto> So you don't see how we'd do the equivalent of a PCCINVOKE call from 
L1?
<bacek> But it will cause big slowdown.
<bacek> no-no-no.
<bacek> Current op dispatcher is pure C
<cotto> so far, so good
<bacek> if some of ops are in L1 we need smarter dispatcher.
<bacek> or use PCCINVOKE form auto-generated C stubs for L1-based ops.

I then piped up, suggesting that perhaps we could have a 'runcore' that runs 
L1 and can call into C as needed. bacek pointed out some issues, namely 
speed of execution (since that's one problem we already have with PCC)

<eternaleye> bacek: Why not 'C dispatcher dispatches L1, which dispatches 
PIR/PBC/PASM'
<bacek> and using PCCINVOKE will be slow.
<cotto> ah
<eternaleye> Then "C dispatcher" can be cgoto, jit, etc
<bacek> eternaleye: We can't replace whole PIR ops with L1 based in single 
step
<cotto> basically, how do we get C and L1-based opcodes to play nice
<cotto> From my understanding, we don't have to.
<bacek> cotto: indeed
<eternaleye> bacek: Have the C PIR/PASM ops dispatcher call into an L1 
dispatcher?
<bacek> eternaleye: it's single dispatcher
<eternaleye> bacek: But does it have to be?
<cotto> Until everything is L1-capable, we just convert L1 to C.
<bacek> eternaleye: but some of ops isn't implemented in C
<cotto> (automatically)
<eternaleye> bacek: I'm saying you need to take the microcode analogy a bit 
further
<eternaleye> x86 cpus contain a risc core that executes the microcode. That 
microcode runs x86 ASM.
<cotto> Part of what L1 will need to do is be capable of emitting C that's 
functionally equivalent to the C we've got now.
<bacek> eternaleye: it's ultimate goal. But for time being we'll have mixed 
environment. And this is hardest part...
<eternaleye> bacek: We already can call from PIR to C and back. What part of 
that equationchanges when s/PIR/L1/ ?
<bacek> eternaleye: speed... We usually doesn't call from C to PIR in ops

I then produced a counterpoint: the only reason that the L1 dispatcher needs 
to call into C (well, aside from NCI) is the existence of non-L1-based ops 
and PMCs - an _inherently_ temporary condition (I also produce a stupidly 
optimistic estimate, but oh well)

<eternaleye> bacek: But if the stage where we have both types of ops is 
temporary, the speed loss is also temporary
<eternaleye> Since later on, we won't _need_ to switch control back and fort
<bacek> eternaleye: of course... But it's still slowdown.
<eternaleye> bacek: Premature optimization is etc. etc.
<eternaleye> If it's possible to do it in ~1 month, then the slowdown won't 
even be in a release
--> iblechbot ([email protected]) has 
joined #parrot
<bacek> eternaleye: oh... 1 month... You are way too optimistic...
<cotto> eternaleye, the expected plan is to do s/C/L1/ for a bunch of code, 
only going to the next step once all {ops|pmcs} are converted.

Then an idea hit me: If runcores are pluggable, why not do what I 
described... as a new runcore? Then, nobody who wasn't already aware of the 
L1 development process would encounter it (or any slowness therein)

<eternaleye> What if we implement it as a runcore? That allows doing 
everything except actually translating PMCs/ops without anything changing 
unless you use -R l1core
<cotto> during the that transition time, L1 will essentially be a different 
C-like language
<bacek> cotto: no-no-no. Some HighLevelLanguageWhichEasyToCompileToL1AndC
<eternaleye> Then, right after a release, we can switch to l1core and 
immediately translate ops/etc. If the majority (or most-used) get 
translated, the frequency of calling between C and L1 is minimized
<eternaleye> Thus, little performance lost
<cotto> bacek, In general I mean "anything that compiles to L1" by "L1"
<bacek> cotto: ok :)
<cotto> I need a word for that.  that's the second time that's caused 
confusion.
<cotto> L1-capable?
<eternaleye> L1-directed?
<cotto> That works.
<cotto> It's so many more letters than "L1", though. :(
<eternaleye> So call it L1-t for L1-targeting
<bacek> actually, for this "language" we need only byte munging and "if"
<bacek> Level One Language :)
<cotto> eternaleye, there's actually a plan to implement L1 opcodes as 
dynops.
<eternaleye> But honestly, if the option is to temporarily give up some 
speed, in order to permanently improve the architecture...
<cotto> It'd be very slow, but it'd let us see them in action.
<cotto> eternaleye, you're saying that while we're switching ops to L1
<cotto> (which is emitting C), we should also work on making those ops 
directly runnable?
<bacek> 1. Implement some very-very tiny language which an emit L1 bytecode 
and C
<bacek> 2. Implement opsc which can emit L1 bytecode and C stubs with 
PCCINVOKE
<bacek> 3. Patch imcc to emit L1 bytecode for L1 reimplemented ops

I next tried to make sure I had an accurate picture of the end goal, since 
my next point depended on it. Then, I laid out the barest beginnings of a 
rough idea for a possible migration path with little, if any, disruption

<eternaleye> cotto: IIUC, the plan is not to compile L1 to C, but to compile 
everything to L1 and make L1 the bytecode language of the virtual machine
<cotto> long-term, that's pretty accurate
<cotto> but L1 -> C is a short-term way to keep Parrot working while only a 
subset of the ops have been rewritten
<eternaleye> cotto: Then why go about it backwards? If the plan is to VM L1, 
then why compile L1 to c and run that? Why not VM the L1, and call into what 
residual C is needed? It puts us on a direct path (rewrite each C op and you 
gain more speed since it's all L1 now) to the end goal (which would be 
achieved immediately when the last op is translated)
<bacek> eternaleye: 4 cores stay on this path

At this point, I realized I should probably know more about ops than I had 
been running on so far. This indicates a potentially workable solution

<eternaleye> As is, the ops need to be implemented for each runcore, right?
<cotto> eternaleye, not currently.
<cotto> They're in src/ops/foo.ops, which ops2c mangles into the various 
runcores
<eternaleye> Ah
<bacek> eternaleye: no, Ops2c will generate everything required.
<eternaleye> Are the ops pretty much static these days?
<cotto> eternaleye, mostly yes.
<eternaleye> Then why not make an L1core, that prefers L1ops and can call 
into Cops, and rewrite the ops into L1 _for_that_core_? Then, when they're 
all written (or enough that speed is no longer a problem), make L1core the 
default
<eternaleye> If the ops were still frequently changing it'd be infeasible, 
but as they're mostly static...
<bacek> but...
<bacek> but...
<bacek> Wow
<bacek> It's very good point!
<bacek> Lets steal C generation for current ops from Ops2c
<bacek> L1 still able to call C function directly
<cotto> eternaleye, I think you may have a good idea.  (I need to process it 
more fully and I'm sleepy.)  We should probably put an L1 roadmap on the 
wiki so these kinds of suggestions can be added.

cotto then suggested that I post a message to the list detailing the results 
of this conversation.

Questions? Answers? Gigantic black beasts of Aaaaaaaarggghhhh....


_______________________________________________
http://lists.parrot.org/mailman/listinfo/parrot-dev

Paths towards implementing L1

Reply via email to