Perl 6 code - a possible compile, link, run cycle

Yuval Kogman Wed, 24 Aug 2005 06:13:34 -0700

WRT to PIL and compilation and all that, I think it's time to think
about how the linker might look.


As I see it the compilation chain with the user typing this in the
prompt:

        perl6 foo.pl

perl6 is a compiled perl 6 script that takes an input file, and
compiles it, and then passes the compiled unit to the default
runtime (e.g. parrot).

perl6 creates a new instance of the perl compiler (presumably an
object). The compiler will only compile the actual file 'foo.pl',
and disregard any 'require', 'use', or 'eval' statements.

The compiler produces an object representing a linkable unit, which
will be discussed later.

At this point the runtime kicks in. The runtime really runs compiled
byte code for the runtime linker, which takes the compiled unit that
the compiler emitted and prepares it for execution.

The runtime linker checks if any inclusions of outside code have
been made, and if so, invokes a search routine with the foreign
module plugin responsible. For example 'use python:Numerical' will
use the pyhon module plugin to produce a linkable unit.

A given module will normally traverse a search path, find some
source code, check to see if there is a valid cached version of the
source code, and if needed, recompile the source code into another
linkable unit.

As the linker gets new linkable units it checks if they have any
dependencies of their own, and eventually resolves all the data and
code that modules take from one another.

The resulting mess has only one requirement: that it can be run by
the runtime - that is, byte code can be extracted out of it.

If the modules expose more than just byte code with resolved
dependencies in the modules, for example type annotations,
serialized PIL, serialized perl 6 code, and so forth, it may, at
this point, do any amount of static analysis as it pleases,
recompiling, relinking, optimizing, inlining, performing early
resolution (especially of MMD)  and otherwise modifying code
(provided it was asked to do this by the user).

The optimization premise is: by the time it's linked it probably
won't change too much. Link time isa magictime for resolving calls,
inlining values, folding newly discovered constants, and so forth.

Furthermore, a linker may cache the link between two modules,
regardless of the calling script, so that optimization does not have
to be repeated.

The result is still the same: code that can be executed by the
runtime. It just might be more efficient.

The linker also must always be able to get the original version of
the linked byte code back, either by reversing some changes, or
keeping the original.

At this point the runtime's runloop kicks in, starting at the start
point in the byte code, and doing it's thing.

Runtime loading of code (e.g. eval 'sub foo { }') simply reiterates
the above procedure:

'sub foo { }' is compiled by the compiler, creating a linkable unit
(that can give 'sub foo { }' to the world). The runtime linker gets
a fault saying "byte code state is changing,
$compiled_code_from_eval  is being ammended to
$linked_byte_code_in_runtime_loop".  The linker must then link the
running code to the result of eval. To do this it may need to undo
it's optimizations that assumed there was no sub foo.

For example, if there is a call to 'foo()' somewhere in foo.pl, the
linker may have just inlined a 'die "no such sub foo()"' error
instead of the call. Another linker may have put in code to do a
runtime search for the '&foo' symbol and apply it. The linker that
did a runtime search that may fail doesn't need to change anything,
but the linker which inlined a fatal error must undo that
optimization now that things have changed.

The behavior of the linker WRT to such things is dependant on the
deployment setting. In a long running mod_perl application there may
even be a linker that optimizes code as time goes by, slowly
changing things to be more and more static. As the process
progresses through time, the probability of new code being
introduced is lower, so the CPU time is invested better.
Furthermore, latency is not hindered, and startup is fast because
the linker doesn't do any optimizations in the begining. This is
part of the proposed optimizer chain, as brought up on p6l a month
or so ago.

Anyway, back to runtime linking. Once the code is consistent again,
e.g. calls to foo() will now work as expected, eval gets the
compiled code, and runs it. It just happens that 'sub foo { }' has
no runtime effects, so evak returns, and normal execution is
resumed.

To get the semantics of 'perl -c' you force the linker to resolve
everything, but don't actually go to the runloop.

Linkable units are first class objects, and may be of a different
class. This has merits when, for example, a linkable unit is
implemented by an FFI wrapper. The FFI wrapper determines at link
time what the foreign interface looks like, and then behaves like
the linkable unit one might expect if it were a native interface. It
can generate bytecode to call the foreign functions on demand at
link time. This should simplify the link process.

Linkable units have an implementation class that determines the
behavior to producing byte code.

Linkable unit classes do any number of roles the implementor chose
to add.

The linker searches for roles it is interested in. For example,
LinkableUnit::WithPIL is the interface that linkable units that have
PIL code expose.

Furthermore, different types of bytecode formats are also roles. For
example, here is a wrapper linkable unit that exposes PBC for
various versions of parrot PIR linkable units:

class LinkableUnit::PBC {
        has Linkablemy $.linkable_unit handles <*>;

        submethod BUILD ($.linkable_unit) {
                die "i wrap around PIR linkables"
                        unless ($linkable_unit ~~ LinkableUnit::Emits::PIR);

                given $linkable_unit->pir_version {
                        when ... {
                                 # use correct version of parrot to compile PIR
                                 # to bytecode
                        }
                }
        }
}

# when the Perl6::Compiler emits PIR
LinkableUnit::PBC.new(:linkable_unit(Perl6::Compiler.new(:string<sub foo { 
}>)));

This approach should encourage each runtime to have a tightly
coupled linked that looks for a specific bytecode role, but allow
this linker to share the maximum amount of code with a linker for
another system.

Furthermore, linked level translations of interfaces with compatible
bytecode underneath, for example runtime loading of x86/ELF vs.
x86/Mach-O can be implemented on top of the same runtime engine for
x86 machine code, and with possible reuse for the two different
binary formats. Furthermore, ELF and Mach-O for other archs can be
reused. The way this is done is that the link format LinkableUnit
classes expose a consistent symbol interface, and the X86<->native
runtime translation link layer wraps over that. Any of the two (the
native code translater and the link format reader) can be exchanged.

Some possible linkable unit roles:

        * Embedded source (knows to map bytecode back to source code)
        * Source reference annotations (line numbers and such, but no
        complete code)
        * Emit::PIL (pil version of the code can be extracted)
        * Emit::PBC (parrot byte code version of the code can be extracted)
        * Emit::...
        * Rich type/value annotations (full partially resolved type/value
        inferencing trees for symbols, including constantness, return
        values, and so forth).

PHEW, THAT WAS LONG.

Sorry!

-- 
 ()  Yuval Kogman <[EMAIL PROTECTED]> 0xEBD27418  perl hacker &
 /\  kung foo master: /me sushi-spin-kicks : neeyah!!!!!!!!!!!!!!!!!!!!

pgpMJ6bIaGORb.pgp
Description: PGP signature

Perl 6 code - a possible compile, link, run cycle

Reply via email to