vm status update

2009-01-11 Thread Andy Wingo
Hey hackers,

I just finished up a lot of typing at the manual, and I hope I'm done
with that. The net result is that the VM is documented quite thoroughly,
and the compiler as well. I'll send those documents to the list in
separate mails for inline comments.

Otherwise, in the course of documentation, I've made a few minor
cleanups, some internal name changes and such. No sense polishing a
turd, they say.

I had an idea regarding unit tests recently: since GHIL and GLIL now
have (documented!) S-expression representations, we should be able to
easily and expressively test individual compiler passes. Looking forward
to that.

I also had another realization, that now that VM frames go into stack
structures, that statprof should work with the VM. Have yet to check
though.

Anyway, just some babblings. I'll probably switch to benchmarking and
profiling sometime soon. I also need to merge in master to vm, it's been
a while and there are probably some conflicts.

So that's my status. Happy hacking!

Andy
-- 
http://wingolog.org/




vm.texi: A virtual machine for guile

2009-01-11 Thread Andy Wingo
http://git.savannah.gnu.org/gitweb/?p=guile.git;a=blob;f=doc/ref/vm.texi;hb=refs/heads/vm

@c -*-texinfo-*-
@c This is part of the GNU Guile Reference Manual.
@c Copyright (C)  2008,2009
@c   Free Software Foundation, Inc.
@c See the file guile.texi for copying conditions.

@node A Virtual Machine for Guile
@section A Virtual Machine for Guile

Guile has both an interpreter and a compiler. To a user, the
difference is largely transparent -- interpreted and compiled
procedures can call each other as they please.

The difference is that the compiler creates and interprets bytecode
for a custom virtual machine, instead of interpreting the
S-expressions directly. Running compiled code is faster than running
interpreted code.

The virtual machine that does the bytecode interpretation is a part of
Guile itself. This section describes the nature of Guile's virtual
machine.

@menu
* Why a VM?::   
* VM Concepts:: 
* Stack Layout::
* Variables and the VM::   
* VM Programs:: 
* Instruction Set::
@end menu

@node Why a VM?
@subsection Why a VM?

For a long time, Guile only had an interpreter, called the evaluator.
Guile's evaluator operates directly on the S-expression representation
of Scheme source code.

But while the evaluator is highly optimized and hand-tuned, and
contains some extensive speed trickery (@pxref{Memoization}), it still
performs many needless computations during the course of evaluating an
expression. For example, application of a function to arguments
needlessly conses up the arguments in a list. Evaluation of an
expression always has to figure out what the car of the expression is
-- a procedure, a memoized form, or something else. All values have to
be allocated on the heap. Et cetera.

The solution to this problem is to compile the higher-level language,
Scheme, into a lower-level language for which all of the checks and
dispatching have already been done -- the code is instead stripped to
the bare minimum needed to ``do the job''.

The question becomes then, what low-level language to choose? There
are many options. We could compile to native code directly, but that
poses portability problems for Guile, as it is a highly cross-platform
project.

So we want the performance gains that compilation provides, but we
also want to maintain the portability benefits of a single code path.
The obvious solution is to compile to a virtual machine that is
present on all Guile installations.

The easiest (and most fun) way to depend on a virtual machine is to
implement the virtual machine within Guile itself. This way the
virtual machine provides what Scheme needs (tail calls, multiple
values, call/cc) and can provide optimized inline instructions for
Guile (cons, struct-ref, etc.).

So this is what Guile does. The rest of this section describes that VM
that Guile implements, and the compiled procedures that run on it.

Note that this decision to implement a bytecode compiler does not
preclude native compilation. We can compile from bytecode to native
code at runtime, or even do ahead of time compilation. More
possibilities are discussed in @xref{Extending the Compiler}.

@node VM Concepts
@subsection VM Concepts

A virtual machine (VM) is a Scheme object. Users may create virtual
machines using the standard procedures described later in this manual,
but that is usually unnecessary, as Guile ensures that there is one
virtual machine per thread. When a VM-compiled procedure is run, Guile
looks up the virtual machine for the current thread and executes the
procedure using that VM.

Guile's virtual machine is a stack machine -- that is, it has few
registers, and the instructions defined in the VM operate by pushing
and popping values from a stack.

Stack memory is exclusive to the virtual machine that owns it. In
addition to their stacks, virtual machines also have access to the
global memory (modules, global bindings, etc) that is shared among
other parts of Guile, including other VMs.

A VM has generic instructions, such as those to reference local
variables, and instructions designed to support Guile's languages --
mathematical instructions that support the entire numerical tower, an
inlined implementation of @code{cons}, etc.

The registers that a VM has are as follows:

@itemize
@item ip - Instruction pointer
@item sp - Stack pointer
@item fp - Frame pointer
@end itemize

In other architectures, the instruction pointer is sometimes called
the ``program counter'' (pc). This set of registers is pretty typical
for stack machines; their exact meanings in the context of Guile's VM
is described in the next section.

A virtual machine executes by loading a compiled procedure, and
executing the object code associated with that procedure. Of course,
that procedure may call other procedures, tail-call others, ad
infinitum -- indeed, within a guile whose modules have all been
compiled to object code, one might never leave the virtual machine.

compiler.texi: Compiling to the virtual machine

2009-01-11 Thread Andy Wingo
http://git.savannah.gnu.org/gitweb/?p=guile.git;a=blob;f=doc/ref/compiler.texi;hb=refs/heads/vm

@c -*-texinfo-*-
@c This is part of the GNU Guile Reference Manual.
@c Copyright (C)  2008
@c   Free Software Foundation, Inc.
@c See the file guile.texi for copying conditions.

@node Compiling to the Virtual Machine
@section Compiling to the Virtual Machine

Compilers have a mystique about them that is attractive and
off-putting at the same time. They are attractive because they are
magical -- they transform inert text into live results, like throwing
the switch on Frankenstein's monster. However, this magic is perceived
by many to be impenetrable.

This section aims to pay attention to the small man behind the
curtain.

@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
know how to compile your .scm file.

@menu
* Compiler Tower::   
* The Scheme Compiler::   
* GHIL:: 
* GLIL::
* Object Code::   
* Extending the Compiler::
@end menu

@node Compiler Tower
@subsection Compiler Tower

Guile's compiler is quite simple, actually -- its @emph{compilers}, to
put it more accurately. Guile defines a tower of languages, starting
at Scheme and progressively simplifying down to languages that
resemble the VM instruction set (@pxref{Instruction Set}).

Each language knows how to compile to the next, so each step is simple
and understandable. Furthermore, this set of languages is not
hardcoded into Guile, so it is possible for the user to add new
high-level languages, new passes, or even different compilation
targets.

Languages are registered in the module, @code{(system base language)}:

@example
(use-modules (system base language))
@end example

They are registered with the @code{define-language} form.

@deffn {Scheme Syntax} define-language @
name title version reader printer @
[parser=#f] [read-file=#f] [compilers='()] [evaluator=#f]
Define a language.

This syntax defines a @code{#language} object, bound to @var{name}
in the current environment. In addition, the language will be added to
the global language set. For example, this is the language definition
for Scheme:

@example
(define-language scheme
  #:title   Guile Scheme
  #:version 0.5
  #:reader  read
  #:read-file   read-file
  #:compilers   `((,ghil . ,compile-ghil))
  #:evaluator   (lambda (x module) (primitive-eval x))
  #:printer write)
@end example

In this example, from @code{(language scheme spec)}, @code{read-file}
reads expressions from a port and wraps them in a @code{begin} block.
@end deffn

The interesting thing about having languages defined this way is that
they present a uniform interface to the read-eval-print loop. This
allows the user to change the current language of the REPL:

@example
$ guile
Guile Scheme interpreter 0.5 on Guile 1.9.0
Copyright (C) 2001-2008 Free Software Foundation, Inc.

Enter `,help' for help.
scheme@@(guile-user) ,language ghil
Guile High Intermediate Language (GHIL) interpreter 0.3 on Guile 1.9.0
Copyright (C) 2001-2008 Free Software Foundation, Inc.

Enter `,help' for help.
ghil@@(guile-user) 
@end example

Languages can be looked up by name, as they were above.

@deffn {Scheme Procedure} lookup-language name
Looks up a language named @var{name}, autoloading it if necessary.

Languages are autoloaded by looking for a variable named @var{name} in
a module named @code{(language @var{name} spec)}.

The language object will be returned, or @code{#f} if there does not
exist a language with that name.
@end deffn

Defining languages this way allows us to programmatically determine
the necessary steps for compiling code from one language to another.

@deffn {Scheme Procedure} lookup-compilation-order from to
Recursively traverses the set of languages to which @var{from} can
compile, depth-first, and return the first path that can transform
@var{from} to @var{to}. Returns @code{#f} if no path is found.

This function memoizes its results in a cache that is invalidated by
subsequent calls to @code{define-language}, so it should be quite
fast.
@end deffn

There is a notion of a ``current language'', which is maintained in
the @code{*current-language*} fluid. This language is normally Scheme,
and may be rebound by the user. The runtime compilation interfaces
(@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
and target languages.

The normal tower of languages when compiling Scheme goes like this:

@itemize
@item Scheme, which we know and love
@item Guile High Intermediate Language (GHIL)
@item Guile Low Intermediate Language (GLIL)
@item Object code
@end itemize

Object code may be serialized to disk directly, though it has a cookie
and version prepended to the front. But when compiling Scheme at
runtime, you want a Scheme value, e.g. a compiled procedure. For this
reason, so as not to break the abstraction, Guile defines a fake
language, @code{value}. Compiling to @code{value} loads the 

someone please implement a lua language

2009-01-11 Thread Andy Wingo
Hello.

Lua gets a fair amount of press, and is fine in its way. People like it
for the same reason that people liked Tcl: Lua is simple, embeddable,
and has the mainstream, Algol-like syntax. Also, it has a reasonably
fast implementation.

That's cool! It would be interesting to enhance Lua with the rich
runtime of Guile -- all of POSIX, pthreads, and all of Guile's excellent
libraries.

Lua is *really* simple. See http://www.lua.org/manual/5.1/manual.html#8.
Does someone want to write a simple Lua parser (ideally finding an EBNF
parser first) and compile that to GHIL? That would be a great project.

Andy
-- 
http://wingolog.org/