Register VM WIP

2012-05-11 Thread Andy Wingo
Hi all,

This mail announces some very early work on a register VM.  The code is
in wip-rtl (work in progress, register transfer language.  The latter
bit is something of a misnomer.).  There is not much there yet:
basically just the VM, an assembler, and a disassembler.  Still, it's
interesting, and I thought people might want to hear more about it.

So, the deal: why is it interesting to switch from a stack VM, which is
what we have, to a register VM?  There are three overriding
disadvantages to the current VM.

  1) With our stack VM, instructions are smaller.  They do less, so you
 need more of them.  This increases dispatch cost, which is the
 largest cost of a VM.

  2) On a stack VM, there is a penalty to naming values.  Since the only
 values that are accessible to an instruction are the ones on the
 top of the stack, whenever you want to use more names, you end up
 doing a lot of local-ref / local-set operations.  In contrast an
 instruction for a register VM can address many more operands, so
 there is much less penalty to storing something on the stack.  (The
 penalty is not so much in the storage, but in the pushing and
 popping needed to access it.)

  3) Our stack VM has variable-sized stack frames, so we need to check
 for overflow every time we push a value on the stack.  This is
 quite costly.

The WIP register VM fixes all of these issues.

The basic design of the VM is: 32-bit instruction words, 8-bit opcodes,
variable-length instructions, maximum of 24-bit register addressing, and
static, relocatable allocation of constants.

Also, with the wip-rtl VM there is no stack pointer: locals are
addressed directly via the frame pointer, and the call frame for a
function is of a fixed size.  Indeed the IP and FP are the only state
variables of the VM, which makes it much easier to think about native
compilation, given the scarcity of CPU registers on some architectures.

See vm-engine.c from around line 1000 for a commented set of
instructions.  It's messy in many ways now, but hey.

As far as performance goes, we won't know yet.  But at least for a
simple loop, counting down from a billion, the register VM is a few
times faster than the stack VM.  Still, I would be happy if the general
speedup were on the order of 40%.  We'll see.

Here's that loop in rtl VM:

   (use-modules (system vm rtl))

   (assemble-rtl-program
 0
 '((assert-nargs-ee/locals 1 2)
   (br fix-body)
   loop-head
   (make-short-immediate 2 0)
   (br-if-= 1 2 out)
   (sub1 1 1)
   (br loop-head)
   fix-body
   (mov 1 0)
   (br loop-head)
   out
   (make-short-immediate 0 #t)
   (return 0)))

There are various ways to improve this, but its structure is like what
the stack VM produces.

Compare to the current opcode:

scheme@(guile-user) (define (countdown n) (let lp ((n n)) (or (zero? n) 
(lp (1- n)
scheme@(guile-user) ,x countdown
Disassembly of #procedure countdown (n):

   0(assert-nargs-ee/locals 17) ;; 1 arg, 2 locals
   2(br :L186)  ;; - 30
   6(local-ref 1)   ;; `n'
   8(make-int8:0)   ;; 0
   9(ee?)   
  10(local-set 2)   ;; `t'
  12(local-ref 2)   ;; `t'
  14(br-if-not :L187)   ;; - 21
  18(local-ref 2)   ;; `t'
  20(return)
  21(local-ref 1)   ;; `n'
  23(sub1)  
  24(local-set 1)   ;; `n'
  26(br :L188)  ;; - 6
  30(local-ref 0)   ;; `n'
  32(local-set 1)   
  34(br :L188)  ;; - 6

OK, time to set down the keyboard; been working far too much on this in
recent days.  I still need to adapt the compiler to produce RTL
bytecode.  I am going to let it sit for a week or two before touching it
again.  Comments welcome.

Regards,

Andy
-- 
http://wingolog.org/



Wish List

2012-05-11 Thread Sjoerd van Leent
Hi all,

As I have been playing around with the sources of Guile, I looked at
the bug section on Savannah. I couldn't see much bugs and even less
bugs which are not pending already. Can anyone hint to some things
that are easy enough to pick up so I can start contributing something
sensible?

Regards,
Sjoerd



Re: Register VM WIP

2012-05-11 Thread Stefan Israelsson Tampe
Hi,

This looks very good. i like the hole approach and this approach has the
potential to address most of the issues I have seen when disassembling
guile-2.0 output. A few notes.

1. What about growing stacks any coments if they will be easier to manage
for this setup. Can one copy the C stack logic?

2. Is there an instruction that does what call does but can be used for
tail call's
when it needs it e.g. the code
 for (n = 0; n  nargs; n++)
LOCAL_SET (n, old_fp[ip[4 + n]]);
that is missing for the tail code

3. I would appriciate if the frame is always below say 256 SCM:s of the fp
stack limit
that way when preparing tail calling one doesn't usally need to check if
the argument fit's
when issuing a tail call. If you compile a function that tail call more
then 254 (?) arguments
then you can as well check because then be free relative the argument
handling.

4. I think the logic code hook I recently investigated could easily fit
into this VM engine with
using similar techniques as I described in previous mails.

Thanks for your work on this
Stefan


On Fri, May 11, 2012 at 6:19 PM, Andy Wingo wi...@pobox.com wrote:

 Hi all,

 This mail announces some very early work on a register VM.  The code is
 in wip-rtl (work in progress, register transfer language.  The latter
 bit is something of a misnomer.).  There is not much there yet:
 basically just the VM, an assembler, and a disassembler.  Still, it's
 interesting, and I thought people might want to hear more about it.

 So, the deal: why is it interesting to switch from a stack VM, which is
 what we have, to a register VM?  There are three overriding
 disadvantages to the current VM.

  1) With our stack VM, instructions are smaller.  They do less, so you
 need more of them.  This increases dispatch cost, which is the
 largest cost of a VM.

  2) On a stack VM, there is a penalty to naming values.  Since the only
 values that are accessible to an instruction are the ones on the
 top of the stack, whenever you want to use more names, you end up
 doing a lot of local-ref / local-set operations.  In contrast an
 instruction for a register VM can address many more operands, so
 there is much less penalty to storing something on the stack.  (The
 penalty is not so much in the storage, but in the pushing and
 popping needed to access it.)

  3) Our stack VM has variable-sized stack frames, so we need to check
 for overflow every time we push a value on the stack.  This is
 quite costly.

 The WIP register VM fixes all of these issues.

 The basic design of the VM is: 32-bit instruction words, 8-bit opcodes,
 variable-length instructions, maximum of 24-bit register addressing, and
 static, relocatable allocation of constants.

 Also, with the wip-rtl VM there is no stack pointer: locals are
 addressed directly via the frame pointer, and the call frame for a
 function is of a fixed size.  Indeed the IP and FP are the only state
 variables of the VM, which makes it much easier to think about native
 compilation, given the scarcity of CPU registers on some architectures.

 See vm-engine.c from around line 1000 for a commented set of
 instructions.  It's messy in many ways now, but hey.

 As far as performance goes, we won't know yet.  But at least for a
 simple loop, counting down from a billion, the register VM is a few
 times faster than the stack VM.  Still, I would be happy if the general
 speedup were on the order of 40%.  We'll see.

 Here's that loop in rtl VM:

   (use-modules (system vm rtl))

   (assemble-rtl-program
 0
 '((assert-nargs-ee/locals 1 2)
   (br fix-body)
   loop-head
   (make-short-immediate 2 0)
   (br-if-= 1 2 out)
   (sub1 1 1)
   (br loop-head)
   fix-body
   (mov 1 0)
   (br loop-head)
   out
   (make-short-immediate 0 #t)
   (return 0)))

 There are various ways to improve this, but its structure is like what
 the stack VM produces.

 Compare to the current opcode:

scheme@(guile-user) (define (countdown n) (let lp ((n n)) (or (zero?
 n) (lp (1- n)
scheme@(guile-user) ,x countdown
Disassembly of #procedure countdown (n):

   0(assert-nargs-ee/locals 17) ;; 1 arg, 2 locals
   2(br :L186)  ;; - 30
   6(local-ref 1)   ;; `n'
   8(make-int8:0)   ;; 0
   9(ee?)
  10(local-set 2)   ;; `t'
  12(local-ref 2)   ;; `t'
  14(br-if-not :L187)   ;; - 21
  18(local-ref 2)   ;; `t'
  20(return)
  21(local-ref 1)   ;; `n'
  23(sub1)
  24(local-set 1)   ;; `n'
  26(br :L188)  ;; - 6
  30(local-ref 0)   ;; `n'
  32(local-set 1)
  34(br :L188)  ;; - 6

 OK, time to set down the keyboard; been 

Re: Wish List

2012-05-11 Thread Ian Price
Sjoerd van Leent svanle...@gmail.com writes:

 As I have been playing around with the sources of Guile, I looked at
 the bug section on Savannah. I couldn't see much bugs and even less
 bugs which are not pending already.

The savannah bug tracker isn't really used anymore, instead there is
http://debbugs.gnu.org/cgi/pkgreport.cgi?pkg=guile.

-- 
Ian Price

Programming is like pinball. The reward for doing it well is
the opportunity to do it again - from The Wizardy Compiled