vm ops and threading discussion for guile-log

Stefan Israelsson Tampe Tue, 24 Apr 2012 13:09:24 -0700

Hi,

i hope that I do not bore you with continuing talking about guile-log.


But one of the things brought up by Mark, and rightly so, are that having
the fundaments
of logic programming based on a stack compared to assq lists or similar more
scalable functional data structures makes it complicated to take advantage
of multiple
cores and threads.

To answer this I have now enabled the possibility to use multiple stacks by
blending the two
techniques together. The reason to take such a complicated approach are
threefold.

1. Stack based techniques are faster and quite much so if we can take
advantage of special
vm operation or are able to compile the code. I find that assq based one
will saturate and we
would for the uniprocessor case get a difference around 10 in speed.

2. Stack based approaches can take advantage of the fact that it is a stack
and implement
variables that behaves like global variables but also like stack variables
and can be reconstruted
from a stored state. Now these data-structures look nonfunctional but they
have similarities
to variables living on the stack e.g. in order for reentrance to work they
need to reference functional
datastructures. Anyway I find that with them one can formulate many problem
in a natural way
which a functional approach cannot mach hence make sure that we have an
expressive framework
that preserves a functional behavior e.g. running stored code that does not
reference global variables always return
the same and also we are able to put all code in tail position and enable
user space code to yield a return
value simply by returning a value in the user space function.

3. We are able to trace rebuilding unwinding, this is nice with respect to
debugging but also is a algorithmic
enabler because you can design functionality taking advantage of this fact.

Well there are some drawbacks.
1. Things are a little more complicated.
2. There might be some overhead in some situation, althogh selecting the
right mix of
a stack based approach and assq list based one will lead to good
performance in most cases.
-----------------------------------------------------------------------------------------------------------------------------------------

How the simple multithreading works:
Each thread has a stack of their own and each created variable has an id
indicating to which stack they belong.
Now if a thread tries to set a variable from another thread then it will do
this by consing on an assq list in stead
of modifying the other stack and simply mimic kanrens strategy.
------------------------------------------------------------------------------------------------

VM OPS.
--------------
Here one can work on different levels. But I choose to look into just
reducing the overhead for function call's
out into C land. You might write a direct op code that calls that function,
but this means a lot of VM op's.
And I therefore settled in a smaller set of op-codes that used function
pointers stored in a array reachable
form the VM. And then used op-codes like

(fast-call-0 index)
(fast-call-1 index a1)
(fast-call-2 index a2)

Then when we load the c functions like gp-car form the shared library I add
a stub like,

(fast-call-set! gp-car 2 5)
(define-inlinable (gp-car x s)
  (fast-call-2 5 x s))

The first is a vm-op that transferes gp-car function pointer to the
function pointer array at arguments 2,  index 5.
and then I overwrite the gp-car! with a suitable inline funciton.

With this techniqe I can increase the speed of the stack based logic code
by a factor 2-3. The assq based one is only marginally
faster.

I think I could win a factor of 2 in the number of vm ops if I could design
into this framework the possibility to return several
variables. e.g.

Consider destructuring a cons cell in a car and a cdr. Currently this is
done like,

(let ((s (gp-pair!? x s)))
  (if s
    (let ((x-car  (gp-car x s))
          (x-cdr  (gp-cdr x s)))
       ...

I would like to be able to write

(let-values (((s x-car x-cdr) (gp-pair-fast!? x s)))
   (if s ...))

And the op-codes would align quite nicely and reduce the op count.

There is a further improvements that could be made but then everything is
special and special vm op-codes would be needed and that bloat that should
be handled
by implementing A JIT engine or naitive compilation in stead.

My conclusion is that it's possible to implement a tool in guile that would
make it quite
easy to improve the speed of vm based guile by a factor of 2-5 by using C
based elements
without the need to be a VM wizard and it would be nice to have something
along the lines
discussed above but with better integration into guile.

I did also redesign the code so currently I'm at 80ms (for stack based
guile-log) for the einstein case that basically tests
the speed fo destructuring a pair in it's elements.

Cheers
Stefan

vm ops and threading discussion for guile-log

Reply via email to