Hi, i hope that I do not bore you with continuing talking about guile-log.
But one of the things brought up by Mark, and rightly so, are that having the fundaments of logic programming based on a stack compared to assq lists or similar more scalable functional data structures makes it complicated to take advantage of multiple cores and threads. To answer this I have now enabled the possibility to use multiple stacks by blending the two techniques together. The reason to take such a complicated approach are threefold. 1. Stack based techniques are faster and quite much so if we can take advantage of special vm operation or are able to compile the code. I find that assq based one will saturate and we would for the uniprocessor case get a difference around 10 in speed. 2. Stack based approaches can take advantage of the fact that it is a stack and implement variables that behaves like global variables but also like stack variables and can be reconstruted from a stored state. Now these data-structures look nonfunctional but they have similarities to variables living on the stack e.g. in order for reentrance to work they need to reference functional datastructures. Anyway I find that with them one can formulate many problem in a natural way which a functional approach cannot mach hence make sure that we have an expressive framework that preserves a functional behavior e.g. running stored code that does not reference global variables always return the same and also we are able to put all code in tail position and enable user space code to yield a return value simply by returning a value in the user space function. 3. We are able to trace rebuilding unwinding, this is nice with respect to debugging but also is a algorithmic enabler because you can design functionality taking advantage of this fact. Well there are some drawbacks. 1. Things are a little more complicated. 2. There might be some overhead in some situation, althogh selecting the right mix of a stack based approach and assq list based one will lead to good performance in most cases. ----------------------------------------------------------------------------------------------------------------------------------------- How the simple multithreading works: Each thread has a stack of their own and each created variable has an id indicating to which stack they belong. Now if a thread tries to set a variable from another thread then it will do this by consing on an assq list in stead of modifying the other stack and simply mimic kanrens strategy. ------------------------------------------------------------------------------------------------ VM OPS. -------------- Here one can work on different levels. But I choose to look into just reducing the overhead for function call's out into C land. You might write a direct op code that calls that function, but this means a lot of VM op's. And I therefore settled in a smaller set of op-codes that used function pointers stored in a array reachable form the VM. And then used op-codes like (fast-call-0 index) (fast-call-1 index a1) (fast-call-2 index a2) Then when we load the c functions like gp-car form the shared library I add a stub like, (fast-call-set! gp-car 2 5) (define-inlinable (gp-car x s) (fast-call-2 5 x s)) The first is a vm-op that transferes gp-car function pointer to the function pointer array at arguments 2, index 5. and then I overwrite the gp-car! with a suitable inline funciton. With this techniqe I can increase the speed of the stack based logic code by a factor 2-3. The assq based one is only marginally faster. I think I could win a factor of 2 in the number of vm ops if I could design into this framework the possibility to return several variables. e.g. Consider destructuring a cons cell in a car and a cdr. Currently this is done like, (let ((s (gp-pair!? x s))) (if s (let ((x-car (gp-car x s)) (x-cdr (gp-cdr x s))) ... I would like to be able to write (let-values (((s x-car x-cdr) (gp-pair-fast!? x s))) (if s ...)) And the op-codes would align quite nicely and reduce the op count. There is a further improvements that could be made but then everything is special and special vm op-codes would be needed and that bloat that should be handled by implementing A JIT engine or naitive compilation in stead. My conclusion is that it's possible to implement a tool in guile that would make it quite easy to improve the speed of vm based guile by a factor of 2-5 by using C based elements without the need to be a VM wizard and it would be nice to have something along the lines discussed above but with better integration into guile. I did also redesign the code so currently I'm at 80ms (for stack based guile-log) for the einstein case that basically tests the speed fo destructuring a pair in it's elements. Cheers Stefan