Hey, I hope you don't mind my replying in digest form.
First off, I guess I should be a little clearer as to what VPthon is and what it does. VPython is essentially a set of patches for CPython (in touches only three files, diff -b is about 800 lines IIRC plus the switch statement in ceval.c's EvalFrameEx()). The main change is moving the VM instruction implementations, in CPython, blocks of code following a case label, into a separate file, adding Vmgen stack comments, and removing the explicit stack manipulation code (plus some minor modification like renaming variables to work with Vmgen's type prefixes and labels to enable the generation of superinstructions). Vmgen parses the stack comments and prints some macros around the provided instruction code (incidentally, this reduced the 1500 line switch body to about 1000 lines). Interested parties should consult ceval.vmg and ceval-vm.i. The nice thing about this is that: a) It's fairly easy to implement different types of dispatch, simply by changing a few macros (and while I haven't done this, it shouldn't be a problem to add some switch dispatch #ifdefs for non-GCC platforms). In particular, direct threaded code leads to less horrible branch prediction than switch dispatch on many machines (exactly how pronounced this effect is depends heavily on the specific architecture). b) Vmgen can generate superinstructions. A quick primer: A sequence of code such as LOAD_CONST LOAD_FAST BINARY_ADD will, in CPython, push some constant onto the stack, push some local onto the stack, then pop both off the stack, add them and push the result back onto the stack. Turning this into a superinstruction means inlining LOAD_CONST and LOAD_FAST, modifying them to store the values they'd otherwise push onto the stack in local variables and adding a version of BINARY_ADD which reads its arguments from those local variables rather than the stack (this reduces dispatch time in addition to pops and pushes). David Gregg (and friends) recently published a paper comparing stack based and register based VMs for Java and found that register based VMs were substantially faster. The main reason for this appears to be the absence of the various LOAD_ instructions in a register VM. They looked at mitigating this using superinstructions but Java would have required (again, IIRC) about a 1000 (which leads to substantial code growth). Since Python doesn't have multiple (typed) versions of every instruction (Java has iadd and so on) much fewer superinstructions are necessary. On my system, superinstructions account for about 10% of the 30% performance gain. As for limitations, as the README points out, currently 2 cases in test_doctest fail, as well as 1 case in test_hotshot, test_inspect, and test_subprocess. And most of the cases in test_trace. The reason for this is, I suspect, that I removed the line tracing code from ceval.c (I didn't want to look at it detail, and it doesn't seem to affect anything else). I expect this would be a bit of work to fix but I don't see it as a huge problem (in fact, if you don't use settrace(?) it shouldn't affect you?). Stack caching: a previous version of VPython supported this, but the performance gain was minimal (maybe 1-2%, though if done really well (e.g. using x as the top of stack cache), who knows, more may be possible). Also, it let to some problems with the garbage collector seeing an out-of-date stack_pointer[-1]. ``Cell'' is, unfortunately, hardcoded into Vmgen. I could either patch that or run ceval-vm.i through sed or something. Finally, to the people who pointed out that VPython (the name) is already taken: Darn! I really should have checked that! Will call it something else in the future. Anyway, HTH, -jakob _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com