I don't know if this topic has probably been already beat to death, or is otherwise not very interesting or relevant here, but alas...

it is a question though what is the "ideal" level of abstraction (and "generality") in a VM.


for example, "LLVM" is fairly low level (using a statically-typed SSA-form as an IR, and IIRC a partially-decomposed type-system). the JVM is a little higher level, being a statically-typed stack machine (using "primitive" types for stack elements and operations), with an abstracted notion of in-memory class layout; MSIL/CIL is a little higher still, abstracting the types out of the stack elements (all operations work against inferred types, and unlike the JVM there is no notion of "long and double take 2 stack slots", ...).

both the JVM and MSIL tend to declare types "from the POV of their point of use", rather than from their point of declaration. hence, the "load" or "call" operations directly reference a location giving the type of the variable.

similarly, things like loads / stores / method-calls/dispatching / ... are resolved prior to emitting the bytecode.


in my VMs, I have tended to leave the types "at the point of declaration", hence all the general load/store/call operations "merely" link to a symbolic-reference.

one of my attempts (this VM never got fully implemented) would have attempted to pre-resolve all scoping (like in the JVM or .NET, but ran into problems WRT a complex scoping model), but I have not generally done this.

my current VM only does so for the lexical scope, which is treated conceptually as a stack: all variable declarations are "pushed" to the lexical environment, and "popped" when a given frame exits; technically, function arguments are pushed in left-to-right order, meaning that (counter-intuitively) their index numbers are reverse of their argument position; unlike in JBC or MSIL, the index does not directly reference a declared variables' declaration, merely its relative stack position, hence it is also needed to "infer" the declaration; note that it being (conceptually) a stack also does not imply it is physically also represented as a stack.

hence, in the above case, the bytecode not too far removed from the source code.


I guess one can argue, that as one moves up the abstraction layer, then the amount of work needed in making the VM becomes larger (it deals with far more semantics issues, and is arguably more specific to the particular languages in use, ...).

I suspect it is much less clear cut than this though, for example, targeting a dynamic-language (such as Scheme or JavaScript) to a VM such as LLVM or JBC (pre JDK7) essentially requires implementing much of the VM within the VM, and may ultimately reduce how effectively the VM can optimize the code (rather than merely dealing with the construct, now the first VM also has to deal with how the second VM's constructs were implemented on top of the first VM).

a secondary issue is when the restrictions of such a VM (particularly the JVM) impede what can be effectively expressed within the VM, running counter to the notion that higher abstraction necessarily equates to greater semantic restrictions.


the few cases where I can think of where the argument does make a difference include:

the behavior of variable scoping (mostly moot for JVM, which pretty much hard-codes this); the effects of declaration modifiers (moot regarding JVM and .NET, which manage modifiers internally).

the "shape" of the type-system and numeric tower (likewise as the above, although neither "enforces" a particular type-system, neither gives much room for it to be effectively done much differently, likewise in LLVM and ASM one is confined to whatever is provided by the HW).

the behavior of specific operators as applied to specific types. this may be a merit of the JVM and .NET arguably vs my own VMs, since both VMs only perform operations directly against primitive types, the behavior of mixed-type cases is de-facto left to the language and compiler, this may be ultimately a moot point, as manual type-coercion or scope-qualified operator overloading could achieve the same ends. similarly, a high-level VM could also (simply) discard the notion of built-in/hard-coded operator+type semantics, and instead expect the compiled code to either overload operators or import a namespace containing the desired semantics (say, built-in or library-supplied overloaded operators). more-so, unlike the JVM and .NET strategies, this does not mandate the need for static typing (prior to emitting bytecode) in order to achieve language-specific type-semantics.

in the above case (operators being a result of an implicit import), if Language-A disallows "string+int", Language-B interprets it as "append the string(a) with int::toString(b)", and Language-C as "offset the string by int chars", well then, the languages can each do so without interfering with the others.

...


or, in effect, I am seeing fairly little compelling reason (apart from simplicity of the VM runtime) for why a lower-level VM representation would be necessarily preferable to a higher-level one.

one may almost as well just make a VM represent a distilled-down version of C++ style semantics (probably with some extensions and omissions, and represented as bytecode), and make the requirement for other languages "figure out how to compile your code into working with C++ like semantics...", this being in contrast to the alternative trend which is to make VM's which look (more or less) like brain-damaged versions of assembler.


so, probable features of such a VM architecture:
probably bytecode, and probably a stack machine (any "good" reason to do otherwise?);
probably type-inferred values;
types are not declared at their point of use (they are declared at their point of declaration); avoidance of explicitly generated type coercions (this will be left to the VM); avoidance of manual handling of method dispatches (again, left to the VM, the bytecode will merely give the VM its argument list on the stack, likely using a "mark");
operators are mapped fairly directly;
built-in notions for: properties, operator overloading, typedefs, delegates/function-pointers, scope-delegation (can be used for Prototype-OO, implementing namespaces/import, ...), ...
...

so, in such a case, "a+b" always compiles to, say, "load a; load b; add;", regardless of the types in use, and probably even whether or not the operator takes the argument by-reference (although this would imply that all "loads" are "load-by-reference" rather than the more conventional "load-by-value").

note: one could canonize the use of "load-by-reference" semantics as well, namely that the stack is not a stack of values, but rather a stack of value-references. say, when an addition operator is called on a pair of references, it results in a 3rd reference (to "somewhere") which in turn holds the value (a real VM would likely, however, optimize away most such needless references).


granted, not all languages look like C-like languages:
some don't have explicit operators, handling such operations instead via function or method calls, ...
(me thinking here of Scheme and Smalltalk).

but, this shouldn't be a huge issue: they will map their constructs instead to whatever they do use, and probably implement relevant namespaces to implement their semantics. if done well, this still shouldn't result in big ugly language-walls.

another key may be to largely separate "interface from implementation" regarding many types, so two languages can see the same type, but present different local interfaces for said type (the opposite direction of the "everything is an object" concept, which seeks to assume that "int" is an implementation of some "Integer" class). instead, a language could alias operations to the type (as an extension of good-old operator overloading) whereby things like method-calls to an integer can also be intercepted/overloaded, essentially allowing the language to itself implement its mapping between "int" and "some class named Integer".

another possible formalization would be, of course, that there are two-such "classes", and all operations are implicitly method calls into one of them (the languages' "view" of the type), with a second class representing "the type as seen aliased to a class". of course, for this to work would require a notion of "class" somewhat different from that of the "traditional" OO notion of a class (as a single-point-definition which may only be extended via overloading), to instead its conception as "the aggregation of all operations visible for the class type" (essentially, the "methods" would be more conceptually similar to that of "named overloaded operators" than to that of traditional "virtual class methods").

I think it would be easier just to have free-standing "named overloaded operators accepting any number of arguments", which instead boils down mostly to "overloaded functions" (operator overloading is, in effect, merely a special case of a plain overloaded function). (the "add" opcode in such a case could more be seen as a shorthand for a function-call to such a function).

admittedly, this is basically how I had implemented operator overloading anyways (partly, my VM is not nearly so generic, and so many operators are hard-coded, and most operator handlers are "global").

(I could write more, but I am getting burnt out on writing this at the moment...).


or such...

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Reply via email to