Hi all,
I've started to implement a prototype (far from finished) of the parametric VM 
based on John position paper.

  https://github.com/forax/civilizer


Most of the design is great (even really great) but i think it goes too deep 
and there is a minimal parametric vm that is hidden inside.

By Minimal Parameteric VM, or MPVM, i mean a barebone design which is just 
enough to able to specialize parametric class and parametric method so a 
List<Complex> is using an array Complex instead of an array of Object which 
seems a nice intermediary goal.

So i propose to simplify the design as an intermediary step with the explicit 
goal that the MPVM should be able to specialize generics over value types, not 
more.
The main difference is that the MPVM does not need to deal with subtyping of 
parametrized classes, so the opcode checkcast and instanceof does not need to 
be specialized and calling methods on a parametric class does not require the 
owner+type parameters to be reified in the bytecode.
 

- The Parametric attribute:
 
A parametric class or a parametric method as declared as such if there is a the 
class attribute (the method attribute) Parametric is defined.
You can not have more than one Parametric attribute by class/method.

  Parametric_attribute {
    u2 attribute_name_index;  // Parametric
    u4 attribute_length;
    u2 anchor_index;
  }

A parametric attribute references a CONSTANT_Anchor_info that after resolution 
stores a couple of Objects, the first one is the class parameter, the second 
one is the method parameter. It works that way.

CONSTANT_Anchor_info {
  u1 tag; // CONSTANT_Anchor = 21
  u2 bootstrap_method_attr_index;   at runtime, CallSite.target: MH 
(Anchor)Anchor
}

When a parametric class/parametric method is instantiated with a parameter, the 
VM creates an Anchor object containing the parameter. The bootstrap method of 
the CONSTANT_Anchor_info is called to get a method handle (that takes an Anchor 
and returns an Anchor). The target of the BSM is called with the anchor created 
by the VM and here the jdk code can erase the parameter or do whatever should 
be done. The resulting Anchor is stored as result in a constant pool (it 
becomes a loadable constant that can be referenced by ldc or bootstrap method 
constants).

The Anchor object is a value record:

  value record Anchor(Object parameter) {}

 

- Parametrized opcodes

The opcodes new, aconst_init, anewarray, invokestatic, invokevirtual, 
invokeinterface and invokespecial can specify a parameter.
For that, instead of referencing a CONSTANT_Class_info or an XMethodref, they 
reference a CONSTANT_Linkage_info that itself reference the right constant

  CONSTANT_Linkage_info {
    u1 tag; // JVM_CONSTANT_Linkage = 22
    u2 parameter_index;
    u2 reference_index;  // CONSTANT_Class_info or XMethodref
  }

The parameter_index references a loadable constants (the usuals + 
CONSTANT_Anchor_info). The reference_index references either a 
CONSTANT_Class_info or an XMethodref depending on the opcode.
At runtime, the constant referenced by the parameter_index is a Species object 
for new, aconst_init and anewarray and a Linkage object for the invoke* opcodes.

  value record Species(Class<?> raw, Object parameters) {}
  value record Linkage(Object parameters) {}

A species object is defined by a runtime class (so it can represent classes 
that only available at runtime like the secondary type of a zero default value 
class) and a parameter. A linkage object only store a parameter.

value record Species(Class<?> raw, Object parameters) {}
value record Linkage(Object parameters) {}


Chain of constants and runtime representation depending on the opcode:
    new  (CONSTANT_Linkage_info -> CONSTANT_Class_info), at runtime Species
    aconst_init (CONSTANT_Linkage_info -> CONSTANT_Class_info), at runtime 
Species
    anewarray (CONSTANT_Linkage_info -> CONSTANT_Class_info), at runtime Species
    invokestatic, invokevirtual, invokeinterface, invokespecial 
(CONSTANT_Linkage_info -> XMethodref) at runtime Linkage(parameters)


At runtime, when one of the opcodes new, aconst_init and anewarray is first 
called, the VM checks that the raw class of the species is parametric, then 
parameter_index is resolved, then the VM calls the the BSM of the anchor and 
create a parametric version of the class with the parameter of the Anchor if it 
does not already exist. This parametric class is stored as the class of the 
instance created.

Ar runtime, when one of the opcodes invoke* is first call, the parameter_index 
is resolved, the the VM checks that the raw class of the species is parametric, 
then parameter_index is resolved, then the VM calls the the BSM of the anchor 
and create a parametric version of the method with the parameter of the Anchor 
if it does not already exist.


- Class Pool segregation

Because the Anchors are the roots of the constant dynamic trees, the VM can 
segregate the constant pool items as described in John's paper.
 

- Class that inherits/implements parametric class

A class (parametric or not), can reference parametric class/interfaces, so the 
supername and interfaces of the class header may reference a 
CONSTANT_Linkage_info (that itself reference a CONSTANT_Class_info) resolved as 
a Species at runtime.


- Type Restriction

In order to avoid type pollution to propagate, fields and method can defined 
the attribute TypeRestriction that defines restriction (Class at runtime) on 
the method parameters and field.

  TypeRestriction_attribute {
    u2 attribute_name_index;  // TypeRestriction
    u4 attribute_length;
    u2 restrictions_count;
    u2 restrictions[restrictions_count];  // at runtime Class
  }

(Note: there is no need of validating return value for the MPVM but the class 
corresponding to the return type can be present).


- Comparaison with John's vision

It's the cheap version, it still require a lot of works but it has the 
advantage of being simpler, less opcodes to change, subtyping is not changed, 
the callee site does not do more validation and is in my opinion a good first 
step.


Rémi

Reply via email to