Reified generics - shadow class edition.

I believe that try to make method descriptor variant is a bad idea, it comes 
from the model 1...3 experimentation but it's an artifact of such 
implementations, not a concept.
Here i describe a way to keep generics erased even if they are reified.

If the descriptor is erased, we need a way to get reified type argument at 
runtime so you can use 'checkcast' to verify that the arguments that are 
parameterized have the right type.
By example

  class Holder<any E> {
    E element;

    <when E!= void>  // can be specialized to throw a NoSuchMethodError if E is 
void
    E get() {
      return element;
    }

    <when E!= void>
    void set(E element) {
      this.element = element;
    }
  }
will be translated into
  class Holder<any E> {
    Object element;

    Object get() {
      return element;
    }
    void set(Object element) {
      element checkcast Es
      // verify that element is the type argument of E here
      this.element = element;
    }
  }

Now to bridge the gap, we also need:
- a way to explain to the VM at runtime that the field 'element' is specialized 
(if it's a value type)
- a way explain to the VM at runtime that the methods get and set has different 
implementations 

For that i proposed a new mechanism in the VM called master class/shadow class 
which is a way define a specialized class, the shadow class, from a template 
class, the master class. In my example, Holder is the master class and 
Holder<Complex> with Complex a value type is a shadow class 'derived' at 
runtime from the master class.

This mechanism is more general than just supporting type specialization in the 
VM because
- we do not want to inject the Java generic semantics in the VM or the Scala 
semantics, or the Kotlin semantics, etc.
- we can support more use cases, so other languages can by example associate a 
constant an int to a class like in C++. 

So the idea is introduce two things that works together:
1) implement in the VM a mechanism that allows to add constant objects as 
supplementary values (class data) when defining a class
2) use a bootstrap method (to "go meta" as John said) to allow to specialize 
fields and methods of such class

Those two features may be cleanly separated in the future, but i'm not sure how 
to do that, so for now, let say they are two parts of the same feature, the 
master class/shadow class feature.

For (1), we need a class file attribute that describe the class of each class 
data, we don't need to name them, it can be positional (for java generics we 
may introduce another attribute or re-use one existing to find the name of the 
class data if they are type parameter).
For (2), we need to specify a boostrap method that will be called to describe 
how the specialization should be done.

Considering (1) and (2) as a unique feature means you can have the same class 
attribute definining the class data and the boostrap method.
  The MasterClass attribute

  MasterClass_attribute
    u2 attribute_name_index;
    u4 attribute_length;
    u2 number_of_class_data;
    {
      u2 descriptor
      u2 default_value  
    } class_data[number_of_class_data] 
    u2 bootstrap_method_attr_index;
    u2 name_and_type_index;
  }

 The class data descriptor is a field descriptor that describes the class of 
the class_data, it should be a class among int, long, float, double, String, 
MethodType, MethodHandle, i.e. the type of the constant that can appear in the 
constant pool.
 The default value is a constant pool item that defines the value that will be 
used if the shadow class is created with no class_data.   
 The bootstrap method is called to derive a shadow class from a master class if 
the shadow class has not yet been created yet. The bootstrap method takes a 
Lookup configured on the master class, a name, a Class (the type of the 
name_and_type) and an array of Object containing the class data as parameter (+ 
some eventual boostrap arguments) and returns a reference to the 
java.lang.invoke.Classy.
 The type of the name_and_type as to be a subtype of java.lang.invoke.Classy.

The interface java.lang.invoke.Classy describes how to specialize a shadow 
class from a master class.
  interface Classy {
    Class<?> superclass();
    Class<?>[] interfaces();
    String fieldDescriptor(String field, String descriptor);
    MethodHandle method(String name, String descriptor);
  }

superclass() returns the super-class on the shadow class, it has to be a 
specialization of master class super-class (a subtype of the master class 
super-class) or the master class super-class it self.
interfaces() return the interfaces of the shadow class, each interface has to 
be a subtype of the master class corresponding interface or the corresponding 
interface itself.
fieldDescriptor() is called for each field of the master class, with the field 
name and the field descriptor the master class, this method returns the field 
descriptor of corresponding field of the shadow class, it must be a subtype of 
the master class field. If null is returns, it means the field doesn't exist 
and a NoSuchFieldError will be thrown upon access.
method() is called for each method of the master class, with the method name 
and its method descriptor, this method returns a method handle corresponding to 
the specialization of the master class method in the shadow class. The method 
handle type as to be exactly the same as the descriptor sent as parameter. If 
null is returns, it means the method doesn't exist and a NoSuchMethodError will 
be thrown upon access.

The idea here is that a shadow class is a covariant variant of the master 
class, a field can be replaced by a subtype, a method can be replaced by a 
specialized variant with the same parameter types. This allow any shadow call 
to be accessed using any opcode that takes the master class as owner, getfield, 
putfield, all invoke* opcodes. For getfield, a value-type can be buffered by 
the VM to Object/an interface. For putfield, the VM as to perform an extra 
check at runtime (like there is an extra check for arraystore because arrays 
are covariant).

The interface Classy can be used by the VM at anypoint in time, so calls to 
method can be lazy or not (the other informations are needed to determine the 
layout of a class so they can not be called lazily).

At runtime, for the VM, an instance of a shadow class is a subtype of a master 
class.

The fact that the shadow class is a subtype of the master class allows to 
desugar wildcards in Java as the master class.
A shadow class has no special encoding in the bytecode, it only has a 
representation in the runtime data structure of the VM.

In order to be be backward compatible, java.lang.Class is extended to also 
represents shadow classes, java.lang.Class is extended by the following methods:
- Class<?> withClassData(Object... data) that returns the shadow class of a 
master class.
- Object[] getClassData() that returns the class data of a shadow class or null.
- boolean isMasterClass() return if current class is a master class.
- Class<?> getMasterclass() that returns the master class of a shadow class or 
the current class otherwise (a classical class is it's own master class).  

Reusing java.lang.Class to represent shadow classes at runtime is important 
because it allows reflection and java.lang.invoke to works seamlessly with the 
shadow class because from a user point of view, a classical class and a shadow 
class are all java.lang.Class.

There is a compatibility issue with Object.getClass(), isInstance, instanceof 
and checkcast, they can not can not returns/uses the shadow class because a 
code like this o.getClass() == ArrayList.class  or o instanceof ArrayList will 
not work if the comparison uses the shadow class. This means that getClass(), 
instanceof and checkcast need to check the master class of the shadow class 
instead of using the shadow class directly.
Note that this problem is not inherent to the shadow class, it's an artifact of 
the fact that the type argument is reified.

This means that we have to introduce a least a supplementary methods for 
getClass(), a static method in class, Class.getTheTrueRealClass(Object o) is 
enough, it also means that if we want to allow reified cast/instanceof in 
Java/.class notation, this will have to be implemented using 
invokedynamic/condy (again to avoid to avoid to bolt the Java generics 
semantics in the VM). We may also choose to not support reified cast/instanceof 
in Java, given that being able to specialized fields/methods is more important 
in term of performance and that we will not support reified generics of objects 
anyway.  
 

The fact that a shadow class has a representation in the classfile means that 
we are loosing information because if ArrayList is anyfied,
  ArrayList<String> list = ...
  list.get(3)
list.get() is encoded in the bytecode as a calls to the master class ArrayList 
and not a class to the shadow class ArrayList, so a call to an anyfied generics 
is still erased, but given that this information is available at runtime (the 
inlining cache stores the shadow class), a JIT can easily inline the call.

With the classfile only containing classical descriptor, in term of opcodes we 
need only to add to support few operations
- new on an anyfied class
- new on an anyfied array
- invocation of an anyfield method.
for all theses operations, the idea is to send the class data (method data) by 
storing them on the stack and have a bytecode that describe them as class 
data/method data.
We also need to way to get the method data inside the method on stack.

I propose to introduce two new opcodes, dataload and datastore,
- dataload is constructed with a concatenation of field descriptors as 
parameter (or a method descriptor with no parens and return type) and takes all 
values on stack and store them in a side channel.
- datastore also takes a concatenation of field descriptors as parameter and 
extract the data from the side channel to the stack.

dataload is used as prefix of anew, anewarray to pass the class data that will 
be used to build the shadow class (if not already created)
dataload is used as prefix of all invoke* bytecode to pass the method data.

We also need a special reflection method in Thread, getMethodData() that 
returns the method data associated to the current method as an array or null if 
no method data was pass when the method was called.

Note that when invokedynamic is perfix by a dataload, the bootstrap method has 
no access to the data, only the target of the callsite will see the method data.


To summarize, i propose to implement reified generics in the VM by introducing 
the notion of shadow class, a class only available at runtime that has 
associated class data and a user defined way to do fields and methods 
specialization at runtime. The main advantages of the solution is that old 
classes will not only be able to use anyfied generics but old code will be also 
optimized by JITs as if it was a new code. 


regards,
RĂ©mi

Reply via email to