Reified generics - shadow class edition.
I believe that try to make method descriptor variant is a bad idea, it comes
from the model 1...3 experimentation but it's an artifact of such
implementations, not a concept.
Here i describe a way to keep generics erased even if they are reified.
If the descriptor is erased, we need a way to get reified type argument at
runtime so you can use 'checkcast' to verify that the arguments that are
parameterized have the right type.
By example
class Holder<any E> {
E element;
<when E!= void> // can be specialized to throw a NoSuchMethodError if E is
void
E get() {
return element;
}
<when E!= void>
void set(E element) {
this.element = element;
}
}
will be translated into
class Holder<any E> {
Object element;
Object get() {
return element;
}
void set(Object element) {
element checkcast Es
// verify that element is the type argument of E here
this.element = element;
}
}
Now to bridge the gap, we also need:
- a way to explain to the VM at runtime that the field 'element' is specialized
(if it's a value type)
- a way explain to the VM at runtime that the methods get and set has different
implementations
For that i proposed a new mechanism in the VM called master class/shadow class
which is a way define a specialized class, the shadow class, from a template
class, the master class. In my example, Holder is the master class and
Holder<Complex> with Complex a value type is a shadow class 'derived' at
runtime from the master class.
This mechanism is more general than just supporting type specialization in the
VM because
- we do not want to inject the Java generic semantics in the VM or the Scala
semantics, or the Kotlin semantics, etc.
- we can support more use cases, so other languages can by example associate a
constant an int to a class like in C++.
So the idea is introduce two things that works together:
1) implement in the VM a mechanism that allows to add constant objects as
supplementary values (class data) when defining a class
2) use a bootstrap method (to "go meta" as John said) to allow to specialize
fields and methods of such class
Those two features may be cleanly separated in the future, but i'm not sure how
to do that, so for now, let say they are two parts of the same feature, the
master class/shadow class feature.
For (1), we need a class file attribute that describe the class of each class
data, we don't need to name them, it can be positional (for java generics we
may introduce another attribute or re-use one existing to find the name of the
class data if they are type parameter).
For (2), we need to specify a boostrap method that will be called to describe
how the specialization should be done.
Considering (1) and (2) as a unique feature means you can have the same class
attribute definining the class data and the boostrap method.
The MasterClass attribute
MasterClass_attribute
u2 attribute_name_index;
u4 attribute_length;
u2 number_of_class_data;
{
u2 descriptor
u2 default_value
} class_data[number_of_class_data]
u2 bootstrap_method_attr_index;
u2 name_and_type_index;
}
The class data descriptor is a field descriptor that describes the class of
the class_data, it should be a class among int, long, float, double, String,
MethodType, MethodHandle, i.e. the type of the constant that can appear in the
constant pool.
The default value is a constant pool item that defines the value that will be
used if the shadow class is created with no class_data.
The bootstrap method is called to derive a shadow class from a master class if
the shadow class has not yet been created yet. The bootstrap method takes a
Lookup configured on the master class, a name, a Class (the type of the
name_and_type) and an array of Object containing the class data as parameter (+
some eventual boostrap arguments) and returns a reference to the
java.lang.invoke.Classy.
The type of the name_and_type as to be a subtype of java.lang.invoke.Classy.
The interface java.lang.invoke.Classy describes how to specialize a shadow
class from a master class.
interface Classy {
Class<?> superclass();
Class<?>[] interfaces();
String fieldDescriptor(String field, String descriptor);
MethodHandle method(String name, String descriptor);
}
superclass() returns the super-class on the shadow class, it has to be a
specialization of master class super-class (a subtype of the master class
super-class) or the master class super-class it self.
interfaces() return the interfaces of the shadow class, each interface has to
be a subtype of the master class corresponding interface or the corresponding
interface itself.
fieldDescriptor() is called for each field of the master class, with the field
name and the field descriptor the master class, this method returns the field
descriptor of corresponding field of the shadow class, it must be a subtype of
the master class field. If null is returns, it means the field doesn't exist
and a NoSuchFieldError will be thrown upon access.
method() is called for each method of the master class, with the method name
and its method descriptor, this method returns a method handle corresponding to
the specialization of the master class method in the shadow class. The method
handle type as to be exactly the same as the descriptor sent as parameter. If
null is returns, it means the method doesn't exist and a NoSuchMethodError will
be thrown upon access.
The idea here is that a shadow class is a covariant variant of the master
class, a field can be replaced by a subtype, a method can be replaced by a
specialized variant with the same parameter types. This allow any shadow call
to be accessed using any opcode that takes the master class as owner, getfield,
putfield, all invoke* opcodes. For getfield, a value-type can be buffered by
the VM to Object/an interface. For putfield, the VM as to perform an extra
check at runtime (like there is an extra check for arraystore because arrays
are covariant).
The interface Classy can be used by the VM at anypoint in time, so calls to
method can be lazy or not (the other informations are needed to determine the
layout of a class so they can not be called lazily).
At runtime, for the VM, an instance of a shadow class is a subtype of a master
class.
The fact that the shadow class is a subtype of the master class allows to
desugar wildcards in Java as the master class.
A shadow class has no special encoding in the bytecode, it only has a
representation in the runtime data structure of the VM.
In order to be be backward compatible, java.lang.Class is extended to also
represents shadow classes, java.lang.Class is extended by the following methods:
- Class<?> withClassData(Object... data) that returns the shadow class of a
master class.
- Object[] getClassData() that returns the class data of a shadow class or null.
- boolean isMasterClass() return if current class is a master class.
- Class<?> getMasterclass() that returns the master class of a shadow class or
the current class otherwise (a classical class is it's own master class).
Reusing java.lang.Class to represent shadow classes at runtime is important
because it allows reflection and java.lang.invoke to works seamlessly with the
shadow class because from a user point of view, a classical class and a shadow
class are all java.lang.Class.
There is a compatibility issue with Object.getClass(), isInstance, instanceof
and checkcast, they can not can not returns/uses the shadow class because a
code like this o.getClass() == ArrayList.class or o instanceof ArrayList will
not work if the comparison uses the shadow class. This means that getClass(),
instanceof and checkcast need to check the master class of the shadow class
instead of using the shadow class directly.
Note that this problem is not inherent to the shadow class, it's an artifact of
the fact that the type argument is reified.
This means that we have to introduce a least a supplementary methods for
getClass(), a static method in class, Class.getTheTrueRealClass(Object o) is
enough, it also means that if we want to allow reified cast/instanceof in
Java/.class notation, this will have to be implemented using
invokedynamic/condy (again to avoid to avoid to bolt the Java generics
semantics in the VM). We may also choose to not support reified cast/instanceof
in Java, given that being able to specialized fields/methods is more important
in term of performance and that we will not support reified generics of objects
anyway.
The fact that a shadow class has a representation in the classfile means that
we are loosing information because if ArrayList is anyfied,
ArrayList<String> list = ...
list.get(3)
list.get() is encoded in the bytecode as a calls to the master class ArrayList
and not a class to the shadow class ArrayList, so a call to an anyfied generics
is still erased, but given that this information is available at runtime (the
inlining cache stores the shadow class), a JIT can easily inline the call.
With the classfile only containing classical descriptor, in term of opcodes we
need only to add to support few operations
- new on an anyfied class
- new on an anyfied array
- invocation of an anyfield method.
for all theses operations, the idea is to send the class data (method data) by
storing them on the stack and have a bytecode that describe them as class
data/method data.
We also need to way to get the method data inside the method on stack.
I propose to introduce two new opcodes, dataload and datastore,
- dataload is constructed with a concatenation of field descriptors as
parameter (or a method descriptor with no parens and return type) and takes all
values on stack and store them in a side channel.
- datastore also takes a concatenation of field descriptors as parameter and
extract the data from the side channel to the stack.
dataload is used as prefix of anew, anewarray to pass the class data that will
be used to build the shadow class (if not already created)
dataload is used as prefix of all invoke* bytecode to pass the method data.
We also need a special reflection method in Thread, getMethodData() that
returns the method data associated to the current method as an array or null if
no method data was pass when the method was called.
Note that when invokedynamic is perfix by a dataload, the bootstrap method has
no access to the data, only the target of the callsite will see the method data.
To summarize, i propose to implement reified generics in the VM by introducing
the notion of shadow class, a class only available at runtime that has
associated class data and a user defined way to do fields and methods
specialization at runtime. The main advantages of the solution is that old
classes will not only be able to use anyfied generics but old code will be also
optimized by JITs as if it was a new code.
regards,
Rémi