Re: Evolving CONSTANT_Class

Dan Smith Thu, 04 Jun 2020 12:17:54 -0700

Some thoughts, working backwards from species, that may inform this decision.


A *species* is a specialization of a (generic) class or interface, where by 
"specialization" we mean the class/interface declaration interpreted in the 
context of a constant pool that has been modified by inserting certain resolved 
constants.

At the use site (think 'new'), we informally talk about a species like 
'List[Val]'. What this means is "the species produced by resolving 'List', 
resolving 'Val', and modifying the constant pool of 'List' with the resolved 
'Val'".

It will also be common to talk about species like 'List[T]', where 'T' is 
represented by a constant pool entry that will be filled in with a live 
constant.

This suggests that our representation of a species should combine 1) a pointer 
to a Class constant, and 2) pointers to other resolvable constants (typically, 
but maybe not exclusively, representing types).

I think we intuitively want to encode a species with something like 
'Class("LList[QVal;];")', but this encoding is flawed:
- There's no constant pool entry to cache the resolution of List
- There's no constant pool entry to cache the resolution of Val
- There's no way to encode a live type argument (List[T]), so we'd need a 
separate encoding for that
- Depending on the domain of type arguments (can I use an integer?), there's no 
descriptor string encoding for many other type arguments; again, we'd need a 
separate encoding

I'm appealing here to a design principle that seems to have driven the original 
constant pool design: Class constants are for things that get resolved (and can 
be cached); descriptor strings are little more than fancy names. This principle 
doesn't always get followed: the verifier sometimes loads classes named by 
descriptors; array type class constants resolve their element types without a 
separate entry; more recently, StackMapTables use Class constants to represent 
types, and MethodTypes resolve method descriptors "as if" there were class 
constants for all of the parameter types. But I think these, especially the 
recent ones, are mistakes, and I still think the original notion is a useful 
separation of concerns that we should try to follow in our design.

Implications, if you buy this argument:

- There's got to be some sort of new CONSTANT_Species entry consisting of 
pointers to the generic class and the type arguments.

- For class-flavored references that allow species (super_class, interfaces, 
new, maybe this_class), either a Class can point to a Species, or a Species can 
appear as an alternative to a Class.

- For type-flavored references (Methodref, instanceof, anewarray), again we 
need either a Class/Type that can point to the Species, or we allow the Species 
as an alternative to be referenced directly. A distinct problem here is that we 
need a way to express whether the species type is an L type or a Q type. Maybe 
that's an extra layer, or maybe it's built into CONSTANT_Species. (This is 
really the same problem as what we do about L vs. Q class types, but without 
the legacy constraints.)

- For bare descriptors (type of a field), it's fine to use something like 
"LList[QVal;];". Or maybe it's useful to describe descriptors in terms of 
Class/Species constants. In any case, there's still a need to figure out how to 
parameterize a descriptor with live constants ("LList[$T];"), but I think this 
can be set aside as a separate problem.

-----

Bonus round: generic methods.

Generic methods work a lot like species—at the use site, we need to be able to 
refer to a method in the context of a constant pool that has been modified by 
inserting certain resolved constants. (We might even want to use the term 
"species" here, too. Or maybe it's "specialized method", where "specialized 
class" = "species".)

The existing representation of a method to be invoked is a Methodref, which has 
pointers to a Class constant, a name string, and a descriptor string.

So I think we need CONSTANT_SpecializedMethodref, which has 1) a pointer to a 
Methodref constant, and 2) pointers to some resolvable constants (typically, 
but maybe not exclusively, representing types). (Caveat: there are some details 
about the interaction between type arguments, overriding, and method resolution 
that I'm hand-waving about. Maybe the encoding will be stacked a little 
differently.)

Again, we can either somehow wrap the SpecializedMethodref in a Methodref (this 
seems a lot more awkward that it does when wrapping a Species in a Class), or 
we can allow the use sites (invoke instructions, mostly) to point to either 
Methodrefs or SpecializedMethodrefs.

-----

Where this leaves me (acknowledging that I've made some leaps that some people 
might be more skeptical of) is pretty down on options (1) and (2). If we do 
(4), CONSTANT_Type is going to be heavily overloaded: it can refer to a 
descriptor, a SpeciesType, an ArrayType (for arrays of species types), a type 
variable, etc. Basically, the distinction between (3) and (4) amounts to 
whether outside references can point to one of many alternatives, or whether 
they're all routed through a CONSTANT_Type, which then points to one of the 
alternatives. I can imagine good arguments for both of those alternatives.

Re: Evolving CONSTANT_Class

Reply via email to