Some thoughts, working backwards from species, that may inform this decision.
A *species* is a specialization of a (generic) class or interface, where by
"specialization" we mean the class/interface declaration interpreted in the
context of a constant pool that has been modified by inserting certain resolved
constants.
At the use site (think 'new'), we informally talk about a species like
'List[Val]'. What this means is "the species produced by resolving 'List',
resolving 'Val', and modifying the constant pool of 'List' with the resolved
'Val'".
It will also be common to talk about species like 'List[T]', where 'T' is
represented by a constant pool entry that will be filled in with a live
constant.
This suggests that our representation of a species should combine 1) a pointer
to a Class constant, and 2) pointers to other resolvable constants (typically,
but maybe not exclusively, representing types).
I think we intuitively want to encode a species with something like
'Class("LList[QVal;];")', but this encoding is flawed:
- There's no constant pool entry to cache the resolution of List
- There's no constant pool entry to cache the resolution of Val
- There's no way to encode a live type argument (List[T]), so we'd need a
separate encoding for that
- Depending on the domain of type arguments (can I use an integer?), there's no
descriptor string encoding for many other type arguments; again, we'd need a
separate encoding
I'm appealing here to a design principle that seems to have driven the original
constant pool design: Class constants are for things that get resolved (and can
be cached); descriptor strings are little more than fancy names. This principle
doesn't always get followed: the verifier sometimes loads classes named by
descriptors; array type class constants resolve their element types without a
separate entry; more recently, StackMapTables use Class constants to represent
types, and MethodTypes resolve method descriptors "as if" there were class
constants for all of the parameter types. But I think these, especially the
recent ones, are mistakes, and I still think the original notion is a useful
separation of concerns that we should try to follow in our design.
Implications, if you buy this argument:
- There's got to be some sort of new CONSTANT_Species entry consisting of
pointers to the generic class and the type arguments.
- For class-flavored references that allow species (super_class, interfaces,
new, maybe this_class), either a Class can point to a Species, or a Species can
appear as an alternative to a Class.
- For type-flavored references (Methodref, instanceof, anewarray), again we
need either a Class/Type that can point to the Species, or we allow the Species
as an alternative to be referenced directly. A distinct problem here is that we
need a way to express whether the species type is an L type or a Q type. Maybe
that's an extra layer, or maybe it's built into CONSTANT_Species. (This is
really the same problem as what we do about L vs. Q class types, but without
the legacy constraints.)
- For bare descriptors (type of a field), it's fine to use something like
"LList[QVal;];". Or maybe it's useful to describe descriptors in terms of
Class/Species constants. In any case, there's still a need to figure out how to
parameterize a descriptor with live constants ("LList[$T];"), but I think this
can be set aside as a separate problem.
-----
Bonus round: generic methods.
Generic methods work a lot like species—at the use site, we need to be able to
refer to a method in the context of a constant pool that has been modified by
inserting certain resolved constants. (We might even want to use the term
"species" here, too. Or maybe it's "specialized method", where "specialized
class" = "species".)
The existing representation of a method to be invoked is a Methodref, which has
pointers to a Class constant, a name string, and a descriptor string.
So I think we need CONSTANT_SpecializedMethodref, which has 1) a pointer to a
Methodref constant, and 2) pointers to some resolvable constants (typically,
but maybe not exclusively, representing types). (Caveat: there are some details
about the interaction between type arguments, overriding, and method resolution
that I'm hand-waving about. Maybe the encoding will be stacked a little
differently.)
Again, we can either somehow wrap the SpecializedMethodref in a Methodref (this
seems a lot more awkward that it does when wrapping a Species in a Class), or
we can allow the use sites (invoke instructions, mostly) to point to either
Methodrefs or SpecializedMethodrefs.
-----
Where this leaves me (acknowledging that I've made some leaps that some people
might be more skeptical of) is pretty down on options (1) and (2). If we do
(4), CONSTANT_Type is going to be heavily overloaded: it can refer to a
descriptor, a SpeciesType, an ArrayType (for arrays of species types), a type
variable, etc. Basically, the distinction between (3) and (4) amounts to
whether outside references can point to one of many alternatives, or whether
they're all routed through a CONSTANT_Type, which then points to one of the
alternatives. I can imagine good arguments for both of those alternatives.