This message needs an "Impractical Content" warning, but I want to log an
interesting line of thought raised today in our Valhalla meeting with IBM. It
actually is practical to think about, as a mental model.
We could, if we chose to do it at some far-future date, clarify the roles of CP
entries by splitting them as thoroughly as possible into functionally distinct
types.
Here's the idea:
Revamp CP to clearly distinguish (a) artifact references from (b) usage
requests.
Artifact references are named class-files and named parts thereof:
Class[ref], Fieldref, Methodref.
Usage requests are specific operations which may be performed on those named
entities: InvokeStatic, InvokeSpecial, InvokeVirtual, InvokeInterface,
InvokeOnValue, GetField, GetStatic, PutField, PutStatic, WithField.
Linking an artifact reference loads the artifact, which implements some usage
requests but not all.
Linking a usage request verifies that the usage is well-formed and caches (on
the usage CP entry, not the artifact CP entry) whatever bits help the
instruction go fast after linkage.
(Indy/Condy don't work directly on named artifacts, so they are off to the
side.)
It's safe to say we will never do all of this, but it helps, I think, as a
mental framework when considering all the funky overloading inside today's CP.
(But, those request types do look a lot like MethodHandle constants. Funny…)
The worst overloading in the CP is the need to store double resolution
information on an [Interface]Methodref in case it has to handle both
invokespecial and invoke[virtual,interface]. But there are also lots of little
status bits to support dynamic checks of {get,put}{static,field}. The J9 guys
commented that the double-resolution thing is familiar to them.
Lots of that implementation noise would drop away if there were enough CP
entries to go around for each distinct type of reference.
Why would we even consider such a change? Because right now we have to add
more usage request types to cover value types, and eventually
templates/generics. In today's EG meeting I was advocating pushing harder on
the current model of fewer, more overloaded constants for Valhalla, because
that's what we do today. Then we collectively realized that constant
overloading has always been such a royal pain that nobody wants to keep doing
it. So, for the purpose of argument, we pivoted toward the other extreme, in a
brief discussion of a hyper-split CP with basically one constant per
instruction type.
(I think, as a matter of design esthetic, Java tends to lump more than it
splits. The original decision in CP design, to lump more functions onto fewer
CP entries, made a superficially simpler constant pool. It has been a burden
on JVM implementors, who agonize even to this day on how to make CP a random
access data structure with element types of widely varying size.)
This idea of CP splitting is food for thought which (I think) can help us
settle more confidently on a fair compromise in a real release.
This design approach prompts us to consider, in the nearer term, a few new CP
types, incrementally added to the current design. For example, QType[…] which
derives the Q-mode version from a class artifact:
ldc[ CONSTANT_QType[Class["Foo"]] ]
getfield[ CONSTANT_Fieldref[QType[Class["Foo"]], NameAndType["bar", "I"]] ]
invokespecial[ CONSTANT_Methodref[QType[Class["Foo"]], NameAndType["baz",
"()F"]] ]
Or maybe the Q-mode-ness goes into the field or method reference:
ldc[ CONSTANT_Dynamic[[get the Q-type from the L-type], Class["Foo"]] ]
getfield[ CONSTANT_VFieldref[Class["Foo"]], NameAndType["bar", "I"]]
invokespecial[ CONSTANT_VMethodref[Class["Foo"]], NameAndType["baz", "()F"]]
In any case, mode information (Q vs. L vs. …) is incompressible. What I mean
is that the Q/L distinction has to go somewhere, either the instruction or the
symbolic reference stored in the constant pool. (Symbolic, not resolved, is an
important distinction here. The instruction can always resolve the CP
reference and dip into the runtime bits, but the verifier greatly prefers to
operate on the pre-resolved symbolic references.)
To avoid the extra CP types, we could squeeze all the mode-ish bits up into the
instructions as follows:
vldc[ Class["Foo"]] ]
vgetfield[ CONSTANT_Fieldref[QType[Class["Foo"]], NameAndType["bar", "I"]] ]
vinvoke[ CONSTANT_Methodref[QType[Class["Foo"]], NameAndType["baz", "()F"]] ]
But the problem with modal-instructions-plus-nonmodal-constants is that each
constant has to be prepared to be resolved in several modes. (Lumping
constants means more resolution information per constant.) Splitting the
constants allows (though does not require) nonmodal constants.
As noted in the meeting, a possible simplification of Minimal Value Types is we
don't need to overload Class since we could easily have different names running
around: Class["Foo"] vs. Class["Foo$DVT"] or the like. That means that we
could, for the moment, continue to overload Fieldref and Methodref, as long as
each as only used in exactly one mode (to be hashed out at link-time). But
that's only a short-term help, not a long term design.
— John
P.S. We could also duplicate the mode information in both CP and instruction:
vgetfield[ CONSTANT_VFieldref[Class["Foo"]], NameAndType["bar", "I"]]
vinvoke[ CONSTANT_VMethodref[Class["Foo"]], NameAndType["baz", "()F"]]
…Thus starting down Overkill Road, which leads to Crazytown:
vgetfield[ CONSTANT_VFieldref[ QType[Class[";QFoo;"]]], VNameAndType["bar",
"I"]]
P.P.S. If we go with modey CP constants, I think we need to admit, as a
concession to the legacies of history, that the CONSTANT_Class guy will forever
denote an L-mode type (unless we do LType[Class["foo"]]?) and we will need a
different CP constant (and maybe even condy for ldc) to refer to its Q-type or
U-type.
P.P.P.S. If we tried to do all of the above CP splitting for real we'd be
breaking so much glass that we'd feel compelled to address other design points,
such as heterogeneous CPs (another not-so-good legacy), a limit of two
components per CP entry, and of course the 16-bit limit. Dealing with all of
that at once will be a tarpit, and we're already too busy doing important
stuff. So file this note also under "Hard decisions to make when our
grandchildren revamp the whole class-file format."