This message needs an "Impractical Content" warning, but I want to log an 
interesting line of thought raised today in our Valhalla meeting with IBM.  It 
actually is practical to think about, as a mental model.

We could, if we chose to do it at some far-future date, clarify the roles of CP 
entries by splitting them as thoroughly as possible into functionally distinct 
types.

Here's the idea:

Revamp CP to clearly distinguish (a) artifact references from (b) usage 
requests.

Artifact references are named class-files and named parts thereof:   
Class[ref], Fieldref, Methodref.

Usage requests are specific operations which may be performed on those named 
entities:  InvokeStatic, InvokeSpecial, InvokeVirtual, InvokeInterface, 
InvokeOnValue, GetField, GetStatic, PutField, PutStatic, WithField.

Linking an artifact reference loads the artifact, which implements some usage 
requests but not all.

Linking a usage request verifies that the usage is well-formed and caches (on 
the usage CP entry, not the artifact CP entry) whatever bits help the 
instruction go fast after linkage.

(Indy/Condy don't work directly on named artifacts, so they are off to the 
side.)

It's safe to say we will never do all of this, but it helps, I think, as a 
mental framework when considering all the funky overloading inside today's CP.

(But, those request types do look a lot like MethodHandle constants.  Funny…)

The worst overloading in the CP is the need to store double resolution 
information on an [Interface]Methodref in case it has to handle both 
invokespecial and invoke[virtual,interface].  But there are also lots of little 
status bits to support dynamic checks of {get,put}{static,field}.  The J9 guys 
commented that the double-resolution thing is familiar to them.

Lots of that implementation noise would drop away if there were enough CP 
entries to go around for each distinct type of reference.

Why would we even consider such a change?  Because right now we have to add 
more usage request types to cover value types, and eventually 
templates/generics.  In today's EG meeting I was advocating pushing harder on 
the current model of fewer, more overloaded constants for Valhalla, because 
that's what we do today.  Then we collectively realized that constant 
overloading has always been such a royal pain that nobody wants to keep doing 
it.  So, for the purpose of argument, we pivoted toward the other extreme, in a 
brief discussion of a hyper-split CP with basically one constant per 
instruction type.

(I think, as a matter of design esthetic, Java tends to lump more than it 
splits.  The original decision in CP design, to lump more functions onto fewer 
CP entries, made a superficially simpler constant pool.  It has been a burden 
on JVM implementors, who agonize even to this day on how to make CP a random 
access data structure with element types of widely varying size.)

This idea of CP splitting is food for thought which (I think) can help us 
settle more confidently on a fair compromise in a real release.

This design approach prompts us to consider, in the nearer term, a few new CP 
types, incrementally added to the current design.  For example, QType[…] which 
derives the Q-mode version from a class artifact:

  ldc[ CONSTANT_QType[Class["Foo"]] ]
  getfield[ CONSTANT_Fieldref[QType[Class["Foo"]], NameAndType["bar", "I"]] ]
  invokespecial[ CONSTANT_Methodref[QType[Class["Foo"]], NameAndType["baz", 
"()F"]] ]

Or maybe the Q-mode-ness goes into the field or method reference:

  ldc[ CONSTANT_Dynamic[[get the Q-type from the L-type], Class["Foo"]] ]
  getfield[ CONSTANT_VFieldref[Class["Foo"]], NameAndType["bar", "I"]]
  invokespecial[ CONSTANT_VMethodref[Class["Foo"]], NameAndType["baz", "()F"]]

In any case, mode information (Q vs. L vs. …) is incompressible.  What I mean 
is that the Q/L distinction has to go somewhere, either the instruction or the 
symbolic reference stored in the constant pool.  (Symbolic, not resolved, is an 
important distinction here.  The instruction can always resolve the CP 
reference and dip into the runtime bits, but the verifier greatly prefers to 
operate on the pre-resolved symbolic references.)

To avoid the extra CP types, we could squeeze all the mode-ish bits up into the 
instructions as follows:

  vldc[ Class["Foo"]] ]
  vgetfield[ CONSTANT_Fieldref[QType[Class["Foo"]], NameAndType["bar", "I"]] ]
  vinvoke[ CONSTANT_Methodref[QType[Class["Foo"]], NameAndType["baz", "()F"]] ]

But the problem with modal-instructions-plus-nonmodal-constants is that each 
constant has to be prepared to be resolved in several modes.  (Lumping 
constants means more resolution information per constant.)  Splitting the 
constants allows (though does not require) nonmodal constants.

As noted in the meeting, a possible simplification of Minimal Value Types is we 
don't need to overload Class since we could easily have different names running 
around:  Class["Foo"] vs. Class["Foo$DVT"] or the like.  That means that we 
could, for the moment, continue to overload Fieldref and Methodref, as long as 
each as only used in exactly one mode (to be hashed out at link-time).  But 
that's only a short-term help, not a long term design.

— John

P.S.  We could also duplicate the mode information in both CP and instruction:

  vgetfield[ CONSTANT_VFieldref[Class["Foo"]], NameAndType["bar", "I"]]
  vinvoke[ CONSTANT_VMethodref[Class["Foo"]], NameAndType["baz", "()F"]]

…Thus starting down Overkill Road, which leads to Crazytown:

  vgetfield[ CONSTANT_VFieldref[ QType[Class[";QFoo;"]]], VNameAndType["bar", 
"I"]]

P.P.S. If we go with modey CP constants, I think we need to admit, as a 
concession to the legacies of history, that the CONSTANT_Class guy will forever 
denote an L-mode type (unless we do LType[Class["foo"]]?) and we will need a 
different CP constant (and maybe even condy for ldc) to refer to its Q-type or 
U-type.

P.P.P.S. If we tried to do all of the above CP splitting for real we'd be 
breaking so much glass that we'd feel compelled to address other design points, 
such as heterogeneous CPs (another not-so-good legacy), a limit of two 
components per CP entry, and of course the 16-bit limit.  Dealing with all of 
that at once will be a tarpit, and we're already too busy doing important 
stuff.  So file this note also under "Hard decisions to make when our 
grandchildren revamp the whole class-file format."

Reply via email to