Here’s a summary of the story we came up with for erased generics over values. It builds on the typing story outlined in John’s “Q-Types in L-World” writeup.

   Background

In MVT, there were separate carrier types (Q and L) for values and references. The Q carriers were not nullable, and an explicit conversion was required between L and Q types. This offered perfect nullity information to the JVM, but little ability to abstract over both values and references. This showed up in lots of places:

 * Values themselves could not implement interfaces, only their
   companion box type could.
 * Could not operate on values with |a*| instructions.
 * Could not store values in |Object| variables.

Each of these conflicted with the desire for genericity (whether specialized or erased). In Q-world, we couldn’t have erased generics over values, because erased code could not operate on values for multiple reasons (wrong carriers, wrong bytecodes). And looking ahead to specialized generics, having separate bytecodes for references and values increased the complexity of the specialization transform.

L-World started out as an experiment to validate the following hypothesis:

   We can, without significant performance compromises, reuse the |L|
   carrier and |a*| bytecodes for value types, and allow them to be
   proper subtypes of their interfaces (and |Object|).

In LW1, we took a hybrid approach; nullability and flattenability become properties of the variable (field, array element, stack slot, local variable), rather than a property of the type itself. This means we could be tolerant of nulls in local variables and stack slots, only rejecting nulls when they hit the heap, and we then relied on the translation strategy to insert null checks to prevent introduction of nulls. This allowed value-oblivious code to act as a “conduit” for values between value-aware code and a value-aware heap layout.

One of the conclusions of the LW1 experiment was that the JIT very much wants better nullity information than this; without enhancing our ability to prove that a given use of an L-type is null-free, we cannot fully optimize calling conventions.


       Erased generics in LW1

One of the motivations for L-World was that using L-carriers for values would likely better interoperate with generics by providing a common carrier and by allowing the |a*| bytecodes to operate uniformly on both values and references. However, the assumption that a type variable |T| is nullable interacts poorly here; given that values are proper L-types in LW1, it seems tempting to allow users to parameterize erased generics with values:

|List<Point> points = new ArrayList<Point>(); |

with the compiler translating |Point| as |LPoint|. Existing erased generics would “just work” … mostly. However, there are several sharp edges:

 * There are some API points that deliberately use nulls as sentinels,
   such as returning |null| from |Map::get| to signal that the key is
   not in the map. This would then NPE in the compiler-inserted null
   check when we tried to assign the result of |get(k)| to a
   value-typed |V|.
 * Some generic classes may accidentally try to convert the default
   (null) value of an uninitialized |Object| field or |Object[]| array
   element to a |T|, which would again NPE when it crossed the boundary
   from erased generic code to value-aware code.
 * If value arrays are subtypes of |Object[]|, and a |V[]| is passed to
   code that expects an |Object[]|, attempt to store a null in that
   array would NPE.


       The return of the Q

MVT had explicit L-types and Q-types; LW1 has only L-types, relying on the |ValueTypes| attribute to determine whether a given L-type describes a value or not.

In LW2, we will back off slightly from this unification, so as to provide the VM with end-to-end information about the flow of values and their potential nullity; for a given value class |V|, one can denote both the non-nullable type |QV;| and the nullable type |LV;|, where |QV <: LV|. The value set of |LV| is that of |QV|, plus |null|; both share the |L| carrier. No conversion is needed from |QV| to |LV|; |checkcast| and |instanceof| perform a null check when converting from |LV| to |QV|. |Q*| fields and array elements will be flattenable; |L*| will not. (As a side benefit, the |ValueTypes| attribute is no longer needed, as descriptors fully capture their nullability constraints.)

This gives language compilers some options; we can translate uses of a value type to either the |L| or |Q| variants. Of course, we don’t want to blindly translate value types as L-types, as this would sacrifice the main goal (flattenability), but we could use them where values meet erased generics.


   Meet the new box (not the same as the old box)

Essentially, the L-value types can be thought of as the “new boxes”, serving the interop role that primitive boxes do. (Fortunately, they are cheaper than primitive boxes; the boxing conversion is a no-op, and the unboxing conversion is a null check.)

Just as the JVM wants to be able to separately denote “non-nullable value” and “nullable value”, so does the language. In general, we want for values to be non-nullable, but there are exceptions:

 * When dealing with erased generic code, since an erased type variable
   |T| is nullable;
 * When dealing with legacy code involving a value-based class that is
   migrated to a value type, since existing code may treat it as nullable.

So, let’s say that a value class |V| gives rise to two types: |V.Val| and |V.Box|. The former translates to |QV|; the latter to |LV|. The former is non-nullable; the latter is nullable. And there exists a boxing/unboxing conversion between them, just like with |int| and |Integer| — but in this case, the cost of the “boxing” conversion is much lower.


       Erased generics over boxes

Now, erased generics fall out for free: we just require clients to generify over the box type. This is no different from how we deal with primitives today — generify over the box. “Works like an int.”

|ArrayList<Integer> ints = new ArrayList<>(); ArrayList<Point.Box> points = new ArrayList<>(); |

Since |V.Box| is nullable, we have no problem with returning null from |Map::get|.


       Migration considerations

Nullability also plays into migration concerns. A baseline goal is that migrating a value-based class to a value type, or migrating an erased generic class to specialized, should be source and binary compatible. That means, we don’t want to perturb the meaning of |Foo<V>| in clients or subtypes when either |V| or |Foo| migrates.

For existing value-based classes, such as |LocalDate|, there are plenty of existing locutions such as:

|LocalDate d = null; if (d == null) { ... } ArrayList<LocalDate> dates = ... dates.add(null); |

If we want migration of |LocalDate| to a value type to be source compatible, then this constrains us to translate |LocalDate| to |LLocalDate| forever. This suggests that |LocalDate| should be an alias for |LocalDate.Box|; otherwise the meaning of existing code would change.

On the other hand, for a newly written value type which was never a VBC, we want the opposite. If |Point| is not an alias for |Point.Val|, users will have to say |Point.Val| everywhere they want flattening, which is cumbersome and easy to forget. And since flattening is the whole point of value types to begin with, this seems like it would be letting the migration tail wag the dog.

Taken together, this means we want some sort of contextual decision as to whether to interpret |Foo| as |Foo.Box| or |Foo.Val|. This could be based on the provenance of |Foo| (was it migrated from a VBC or not), or could be some sort of aliased import (|import Foo as Foo.Val|).

The declaration-site approach seems preferable to me; it gives the author of the class the choice of which face of their class to present to the world. Classes for which migration compatibility is of primary concern (e.g., |Optional|) get compatibility at the cost of biasing towards boxing; those for which flattening is of primary concern (e.g., |Complex|) get flattening at the cost of compatibility. In this approach, we put the pain on clients of migrated classes — they have to take an extra step to get flattening. In the long run, there will likely be more born-as-value classes than migrated-to-value classes, so this seems the right place to put the pain.

Note that the |Box| syntactic convention scales nicely to type variables; we can write specialized generic code like:

|<T> T.Box box(T t) { } |

Whatever syntactic convention we use (e.g., |T?|) would want to have similar behavior. (Another consideration in the choice of denotation is the number of potential type operators we may need. Our work in specialized generics suggests there may be at least a few more coming.)


       Primitives as values — a sketch

We would like a path to treating primitives and values uniformly, especially as we get to specialized generics; we don’t want to have deal with the 1-slot vs 2-slot distinction when we specialize, nor do we want to deal with using |iload| vs |aload|.

We can extend our lightweight boxing approach to allow us to heal the primitive/value divide. For each of our primitive types, we hand-code (or generate) a value class:

|value class IntWrapper { int x; } |

We introduce a bidirectional conversion between |int| and |IntWrapper|; when the user goes to generify over |int|, we instead generify over |IntWrapper|, and add appropriate conversions at the boundary (like the casts we currently insert in erased generics.) We can then translate |int.Box| as |LIntWrapper;|, and we can support erased generics over the lighter value-boxes rather than the heavy legacy boxes.

Unfortunately, now we have three types that perform the role of boxes: the new value wrappers like |IntWrapper|, their lightweight box type |IntWrapper.Box|, and the legacy heavy box |java.lang.Integer|. To keep our boxes straight, we could call them:

 * Box — the legacy heavy box (|java.lang.Integer| and friends)
 * Lox — the new lightweight value boxes (L-types of value classes)
 * Pox — the primitive wrapper value classes

So |X.Box| denotes the lox for values (and probably X itself for reference classes), and the lox-of-a-pox for primitives (making it total.) When we get to specialized generics, when we instantiate a generic class with a primitive type, we silently wrap them with their pox on the way in, which is a value, and we’ve reduced generics-over-primitives to generics-over-values. This is a huge complexity reducer for the specializer, as it need not deal with the fact that long and double take two slots, or with changing a* bytecodes to the corresponding primitive bytecodes.

It is an open question how aggressively we can deprecate or denigrate the legacy boxes (probably not much, but hope springs eternal.)


   Open issues

There were a few issues we left for further study; more on these to follow.

*Array covariance*. There was some degree of discomfort in pushing array covariance into |aaload| now. When we have specialized generics, we’ll be able to handle this through interfaces; it seems a shame to permanently weigh down intrinsic array access with potential megamorphism. We’re going to try to avoid plunking for array covariance now, and see how painful that is.

*Equality.* There was some discomfort with the user-model consequences of disallowing |==| on values. It is likely that we’d translate |val==| as a substitutibility test; if we’re going to do that, it’s not obvious whether we shouldn’t just lump this on |acmp|.

*Locking.* The same generics-reuse arguments as we made for nullability support also could be applied to locking on loxes. No one really likes the idea of supporting locking here, but just as surprise NPEs were sharp edge, surprise IMSEs might be as well.

*Construction.* We have not yet outlined either the language-level construction constraints or the translation of constructors to bytecode.

Reply via email to