Values and erased generics

Brian Goetz Fri, 05 Oct 2018 08:17:17 -0700

Here’s a summary of the story we came up with for erased generics overvalues. It builds on the typing story outlined in John’s “Q-Types inL-World” writeup.


   Background

In MVT, there were separate carrier types (Q and L) for values andreferences. The Q carriers were not nullable, and an explicit conversionwas required between L and Q types. This offered perfect nullityinformation to the JVM, but little ability to abstract over both valuesand references. This showed up in lots of places:


 * Values themselves could not implement interfaces, only their
   companion box type could.
 * Could not operate on values with |a*| instructions.
 * Could not store values in |Object| variables.

Each of these conflicted with the desire for genericity (whetherspecialized or erased). In Q-world, we couldn’t have erased genericsover values, because erased code could not operate on values formultiple reasons (wrong carriers, wrong bytecodes). And looking ahead tospecialized generics, having separate bytecodes for references andvalues increased the complexity of the specialization transform.


L-World started out as an experiment to validate the following hypothesis:

   We can, without significant performance compromises, reuse the |L|
   carrier and |a*| bytecodes for value types, and allow them to be
   proper subtypes of their interfaces (and |Object|).

In LW1, we took a hybrid approach; nullability and flattenability becomeproperties of the variable (field, array element, stack slot, localvariable), rather than a property of the type itself. This means wecould be tolerant of nulls in local variables and stack slots, onlyrejecting nulls when they hit the heap, and we then relied on thetranslation strategy to insert null checks to prevent introduction ofnulls. This allowed value-oblivious code to act as a “conduit” forvalues between value-aware code and a value-aware heap layout.

One of the conclusions of the LW1 experiment was that the JIT very muchwants better nullity information than this; without enhancing ourability to prove that a given use of an L-type is null-free, we cannotfully optimize calling conventions.



       Erased generics in LW1

One of the motivations for L-World was that using L-carriers for valueswould likely better interoperate with generics by providing a commoncarrier and by allowing the |a*| bytecodes to operate uniformly on bothvalues and references. However, the assumption that a type variable |T|is nullable interacts poorly here; given that values are proper L-typesin LW1, it seems tempting to allow users to parameterize erased genericswith values:


|List<Point> points = new ArrayList<Point>(); |

with the compiler translating |Point| as |LPoint|. Existing erasedgenerics would “just work” … mostly. However, there are several sharpedges:


 * There are some API points that deliberately use nulls as sentinels,
   such as returning |null| from |Map::get| to signal that the key is
   not in the map. This would then NPE in the compiler-inserted null
   check when we tried to assign the result of |get(k)| to a
   value-typed |V|.
 * Some generic classes may accidentally try to convert the default
   (null) value of an uninitialized |Object| field or |Object[]| array
   element to a |T|, which would again NPE when it crossed the boundary
   from erased generic code to value-aware code.
 * If value arrays are subtypes of |Object[]|, and a |V[]| is passed to
   code that expects an |Object[]|, attempt to store a null in that
   array would NPE.


       The return of the Q

MVT had explicit L-types and Q-types; LW1 has only L-types, relying onthe |ValueTypes| attribute to determine whether a given L-type describesa value or not.

In LW2, we will back off slightly from this unification, so as toprovide the VM with end-to-end information about the flow of values andtheir potential nullity; for a given value class |V|, one can denoteboth the non-nullable type |QV;| and the nullable type |LV;|, where |QV<: LV|. The value set of |LV| is that of |QV|, plus |null|; both sharethe |L| carrier. No conversion is needed from |QV| to |LV|; |checkcast|and |instanceof| perform a null check when converting from |LV| to |QV|.|Q*| fields and array elements will be flattenable; |L*| will not. (As aside benefit, the |ValueTypes| attribute is no longer needed, asdescriptors fully capture their nullability constraints.)

This gives language compilers some options; we can translate uses of avalue type to either the |L| or |Q| variants. Of course, we don’t wantto blindly translate value types as L-types, as this would sacrifice themain goal (flattenability), but we could use them where values meeterased generics.



   Meet the new box (not the same as the old box)

Essentially, the L-value types can be thought of as the “new boxes”,serving the interop role that primitive boxes do. (Fortunately, they arecheaper than primitive boxes; the boxing conversion is a no-op, and theunboxing conversion is a null check.)

Just as the JVM wants to be able to separately denote “non-nullablevalue” and “nullable value”, so does the language. In general, we wantfor values to be non-nullable, but there are exceptions:


 * When dealing with erased generic code, since an erased type variable
   |T| is nullable;
 * When dealing with legacy code involving a value-based class that is
   migrated to a value type, since existing code may treat it as nullable.



       Erased generics over boxes

Now, erased generics fall out for free: we just require clients togenerify over the box type. This is no different from how we deal withprimitives today — generify over the box. “Works like an int.”

|ArrayList<Integer> ints = new ArrayList<>(); ArrayList<Point.Box> points= new ArrayList<>(); |

Since |V.Box| is nullable, we have no problem with returning null from|Map::get|.



       Migration considerations

Nullability also plays into migration concerns. A baseline goal is thatmigrating a value-based class to a value type, or migrating an erasedgeneric class to specialized, should be source and binary compatible.That means, we don’t want to perturb the meaning of |Foo<V>| in clientsor subtypes when either |V| or |Foo| migrates.

For existing value-based classes, such as |LocalDate|, there are plentyof existing locutions such as:

|LocalDate d = null; if (d == null) { ... } ArrayList<LocalDate> dates =... dates.add(null); |

On the other hand, for a newly written value type which was never a VBC,we want the opposite. If |Point| is not an alias for |Point.Val|, userswill have to say |Point.Val| everywhere they want flattening, which iscumbersome and easy to forget. And since flattening is the whole pointof value types to begin with, this seems like it would be letting themigration tail wag the dog.

Taken together, this means we want some sort of contextual decision asto whether to interpret |Foo| as |Foo.Box| or |Foo.Val|. This could bebased on the provenance of |Foo| (was it migrated from a VBC or not), orcould be some sort of aliased import (|import Foo as Foo.Val|).

The declaration-site approach seems preferable to me; it gives theauthor of the class the choice of which face of their class to presentto the world. Classes for which migration compatibility is of primaryconcern (e.g., |Optional|) get compatibility at the cost of biasingtowards boxing; those for which flattening is of primary concern (e.g.,|Complex|) get flattening at the cost of compatibility. In thisapproach, we put the pain on clients of migrated classes — they have totake an extra step to get flattening. In the long run, there will likelybe more born-as-value classes than migrated-to-value classes, so thisseems the right place to put the pain.

Note that the |Box| syntactic convention scales nicely to typevariables; we can write specialized generic code like:


|<T> T.Box box(T t) { } |

Whatever syntactic convention we use (e.g., |T?|) would want to havesimilar behavior. (Another consideration in the choice of denotation isthe number of potential type operators we may need. Our work inspecialized generics suggests there may be at least a few more coming.)



       Primitives as values — a sketch

We would like a path to treating primitives and values uniformly,especially as we get to specialized generics; we don’t want to have dealwith the 1-slot vs 2-slot distinction when we specialize, nor do we wantto deal with using |iload| vs |aload|.

We can extend our lightweight boxing approach to allow us to heal theprimitive/value divide. For each of our primitive types, we hand-code(or generate) a value class:


|value class IntWrapper { int x; } |


 * Box — the legacy heavy box (|java.lang.Integer| and friends)
 * Lox — the new lightweight value boxes (L-types of value classes)
 * Pox — the primitive wrapper value classes

So |X.Box| denotes the lox for values (and probably X itself forreference classes), and the lox-of-a-pox for primitives (making ittotal.) When we get to specialized generics, when we instantiate ageneric class with a primitive type, we silently wrap them with theirpox on the way in, which is a value, and we’ve reducedgenerics-over-primitives to generics-over-values. This is a hugecomplexity reducer for the specializer, as it need not deal with thefact that long and double take two slots, or with changing a* bytecodesto the corresponding primitive bytecodes.

It is an open question how aggressively we can deprecate or denigratethe legacy boxes (probably not much, but hope springs eternal.)



   Open issues

There were a few issues we left for further study; more on these to follow.

*Array covariance*. There was some degree of discomfort in pushing arraycovariance into |aaload| now. When we have specialized generics, we’llbe able to handle this through interfaces; it seems a shame topermanently weigh down intrinsic array access with potentialmegamorphism. We’re going to try to avoid plunking for array covariancenow, and see how painful that is.

*Equality.* There was some discomfort with the user-model consequencesof disallowing |==| on values. It is likely that we’d translate |val==|as a substitutibility test; if we’re going to do that, it’s not obviouswhether we shouldn’t just lump this on |acmp|.

*Locking.* The same generics-reuse arguments as we made for nullabilitysupport also could be applied to locking on loxes. No one really likesthe idea of supporting locking here, but just as surprise NPEs weresharp edge, surprise IMSEs might be as well.

*Construction.* We have not yet outlined either the language-levelconstruction constraints or the translation of constructors to bytecode.

Values and erased generics

Reply via email to