Here is a write-up giving an option for nullable value types. This is what we talked about this morning. I had hoped to produce it before the meeting—better late than never.
http://cr.openjdk.java.net/~jrose/values/nullable-values.html # Nullable Value Types in L-World #### Or, "Read my lips: No new nulls" #### Or, "Bottom-lander: There can be only one" #### John Rose, Brian Goetz, Valhalla Working Group ## Basic Premises **Null is the default reference:** In Java, all Java reference types have a common default value, the null reference. This reference appears in uninitialized array elements and field values of that type. The null reference is distinct from any reference that is produced by a `new` expression and/or a constructor call. **A problem with legacy value classes:** In order for today's value-based classes to migrate to reference types proper value types, they must retain the property that their default value is the null reference, since the null reference may appear in user code that uses such classes. **Nullable value classes:** This implies that _some_ value types require the ability to represent a true null reference, as one of the many points in their set of possible values. (This _does not_ imply that any other value of such a type must _also_ be a reference: All other values of the type can and should be proper values.) **Nullability is rare:** Most value types, such as complex numbers or vectors, must _not_ represent the null reference. In particular, arithmetic value types of size `N` bits may need to assign all `2**N` code points to regular non-null values. For example, a type that emulates `byte` could not give up one of its 256 encodings to encode `null`, and adding an extra hidden bit _to all value types_ would have very large costs. In addition, most value types "work like an int", and require a default value which can accept method calls without throwing `NullPointerExceptions`. **Not the default:** Nullability must be explicitly selected by the designer of a value type; it is expected to be a rarely used feature, because it is likley to incur extra costs, in space and time, for encoding and decoding the null reference to and from the flattened form of a value class. **Tweaking the key slogan:** In summary, value types "code like a class and work like an int". But there are a few value types that _also_ want to "work like an `Integer`", in that their default value is the null reference, rather than an appropriate pattern of zero bits. ## User Model **New keyword:** A value class can be declared with a pseudo-modifier `__Nullable`, (TBD, may be just `nullable`) which must be accompanied by the `value` pseudo-modifier. Such a value class is called a _nullable value class_. Other value classes are called _regular value classes_. **Two variable kinds:** Variables come in two kinds, _heap_ and _stack_. A heap variable is a class field (static or non-static) or an array element. A stack variable is a method parameter or local variable. (Local variables include specially declared names such as a `catch` variable.) Heap variables exhibit type-specific default values. Stack variables do not require a default value convention, because they are subject to definite assignment rules, which require explicitly assigned values. **Default values:** For a regular value class `RV`, the expression `RV.default` evaluates to a non-null value of `RV` all of whose fields are their respective default values (typically zero, `false`, or `null`). The uninitialized value of a heap variable of type `RV` or `RV.val` is `RV.defalt`. The uninitialized value of a heap variable of type `RV.box` is `null`, and `RV.box.default` evaluates to `null`. **Nullable defaults to null:** For a nullable value class `NV`, the expression `NV.default` evaluates to `null` (the null reference). The uninitalized value of a heap variable of type `NV`, `NV.val`, or `NV.box` is `null`. **Regular values never null:** Regular value classes can never represent `null` values in their normal unboxed form. (In this, they "work like an int".) Casting a null to a regular value class will throw a `NullPointerException`, just like casting a null `Integer` to `int`. Reflectively storing an untyped null reference to a heap variable of a regular value type will also throw a `NullPointerException`. Loading a heap variable of a regular, non-boxed value class will never produce a null. **Constructors are null-checked:** In order to keep a clear distinction between the null default value and constructed values, value class constructors have a null check on exit. This means that if constructor code accidentally assigns zero or default values to all the instance fields, the constructor will throw `NullPointerException` rather than return the null instance value. No such check is done for regular value classes, since null values are impossible for them. **All values are flattenable:** In heap variables, instances of both regular and nullable value classes behave as if they were flattened, and are in fact routinely flattened. This is likely to affect the performance and footprint of programs which use such variables. In stack variables, flattening may or may not happen, depending on how the interpreter or the JIT is directing execution. **All values are boxable:** All value classes support a "boxed" view which interoperates with `null` and with erased generics. The expression `V.box.default` evaluates to `null` for all value types, both regular and nullable. Heap variables of type `V.box` are _never_ flattened, but as a consolation prize they _can_ receive nulls even for regular `V`. For generics, note that `List<V.box>` is always legal, but `List<V.val>` is currently illegal and reserved for future use, when specialized generics are available. Note that the unadorned value type name `V` usually denotes the same type as `V.val`, but we reserve the right to have some occurrences of `V` for certain types to denote `V.box` instead. (This is TBD; perhaps it is part of the migration package for nullable classes.) ### Observations and Fine Print **Null stores as vull:** When storing a `null` to a heap variable of a nullable value class, the JVM will reset all (non-static) fields of that variable to their default values. On the heap, a logically null flat value is called a _flattened value null_, or "vull" for short. **Vull loads as null:** When loading a value from a heap variable of a nullable value class, the JVM will detect "vull" and convert it to a proper null reference. **Vull is a ghost:** Thus, for a nullable value class `NV`, it is impossible to create or observe on stack a non-null instance of `NV` for which all fields of the instance are default. This means that "vulls" are confined to the heap. The JVM enforces this as a low-level invariant, by dynamically transcoding between on-heap "vulls" and on-stack nulls. **Pivot fields:** As a "pro move", a nullable value class can declare that one or more of its non-static fields with the `__NullablePivot` keyword (TBD). When detecting "vulls", the JVM consults only such marked fields for their default value, not all fields of the instance. This may makes "vull" detection faster for legacy classes like `LocalDate`. Such a specially marked field (or fields) may be called a _pivot field_, since the task of "vull" detection "pivots" around that field. By default, if no fields are marked as pivot fields, then in effect all of them serve as pivot fields. **Null stops bad calls:** It is arguable that the most legitimate job of `null` is to avoid executing a method call on a receiver which has not yet been specified. After all, objects do not always have reasonable default values, and so Java (and the JVM) assigns a "default default" value of `null` to object variables that are not otherwise initialized. The null value ensures that if buggy code tries to call a method on an uninitialized variable, an exception will be thrown immediately, rather than executing a method body on an unexpected input. (In this view, field gets are the same as method calls. Other uses of nulls, such as a API sentinel values, were created by creative programmers, who given a hammer will always find more nails.) For a value type without a reasonable default value, programmers have a right to a similar sentinel value which prevents method execution on uninitialized variables. But for a value _with_ a reasonable default value, such machinery would be pure annoyance. Only the designer of the value class knows which case is true. **Inner value classes:** Any non-static nested ("inner") value class `C.IV` must also be declared to be nullable. The reason for this is that every properly constructed instance of `C.IV` must specify a non-null outer instance of type `C`. But if `C.IV` were regular and a method were called on the default value `C.IV.default`, then that method then it would not be able to observe a definite non-null value `C.this`. Such a method call would be inescapably broken. Thus, such method calls must be prevented. The existing language achieves this result by throwing `NullPointerException` on when invoking methods on the default value of `C.IObj`, for an object class `I.IObj`. To preserve this behavior, an inner value class `C.IV` must also present a null default value. This restriction does not apply to static nested value classes `C.NV`. **A slogan:** The slogan for this user model of nullable value types is "no new nulls". It refuses to introduce new "work-alikes" for the null reference. There is no new `NullValueException` which pairs with `NullPointerException`. There is no `Nullable` interface. There not an `isNull` method, certainly not a user-definable one. (An `isNull` method could never return a `false` result, could it? It would have to throw `NullPointerException` instead!) There are no directly observable "vull" values to compete for the throne of `null`; "vulls" are only indirectly observable in the flattening of certain heap variables. In short, the heavy cost of nulls (arguably a "billion dollar mistake") is not multiplied by a new set of null-like values. And the historic cost of nulls is not pushed forward to new value types that don't request it, such as arithmetic types. **Another slogan:** Alternatively, the slogan from "Highlander" applies: There is only ever one `null`. Any would-be "vull" value is dissected from the value space and conjoined to `null` just as soon as it tries to enter the stack. ## Implementation **Affected bytecodes:** At the bytecode level, the instructions `getfield`, `putfield`, `getstatic`, `putstatic`, `withfield`, `aaload`, and `aastore` must transcode between "vull" and proper null. The `defaultvalue` bytecode must not produce "vull". **Null containers rejected:** If a `getfield` instruction is asked for an instance field `NV.f` of a value class `NV`, and if the on-stack value is `null`, then `NullPointerException` is thrown. There is no conversion of the containing instance to a "vull". This is true regardless of the type of the field `NV.f`. If the on-stack value is non-null, and `NV.f` is a "vull", then transcoding occur as usual. This means that the sequence `defaultvalue V; getfield V.f:T` is equivalent to `defaultvalue T` only if `V` is a normal value type. (It must also possess a non-static field `f` of type `T`.) If `V` is nullable, then the `getfield` instruction will throw. **Withfield transcodes twice:** The `withfield` instruction must transcode on both input and output. It must convert a null input value to a temporary "vull", one of whose fields is then updated. It must then detect whether the resulting value is a "vull" and convert that (and only that) back to a null. Unlike `getfield` and `putfield`, `withfield` does not reject a `null` container value. For example, it will produce a null result value if asked to store a default value to a field in an instance where all of the other fields are already set to default values. **Unaffected bytecodes:** Instructions which operate only on stacked or local values do not need further modification to detect "vulls", since "vulls" are never on stack. Receiver null checks for `invokevirtual` and its siblings are unchanged; these instructions will never encounter `vull` values. The `acmp`, `checkcast`, and `instanceof` instructions (and the `aastore` store check) already have special semantics for null references which are unchanged. **Transcoding in field instructions:** For the field instructions, transcoding is a reasonable incremental cost to add, since these instructions resolve their field and therefore know the specific field type; thus the cost of adding transcoding between "vull" and `null` is incremental and added only for fields whose types require this extra step. **Transcoding in array instructions:** Array element access instructions first check the layout of the target array element and then use the proper sequence of steps to convert from a flattened array element (if present) to a regular on-stack reference. As part of this sequence of steps, if the element type is a nullable, flattened value type, "vull" must be detected on load and produced on store, corresponding to an on-stack null reference. **Reflection, etc.:** Access to fields and array elements via reflection, method handles, or JNI is defined in terms of the behaviors of bytecodes, as usual. Thus, reflectively loading or storing a value instance must include a transcoding step exactly when transcoding is required by the corresponding bytecode instruction. **Variable declaration:** The bytecode-level descriptor for a flattenable variable of a value type has the form of a _Q-descriptor_, which begins with the letter "Q" instead of the letter "L" normally used with class-based types. Variables which hold a boxed value are introduced with _L-descriptors_ beginning with "L", like any other reference type. In the setting of the JVM type system, L-descriptors and Q-descriptors denote _L-types_ and _Q-types_. Q-types and L-types roughly correspond to user-level `V.val` and `V.box` types, respectively. Again, in the setting of JVM types (only) we say a value of a Q-type or L-type is a _Q-value_ or _L-value_. **Layout includes nullability:** The nullability of a value class `V` is logically a part of `V`'s overall layout, its size and the format of its fields. This is because layout is the information that dictates the JVM's exact steps when loading or storing a flattened value. Since these steps necessarily include a "vull" check when the value class is nullable, layout includes nullability. **Q-values in the heap:** Q-types are introduced in the heap as part of a class declaration, or when an array type derived from the Q-type is mentioned. The JVM consults the layout of a Q-type `Q-V` when it lays out an instance field of type `Q-V`, or prepares a static field of type `Q-V`, or computes the layout of an array whose element type is `Q-V`. In all cases, the class declaration of `V` is loaded if necessary, and the layout of `V` is consulted, to determine the steps needed to load or store the Q-value. **Q-values on the stack:** Q-types are introduced on the stack as part of method, field, or array type descriptors, or as `checkcast` targets. For nullable value types, both Q-types and L-types can carry null values, while for regular value types, Q-types cannot carry null. Thus, a `checkcast` to a "Q-type" will throw `NullPointerException` if presented at runtime with a null reference on the stack, but only if the referenced type is regular (not nullable). **Verifier rules:** The verifier tracks the distinction between Q-types and L-types, and specifically ensures that an on-stack L-value is never consumed by an instruction which expects a Q-value, if the L-type accepts null but the Q-type does not. If the Q-type is nullable, implicit conversions are logically permissible, and the verifier should allow them. Given the nullable and regular value types `NV` and `RV`, it follows that `Q-NV` is a proper subtype of `L-NV`, but `Q-RV` and `L-RV` can be treated as the same type, since they have the same set of on-stack values. **Verifier rules for supers:** If `C` is a super (class or interface) of `NV` or (respectively) `RV`, then `L-C` is a proper supertype of `Q-NV` and `L-NV`, or (respectively) `Q-RV` and `L-RV`. Note that supertypes of value types are always nullable. Thus, there is no need to distinguish between Q-types and L-types when converting to supers; if there is a null it will be welcome in the supertype. **Optimized calling sequences:** The JIT may elect to use `vull` values for non-receiver parameters or return values, as an alternative to buffering via a nullable indirection. Such calling sequences must be made invisible to the end-user by ensuring that "vull" parameters are transcoded (detected and converted to nulls) as needed, and vice versa on return. Such transcoding operations seem to be reorderable and trackable much like `null` detection is at present. Thus, "vull" transcoding is thought to be optimizable as a straightforward extension to today's JITs. **Constructor translation:** A value class constructor starts with the default value of its class, and builds up the value by assigning to its fields. The rules of the Java language (for final fields and value instance fields) ensure that each field is assigned once and only once. The tracking of assignment along all paths uses a pair of conditions called "definite assignment" and "definite unassignment". On every normal exit from a constructor, each field must be definitely assigned and not definitely unassigned. The JVM has no such rules for tracking assignment at bytecode boundaries. Instead, constructors are allowed to write to final fields any number of times. For values, the corresponding rule is that `withfield` is allowed to assign to an uninitialized value any number of times. In order to prevent null values from escaping from a value class constructor, the compiler must precede each return instruction by a null check. (This can be done with a call to `Objects.requireNonNull` or `Object.getClass`.) Optionally (TBD) the JVM could perform this check automatically. **Non-throwing getter:** Optionally (and TBD), the JVM may choose to define `getfield` on a Q-type container to transcode the container to "vull". Most uses of `getfield` would use the regular L-type container, but this variation would give a "hook" for translation strategies that need to operate on the fields of a possibly uninitialized value. This could happen, for example, inside a constructor. The class component of the `CONSTANT_Fieldref` of such an instruction would be Q-descriptor, rather than the name of a class. Note that, in the JVM, unadorned class names usually denote L-types, not Q-types, so directing a `getfield` instruction to a Q-type container is an unusual step.
