JVM alternatives for supporting nullable value types

Dan Smith Wed, 12 Sep 2018 16:47:17 -0700

For LW10, one of our goals is to support interactions between value types and 
erased generics by having some form of a nullable value type.


The needs of the language factor heavily into the JVM design. We're not ready 
to commit to language-level details, but it's likely that the language will 
support nullable and non-nullable variations of the types declared by value 
classes; and these variations will probably be supported in most places that 
types can appear.

More generally, the language may support up to three different flavors of 
nullability on some or all types:
- null-free: a type that does not include null (could be spelled Foo!)
- null-permitting: a type that allows but ignores nulls (could be spelled Foo~)
- null-checked: a type that allows and checks for nulls (could be spelled Foo?)

(Please note that this is placeholder syntax. There are lots of ways to map 
this to real syntax. Unadorned names will map to one of these; it's possible 
that migrating a class to be a value class will change the interpretation of 
its unadorned name.)

Null-permitting and null-checked types are both "nullable"; the difference is 
in how strongly the compiler enforces null checks. ("Null-permitting" is the 
existing behavior for types like 'String'; "null-checked" is the style that 
requires proof that nulls are absent before dereferencing.)

The other important concept from the language is conversions:
- A widening conversion (or something similar) supports treating a value of a 
null-free type as null-permitting or null-checked
- A "null-free conversion" is required to go in the opposite direction, and 
includes a runtime null check
- A "nullability conversion", like an unchecked conversion, might allow other 
forms of conversions between types involving different nullabilities, including 
in their type arguments or array component type.

Turning to the JVM with those language-level concepts in mind, I've put 
together the following summary of four main designs we've considered. The goal 
here is not to reach a conclusion about which path is best, but to make sure 
we're accurately considering all of the implications in each case.


Nullable value types, null-free storage
---------------------------------------

In this approach, we use regular L types to represent value types, and these 
types are nullable. Fields and arrays, via some sort of modifier, may choose to 
be nullable or null-free.


JVM implications

- Need a mechanism (new opcode?) to indicate that an array allocation is 
null-free
- The default value of a field/array depends on whether the "null-free" 
modifier is used
- Fields and arrays that are marked null-free can, of course, be flattened
- Stack variables and method parameters/returns may always be null
- A putfield, putstatic, or aastore may fail with an NPE (or maybe ASE)
- JIT can optimistically assume no nulls and scalarize, but must check and 
de-opt when a null is encountered
- The "null-free" modifier is only allowed with value class types, and must be 
validated early (e.g., to decide on field layout)


Compilation strategy

Val? maps to LVal;
Val~ maps to LVal;
Val! maps to LVal;

The nullability of the type in a field declaration or array creation expression 
determines whether the "null-free" modifier is used or not.

Nullability conversions are no-ops; null-free conversions are either compiled 
to explicit null checks or are implicit in a invoke*/getfield/putfield.


Language implications

- Null-free value types typically get flattened storage and scalarized 
invocations
- Array store runtime checks may include a null check
- Methods may not be overloaded on different nullabilities of the same type
- Null-free parameters/returns may be polluted with nulls due to inconsistent 
compilation or non-Java interop—detected with an NPE on storage or dereference
- A conversion from Val~[] to Val![] could be supported, but the result would 
not perform the expected runtime checks


Migration implications

- Refactoring a class to be a value class is a binary compatible change (except 
where this involves incompatible changes like removing a public constructor); 
before recompilation (which may reinterpret some unadorned names), treatment of 
nulls does not change
- Changing the nullability of a type is a binary compatible change; library 
clients who expect nullable storage may see surprising NPEs or ASEs



Always null-free value types
----------------------------

In this approach, we use regular L types to represent value types, and these 
types are null-free. Non-value L types continue to be nullable. A use-site 
attribute tracks which class names represent value classes; validation lazily 
ensures consistency with the declaration.


JVM implications

- Fields, arrays, and method parameters and returns with value class types can 
be flattened/scalarized
- The 'null' verification type is not a subtype of any value class types
- Casts to value class types must fail on 'null' (CCE or NPE)
- At method preparation, field/method resolution, and class loading, a check 
similar to class loader constraints ensures that classes agree on value classes 
in the descriptor
- Various other vectors for getting data into the JVM should prevent nulls, or 
have contracts that allow crashing, etc., if data is corrupted
- Classes in the value classes attribute are allowed to be loaded early (e.g., 
to decide on field layout)
- If the value classes attribute does not mention a value class, it's possible 
for variables/fields of that type to be null, but an error will occur when an 
attempt is made to load the class or resolve against a class that disagrees


Compilation strategy

Val? maps to Ljava/lang/Object;
Val~ maps to Ljava/lang/Object;
Val! maps to LVal;

Every referenced value class is listed in the value classes attribute.

Nullability conversions are no-ops; null-free conversions are compiled to 
checkcasts (even for member access). Casts that target Val?/Val~ compile to a 
checkcast guarded by a null check, where null always succeeds.


Language implications

- Null-free value types typically get flattened storage and scalarized 
invocations
- Array store runtime checks may include a null check
- Val~[] and Val?[] do not perform array store checks at all—any Object may end 
up polluting these arrays (creating arrays of these types might be treated as 
an error, like T[])
- Val~ and Val? are overloading-hostile: their use in signatures conflicts with 
Object and all other null-permitting/null-checked value types
- Null-permitting/null-checked value type parameters and returns may be 
polluted with other types due to inconsistent compilation or non-Java 
interop—detected with a CCE on null-free conversion
- A conversion from Val~[] to Val![] cannot be allowed


Migration implications

- Refactoring a class to be a value class is a binary incompatible change due 
to inconsistent value class attributes
- Changing from a null-permitting/null-checked to null-free type (or vice 
versa) is a binary incompatible change unless there's some form of support for 
type migrations



Null-free types with new descriptors
------------------------------------

In this approach, we use regular L types to represent nullable value types, and 
introduce other types (spelled, say, with a "K") to represent null-free value 
types. K types are subtypes of L types, and casts can be used to convert from L 
to K.


JVM implications

- Descriptor syntax needs to support 'K'
- To support K casts, we need ClassRefs that indicate K-ness, a new opcode, or 
some other mechanism
- Fields, arrays, and method parameters and returns with K types can be 
flattened/scalarized
- The 'null' verification type is not a subtype of K types
- Casts to K types must fail on 'null'
- Various other vectors for getting data into the JVM should prevent nulls, or 
have contracts that allow crashing, etc., if data is corrupted
- Classes named by K types are allowed to be loaded early (e.g., to decide on 
field layout)


Compilation strategy

Val? maps to LVal;
Val~ maps to LVal;
Val! maps to KVal;

Nullability conversions are no-ops; null-free conversions are either compiled 
to explicit casts or are implicit in an invoke*/getfield/putfield.


Language implications

- Null-free value types typically get flattened storage and scalarized 
invocations
- Array store runtime checks may include a null check
- Methods may be overloaded with a null-free type vs. a 
null-permitting/null-checked type (but null-permitting vs. null-checked is not 
allowed)
- Pollution of null-free variables or arrays is impossible
- A conversion from Val~[] to Val![] cannot be allowed


Migration implications

- Refactoring a class to be a value class is a binary compatible change (except 
where this involves incompatible changes like removing a public constructor); 
before recompilation (which may reinterpret some unadorned names), treatment of 
nulls does not change
- Changing from a null-permitting/null-checked to null-free type (or vice 
versa), is a binary incompatible change unless there's some form of support for 
type migrations



Nullability notations on types
------------------------------

In this approach, we use regular L types to represent value types, and these 
types are nullable by default. To indicate that a particular field, array, or 
parameter/return is null-free, some form of side notation is used. 
(Deliberately using the word "notation" rather than "annotation" or "modifier" 
here to avoid committing to an encoding.)

This is similar to "nullable value types, null-free storage", except that the 
null-free notation can be used on method parameters/returns.

This is similar to "always null-free value types", except that instead of 
tracking value classes in each class file, we track null-free value types per 
use site.

This is similar to "null-free types with new descriptors", except that the 
notations are not part of descriptors and don't require any explicit 
conversions—they are not part of the verification type system.


JVM implications

- Need a mechanism to encode notations, both for descriptors and for array 
creations
- The default value of a field/array depends on whether the "null-free" 
notation is used
- Fields, arrays, and method parameters and returns that are marked null-free 
can be flattened/scalarized
- Stack variables may generally be null, unless a static analysis proves 
otherwise
- A putfield, putstatic, aastore, or method invocation may fail with an NPE (or 
maybe ASE)
- Method overriding allows nullability mismatches; calls must be able to 
dynamically adapt (e.g., through multiple v-table entries and VM-generated 
bridges)
- Types marked null-free are allowed to be loaded early (e.g., to decide on 
field layout)


Compilation strategy

Where '*' represents a side notation that a type is null-free:

Val? maps to LVal;
Val~ maps to LVal;
Val! maps to LVal;*

Nullability conversions are no-ops; null-free conversions are either compiled 
to explicit null checks or are implicit in a invoke*/getfield/putfield.


Language implications

- Null-free value types typically get flattened storage and scalarized 
invocations
- Array store runtime checks may include a null check
- Methods may not be overloaded on different nullabilities of the same type
- Pollution of null-free variables arrays, or parameters/returns is impossible
- A conversion from Val~[] to Val![] could be supported, but the result would 
not perform the expected runtime checks


Migration implications


- Refactoring a class to be a value class is a binary compatible change (except 
where this involves incompatible changes like removing a public constructor); 
before recompilation, treatment of nulls does not change
- Changing the nullability of a type is a binary compatible change; library 
clients who expect a nullable API may see surprising NPEs or ASEs

JVM alternatives for supporting nullable value types

Reply via email to