Nullness markers to enable flattening

Dan Smith Mon, 06 Feb 2023 17:26:56 -0800

A quick review:

The Value Objects feature (see https://openjdk.org/jeps/8277163) captures the 
Valhalla project's central idea: that objects don't have to have identity, and 
if programmers opt out of identity, JVMs can provide optimizations comparable 
to primitive performance.


However, one important implementation technique is not supported by that JEP: 
maximally flattened heap storage. ("Maximally flattened" as in "just the bits 
necessary to encode an instance".) This is because flattened fields and arrays 
store an object's field values directly, and so 1) need to be initialized "at 
birth" to a non-null class instance, 2) may not store null, and 3) may by 
updated non-atomically. These are semantics that need to be surfaced in the 
language model.

We've tackled (3) by allowing value classes to be declared non-atomic 
(syntax/limitations subject to bikeshedding), and then claiming by fiat that 
fields/arrays of such classes are tearing risks. Races are rare enough that 
this doesn't really call for a use-site opt-in, and we don't necessarily need 
any deeper explanation for how new objects derived from random combinations of 
old objects can be created by a read operation. That's just how it works. 
<shrug>

We also allow value classes to declare that they support an all-zeros default 
instance (again, subject to bikeshedding). You could imagine similarly claiming 
that fields/arrays of these classes are null-hostile, as a side effect of how 
their storage works. But this is an idiosyncrasy that is going to affect a lot 
more programmers, and "that's just how it works" is pretty unsatisfactory. 
Sometimes programs count on being able to use 'null' in their computation. We 
need something in the language model to let programs opt in/out of nulls at the 
use site, and thus opt out/in of maximally flattenable heap storage.

We've long discussed "reference type" vs. "value type" as the language concept 
that captures this distinction. But where we once had a long list of 
differences between references and values, most of those have gone away. 
Notably, it's *not* useful for performance intuitions to imagine that 
references are pointers and values are inline. Value objects get inlined when 
the JVM want to do so. Reference-ness is not relevant.

Really, for most programmers, nullness is all that distinguishes a "reference 
type" from a "value type".

Meanwhile, expressing nullness is not a problem unique to Valhalla. Whether a 
variable is meant to store nulls is probably the most important property of 
most programs that isn't expressible in the language. Workarounds include 
informal javadoc specifications, type annotations (as explored by JSpecify), 
lots of 'Objects.requireNonNull' calls, and blanket "if you pass in a null, you 
might get an NPE" policies.

In Amber, pattern matching has its own problems with nullness: there are a lot 
of ad hoc rules to distinguish between "is this a non-null instance of class 
Foo?" vs. "is this null *or* an instance of class Foo?", because there's no 
good way to express those two queries as explicitly different.

---

To address these problems, we've been exploring nullness markers as an 
alternative to '.val' and '.ref'. The goal is a general-purpose feature that 
lets programmers express intent about nulls, and that is preserved at runtime 
sufficiently for JVMs to observe that "not null" + "value class" + "non-atomic 
(or compact) class" --> "maximally flattenable storage". There are no "value 
types", and there is no direct control over flattenability.

(A lot of these ideas build on what JSpecify has done, so appreciation to them 
for the good work and useful documentation.)

Some key ideas:

- Nullness is an *optional* property of variables/expressions/etc., distinct 
from types. If the program doesn't say what kind of nullness a variable has, 
and it can't be inferred, the nullness is "unspecified". (Interpreted as "might 
be null, but the programmer hasn't told us if that's their intent".) 
Variables/expressions with unspecified nullness continue to behave the way they 
always have.

- Because nullness is distinct from types, it shouldn't impact type checking 
rules, subtyping, overriding, conversions, etc. Nullness has its own analysis, 
subject to its own errors/warnings. The precise error/warning conditions 
haven't been fleshed out, but our bias is towards minimal intrusion—we don't 
want to make it hard to adopt these features in targeted ways.

- That said, *type expressions* (the syntax in programs that expresses a type) 
are closely intertwined with *nullness markers*. 'Foo!' refers to a non-null 
Foo, and 'Foo?' refers to a Foo or null. And nullness is an optional property 
of type arguments, type variable bounds, and array components. Nullness markers 
are the way programmers express their intent to the compiler's nullness 
analysis.

- Nullness may also be implicit. Catch parameters and pattern variables are 
always non-null. Lots of expressions have '!' nullness, and the null literal 
has '?' nullness. Local variables get their nullness from their initializers. 
Control flow analysis can infer properties of a variable based on its uses.

- There are features that change the default interpretation of the nullness of 
class names. This is still pretty open-ended. Perhaps certain classes can be 
declared (explicitly or implicitly) null-free by default (e.g., 'Point' is 
implicitly 'Point!'). Perhaps a compilation-unit- or module- level directive 
says that all unadorned types should be interpreted as '!'. Programs can be 
usefully written without these convenience features, but for programmers who 
want to widely adopt nullness, it will be important to get away from 
"unspecified" as the default everywhere.

- Nullness is generally enforced at run time, via cooperation between javac and 
JVMs. Methods with null-free parameters can be expected to throw if a null is 
passed in. Null-free storage should reject writes of nulls. (Details to be 
worked out, but as a starting point, imagine 'Q'-typed storage for all types. 
Writes reject nulls. Reads before any writes produce a default value, or if 
none exists, throw.)

- Type variable types have nullness, too. Besides 'T!' and 'T?', there's also a 
"parametric" 'T*' that represents "whatever nullness is provided by the type 
argument". (Again, some room for choosing the default interpretation of bare 
'T'; unspecified nullness is useful for type variables as well.) Nullness of 
type arguments is inferred along with the types; when both a type argument and 
its bound have nullness, bounds checks are based on '!' <: '*' <: '?'. Generics 
are erased for now, but in the future '!' type arguments will be reified, and 
specialized classes will provide the expected runtime behaviors.

There are, of course, a lot of details behind these points. But hopefully this 
provides a good high-level introduction.

A worry in taking on extra features like this is that we'll get distracted from 
our primary goal, which is to support maximally flattened storage of value 
objects. But I think it feels manageable, and it's certainly a lot more useful 
than the sort of targeted usage of '.val' we were thinking about before.

Our main tasks for delivering a feature include:
- Work out the declaration syntax/class file encoding for opting in to 
non-atomic-ness and default instances
- Implement nullness markers and some analysis/diagnostics in javac
- Provide a language spec for the parts of the analysis standardized in the 
language
- Settle on a class file format and division of responsibility for runtime 
behaviors
- Implement some targeted new JVM behaviors; use nullness as a signal for 
flattening
- Design/implement how nullness is exposed by reflection

For the future, we'll want to:
- Anticipate how a "change the defaults" feature will work
- Consider the interaction of nullness with Amber features
- Think about how runtime nullness interacts with specialization and type 
restrictions

Nullness markers to enable flattening

Reply via email to