A quick review:
The Value Objects feature (see https://openjdk.org/jeps/8277163) captures the
Valhalla project's central idea: that objects don't have to have identity, and
if programmers opt out of identity, JVMs can provide optimizations comparable
to primitive performance.
However, one important implementation technique is not supported by that JEP:
maximally flattened heap storage. ("Maximally flattened" as in "just the bits
necessary to encode an instance".) This is because flattened fields and arrays
store an object's field values directly, and so 1) need to be initialized "at
birth" to a non-null class instance, 2) may not store null, and 3) may by
updated non-atomically. These are semantics that need to be surfaced in the
language model.
We've tackled (3) by allowing value classes to be declared non-atomic
(syntax/limitations subject to bikeshedding), and then claiming by fiat that
fields/arrays of such classes are tearing risks. Races are rare enough that
this doesn't really call for a use-site opt-in, and we don't necessarily need
any deeper explanation for how new objects derived from random combinations of
old objects can be created by a read operation. That's just how it works.
<shrug>
We also allow value classes to declare that they support an all-zeros default
instance (again, subject to bikeshedding). You could imagine similarly claiming
that fields/arrays of these classes are null-hostile, as a side effect of how
their storage works. But this is an idiosyncrasy that is going to affect a lot
more programmers, and "that's just how it works" is pretty unsatisfactory.
Sometimes programs count on being able to use 'null' in their computation. We
need something in the language model to let programs opt in/out of nulls at the
use site, and thus opt out/in of maximally flattenable heap storage.
We've long discussed "reference type" vs. "value type" as the language concept
that captures this distinction. But where we once had a long list of
differences between references and values, most of those have gone away.
Notably, it's *not* useful for performance intuitions to imagine that
references are pointers and values are inline. Value objects get inlined when
the JVM want to do so. Reference-ness is not relevant.
Really, for most programmers, nullness is all that distinguishes a "reference
type" from a "value type".
Meanwhile, expressing nullness is not a problem unique to Valhalla. Whether a
variable is meant to store nulls is probably the most important property of
most programs that isn't expressible in the language. Workarounds include
informal javadoc specifications, type annotations (as explored by JSpecify),
lots of 'Objects.requireNonNull' calls, and blanket "if you pass in a null, you
might get an NPE" policies.
In Amber, pattern matching has its own problems with nullness: there are a lot
of ad hoc rules to distinguish between "is this a non-null instance of class
Foo?" vs. "is this null *or* an instance of class Foo?", because there's no
good way to express those two queries as explicitly different.
---
To address these problems, we've been exploring nullness markers as an
alternative to '.val' and '.ref'. The goal is a general-purpose feature that
lets programmers express intent about nulls, and that is preserved at runtime
sufficiently for JVMs to observe that "not null" + "value class" + "non-atomic
(or compact) class" --> "maximally flattenable storage". There are no "value
types", and there is no direct control over flattenability.
(A lot of these ideas build on what JSpecify has done, so appreciation to them
for the good work and useful documentation.)
Some key ideas:
- Nullness is an *optional* property of variables/expressions/etc., distinct
from types. If the program doesn't say what kind of nullness a variable has,
and it can't be inferred, the nullness is "unspecified". (Interpreted as "might
be null, but the programmer hasn't told us if that's their intent".)
Variables/expressions with unspecified nullness continue to behave the way they
always have.
- Because nullness is distinct from types, it shouldn't impact type checking
rules, subtyping, overriding, conversions, etc. Nullness has its own analysis,
subject to its own errors/warnings. The precise error/warning conditions
haven't been fleshed out, but our bias is towards minimal intrusion—we don't
want to make it hard to adopt these features in targeted ways.
- That said, *type expressions* (the syntax in programs that expresses a type)
are closely intertwined with *nullness markers*. 'Foo!' refers to a non-null
Foo, and 'Foo?' refers to a Foo or null. And nullness is an optional property
of type arguments, type variable bounds, and array components. Nullness markers
are the way programmers express their intent to the compiler's nullness
analysis.
- Nullness may also be implicit. Catch parameters and pattern variables are
always non-null. Lots of expressions have '!' nullness, and the null literal
has '?' nullness. Local variables get their nullness from their initializers.
Control flow analysis can infer properties of a variable based on its uses.
- There are features that change the default interpretation of the nullness of
class names. This is still pretty open-ended. Perhaps certain classes can be
declared (explicitly or implicitly) null-free by default (e.g., 'Point' is
implicitly 'Point!'). Perhaps a compilation-unit- or module- level directive
says that all unadorned types should be interpreted as '!'. Programs can be
usefully written without these convenience features, but for programmers who
want to widely adopt nullness, it will be important to get away from
"unspecified" as the default everywhere.
- Nullness is generally enforced at run time, via cooperation between javac and
JVMs. Methods with null-free parameters can be expected to throw if a null is
passed in. Null-free storage should reject writes of nulls. (Details to be
worked out, but as a starting point, imagine 'Q'-typed storage for all types.
Writes reject nulls. Reads before any writes produce a default value, or if
none exists, throw.)
- Type variable types have nullness, too. Besides 'T!' and 'T?', there's also a
"parametric" 'T*' that represents "whatever nullness is provided by the type
argument". (Again, some room for choosing the default interpretation of bare
'T'; unspecified nullness is useful for type variables as well.) Nullness of
type arguments is inferred along with the types; when both a type argument and
its bound have nullness, bounds checks are based on '!' <: '*' <: '?'. Generics
are erased for now, but in the future '!' type arguments will be reified, and
specialized classes will provide the expected runtime behaviors.
There are, of course, a lot of details behind these points. But hopefully this
provides a good high-level introduction.
A worry in taking on extra features like this is that we'll get distracted from
our primary goal, which is to support maximally flattened storage of value
objects. But I think it feels manageable, and it's certainly a lot more useful
than the sort of targeted usage of '.val' we were thinking about before.
Our main tasks for delivering a feature include:
- Work out the declaration syntax/class file encoding for opting in to
non-atomic-ness and default instances
- Implement nullness markers and some analysis/diagnostics in javac
- Provide a language spec for the parts of the analysis standardized in the
language
- Settle on a class file format and division of responsibility for runtime
behaviors
- Implement some targeted new JVM behaviors; use nullness as a signal for
flattening
- Design/implement how nullness is exposed by reflection
For the future, we'll want to:
- Anticipate how a "change the defaults" feature will work
- Consider the interaction of nullness with Amber features
- Think about how runtime nullness interacts with specialization and type
restrictions