As we just discussed in the EG, allowing null to co-exist with flattenable representations is a challenge. It is one we have in the past tried to avoid, but the very legitimate needs for (what we now call) reference semantics for all of Bucket 2 and some of Bucket 3 require us to give null a place at the table, even while continuing to aim at flattening nullable values, when possible.
A good example of this is Optional, migrated from a Bucket 1 *value-based class* to a proper Bucket 2 *reference-based primitive*. (See that tricky change in POV?) Another example to keep in mind is the reference projection of a Bucket 3 type such as Complex.ref or Point.ref. The simplest way to support null is just to do what we do today, and buffer on the heap, with the option of a null reference instead of a reference to a boxed value. (We call such things “buffers” rather than “boxes” simply because, unlike int/Integer, the type of thing that’s in the box might not be denotably different from the type of the “box” itself.) The next thing to do is inject a *pivot field* into the flattened layout of the primitive object. When this invisible field contains all zero bits, the flattened object encodes a null. All the other bits are either ignorable or must be zero, depending on what you are trying to do. This idea splits into two directions: How to work with “pivoted” non-null values, and how to represent the pivot efficiently. Both lines of thought are more or less required exercises, once you allow null its place at the table. We know where null comes from (the null literal and aconst_null). Where do pivoted values come from? You need an original source of them for the initial value of “this” in the primitive constructor (a factory method at the bytecode level). Specifically, you need that bit pattern which is almost but not quite all zero bits; the pivot field is set to the “non-null” state but all other field values are zero. Then the constructor can get to work. This might be the job of an “initialvalue” bytecode, which is a repackaging of the “defaultvalue” bytecode. Given a suitable definition with suitable restrictions for initialvalue, a constructor uses a mix of initialvalue and withfield executions to get to its output state for “this”. None of the intermediate states would be confusable with null. (We sometimes assumed, wrongly in hindsight, that doing this simply requires assigning “this” to null in the constructor and then special-casing withfield and maybe getfield to allow a null input and maybe a null output. But this is a thicket of tangles and irregularities, and it doesn’t quite get rid of the need for a separate operation to actually set the pivot field. Basically, once null gets entrenched, defaultvalue has to turn into initialvalue, or so it appears to me at this moment.) Once the constructor returns a non-null set of bits, all subsequent assignments continue to separate null from non-null. That’s true even for racy assignments, assuming that pivot field states are individually atomic, even if they race relative to other fields. (Race control might be important for Bucket 3 references like Complex.ref, if we ever try to flatten those. I’m digressing; my focus is to build out Bucket 2, which suppresses such races.) To allow Bucket 2 constructors control over their outputs, it follows that initialvalue (unlike its earlier version defaultvalue) must be restricted to those same contexts where withfield is allowed. Either to constructors only (for the same class) or to the capsule (nest) of that class. OK, so how is the pivot field physically represented? Again, we have discussed this in years past, but I’ll summarize some of the thinking: 1. It can be just a boolean, a byte or a packed bit that is made free somehow. A 65th bit to a 64-bit payload perhaps. This is sad, but also hard to get around when every single bitwise encoding in the existing layout already has a meaning. But the payload of the primitive type might use a field with “slack”, aka unused bitwise encodings. We can pounce on this and use bit-twiddling to internally reserve the zero state, and declare that when that field is zero, it is the pivot field denoting null, and when it is non-zero it is doing its normal job. 2. If the language tells us, “yes I promise not to use the default value on this field” then maybe the JVM can do something with that promise. There are issues, but it’s tempting for (say) a Rational type where the denominator is never zero. 3. More reliably, if the JVM knows that the a field has unused encodings, it can just swap the all-zero state with some other state. People will immediate think of unused bits which can be flipped to true in the field when it is pivoted to non-null. It’s better, IMO, to start out with the humble increment operator (rather than the bit-set operator) and work from there. As long as the encoding of all-one-bits is not taken, for a given field (true for booleans and managed pointers!) then the JVM can simply perform an unsigned non-overflowing increment when storing payload to the pivot field (preserving the non-zero invariant) and do a non-overflowing unsigned decrement when loading. I can just hear the GC folks groaning in the distance about such increments, on managed pointers. For them, a slightly less JIT-friendly operation might be preferable, to perform the increment (on store) only when the value is null, and vice versa on load, decrement only when 1. Or use bit twiddling in the low bits of the pointer. Or use all-one-bits as the “payload null” which is distinct from the “pivot is zero” state. I think the JIT and GC folks can come to an agreement, in any given JVM. When the JIT people groan back about weirdo encodings of managed pointers, we can gently tell them, “it’s just another flavor of managed pointer transcoding, a problem we solved when we went to compressed oops.” (On balance, I think the GC should define a small family of “quasi-null sentinel values” which can be easily stored into any managed pointer for ad hoc purposes like this and others. Others would be at least 1. an Optional::isEmpty state for optionals *which are null-friendly* and 2. a distinction between null and unbound, for lazy variables which are also null-friendly. Neither of these exist today, of course, and none of these hypothetical sentinels would ever be visible to normal Java code.) My point is that we don’t have to just slap a boolean on everything. In particular, when migrating ju.Optional to Bucket 2, we can preserve its very attractive one-field representation by invisibly assigning a bad managed pointer value to encode Optional::isEmpty. No Java code changes are needed (or desired) to pull this off, just the increment hack sketched above, or one of its variations. Even Bucket 3 references could be encoded in this way, if and when we desire to. That is, whatever JVM algorithm constructors a pivot field and its logic could be pointed at a Bucket 3 reference projection, if we think this would be desirable. One result would be that Map.get, which returns T.ref, could avoid buffering on the heap. N.B. This assumes stuff we don’t have yet, to specialize Map::get to a particular flattenable type. I hope we will get there. — John