Null channels (was: User model stacking)

Brian Goetz Tue, 03 May 2022 10:56:21 -0700

About six months ago we started working on flattening references incalling conventions in the Valhalla repos. We use the Preload attributeto force preloading of classes that are known to be (or expected to be)value classes, but which are referenced only via L descriptors, so thatat the (early) time that calling convention is chosen, we have theadditional information that that this is an identity-free class. Inthese cases, we scalarize the calling convention as we do with Q types,but we add an extra boolean channel for null; it is as if we add aboolean field to the object layout. When we adapt between thescalarized and indirected forms (e.g., c2i adapters), we apply theobvious semantics to the null channel.

We have not yet applied the same treatment to field layout, but we can(and it has the same timing constraints, so it also needs Preload), andthe VM has additional degrees of implementation freedom in doing so. The simplest is to let the layout engine choose to flatten a preloaded Lvalue type by injecting a boolean field which represents nullity, andadapting null checks to check this field (which can be hoisted etc.)

The layout engine has other tricks available to it as well, to furtherreduce the footprint of representing "might be null", if it can findsuitable slack space in the representation. Such tricks could includeusing slack bits in boolean fields (potentially seven of them), loworder bits of pointers (a la compressed OOPs), unused color bits of 64bit pointers, etc. Some of these choices require transforms onload/store (e.g., those that use pointer bits), not unlike what we dowith compressed OOPs. This is entirely "VM's choice" and affects onlyquality of implementation; there is nothing in the classfile thatconditions this, other than the ACC_VALUE indication and L/Q typecarriers. So the VM has a rich set of footprint/computation tradeoffsfor encoding the null channel, but logically, it is an "extra booleanfield" that all nullable value types have.

I'd like to reserve judgement on this stacking as I'm uncomfortable
(uncertain maybe?) about the practicality of the extra null channel.
Without having validated the extra null channel, I'm concerned we're
exposing a broader set of options in the language that will, in
practice, map down to the existing 3 buckets we've been talking about.
Maybe this factoring allows a slightly larger number of classes to be
flattened or leaves the door open for them to get it in the future?

What I'm trying to do here is decomplect flattening from nullity. Rightnow, we have an unfortunate interaction which both makes certaincombinations impossible, and makes the user model harder to reason about.

Identity-freedom unlocks flattening in the stack (calling convention.) The lesson of that exercise (which was somewhat surprising, but good) isthat nullity is mostly a non-issue here -- we can treat the nullityinformation as just being an extra state component when scalarizing,with some straightforward fixups when we adapt between direct andindirect representations. This is great, because we're not asking usersto choose between nullability and flattening; users pick the combinationof { identity, nullability } they want, and they get the best flatteningwe can give:


    case (identity, _) -> 1; // no flattening
    case (non-identity, non-nullable) -> nFields;  // scalarize fields

case (non-identity, nullable) -> nFields + 1; // scalarize fieldswith extra null channel

Asking for nullability on top of non-identity means only that there is alittle more "footprint" in the calling convention, but not a qualitativedifference. That's good.

In the heap, it is a different story. What unlocks flattening in theheap (in addition to identity-freedom) is some permission for_non-atomicity_ of loads and stores. For sufficiently simple classes(e.g., one int field) this is a non-issue, but because loads and storesof references must be atomic (at least, according to the current JMM),references to wide values (B2 and B3.ref) cannot be flattened as much asB3.val. There are various tricks we can do (e.g., stuffing two 32 bitfields into a 64 bit atomic) to increase the number of classes that canget good flattening, but it hits a wall much faster than "primitives".

What I'd like is for the flattening story on the heap and the stack tobe as similar as possible. Imagine, for a moment, that tearing was notan issue. Then where we would be in the heap is the same story asabove: no flattening for identity classes, scalarization in the heap fornon-nullable values, and scalarization with an extra boolean field(maybe, same set of potential optimizations as on the stack) fornullable values. This is very desirable, because it is so much easierto reason about:


 - non-identity unlocks scalarization on the stack
 - non-atomicity unlocks flattening in the heap

- in both, ref-ness / nullity means maybe an extra byte of footprintcompared to the baseline

(with additional opportunistic optimizations that let us get moreflattening / better footprint in various special cases, such as verysmall values.)

In previous discussions around the extra null channel for flattened
values, we were really looking at narrowly applicable optimization -
basically for nullable values that would fit within 64bits.  With this
stacking, and the info about intel allowing atomicity up to 128bits,
the extra null channel becomes more widely applicable.

Yes. What I'm trying to do is separate this all from the details ofwhat instructions CPU X has, and instead connect optimizations tosemantics: nullity requires extra footprint (unless it can be optimizedaway by stealing bits somehow), and does so uniformly across the buckets/ heap / stack / whatever. Nullability is a semantic property;providing this property may have a cost, but the more uniform we canmake it, the simpler it is to reason about, and the simpler to implement(since we can use the same encoding tricks in both stack and heap.)

Some of my hesitation comes from experiences writing structs or
multi-field invariants in C where memory barriers and careful
read/write protocols are important to ensure consistent data in the
face of races.  Widening the set of cases that have a multifield
invariant *created and enforced by the VM* by adding an additional
null channel will make it more likely the VM (and optimized jit code!)
can do the wrong thing.

Yes, this is why I want to bring it into the programming model. I don'twant to magically analyze the constructor and say "whoa, that looks likea cross-field invariant"; I want the class author to say "you havepermission to shred" or "you do not have permission to shred", and weoptimize within the semantic properties declared by the author.

In addition to cross-field invariants being part of the boundary betweenwhether or not we need atomicity, transparency also comes into play. When we "construct" a long, we have a pretty clear idea how the valuemaps to all the bits; with encapsulation, we do not (but for records, wedo again, because we've constrained away the ability to letrepresentation diverge from interface.) Again, though, I think we arebetter off having the author declare the required atomicity propertiesrather than trying to derive them from other things (e.g., constructorbody, record-ness, etc.)

I have always been somewhat uneasy about the injected nullchannel
approach and concerned about how difficult it will be for service
engineers to support when something goes wrong.  If there's experience
that can be shared that shows this works well in an implementation,
then I'll be less concerned.


Perhaps Tobias and Frederic can share more about what we've discovered here?

Null channels (was: User model stacking)

Reply via email to