About six months ago we started working on flattening references in calling conventions in the Valhalla repos.  We use the Preload attribute to force preloading of classes that are known to be (or expected to be) value classes, but which are referenced only via L descriptors, so that at the (early) time that calling convention is chosen, we have the additional information that that this is an identity-free class.  In these cases, we scalarize the calling convention as we do with Q types, but we add an extra boolean channel for null; it is as if we add a boolean field to the object layout.  When we adapt between the scalarized and indirected forms (e.g., c2i adapters), we apply the obvious semantics to the null channel.

We have not yet applied the same treatment to field layout, but we can (and it has the same timing constraints, so it also needs Preload), and the VM has additional degrees of implementation freedom in doing so.  The simplest is to let the layout engine choose to flatten a preloaded L value type by injecting a boolean field which represents nullity, and adapting null checks to check this field (which can be hoisted etc.)

The layout engine has other tricks available to it as well, to further reduce the footprint of representing "might be null", if it can find suitable slack space in the representation.  Such tricks could include using slack bits in boolean fields (potentially seven of them), low order bits of pointers (a la compressed OOPs), unused color bits of 64 bit pointers, etc.  Some of these choices require transforms on load/store (e.g., those that use pointer bits), not unlike what we do with compressed OOPs.  This is entirely "VM's choice" and affects only quality of implementation; there is nothing in the classfile that conditions this, other than the ACC_VALUE indication and L/Q type carriers.  So the VM has a rich set of footprint/computation tradeoffs for encoding the null channel, but logically, it is an "extra boolean field" that all nullable value types have.

I'd like to reserve judgement on this stacking as I'm uncomfortable
(uncertain maybe?) about the practicality of the extra null channel.
Without having validated the extra null channel, I'm concerned we're
exposing a broader set of options in the language that will, in
practice, map down to the existing 3 buckets we've been talking about.
Maybe this factoring allows a slightly larger number of classes to be
flattened or leaves the door open for them to get it in the future?

What I'm trying to do here is decomplect flattening from nullity. Right now, we have an unfortunate interaction which both makes certain combinations impossible, and makes the user model harder to reason about.

Identity-freedom unlocks flattening in the stack (calling convention.)  The lesson of that exercise (which was somewhat surprising, but good) is that nullity is mostly a non-issue here -- we can treat the nullity information as just being an extra state component when scalarizing, with some straightforward fixups when we adapt between direct and indirect representations.  This is great, because we're not asking users to choose between nullability and flattening; users pick the combination of { identity, nullability } they want, and they get the best flattening we can give:

    case (identity, _) -> 1; // no flattening
    case (non-identity, non-nullable) -> nFields;  // scalarize fields
    case (non-identity, nullable) -> nFields + 1;  // scalarize fields with extra null channel

Asking for nullability on top of non-identity means only that there is a little more "footprint" in the calling convention, but not a qualitative difference.  That's good.

In the heap, it is a different story.  What unlocks flattening in the heap (in addition to identity-freedom) is some permission for _non-atomicity_ of loads and stores.  For sufficiently simple classes (e.g., one int field) this is a non-issue, but because loads and stores of references must be atomic (at least, according to the current JMM), references to wide values (B2 and B3.ref) cannot be flattened as much as B3.val.  There are various tricks we can do (e.g., stuffing two 32 bit fields into a 64 bit atomic) to increase the number of classes that can get good flattening, but it hits a wall much faster than "primitives".

What I'd like is for the flattening story on the heap and the stack to be as similar as possible.  Imagine, for a moment, that tearing was not an issue.  Then where we would be in the heap is the same story as above: no flattening for identity classes, scalarization in the heap for non-nullable values, and scalarization with an extra boolean field (maybe, same set of potential optimizations as on the stack) for nullable values.  This is very desirable, because it is so much easier to reason about:

 - non-identity unlocks scalarization on the stack
 - non-atomicity unlocks flattening in the heap
 - in both, ref-ness / nullity means maybe an extra byte of footprint compared to the baseline

(with additional opportunistic optimizations that let us get more flattening / better footprint in various special cases, such as very small values.)

In previous discussions around the extra null channel for flattened
values, we were really looking at narrowly applicable optimization -
basically for nullable values that would fit within 64bits.  With this
stacking, and the info about intel allowing atomicity up to 128bits,
the extra null channel becomes more widely applicable.

Yes.  What I'm trying to do is separate this all from the details of what instructions CPU X has, and instead connect optimizations to semantics: nullity requires extra footprint (unless it can be optimized away by stealing bits somehow), and does so uniformly across the buckets / heap / stack / whatever.  Nullability is a semantic property; providing this property may have a cost, but the more uniform we can make it, the simpler it is to reason about, and the simpler to implement (since we can use the same encoding tricks in both stack and heap.)

Some of my hesitation comes from experiences writing structs or
multi-field invariants in C where memory barriers and careful
read/write protocols are important to ensure consistent data in the
face of races.  Widening the set of cases that have a multifield
invariant *created and enforced by the VM* by adding an additional
null channel will make it more likely the VM (and optimized jit code!)
can do the wrong thing.

Yes, this is why I want to bring it into the programming model.  I don't want to magically analyze the constructor and say "whoa, that looks like a cross-field invariant"; I want the class author to say "you have permission to shred" or "you do not have permission to shred", and we optimize within the semantic properties declared by the author.

In addition to cross-field invariants being part of the boundary between whether or not we need atomicity, transparency also comes into play.  When we "construct" a long, we have a pretty clear idea how the value maps to all the bits; with encapsulation, we do not (but for records, we do again, because we've constrained away the ability to let representation diverge from interface.)  Again, though, I think we are better off having the author declare the required atomicity properties rather than trying to derive them from other things (e.g., constructor body, record-ness, etc.)

I have always been somewhat uneasy about the injected nullchannel
approach and concerned about how difficult it will be for service
engineers to support when something goes wrong.  If there's experience
that can be shared that shows this works well in an implementation,
then I'll be less concerned.

Perhaps Tobias and Frederic can share more about what we've discovered here?



Reply via email to