User model stacking: current status

Brian Goetz Thu, 05 May 2022 10:51:49 -0700

The current stacking discussion is motivated by several factors:


 - experiences prototyping both B2 and B3

- recently discovered hardware improvements in atomic operations(e.g., Intel's recent specification strengthening around 128-bit vectorloads and stores) - further thought on the consequences of the B2/B3 model, particularlywith regard to tearing

The B2/B3 split was a useful proxy during prototyping, with each beingbuilt around a known use case: B2 around value-based classes, and B3around numeric abstractions. My main objection is twofold: there aregratuitous-seeming differences in performance model (B3s flatten muchbetter currently), which puts users to bad choices between semantics andperformance, and the degree to which tearing is hidden behind some otherproxy ("primitive-ness", non-nullity, etc), which is likely to surpriseusers when invariants are checked in the constructor but not necessarilyobeyed at runtime. I want the observed behavioral distinctions betweenbuckets to be clearly related to their semantic differences, and we'renot there yet.

The differences in flattening and performance between the current B2/B3derives directly from the possibility of tearing. When tearing isunacceptable, we are likely to fall back on using indirections to makeloads and stores of references atomic (the "non-flat" option); evenwhere we are able to gain some flattening through compiler heroics (the"low flat" option), these hit the ceiling pretty fast (we're unlikely toget above 128 bits any time soon, and may need at least one bit fornull) and these also have other costs (wider loads and stores means moredata movement and more register shuffling, in addition to the complexityof the required compiler heroics.) Full-flat requires tearing. But Idon't see an intrinsic reason (yet) while we can't have full-flat forVBCs like Optional.

The most encouraging direction is to factor atomicity out of the bucketmodel. We can make both buckets (VBC and primitive-like) atomic bydefault; this still gets us all the calling convention optimizations,and for very small values (such as single field ones, like Optional), wecan probably achieve full flattening in the heap, and more flatteningfor small-ish values with low-flat heroics. We can allow both bucketsto opt into non-atomicity, which unlocks full-flat layout in the heap,with the only difference being whether we have to perturb therepresentation to make null representable.


This gets us to something like:

    [ atomic | non-atomic ] __value class B2 { }
    [ atomic | non-atomic ] __primitive class B3 { }

There are many bikesheds here, including the spelling of all thesethings, and whether or not we say "class" or "struct" or "primitive" ornothing at all, or whether these work with records, but painting cancome later. There are also many other decisions to make, but I'llobserve several properties we've already gained by this stacking:

- non-atomicity is explicit, rather than hiding it behind "primitive"or "non-nullable" or "zero-happy" - non-atomicity is orthogonal, which means that the performancedifference between B2 and B3 (or B3.val and B3.ref), for either polarityof atomicity, is exclusively that imposed by the null-encoding requirement - safe by default, can opt into more performance by opting out of somesafety - non-atomic sounds "just scary enough" to make people think twice, orat least learn what non-atomic means

Atomicity is only needed when a class has cross-field invariants (orwhen it's construction API varies significantly from itsrepresentation.) Numeric classes like Complex have no invariants, andRational has only single-field invariants, but classes like IntRangewould have cross-field invariants. In cases where the VM can provideatomicity for free (e.g., single-field classes), it wouldn't make adifference.

If we further opt for Kevin's "ref is default" proposal, then we addanother:


 - All unadorned type names are reference types

Separately, I think we can reconsider where we spend the "value"keyword. Previously "value" meant "non-identity", but I think it isbetter spent meaning "has a value projection", which leads us to theminor reshuffling presented yesterday:


    class B1 { }                 // ref only, == based on identity
    value-based class B2 { }     // ref only, == based on state
value class B3 { }           // Has ref and val projections

This affirms B2 as "value-lite", connects to the term we colonized inJava 8 for "classes that have value-like semantics", and moves away from"primitive".


Let's work through Kevin's examples here:

- Rational. Here, the default value is particularly bad (denominatorshould not be zero). This leads to an uncomfortable choice; choose B2,or choose B3 and deal with the DBZE as "user error" when it happens. Internal methods (e.g., multiply two rationals) can treat the defaultvalue as "0/1" instead and produce a valid rational, but any code thatpulls out the denominator and operate on it externally will confront thezero anyway. Whichever way one chooses, people will complain "butthat's bad". Rational is interesting because it _has_ a sensibledefault, it is just not the zero representation. - EmployeeId. Similar, but maybe more tolerable to treat as a B2, anddoesn't require atomicity.

 - Instant.  Seems this is a (probably non-atomic) B2.
 - Complex.  Solid non-atomic B3.

- Optional, OptionalInt, etc. In a world where B3 is ref-default,these can be B3; otherwise B2.

 - IntRange: atomic B3 (cross-field invariant.)

There are lots of other things to discuss here, including a discussionof what does non-atomic B2 really mean, and whether there are additionalrisks that come from tearing _between the null and the fields_. I'lladdress that in a separate mail, but I think that factoring out atomicinto its own explicit thing is a pure win, and that in turn exposes somesensible terminology shuffling in the other buckets.

Also, bikeshed topics to cover (please, let's not let this drown thediscussion):

 - How to spell atomic / non-atomic
 - How to spell B2 and B3
 - How to spell .ref and .val
 - ref-default vs val-default for B3
   - if we go ref-default, reconciling this with universal generics
   - reconciling this with nullable types

User model stacking: current status

Reply via email to