> From: "Brian Goetz" <brian.go...@oracle.com> > To: "daniel smith" <daniel.sm...@oracle.com> > Cc: "valhalla-spec-experts" <valhalla-spec-experts@openjdk.java.net> > Sent: Tuesday, June 14, 2022 1:04:39 AM > Subject: Re: User model stacking: current status
> I've done a little more shaking of this tree. It involves keeping the notion > that the non-identity buckets differ only in the treatment of their val > projection, but makes a further normalization that enables the buckets to > mostly collapse away. > "value class X" means: > - Instances are identity-free > - There are two types, X.ref (reference, nullable) and X.val (direct, > non-nullable) > - Reference types are atomic, as always > - X is an alias for X.ref > Now, what is the essence of B2? B2 means not "I hate zeros", but "I don't like > that uninitialized variables are initialized to zero." It doesn't mean the > .val > projection is meaningless, it means that we don't trust arbitrary clients with > it. So, we can make a slight adjustment: > - The .val type is always there, but for "B2" classes, it is *inaccessible > outside the nest*, as per ordinary accessibility. > This means that within the nest, code that understands the restrictions can, > say, create `new X.val[7]` and expose it as an `X[]`, as long as it doesn't > let > the zero escape. This gives B2 classes a lot more latitude to use the .val > type > in safe ways. Basically: if you don't trust people with the .val type, don't > let the val type escape. I don't trust myself with a B2.val. The val type for B2 should not exist at all, otherwise any libraries using the reflection can do getClass() on a X.val[] (even typed as a X[]). > There's a bikeshed to paint, but it might look something like: > value class B2 { > private class val { } > } > or, flipping the default: > value class B3a { > public class val { } > } > So B2 is really a B3a whose value projection is encapsulated. and here you lost me, .ref and .val are supposed to be projection types not classes, at runtime there is only one class. > The other bucket, B3n, I think can live with a modifier: > non-atomic value class B3n { } > While these are all the same buckets as before, this feels much more like "one > new bucket" (the `non-atomic` modifier is like `volatile` on a field; we don't > think of this as creating a different bucket of fields.) yes ! > Summary: > class B1 { } > value class B2 { private class val { } } > value class B3a { } > non-atomic value class B3n { } > Value class here is clearly the star of the show; all value classes are > treated > uniformly (ref-default, have a val); some value classes encapsulate the val > type; some value classes further relax the integrity requirements of instances > on the heap, to get better flattening and performance, when their semantics > don't require it. > It's an orthogonal choice whether the default is "val is private" and "val is > public". It makes B2.val a reality, but B3 has no sane default value otherwise it's a B3, so B2.val should not exist. regards, Rémi > On 6/3/2022 3:14 PM, Brian Goetz wrote: >> Continuing to shake this tree. >> I'm glad we went through the exploration of "flattenable B3.ref"; while I >> think >> we probably could address the challenges of tearing across the null channel / >> data channels boundary, I'm pretty willing to let this one go. Similarly I'm >> glad we went through the "atomicity orthogonal to buckets" exploration, and >> am >> ready to let that one go too. >> What I'm not willing to let go of us making atomicity explicit in the model. >> Not >> only is piggybacking non-atomicity on something like val-ness too subtle and >> surprising, but non-atomicity seems like it is a property that the class >> author >> needs to ask for. Flatness is an important benefit, but only when it doesn't >> get in the way of safety. >> Recall that we have three different representation techniques: >> - no-flat -- use a pointer >> - low-flat -- for sufficiently small (depending on size of atomic >> instructions >> provided by the hardware) values, pack multiple fields into a single, >> atomically accessed unit. >> - full-flat -- flatten the layout, access individual individual fields >> directly, >> may allow tearing. >> The "low-flat" bucket got some attention recently when we discovered that >> there >> are usable 128-bit atomics on Intel (based on a recent revision of the chip >> spec), but this is not a slam-dunk; it requires some serious compiler heroics >> to pack multiple values into single accesses. But there may be targets of >> opportunity here for single-field values (like Optional) or final fields. And >> we can always fall back to no-flat whenever the VM feels like it. >> One of the questions that has been raised is how similar B3.ref is to B2, >> specifically with respect to atomicity. We've gone back and forth on this. >> Having shaken the tree quite a bit, what feels like the low energy state to >> me >> right now is: >> - The ref type of all on-identity classes are treated uniformly; B3.ref and >> B2.ref are translated the same, treated the same, have the same atomicity, >> the >> same nullity, etc. >> - The only difference across the spectrum of non-identity classes is the >> treatment of the val type. For B2, this means the val type is *illegal*; for >> B3, this means it is atomic; for B3n, it is non-atomic (which in practice >> will >> mean more flatness.) >> - (controversial) For all types, the ref type is the default. This means that >> some current value-based classes can migrate not only to B2, but to B3 or >> B3n. >> (And that we could migrate to B2 today and further to B3 tomorrow.) >> While this is technically four flavors, I don't think it needs to feel that >> complex. I'll pick some obviously silly modifiers for exposition: >> - class B1 { } >> - zero-hostile value class B2 { } >> - value class B3 { } >> - tearing-happy value class B3n { } >> In other words: one new concept ("value class"), with two sub-modifiers >> (zero-hostile, and tearing-happy) which affect the behavior of the val type >> (forbidden for B2, loosened for B3n.) >> For heap flattening, what this gets us is: >> - B1 -- no-flat >> - B2, B3.ref, B3n.ref -- low-flat atomic (with null channel) >> - B3 -- low-flat (atomic, no null channel) >> - B3n -- full-flat (non-atomic, no null channel) >> This is a slight departure from earlier tree-shakings with respect to >> tearing. >> In particular, refs do not tear at all, so programs that use all refs will >> never see tearing (but it is still possible to get a torn value using .val >> and >> then box that into a ref.) >> If you turn this around, the declaration-site decision tree becomes: >> - Do I need identity (mutability, subclassing, aliasing)? Then B1. >> - Are uninitialized values unacceptable? Then B2. >> - Am I willing to tolerate tearing to enable more flattening? Then B3n. >> - Otherwise, B3. >> And the use-site decision tree becomes: >> - For B1, B2 -- no choices to make. >> - Do I need nullity? Then .ref >> - Do I need atomicity, and the class doesn't already provide it? Then .ref >> - Otherwise, can use .val >> The main downside of making ref the default is that people will grumble about >> having to say .val at the use site all the time. And they will! And it does >> feel a little odd that you have to opt into val-ness at both the declaration >> and use sites. But it unlocks a lot of things (see Kevin's list for more): >> - The default name is the safest version. >> - Every unadorned name works the same way; it's always a reference type. You >> don't need to maintain a mental database around "which kind of name is this". >> - Migration from B1 -> B2 -> B3 is possible. This is huge (and more than we >> had >> hoped for when we started this game.) >> (The one thing to still worry about is that while refs can't tear, you can >> still >> observe a torn value through a ref, if someone tore it and then boxed it. I >> don't see how we defend against this, but the non-atomic label should be >> enough >> of a warning.) >> On 5/6/2022 10:04 AM, Brian Goetz wrote: >>> In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in the >>> stacking I've been discussing. Is that what you're saying? >>> class B1 { } // ref, identity, atomic >>> value-based class B2 { } // ref, non-identity, atomic >>> [ non-atomic ] value class B3 { } // ref or val, zero is ok, both >>> projections >>> share atomicity >>> If we go with ref-default, then this is a small leap from yesterday's >>> stacking, >>> because "B3" and "B2" are both reference types, so if you want a tearable, >>> non-atomic reference type, saying `non-atomic value class B3` and then just >>> using B3 gets you that. Then: >>> - B2 is like B1, minus identity >>> - B3 means "uninitialized values are OK, you get two types, a zero-default >>> and a >>> non-default" >>> - Non-atomicity is an extra property we can add to B3, to get more >>> flattening in >>> exchange for less integrity >>> - The use cases for non-atomic B2 are served by non-atomic B3 (when .ref is >>> the >>> default) >>> I think this still has the properties I want; I can freely choose the >>> reasonable >>> subsets of { identity, has-zero, nullable, atomicity } that I want; the >>> orthogonality of non-atomic across buckets becomes orthogonality of >>> non-atomic >>> with nullity, and the "B3.ref is just like B2" is shown to be the "false >>> friend."