Re: User model stacking: current status

Brian Goetz Tue, 14 Jun 2022 07:19:53 -0700

It took me a while to understand your concern, but I think I have it now-- it is that we're effectively doing separate access control on LFooand QFoo. At the language level this is no problem, but the VM needs astory here. Is this the whole of your concern, or is there more?

  - The .val type is always there, but for "B2" classes, it is *inaccessible 
outside the nest*, as per ordinary accessibility.

Is this the first time we'll be checking nest mate accessibility at
class creation?  If so (and I think it is) we'll need to update the
spec to define when the nest mates + nest host can be loaded to
complete this check in the (already complicated) class loading
process.


The case I'm thinking of is needing to do the accessibility check on
the defining class of a static field (and possibly an instance field)
when defining a class like:

class Foo {
   static QRational myRational;
}

To know if Foo can have a field of Rational.val, we need to check both
Foo and Rational are in the same nest.

First you need to check that Rational is accessible, and *then* you needto check that QRational satisfies the additional accessibilityrequirements, based on the public/package/private accessibility of the Qtype. Right?

This will require additional
class loads mitigated somewhat by the existing rules for preloading
Qs.  So maybe we can do the nest check there?  We'll probably need to
make this explicit in the spec that these additional classes can be
loaded as part of the accessible check during class definition.

Another option would be to delay the nest check until either the
<clinit> or instance methods, until the "new" bytecode?  I like that
less but it may be easier to fit into the spec.

--Dan

This means that within the nest, code that understands the restrictions can, 
say, create `new X.val[7]` and expose it as an `X[]`, as long as it doesn't let 
the zero escape.  This gives B2 classes a lot more latitude to use the .val 
type in safe ways.  Basically: if you don't trust people with the .val type, 
don't let the val type escape.

There's a bikeshed to paint, but it might look something like:

     value class B2 {
         private class val { }
     }

or, flipping the default:

     value class B3a {
         public class val { }
     }

So B2 is really a B3a whose value projection is encapsulated.

The other bucket, B3n, I think can live with a modifier:

     non-atomic value class B3n { }

While these are all the same buckets as before, this feels much more like "one new 
bucket" (the `non-atomic` modifier is like `volatile` on a field; we don't think of 
this as creating a different bucket of fields.)

Summary:

     class B1 { }
     value class B2 { private class val { } }
     value class B3a { }
     non-atomic value class B3n { }

Value class here is clearly the star of the show; all value classes are treated 
uniformly (ref-default, have a val); some value classes encapsulate the val 
type; some value classes further relax the integrity requirements of instances 
on the heap, to get better flattening and performance, when their semantics 
don't require it.

It's an orthogonal choice whether the default is "val is private" and "val is 
public".



On 6/3/2022 3:14 PM, Brian Goetz wrote:

Continuing to shake this tree.

I'm glad we went through the exploration of "flattenable B3.ref"; while I think we 
probably could address the challenges of tearing across the null channel / data channels boundary, 
I'm pretty willing to let this one go.  Similarly I'm glad we went through the "atomicity 
orthogonal to buckets" exploration, and am ready to let that one go too.

What I'm not willing to let go of us making atomicity explicit in the model.  
Not only is piggybacking non-atomicity on something like val-ness too subtle 
and surprising, but non-atomicity seems like it is a property that the class 
author needs to ask for.  Flatness is an important benefit, but only when it 
doesn't get in the way of safety.

Recall that we have three different representation techniques:

  - no-flat -- use a pointer
  - low-flat -- for sufficiently small (depending on size of atomic 
instructions provided by the hardware) values, pack multiple fields into a 
single, atomically accessed unit.
  - full-flat -- flatten the layout, access individual individual fields 
directly, may allow tearing.

The "low-flat" bucket got some attention recently when we discovered that there 
are usable 128-bit atomics on Intel (based on a recent revision of the chip spec), but 
this is not a slam-dunk; it requires some serious compiler heroics to pack multiple 
values into single accesses.  But there may be targets of opportunity here for 
single-field values (like Optional) or final fields.  And we can always fall back to 
no-flat whenever the VM feels like it.

One of the questions that has been raised is how similar B3.ref is to B2, 
specifically with respect to atomicity.  We've gone back and forth on this.

Having shaken the tree quite a bit, what feels like the low energy state to me 
right now is:

  - The ref type of all on-identity classes are treated uniformly; B3.ref and 
B2.ref are translated the same, treated the same, have the same atomicity, the 
same nullity, etc.
  - The only difference across the spectrum of non-identity classes is the 
treatment of the val type.  For B2, this means the val type is *illegal*; for 
B3, this means it is atomic; for B3n, it is non-atomic (which in practice will 
mean more flatness.)
  - (controversial) For all types, the ref type is the default.  This means 
that some current value-based classes can migrate not only to B2, but to B3 or 
B3n.  (And that we could migrate to B2 today and further to B3 tomorrow.)

While this is technically four flavors, I don't think it needs to feel that 
complex.  I'll pick some obviously silly modifiers for exposition:

  - class B1 { }
  - zero-hostile value class B2 { }
  - value class B3 { }
  - tearing-happy value class B3n { }

In other words: one new concept ("value class"), with two sub-modifiers 
(zero-hostile, and tearing-happy) which affect the behavior of the val type (forbidden 
for B2, loosened for B3n.)

For heap flattening, what this gets us is:

  - B1 -- no-flat
  - B2, B3.ref, B3n.ref -- low-flat atomic (with null channel)
  - B3 -- low-flat (atomic, no null channel)
  - B3n -- full-flat (non-atomic, no null channel)

This is a slight departure from earlier tree-shakings with respect to tearing.  
In particular, refs do not tear at all, so programs that use all refs will 
never see tearing (but it is still possible to get a torn value using .val and 
then box that into a ref.)

If you turn this around, the declaration-site decision tree becomes:

  - Do I need identity (mutability, subclassing, aliasing)?  Then B1.
  - Are uninitialized values unacceptable?  Then B2.
  - Am I willing to tolerate tearing to enable more flattening?  Then B3n.
  - Otherwise, B3.

And the use-site decision tree becomes:

  - For B1, B2 -- no choices to make.
  - Do I need nullity?  Then .ref
  - Do I need atomicity, and the class doesn't already provide it?  Then .ref
  - Otherwise, can use .val

The main downside of making ref the default is that people will grumble about 
having to say .val at the use site all the time.  And they will!  And it does 
feel a little odd that you have to opt into val-ness at both the declaration 
and use sites.  But it unlocks a lot of things (see Kevin's list for more):

  - The default name is the safest version.
  - Every unadorned name works the same way; it's always a reference type.  You don't 
need to maintain a mental database around "which kind of name is this".
  - Migration from B1 -> B2 -> B3 is possible.  This is huge (and more than we 
had hoped for when we started this game.)

(The one thing to still worry about is that while refs can't tear, you can 
still observe a torn value through a ref, if someone tore it and then boxed it. 
 I don't see how we defend against this, but the non-atomic label should be 
enough of a warning.)



On 5/6/2022 10:04 AM, Brian Goetz wrote:

In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in the 
stacking I've been discussing.  Is that what you're saying?

     class B1 { }  // ref, identity, atomic
     value-based class B2 { }  // ref, non-identity, atomic
     [ non-atomic ] value class B3 { }  // ref or val, zero is ok, both 
projections share atomicity

If we go with ref-default, then this is a small leap from yesterday's stacking, because 
"B3" and "B2" are both reference types, so if you want a tearable, non-atomic 
reference type, saying `non-atomic value class B3` and then just using B3 gets you that. Then:

  - B2 is like B1, minus identity
  - B3 means "uninitialized values are OK, you get two types, a zero-default and a 
non-default"
  - Non-atomicity is an extra property we can add to B3, to get more flattening 
in exchange for less integrity
  - The use cases for non-atomic B2 are served by non-atomic B3 (when .ref is 
the default)

I think this still has the properties I want; I can freely choose the reasonable subsets of { 
identity, has-zero, nullable, atomicity } that I want; the orthogonality of non-atomic across 
buckets becomes orthogonality of non-atomic with nullity, and the "B3.ref is just like 
B2" is shown to be the "false friend."

Re: User model stacking: current status

Reply via email to