Re: [External] : Re: User model stacking

Brian Goetz Wed, 27 Apr 2022 16:15:25 -0700

Let me try and put some more color on the bike shed (but, again, let’s focus on 
model, not syntax, for now.)


We have two axes of variation we want to express with non-identity classes: 
atomicity constraints, and whether there is an additional zero-default 
companion type.  These can be mostly orthogonal; you can have either, neither, 
or both.  We've been previously assuming that "primitiveness" lumps this all 
together; primitives get more flattening, primitives can be 
non-nullable/zero-default, primitives means the good name goes to the "val" 
type.  Primitive-ness implicitly flips the "safety vs performance" priority, 
which has been bothering us because primitives also code like a class.  So we 
were trying to claw back some atomicity for primitives.

But also, we're a little unhappy with B2 because B2 comes with _more_ atomicity 
than is necessarily needed; a B2 with no invariants still gets less flattening 
than a B3.  That's a little sad.  And also that it seems like a gratuitous 
difference, which makes the user model more complicated.  So we’re suggesting 
restacking towards:

- Value classes are those without identity
- Value classes can be atomic or non-atomic, the default is atomic (safe by 
default)
- Value classes can further opt into having a "val" projection (name TBD, val 
is probably not it)
- Val projections are non-nullable, zero-default — this is the only difference
- Both the ref and val projections inherit the atomicity constraints of the 
class, making atomicity mostly orthogonal to ref/val/zero/null

Example: classic B2

   value class B2a { }

Because the default is atomic, we get the classic B2 semantics -- no identity, 
but full final field safety guarantees.  VM has several strategies for 
flattening in the heap: single-field classes always flattened (“full flat”), 
multi-field classes can be flattened with "fat load and store" heroics in the 
future (“low flat”), otherwise, indirection (“no flat”)

Example: non-atomic B2

   non-atomic value class B2n { }

Here, the user has said "I have no atomicity rquirements."  A B2n is a loose 
aggregation of fields that can be individually written and read (full B3-like 
flattening), with maybe an extra boolean field to encode null (VM's choice how 
to encode, could use slack pointer bits etc.)

Example: atomic B3

   zero-capable value class B3a { }

This says I am declaring two types, B3a and B3a.zero.  (The syntax in this 
quadrant sucks; need to find better.)  B3a is just like B2a above, because we 
haven’t activated the zero capability at the use site.  
B3a.zero/val/flat/whatever is non-nullable, zero-default, *but still has full 
B2-classic atomicity*.  With the same set of flattening choices on the part of 
the VM.

Example: full primitive

   non-atomic zero-capable value class B3n { }

Here, B3n is like B2n, and B3n.zero is a full classic-B3 Q primitive with full 
flattening.

So:

- value-ness means "no identity, == means state equality"
- You can add non-atomic to value-ness, meaning you give up state integrity
- You can orthogonally add zero-capable to value-ness, meaning you get a 
non-null, zero-happy companion, which inherits the atomic-ness

Some of the characteristics of this scheme:

- The default is atomicity / integrity FOR ALL BUCKETS (safe by default)
- The default is nullability FOR ALL BUCKETS
- All unadorned type names are reference types / nullable
- All Val-adorned type names (X.val) are non-nullable (or .zero, or .whatever)
- Atomicity is determined by declaration site, can’t be changed at use site

The main syntactic hole is finding the right spelling for "zeroable" / .val.  
There is some chance we can get away with spelling it `T!`, though this has 
risks.

Spelling zero-happy as any form of “flat” is probably a bad idea, because B2 
can still be flat.

A possible spelling for “non-atomic” is “relaxed”:

   relaxed value class B3n { }

Boilerplate-measurers would point out that to get full flattening, you have to 
say three things at the declaration site and one extra thing at the use site:

    relaxed zero-happy value class Complex { }
    …
    Complex! c;

If you forget relaxed, you might get atomicity (but might not cost anything, if 
the value is small.)  If you forget zero-happy, you can’t say `Complex!`, you 
can only say Complex, and the compiler will remind you.  If you forget the !, 
you maybe get some extra footprint for the null bit.  None of these are too 
bad, but the verbosity police might want to issue a warning here.

It is possible we might want to flip the declaration of zero-capable, where 
classes with no good default can opt OUT of the zero companion, rather than the 
the other way around:

    null-default value class LocalDate { }

which says that LocalDate must use the nullable (LocalDate) form, not the 
non-nullable (LocalDate.val/zero/bang) form.


On 4/22/2022 2:24 PM, Brian Goetz wrote:
I think I have a restack of Dan's idea that feels like fewer buckets.

We have two axes of variation we want to express with flattenable types: 
atomicity constraints, and whether there is an additional zero-default 
companion type.

We've been assuming that "primitiveness" lumps this all together; primitives 
get more flattening, primitives can be non-nullable/zero-default, primitives 
means the good name goes to the "val" type.  Primitive-ness implicitly flips 
the "safety vs performance" priority, which is bothering us because primitives 
also code like a class.  So we're trying to claw back some atomicity for 
primitives.

But also, we're a little unhappy with B2 because B2 comes with _more_ atomicity 
than is necessarily needed; a B2 with no invariants still gets less flattening. 
 That's a little sad.  Let's restack the pieces (again).

- Value classes are those without identity
- Value classes can be atomic or non-atomic, the default is atomic (safe)
- Value classes can further opt into having a "val" projection (name TBD, val 
is probably not it)
- Val projections are non-nullable, zero-default
- Both the ref and val projections inherit the atomicity constraints of the 
class, making atomicity mostly orthogonal to ref/val/zero/null

Example: classic B2

   value class B2 { }

Because the default is atomic, we get the classic B2 semantics -- no identity, 
but full final field safety guarantees.  VM has several strategies for 
flattening in the heap: single-field classes always flattened, multi-field 
classes can be flattened with "fat load and store" heroics in the future, 
otherwise, indirection.

Example: non-atomic B2

   non-atomic value class B2a { }

Here, the user has said "I have no atomicity rquirements."  A B2a is a loose 
aggregation of fields that can be individually written and read (full B3-like 
flattening), with maybe an extra boolean field to encode null (VM's choice how 
to encode.)

Example: atomic B3

   zero-capable value class B3a { }

This says I am declaring two types, B3a and B3a.zero.  (These names suck; need 
better ones.)  B3a is just like B2 above.  B3a.zero is non-nullable, 
zero-default, *but still has full B2-classic atomicity*.  With the same set of 
flattening choices.

Example: full primitive

   non-atomic zero-capable value class B3b { }

Here, B3b is like B2a, and B3b.zero is a full classic-B3 Q primitive with full 
flattening.


So the stacking is:

- value-ness means "no identity, == means state equality"
- You can add non-atomic to value-ness, meaning you give up state integrity
- You can orthogonally add zero-capable to value-ness, meaning you get a 
non-null, zero-happy companion

This is starting to feel more honest....





On 4/19/2022 6:45 PM, Brian Goetz wrote:
By choosing to modify the class, we are implicitly splitting into Buckets 3a 
and 3n:

- B2 gives up identity
- B3a further gives up nullity
- B3n further gives up atomicity

Which opens us up to a new complaint: people didn't even like the B2/B3 split 
("why does there have to be two"), and now there are three.

Given that atomic/non-atomic only work with primitive, maybe there's a way to 
compress this further?

On 4/19/2022 6:25 PM, Dan Smith wrote:
On Apr 19, 2022, at 2:49 PM, Brian Goetz <[email protected]>
wrote:

So, what shall we do when the user says non-atomic, but the constructor 
expresses a multi-field invariant?

Lint warning, if we can detect it and that warning is turned on.


On Apr 19, 2022, at 3:22 PM, Brian Goetz <[email protected]>
wrote:

Stepping back, what you're saying is that we manage atomicity among a subset of 
fields by asking the user to arrange the related fields in a separate class, 
and give that class extra atomicity.  If we wanted to express 
ColoredDiagonalPoint, in this model we'd say something like:

   non-atomic primitive ColoredDiagonalPoint {
       private DiagonalPoint p;
       private Color c;

       private atomic primitive DiagonalPoint {
           private int x, y;

           DiagonalPoint(int x, int y) {
               if (x != y) throw;
               ...
           }
       }
   }

Right?

Yep. Good illustration of how just providing a class modifier gives programmers 
significant fine-grained control.


We exempt the single-field classes from having an opinion.  We could also 
exempt primitive records with no constructor behavior.

Yeah, but (1) hard to identify all assumed invariants—some might appear in 
factories, etc., or informally in javadoc; and (2) even in a class with no 
invariants, it's probably useful for the author to explicitly acknowledge that 
they understand tearing risks.


What it gives up (without either a change in programming model, or compiler 
heroics), is the ability to correlate between user-written invariants and the 
corresponding atomicity constraints, which could guide users away from errors.  
Right?

Right. Could still do that if we wanted, but my opinion is that it's too much 
language surface for the scale of the problem. If we did have additional 
construction constraints, I'd prefer that atomic primitives allow full 
imperative construction logic & encapsulation.

This feels analogous to advanced typing analyses that might prove certain casts 
to be safe/unsafe. Sure, the language could try to be helpful by implementing 
that analysis, but it would add lots of complexity, and ultimately it's either 
a best-effort check or annoyingly restrictive.

On Apr 27, 2022, at 2:51 PM, Dan Heidinga <[email protected]> wrote:

I'm trying to understand how this refactoring fits the VM physics.

In particular, __non-atomic & __zero-ok fit together at the VM level
because the VM's natural state for non-atomic (flattened) data is zero
filled.  When those two items are decoupled, I'm unclear on what the
VM would offer in that case.  Thoughts?

How does "__non-atomic __non-id class B2a { }" fit with the "no new
nulls" requirements?

--Dan

On Wed, Apr 27, 2022 at 12:45 PM Brian Goetz <[email protected]> wrote:

Here’s some considerations for stacking the user model.  (Again, please let’s 
resist the temptation to jump to the answer and then defend it.)

We have a stacking today which says:

- B1 is ordinary identity classes, giving rise to a single reference type
- B2 are identity-free classes, giving rise to a single reference type
- B3 are flattenable identity-free classes, giving rise to both a reference 
(L/ref) and primitive (Q/val) type.

This stacking has some pleasant aspects.  B2 differs from B1 by “only one bit”: 
identity.  The constraints on B2 are those that come from the lack of identity 
(mutability, extensibility, locking, etc.)  B2 references behave like the 
object references we are familiar with; nullability, final field guarantees, 
etc.  B3 further makes reference-ness optional; reference-free B3 values give 
up the affordances of references: they are zero-default and tearable.  This 
stacking is nice because it can framed as a sequence of “give up some X, get 
some Y”.

People keep asking “do we need B2, or could we get away with B1/B3”.  The main 
reason for having this distinction is that some id-free classes have no 
sensible default, and so want to use null as their default.  This is a 
declaration-site property; B3 means that the zero value is reasonable, and use 
sites can opt into / out of  zero-default / nullity.  We’d love to compress 
away this bucket but forcing a zero on classes that can’t give it a reasonable 
interpretation is problematic.  But perhaps we can reduce the visibility of 
this in the model.

The degrees of freedom we could conceivably offer are

  { identity or not, zero-capable or not, atomic or not } x { use-site, 
declaration-site }

In actuality, not all of these boxes make sense (disavowing the identity of an 
ArrayList at the use site), and some have been disallowed by the stacking (some 
characteristics have been lumped.)  Here’s another way to stack the declaration:

- Some classes can disavow identity
- Identity-free classes can further opt into zero-default (currently, B3, 
polarity chosen at use site)
- Identity-free classes can further opt into tearability (currently, B3, 
polarity chosen at use site)

It might seem the sensible move here is to further split B3 into B3a and B3b 
(where all B3 support zero default, and a/b differ with regard to whether 
immediate values are tearable).  But that may not be the ideal stacking, 
because we want good flattening for B2 (and B3.ref) also.  Ideally, the 
difference between B2 and B3.val is nullity only (Kevin’s antennae just went 
up.)

So another possible restacking is to say that atomicity is something that has 
to be *opted out of* at the declaration site (and maybe also at the use site.)  
With deliberately-wrong syntax:

  __non-id class B2 { }

  __non-atomic __non-id class B2a { }

  __zero-ok __non-id  class B3 { }

  __non-atomic __zero-ok  __non-id class B3a { }

In this model, you can opt out of identity, and then you can further opt out of 
atomicity and/or null-default.  This “pulls up” the atomicity/tearaiblity to a 
property of the class (I’d prefer safe by default, with opt out), and makes 
zero-*capability* an opt-in property of the class.  Then for those that have 
opted into zero-capability, at the use site, you can select .ref (null) / .val 
(zero).  Obviously these all need better spellings.  This model frames specific 
capabilities as modifiers on the main bucket, so it could be considered either 
a two bucket, or a four bucket model, depending on how you look.

The author is in the best place to make the atomicity decision, since they know 
the integrity constraints.  Single field classes, or classes with only single 
field invariants (denominator != 0), do not need atomicity.  Classes with 
multi-field invariants do.

This differs from the previous stacking in that it moves the spotlight from 
_references_ and their properties, to the properties themselves.  It says to 
class writers: you should declare the ways in which you are willing to trade 
safety for performance; you can opt out of the requirement for references and 
nulls (saving some footprint) and atomicity (faster access).  It says to class 
*users*, you can pick the combination of characteristics, allowed by the 
author, that meet your needs (can always choose null default if you want, just 
use a ref.)

There are many choices here about “what are the defaults”.  More opting in at 
the declaration site might mean less need to opt in at the use site.  Or not.

(We are now in the stage which I call “shake the box”; we’ve named all the 
moving parts, and now we’re looking for the lowest-energy state we can get them 
into.)

Re: [External] : Re: User model stacking

Reply via email to