Re: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo

Brian Goetz Mon, 25 Apr 2022 07:05:37 -0700

tl:dr; I find pretty much everything about this compelling.  And it comes at a 
good time, too, because now that we’ve figured out what we can deliver, we can 
figure out the sensible stacking of the object model.


As a refresher, recall that we’ve been loosely organizing classes into buckets:

Bucket 1 — good old identity classes.  

Bucket 2 — Identity classes, minus the identity.  This has some restrictions 
(no representational polymorphism, no mutability), but a B2 class is still a 
reference type.  That means it can be null (nullity is a property of 
references) and comes with all the existing guarantees of initialization safety 
(no tearing.)  This is the obvious migration target for value-based classes, 
and enables us to migrate things like Optional safely because we can preserve 
all of the intended semantics, keep the L descriptors, keep the name, handle 
nulls, etc.  (As it turns out, we can get more flattening than you might think 
out of these, even with nullity, but less than we’d ideally like.  I’ll write 
another mail about performance reality.)  

Bucket 3 — here’s where it gets a little fuzzier how we stack it.  Bucket 3 
drops reference-ness, or more precisely, gives you the option to drop 
reference-ness.  (And it is referenceness that enables nullability, and 
prevents tearing.)  A B3 class has two types, a “val” and a “ref” type, which 
have a relationship to each other that is not-coincidentally similar to 
int/Integer.  

I think we are all happy with Bucket 2; it has a single and understandable 
difference from B1, with clear consequences, it supports migration, has 
surprisingly good flattening *on the stack*, but doesn’t yet offer all th heap 
flattening we might want.  I have a hard time imagining this part of the design 
isn’t “done”, modulo syntax.  

I think we are all still bargaining with Bucket 3, because there is a certain 
amount of wanting to have the cake and eat it inherent in “codes like a class, 
works like an int.”  Who gets “custody of the good name” is part of it, but for 
me, the main question is “how do we let people get more flattening without 
fooling themselves into thinking that there aren’t additional concurrency risks 
(tearing).”  

But, let’s address Kevin’s arguments about who should get custody of the good 
name.  

That one class gives rise to two types is already weird, and creates 
opportunity for people to think that one is the “real” type and one is the 
“hanger on.”  Unfortunately, depending on which glasses you are wearing, the 
relationship inverts.  We see this with int and Integer.  From a user 
perspective, int is usually the real type, and Integer is this weird 
compatibility shim.  But when you look at their class literals, for example, 
Integer.class is a fully functional class literal, with member lookup and 
operational access, but int.class is the weird compatibility shim.  The 
int.class literal is only useful for reflecting over descriptors with primitive 
types, but does none of the other things reflection does.  This should be a 
hint that there’s a custody battle brewing.  

In the future world, which of these declarations do we expect to see?  

    public final class Integer { … }

or

    public mumble value class int { … }

The tension is apparent here too; I think most Java developers would hope that, 
were we writing the world from scratch, that we’d declare the latter, and then 
do something to associate the compatibility shim with the real type.  (Whatever 
we do, we still need an Integer.class on our class path, because existing code 
will want to load it.)  This tension carries over into how we declare Complex; 
are we declaring the “box”, or are we declaring the primitive?  

Let’s state the opposing argument up front, because it was our starting point: 
having to say “Complex.val” for 99% of the utterances of Complex would likely 
be perceived as “boy those Java guys love their boilerplate” (call this the 
“lol java” argument for short.)  But, since then, our understanding of how this 
will all actually work has evolved, so it is appropriate to question whether 
this argument still holds the weight we thought it did at the outset.  

> 1. The option with fewer hazards should usually be the default. Users won't 
> opt themselves into extra safety, but they will sometimes opt out of it. 
> Here, the value type is the one that has attendant risks -- risk of a bad 
> default value, risk of a bad torn value. We want using `Foo.val` to *feel 
> like* cracking open the shell of a `Foo` object and using its innards 
> directly. But if it's spelled as plain `Foo` it won't "feel like" anything at 
> all.

Let me state it more strongly: unboxed “primitives” are less safe.  Despite all 
the efforts from the brain trust, the computational physics still points us 
towards “the default is zero, even if you don’t like that value” and “these 
things can tear under race, even though they resemble immutable objects, which 
don’t.”  The insidious thing about tearing is that it is only exhibited in 
subtly broken programs.  The “subtly” part is the really bad part.  So we have 
four broad options:

 - neuter primitives so they are always as safe as we might naively hope, which 
will result in either less performance or a worse programming model;
 - keep a strong programming model, but allow users to trade some safety (which 
non-broken programs won’t suffer for) with an explicit declaration-site and/or 
use-site opt-in (“.val”)
 - same, but try to educate users about the risk of tearing under data race 
(good luck)
 - decide the tradeoff is impossible, and keep the status quo

The previous stake in the ground was #3; you are arguing towards #2.  

> 2. In the current plan a `Foo.ref` should be a well-behaved bucket 2 object. 
> But it sure looks like that `.ref` is specifically telling it NOT to be -- 
> like it's saying "no, VM, *don't* optimize this to be a value even if you 
> can!" That's of course not what we mean. With the change I'm proposing, 
> `Foo.val` does make sense: it's just saying "hey runtime, while you already 
> *might* have represented this as a value, now I'm demanding that you 
> *definitely* do". That's a normal kind of a thing to do.

A key aspect of this is the bike shed tint; .val is not really the right 
indicator  given that the reference type is also a “value class”.  I think 
we’re comfortable giving the “value” name to the whole family of identity-free 
classes, which means that .val needs a new name.  Bonus points if the name 
connotes “having burst free of the constraints of reference-hood”: unbound, 
loose, exploded, compound value, etc.  And also is pretty short.  

> 3. This change would permit compatible migration of an id-less to primitive 
> class. It's a no-op, and use sites are free to migrate to the value type if 
> and when ready. And if they already expose the type in their API, they are 
> free to weigh the costs/benefits of foisting an incompatible change onto 
> *their* users. They have facilities like method deprecation to do it with. In 
> the current plan, this all seems impossible; you would have to fix all your 
> problematic call sites *atomically* with migrating the class.

This is one of my favorite aspects of this direction.  If you recall, you were 
skeptical from the outset about migrating classes in place at all; the previous 
stake in the ground said “well, they can migrate to value classes, but will 
never be able to shed their null footprint or get ultimate flattening.”  With 
this, we can migrate easily from VBC to B2 with no change in client code, and 
then _further_ have a crack at migrating to full flatness inside the 
implementation capsule. That’s sweet.  

> 4. It's much (much) easier on the mental model because *every (id-less) class 
> works in the exact same way*. Some just *also* give you something extra, 
> that's all. This pulls no rugs out from under anyone, which is very very good.
> 
> 5. The two kinds of types have always been easily distinguishable to date. 
> The current plan would change that. But they have important differences 
> (nullability vs. the default value chief among them) just as Long and long 
> do, and users will need to distinguish them. For example you can spot the 
> redundant check easily in `Foo.val foo = ...; / requireNonNull(foo);`.

It is really nice that *any* unadorned identifier is immediately recognizable 
as being a reference, with all that entails — initialization safety and 
nullity.  The “mental database” burden is lower, because Foo is always a 
reference, and Foo.whatever is always direct/immediate/flat/whatever.  

> 6. It's very nice when the *new syntax* corresponds directly to the *new 
> thing*. That is, until a casual developer *sees* `.val` for the first time, 
> they won't have to worry about it.
> 
> 7. John seemed to like my last fruit analogy, so consider these two 
> equivalent fruit stand signs:
> 
> a) "for $1, get one apple OR one orange . . . with every orange purchased you 
> must also take a free apple"
> b) "apples $1 . . . optional free orange with each purchase"
> 
> Enough said I think :-)
> 
> 8. The predefined primitives would need less magic. `int` simply acts like a 
> type alias for `Integer.val`, simple as that. This actually shows that the 
> whole feature will be easier to learn because it works very nearly how people 
> already know primitives to work. Contrast with: we hack it so that what would 
> normally be called `Integer` gets called `int` and what normally gets called 
> `Integer.ref` or maybe `int.ref` gets called `Integer` ... that is much 
> stranger.

One more: the .getClass() anomaly goes away.  

If we have 

    mumble primitive mumble Complex { … }

    Complex.val c = …

then what do we get when we ask c for its getClass?  The physics again point us 
at returning Complex.ref.class, not  Complex.val.class, but under the old 
scheme, where the val projection gets the good name, it would seem anomalous, 
since we ask a val for its class and get the ref mirror.  But under the Kevin 
interpretation, we can say “well, the CLASS is Complex, so if you ask 
getClass(), you get Complex.class.“

Re: [External] Foo / Foo.ref is a backward default; should be Foo.val / Foo

Reply via email to