tl:dr; I find pretty much everything about this compelling. And it comes at a
good time, too, because now that we’ve figured out what we can deliver, we can
figure out the sensible stacking of the object model.
As a refresher, recall that we’ve been loosely organizing classes into buckets:
Bucket 1 — good old identity classes.
Bucket 2 — Identity classes, minus the identity. This has some restrictions
(no representational polymorphism, no mutability), but a B2 class is still a
reference type. That means it can be null (nullity is a property of
references) and comes with all the existing guarantees of initialization safety
(no tearing.) This is the obvious migration target for value-based classes,
and enables us to migrate things like Optional safely because we can preserve
all of the intended semantics, keep the L descriptors, keep the name, handle
nulls, etc. (As it turns out, we can get more flattening than you might think
out of these, even with nullity, but less than we’d ideally like. I’ll write
another mail about performance reality.)
Bucket 3 — here’s where it gets a little fuzzier how we stack it. Bucket 3
drops reference-ness, or more precisely, gives you the option to drop
reference-ness. (And it is referenceness that enables nullability, and
prevents tearing.) A B3 class has two types, a “val” and a “ref” type, which
have a relationship to each other that is not-coincidentally similar to
int/Integer.
I think we are all happy with Bucket 2; it has a single and understandable
difference from B1, with clear consequences, it supports migration, has
surprisingly good flattening *on the stack*, but doesn’t yet offer all th heap
flattening we might want. I have a hard time imagining this part of the design
isn’t “done”, modulo syntax.
I think we are all still bargaining with Bucket 3, because there is a certain
amount of wanting to have the cake and eat it inherent in “codes like a class,
works like an int.” Who gets “custody of the good name” is part of it, but for
me, the main question is “how do we let people get more flattening without
fooling themselves into thinking that there aren’t additional concurrency risks
(tearing).”
But, let’s address Kevin’s arguments about who should get custody of the good
name.
That one class gives rise to two types is already weird, and creates
opportunity for people to think that one is the “real” type and one is the
“hanger on.” Unfortunately, depending on which glasses you are wearing, the
relationship inverts. We see this with int and Integer. From a user
perspective, int is usually the real type, and Integer is this weird
compatibility shim. But when you look at their class literals, for example,
Integer.class is a fully functional class literal, with member lookup and
operational access, but int.class is the weird compatibility shim. The
int.class literal is only useful for reflecting over descriptors with primitive
types, but does none of the other things reflection does. This should be a
hint that there’s a custody battle brewing.
In the future world, which of these declarations do we expect to see?
public final class Integer { … }
or
public mumble value class int { … }
The tension is apparent here too; I think most Java developers would hope that,
were we writing the world from scratch, that we’d declare the latter, and then
do something to associate the compatibility shim with the real type. (Whatever
we do, we still need an Integer.class on our class path, because existing code
will want to load it.) This tension carries over into how we declare Complex;
are we declaring the “box”, or are we declaring the primitive?
Let’s state the opposing argument up front, because it was our starting point:
having to say “Complex.val” for 99% of the utterances of Complex would likely
be perceived as “boy those Java guys love their boilerplate” (call this the
“lol java” argument for short.) But, since then, our understanding of how this
will all actually work has evolved, so it is appropriate to question whether
this argument still holds the weight we thought it did at the outset.
> 1. The option with fewer hazards should usually be the default. Users won't
> opt themselves into extra safety, but they will sometimes opt out of it.
> Here, the value type is the one that has attendant risks -- risk of a bad
> default value, risk of a bad torn value. We want using `Foo.val` to *feel
> like* cracking open the shell of a `Foo` object and using its innards
> directly. But if it's spelled as plain `Foo` it won't "feel like" anything at
> all.
Let me state it more strongly: unboxed “primitives” are less safe. Despite all
the efforts from the brain trust, the computational physics still points us
towards “the default is zero, even if you don’t like that value” and “these
things can tear under race, even though they resemble immutable objects, which
don’t.” The insidious thing about tearing is that it is only exhibited in
subtly broken programs. The “subtly” part is the really bad part. So we have
four broad options:
- neuter primitives so they are always as safe as we might naively hope, which
will result in either less performance or a worse programming model;
- keep a strong programming model, but allow users to trade some safety (which
non-broken programs won’t suffer for) with an explicit declaration-site and/or
use-site opt-in (“.val”)
- same, but try to educate users about the risk of tearing under data race
(good luck)
- decide the tradeoff is impossible, and keep the status quo
The previous stake in the ground was #3; you are arguing towards #2.
> 2. In the current plan a `Foo.ref` should be a well-behaved bucket 2 object.
> But it sure looks like that `.ref` is specifically telling it NOT to be --
> like it's saying "no, VM, *don't* optimize this to be a value even if you
> can!" That's of course not what we mean. With the change I'm proposing,
> `Foo.val` does make sense: it's just saying "hey runtime, while you already
> *might* have represented this as a value, now I'm demanding that you
> *definitely* do". That's a normal kind of a thing to do.
A key aspect of this is the bike shed tint; .val is not really the right
indicator given that the reference type is also a “value class”. I think
we’re comfortable giving the “value” name to the whole family of identity-free
classes, which means that .val needs a new name. Bonus points if the name
connotes “having burst free of the constraints of reference-hood”: unbound,
loose, exploded, compound value, etc. And also is pretty short.
> 3. This change would permit compatible migration of an id-less to primitive
> class. It's a no-op, and use sites are free to migrate to the value type if
> and when ready. And if they already expose the type in their API, they are
> free to weigh the costs/benefits of foisting an incompatible change onto
> *their* users. They have facilities like method deprecation to do it with. In
> the current plan, this all seems impossible; you would have to fix all your
> problematic call sites *atomically* with migrating the class.
This is one of my favorite aspects of this direction. If you recall, you were
skeptical from the outset about migrating classes in place at all; the previous
stake in the ground said “well, they can migrate to value classes, but will
never be able to shed their null footprint or get ultimate flattening.” With
this, we can migrate easily from VBC to B2 with no change in client code, and
then _further_ have a crack at migrating to full flatness inside the
implementation capsule. That’s sweet.
> 4. It's much (much) easier on the mental model because *every (id-less) class
> works in the exact same way*. Some just *also* give you something extra,
> that's all. This pulls no rugs out from under anyone, which is very very good.
>
> 5. The two kinds of types have always been easily distinguishable to date.
> The current plan would change that. But they have important differences
> (nullability vs. the default value chief among them) just as Long and long
> do, and users will need to distinguish them. For example you can spot the
> redundant check easily in `Foo.val foo = ...; / requireNonNull(foo);`.
It is really nice that *any* unadorned identifier is immediately recognizable
as being a reference, with all that entails — initialization safety and
nullity. The “mental database” burden is lower, because Foo is always a
reference, and Foo.whatever is always direct/immediate/flat/whatever.
> 6. It's very nice when the *new syntax* corresponds directly to the *new
> thing*. That is, until a casual developer *sees* `.val` for the first time,
> they won't have to worry about it.
>
> 7. John seemed to like my last fruit analogy, so consider these two
> equivalent fruit stand signs:
>
> a) "for $1, get one apple OR one orange . . . with every orange purchased you
> must also take a free apple"
> b) "apples $1 . . . optional free orange with each purchase"
>
> Enough said I think :-)
>
> 8. The predefined primitives would need less magic. `int` simply acts like a
> type alias for `Integer.val`, simple as that. This actually shows that the
> whole feature will be easier to learn because it works very nearly how people
> already know primitives to work. Contrast with: we hack it so that what would
> normally be called `Integer` gets called `int` and what normally gets called
> `Integer.ref` or maybe `int.ref` gets called `Integer` ... that is much
> stranger.
One more: the .getClass() anomaly goes away.
If we have
mumble primitive mumble Complex { … }
Complex.val c = …
then what do we get when we ask c for its getClass? The physics again point us
at returning Complex.ref.class, not Complex.val.class, but under the old
scheme, where the val projection gets the good name, it would seem anomalous,
since we ask a val for its class and get the ref mirror. But under the Kevin
interpretation, we can say “well, the CLASS is Complex, so if you ask
getClass(), you get Complex.class.“