Re: Updated SoV, take 3

Brian Goetz Thu, 28 Jul 2022 11:24:36 -0700


    Java currently has eight built-in primitive types. Primitives
    represent pure
    _values_; any `int` value of "3" is equivalent to, and
    indistinguishable from,
    any other `int` value of "3".  Because primitives are "just their
    bits" with no
    ancillarly state such as object identity, they are _freely
    copyable_; whether
    there is one copy of the `int` value "3", or millions, doesn't
    matter to the
    execution of the program.  With the exception of the unusual
    treatment of exotic
    floating point values such as `NaN`, the `==` operator on
    primitives performs a
    _substitutibility test_ -- it asks "are these two values the same
    value".

I've said this before, but I think both "substitutability" and"sameness" just lead to more questions, and I'm not sure why we don'tappeal to distinguishability instead.

Fair. Substitutibility is neither a commonly understood concept, nor isit an official term in the spec, so happy to change this to somethingmore intuitive. That said, I'm not sure why you're down on "sameness"?

    Java also has _objects_, and each object has a unique _object
    identity_.  This
    means that each object must live in exactly one place (at any
    given time), and
    this has consequences for how the JVM lays out objects in memory. 
    Objects in
    Java are not manipulated or accessed directly, but instead through
    _object
    references_.  Object references are also a kind of value -- they
    encode the
    identity of the object to which they refer,
Do we really want to invoke identity here? That surprises me. Thatsuggests that a `ValueClass.ref` instance will have identity too.Isn't it really only about the object being addressable or locatable(some term like that)?


Will adjust; this is more of an implementation detail anyway.


    This says that an `Point` is a class whose instances have no
    identity.  As a
    consequence, it must give up the things that depend on identity;
    the class and
    its fields are implicitly final.  Additionally, operations that
    depended on
    identity must either be adjusted (`==` on value objects compares
    state, not
    identity) or disallowed (it is illegal to lock on a value object.)

Just for broad understandability, you might want to address here "butthen how could a reference 'identify' what object it's pointing to?"

Indeed, this is a tricky new concept; a reference to a thing that is notnecessarily unique, but for which we can't distinguish between copies.

    Value classes can still have most of the affordances of classes --
    fields,
    methods, constructors, type parameters, superclasses (with some
    restrictions),
    nested classes, class literals, interfaces, etc.  The classes they
    can extend
    are restricted: `Object` or abstract classes with no instance
    fields, empty
    no-arg constructor bodies, no other constructors, no instance
    initializers, no
    synchronized methods, and whose superclasses all meet this same set of
    conditions.  (`Number` is an example of such an abstract class.)

    Because `Point` has value semantics, `==` compares by state rather
    than
    identity.  This means that value objects, like primitives, are _freely
    copyable_; we can explode them into their fields and re-aggregate
    them into
    another value object, and we cannot tell the difference.
It feels like if this wants to rest some stuff on "comparing by state"it ought to explain here what that means? Or, I guess at least aforward reference.It seems pretty important to understand that it means shallowfieldwise delegation back to `==` again, meaning that fields ofidentity types are still identity-compared.In many contexts "value semantics" and "comparing by state" tend toonly make sense if done recursively/deeply.

It's worse than that, because references to value objects get a deepercomparison than refs to identity objects. I'll stay away fromshallow/deep, but talk about fieldwise equivalence.

    ### Migration

    The JDK (as well as other libraries) has many [value-based
    classes][valuebased]
    such as `Optional` and `LocalDateTime`.  Value-based classes
    adhere to the
    semantic restrictions of value classes, but are still identity
    classes -- even
    though they don't want to be.  Value-based classes can be migrated
    to true value
    classes simply by redeclaring them as value classes, which is both
    source- and
    binary-compatible.
This gave me a slight "huh, then what's the catch?" reaction. It mightmake more sense by adding the fact right away that any errant usages(that don't adhere to the VBC requirements) will start failing atruntime, and might cause compilation warnings?


The catch is twofold:

- Clients that depend on that accidental identity despite the warningsigns are in for a surprise (hello, Integer);

 - The ref companion gets the good name, which will surely annoy people

The former should be viewed as an anti-catch, but not everyone will seeit that way. The latter will surely be spun as "why do you guys hateyour users." For which we'll tell them it was Kevin's idea.

    We plan to migrate many value-based classes in the JDK to value
    classes.
    Additionally, the primitive wrappers can be migrated to value
    classes as well,
    making the conversion between `int` and `Integer` cheaper; see
    "Migrating the
    legacy primitives" below.  (In some cases, this may be _behaviorally_
    incompatible for code that synchronizes on the primitive
    wrappers.  [JEP
    390][jep390] has supported both compile-time and runtime warnings for
    synchronizing on primitive wrappers since Java 16.)
Putting this in parens under the topic of the primitive wrappers feelslike "pulling a fast one". Like it's pretending that thisincompatibility problem is somehow unique to those 8 classes, hopingpeople won't notice "wait a minute, *any* class hopeful of futuremigration would have the same desire to opt into such warnings inadvance." (And for more than just synchronization.) I get that thereis no current plan to solve that problem, but we could be moreup-front about that?

I think it is just these eight classes, since in Java 8, we wrote thisinto the definition of value-based class (but couldn't back-apply thatdefinition to these eight.) But I can drop the parens if that helps :)

    Value classes are generalizations of primitives.  Since primitives
    have a
    reference companion type, value classes actually give rise to
    _pairs_ of types:
    a value type and a reference type.  We've seen the reference type
    already; for
    the value class `ArrayCursor`, the reference type is called
    `ArrayCursor`, just
    as with identity classes.  The full name for the reference type is
    `ArrayCursor.ref`; `ArrayCursor` is just a convenient alias for
    that.  (This
    aliasing is what allows value-based classes to be compatibly
    migrated to value
    classes.)
It's more than just that: it's what unifies all classes together! Theyall define a reference type, always with the same name as the class.That's nice, unchanging solid ground under our feet while all theValhalla shifts are going on.
It would make more sense to me if `ArrayCursor.ref` were the alias to`ArrayCursor`, and it would be appropriate for the reader to wonder"why do we even need that alias?".

Yes, and the answer is "we almost don't", except for type variables(T.ref).

    The value type is called `ArrayCursor.val`, and the two types have the
    same conversions between them as primitives do today with their
    boxes.  The
    default value of the value type is the one for which all fields
    take on their
    default value; the default value of the reference type is, like
    all reference
    types, null.  We will refer to the value type of a value class as
    the _value
    companion type_.
... because it acts as a companion to the reference type you've alwaysknown.(At least, *I* still really don't want people to think that both thevalue type and the reference types are "companions" to the class thatdefined them.)

I am thinking they companions to each other, we can be more explicitabout this.

    Both the reference and value companion types have the same members.
Maybe worth acknowledging "(even those, like `wait()` inherited from`Object`, that don't make sense and will fail at runtime, forsimplicity's sake)".

It is not clear how pedantic to be here. Do they have the same members,or are the members all on the ref type, and we just provide a convenientsyntax / fast implementations for vals as receivers? The latter iscloser to reality, but does that explanation help?

I think it is worth acknowledging that this does lead to`5.toString()` becoming valid and functioning code, which happens justfor consistency and not because it was a goal in itself.

OK. Another good thing that happens here is that we can write equals()methods uniformly:


    return o instanceof Foo f &&
        i.equals(f.i) && name.equals(f.name);

and not have to worry about "is this a ref or a primitive". Just useequals everywhere.



    Arrays of reference types are _covariant_; this means that if `A
    <: B`, then
    `A[] <: B[]`.  This allows `Object[]` to be the "top array type"
    -- but only for
    arrays of references.  Arrays of primitives are currently left out
    of this
    story.   We unify the treatment of arrays by defining array
    covariance over the
    new "extends" relationship; if A _extends_ B, then `A[] <: B[]`. 
    This means
    that for a value class P, `P.val[] <: P.ref[] <: Object[]`; when
    we migrate the
    primitive types to be value classes, then `Object[]` is finally
    the top type for
    all arrays.  (When the built-in primitives are migrated to value
    classes, this
    means `int[] <: Integer[] <: Object[]` too.)

I think it's worth addressing that this does mean there will be`Integer[]` and `Object[]` instances that can't store null, failing atruntime, but that this is consistent with the existing quirks of arraycovariance.


Yep, same ASE

    The base implementation of `Object::equals` delegates to `==`,
    which is a
    suitable default for both reference and value classes.
This is where you could appeal to the idea that `==` has always meant"strictly indistinguishable by any means" and this preserves thatmeaning (modulo float/double weirdness).

Yep

    ### Serialization

    If a value class implements `Serializable`, this is also really a
    statement
    about the reference type.  Just as with other aspects described here,
    serialization of value companions can be defined by converting to the
    corresponding reference type and serializing that, and reversing
    the process at
    deserialization time.
It's nonobvious to me why the reference type is being elevated as theprimary one here, except that of course a method like `writeObject` isonly going to be fed the reference type. I would have expected justthat serializability applies equally to both types in the same way,much like invoking some method on both types.

It's a lot like members; we can define them to be the same on both, orwe can define them to live on the ref. A lot of things are simpler withthe latter, but its not clear readers of this doc need to understand allthat.

    The built-in primitives reflect the design assumption that zero is
    a reasonable
    default.  The choice to use a zero default for uninitialized
    variables was one
    of the central tradeoffs in the design of the built-in
    primitives.  It gives us
    a usable initial value (most of the time), and requires less
    storage footprint
    than a representation that supports null (`int` uses all 2^32 of
    its bit
    patterns, so a nullable `int` would have to either make some 32
    bit signed
    integers unrepresentable, or use a 33rd bit).  This was a
    reasonable tradeoff
    for the built-in primitives, and is also a reasonable tradeoff for
    many other
    potential value classes (such as complex numbers, 2D points,
    half-floats, etc).
You might not want to go into the following. But I hope that userswill understand that the numeric types really do clear a pretty highbar here. They are fortunate that for the *two* most popular reductionoperations over those types, zero happens to be the correct identityfor one of them, and absolutely destructive to the other (i.e., makingit at least easy to detect the bug). If not for *both* of those factswe would have more and worse bugs in the world.

Yeah, it's not obvious how much algebra is helpful here. I mostly wantto make the point that zero wasn't chosen at random; its the default youactually want, and if you got null, you probably wouldn't like it asmuch. Agree about the high bar; Jan 1 1970 doesn't clear that bar.


    But for other potential value classes, such as `LocalDate`, there
    simply _is_ no
    reasonable default.  If we choose to represent a date as the
    number of days
    since some some epoch, there will invariably be bugs that stem from
    uninitialized dates; we've all been mistakenly told by computers
    that something
    that never happened actually happened on or near 1 January 1970. 
    Even if we
    could choose a default other than the zero representation as a
    default, an
    uninitialized date is still likely to be an error -- there simply
    is no good
    default date value.

    For this reason, value classes have the choice of _encapsulating_
    their value
    companion type.  If the class is willing to tolerate an
    uninitialized (zero)
    value, it can freely share its `.val` companion with the world; if
    uninitialized
    values are dangerous (such as for `LocalDate`), the value
    companion can be
    encapsulated to the class or package, and clients can use the
    reference
    companion.  Encapsulation is accomplished using ordinary access
    control.  By
    default, the value companion is `private` to the value class (it
    need not be
    declared explicitly); a class that wishes to share its value
    companion more
    broadly can do so by declaring it explicitly:

    ```
    public value record Complex(double real, double imag) {
        public value companion Complex.val;
    }
    ```

I think you should add that the name `Complex.val` can't be changedhere, much like you can't change the name of a constructor even thoughit *looks* like you could.

I keep hoping that we'll come up with a brilliant replacement for X.valbefore that....

Re: Updated SoV, take 3

Reply via email to