Re: Equality for values -- new analysis, same conclusion

Brian Goetz Tue, 20 Aug 2019 10:14:47 -0700

We also know that in the future...

So, let's pull on this string, because now we're talking about the rightthing -- what Java do we want to have in the future, even if we can onlytake one step now. First, today.

But before that, a digression on terminology. While the terminology isnot nailed down (and please, start a separate thread if you want tocomment on that), the word “value” is problematic, but its hard to breakthe habit. For purposes of this mail, a value is _any_ datum that canbe stored in a variable: primitives, object references, and soon,instances of inline classes. Similarly, the term “object reference” isproblematic, because it is laden with overtones of identity. So, forpurposes of this mail:


 - value: any datum
 - inline class: what we use to call value classes
 - identity class: what we used to call classes
 - class: an identity or inline class
 - object instance: instance of a class, whether identity or inline
 - object reference: a reference to an identity class

A variable of type Object (or interface) may hold _either_ an objectreference, or an instance of an inline class, or null (this isthe confusing new thing). Note that all values are still passed byvalue: primitives, object references, and instances of inline classes.I’ll try to use these consistently, but I’ll likely fail.

Primitives* have a well-defined equivalence relation: do the twooperands describe the exact same value (SAME==). And it issuper-useful. And, it is really the only useful equivalence onprimitives. Conveniently, we have assigned this the operator `==`. Noone argues with this move.

Where things get dodgy is that objects (which historically have alwaysbeen described through object references) have TWO well-defined, anduseful, equivalence relations:

- Do the two operands refer to the same object instance (SAME==),denoted by Object==; - Are the two objects “equivalent” in the sense defined by theirauthor, denoted by .equals(). Let’s call this “equivalence”.

Both are useful, so we can’t get rid of either. Identity comparison hassemantic uses (e.g., topology-aware code like IdentityHashMap, orcomparing with sentinels in data structures). It it also used as anoptimization, a faster way to get to equality, and this optimization hasunfortunately outlived its usefulness but not outlived its use.

Obviously equivalence is useful, and in most cases, the more generallyuseful of the two, but for better or worse, identity comparisongot custody of the operator `==`. This might have been a questionablemove, but it's what we've got, and we're surely not un-assigning this.

Taking primitives and objects together, despite the very visible seambetween them, the == operator partially heals the seam by working acrossall types, and assigning a consistent meaning across all types: SAME==("are you the exact same thing", where same-ness can incorporateidentity.) Some may feel this was a mistake or an accident of history,and it might have been, but the outcome has a sense to it: `==` has aconsistent meaning (SAME==) over all data types.

The part that is uncomfortable is that what's been totalized is the lessbroadly useful equivalence. We can be aware of this, and try to dobetter, but as I’ve observed before, wanting to fix mistakes of historyoften leads us into new, worse mistakes, so let’s not fixate on this.

I’ll note at this point (and come back to it later) than just as we havesome control over what `==` means for inline instances, we _also_ havesome control over what `.equals()` means for primitives.

OK, now we are adding inline classes to the mix. Many of these, likeComplex or Point, are like primitives -- they only have one sensibleequality semantics -- do they represent the same number. This issuitable for binding to ==, or .equals() — or better, both.

But there are also other values which are more complicated, because theycontain potentially-but-not-necessarily-identityful data, like:


    inline class Holder { Object o; }

This is the conundrum of L-World. (The irritating part is that theseare the values we are spending all our time talking about, even thoughthey will not be the most common ones.)

Like with classic objects, for such classes, of the two equivalencerelations ("exactly the same", or "semantically the same"), the formeris generally the less useful. And so, were we rewriting history, wemight bound the "good" syntax to .equals() here too, and relegated theless useful test to some other uglier API point or operator. Butagain, let’s not let this distract us.

In the future, we’ll have primitives, identity objects, and inlineobjects, and we’d like not only to not have three things, but we’d liketo not have two things. So we’d like to have a total story forcomparing them all.

Our story for primitives (but please, let’s not get too distracted onthis now), is that primitives can be “boxed” to inline classes, whichwill be lighter-weight boxes than our current boxes. And we can liftmembers and interfaces from the box to the primitives, so that (say) intcan be seen to implement Comparable and Serializable, and have whatevermethods the lightweight box has — such as equals(). Which means thatequivalence interpretation can be totalized via Object::equals —primitives, identity objects, and inline objects can all have anequals() method. And of course, for primitives, equals() and == will bethe same* thing.

So, in the happy future, there will be a total operation that implementsthe desirable equality comparison. (Which is important forspecializable generic code, since this operation on a T must beavailable on all the types that can instantiate T.)


Or, as you say:

 don't use ==, use equals.

I agree, but here’s the difference in the approaches: we don’t have topunish == to make it less desirable; we can raise equals() up and makeit more desirable.

But we’re not done with val==. For the same reason that id== is stilluseful, if overused, on references, it is useful on values that holdpotential references too. Yes, it is unfortunate that the weakerclaimant (SAME==) got the good syntax. But we still need a way todenote this operation, and it would be even worse (IMO, far worse) thanthe status quo to say “well, we write SAME== for identity objects oneway, but a different way for inline objects, even though you can putboth in an Object." So even given the above, it _still_ seems like asensible (if not forced) move to extend the current meaning of == —SAME== — to the new types. Then everything is total, and everything isconsistent:


  - == means “are the two operands the same value" (indistinguishable);
  - equals() means “are the two operands semantically equivalent”

and both are total, working on primitives, references, and inlineinstances alike. (As mentioned earlier, we can also later —but absolutely not now — explore whether equals() merits a better syntax.)

Your agenda here (which I agree with) is to lessen the importance of ==. Where I disagree is that we should do so by making == harder to use. Instead, I think we should do so by making the better alternativeseasier to use, and educating people about the changed object model andperformance reality.

(I’m still not sure whether exposing V <: Object, rather than Vconvertible-to Object, sets the right user model here — but that’s aseparate discussion.)




*Curse you, NaN.

So, if you want to make this case, start over, and convince peoplethat Object== is the root problem here.
Object== is not the root of the problem, Object== becomes a problemwhen we have decided lword, when at the end, every types is a subtypeof Object, because this is what lworld is. == has been created with adhoc polymorphism in mind (overload polymorphism is a better term BTW),let say your are in Java 1.0 time, you have a strong rift betweenobjects and primitive types, and no super type in between them, theway be able to write polymorphic code is to use overloading, so youhave println(Object)/println(int)/println(double) etc. But it's notenough, so in 1.1 you introduce the wrapper types, Integer, Doubleetc, because you can not write reflection code without being able tosee a primitive value as an Object. Here, we are doing the opposite,since we have decided to use lworld, Object is the root of everythings, indirect types obviously, inline types too. We also know thatin the future, we don't want to stay in a 3 kinds of types world. Sowe have to retrofit primitive types to see them as inline types. Bydoing this, we are also saying that every types has now Object has itsroot type. In this brave new world, val== makes little sense, becauseit's introducing a new overload in a world where you have subtypingpolymorphism so you don't need overload polymorphism anymore. For anindirect type, the way to test structural equality is to use equals(),if every types is a subtypes of Object, the logical move for me is tosay, use equals() everywhere and to stop using ==. So having a usefulval== or a useful Object== goes in the wrong direction, we shoulddemote == and look to the future*. Rémi * and it's very intellectuallysatisfactory to have a solution which means that our users will haveless thing to learn instead of more, i'm thrill that there will be atime where my students will be able to use .equals on a primitive types.

Re: Equality for values -- new analysis, same conclusion

Reply via email to