We also know that in the future...

So, let's pull on this string, because now we're talking about the right thing -- what Java do we want to have in the future, even if we can only take one step now.  First, today.

But before that, a digression on terminology.  While the terminology is not nailed down (and please, start a separate thread if you want to comment on that), the word “value” is problematic, but its hard to break the habit.  For purposes of this mail, a value is _any_ datum that can be stored in a variable: primitives, object references, and soon, instances of inline classes.  Similarly, the term “object reference” is problematic, because it is laden with overtones of identity.  So, for purposes of this mail:

 - value: any datum
 - inline class: what we use to call value classes
 - identity class: what we used to call classes
 - class: an identity or inline class
 - object instance: instance of a class, whether identity or inline
 - object reference: a reference to an identity class

A variable of type Object (or interface) may hold _either_ an object reference, or an instance of an inline class, or null (this is the confusing new thing). Note that all values are still passed by value: primitives, object references, and instances of inline classes. I’ll try to use these consistently, but I’ll likely fail.

Primitives* have a well-defined equivalence relation: do the two operands describe the exact same value (SAME==).  And it is super-useful.  And, it is really the only useful equivalence on primitives.  Conveniently, we have assigned this the operator `==`.  No one argues with this move.

Where things get dodgy is that objects (which historically have always been described through object references) have TWO well-defined, and useful, equivalence relations:

 - Do the two operands refer to the same object instance (SAME==), denoted by Object==;  - Are the two objects “equivalent” in the sense defined by their author, denoted by .equals().  Let’s call this “equivalence”.

Both are useful, so we can’t get rid of either.  Identity comparison has semantic uses (e.g., topology-aware code like IdentityHashMap, or comparing with sentinels in data structures).  It it also used as an optimization, a faster way to get to equality, and this optimization has unfortunately outlived its usefulness but not outlived its use.

Obviously equivalence is useful, and in most cases, the more generally useful of the two, but for better or worse, identity comparison got custody of the operator `==`. This might have been a questionable move, but it's what we've got, and we're surely not un-assigning this.

Taking primitives and objects together, despite the very visible seam between them, the == operator partially heals the seam by working across all types, and assigning a consistent meaning across all types: SAME== ("are you the exact same thing", where same-ness can incorporate identity.)  Some may feel this was a mistake or an accident of history, and it might have been, but the outcome has a sense to it: `==` has a consistent meaning (SAME==) over all data types.

The part that is uncomfortable is that what's been totalized is the less broadly useful equivalence.  We can be aware of this, and try to do better, but as I’ve observed before, wanting to fix mistakes of history often leads us into new, worse mistakes, so let’s not fixate on this.

I’ll note at this point (and come back to it later) than just as we have some control over what `==` means for inline instances, we _also_ have some control over what `.equals()` means for primitives.


OK, now we are adding inline classes to the mix.  Many of these, like Complex or Point, are like primitives -- they only have one sensible equality semantics -- do they represent the same number.  This is suitable for binding to ==, or .equals() — or better, both.

But there are also other values which are more complicated, because they contain potentially-but-not-necessarily-identityful data, like:

    inline class Holder { Object o; }

This is the conundrum of L-World.  (The irritating part is that these are the values we are spending all our time talking about, even though they will not be the most common ones.)

Like with classic objects, for such classes, of the two equivalence relations ("exactly the same", or "semantically the same"), the former is generally the less useful.  And so, were we rewriting history, we might bound the "good" syntax to .equals() here too, and relegated the less useful test to some other uglier API point or operator.  But again, let’s not let this distract us.

In the future, we’ll have primitives, identity objects, and inline objects, and we’d like not only to not have three things, but we’d like to not have two things.  So we’d like to have a total story for comparing them all.

Our story for primitives (but please, let’s not get too distracted on this now), is that primitives can be “boxed” to inline classes, which will be lighter-weight boxes than our current boxes.  And we can lift members and interfaces from the box to the primitives, so that (say) int can be seen to implement Comparable and Serializable, and have whatever methods the lightweight box has — such as equals().  Which means that equivalence interpretation can be totalized via Object::equals — primitives, identity objects, and inline objects can all have an equals() method.  And of course, for primitives, equals() and == will be the same* thing.

So, in the happy future, there will be a total operation that implements the desirable equality comparison.  (Which is important for specializable generic code, since this operation on a T must be available on all the types that can instantiate T.)

Or, as you say:

 don't use ==, use equals.

I agree, but here’s the difference in the approaches: we don’t have to punish == to make it less desirable; we can raise equals() up and make it more desirable.

But we’re not done with val==.  For the same reason that id== is still useful, if overused, on references, it is useful on values that hold potential references too.  Yes, it is unfortunate that the weaker claimant (SAME==) got the good syntax.  But we still need a way to denote this operation, and it would be even worse (IMO, far worse) than the status quo to say “well, we write SAME== for identity objects one way, but a different way for inline objects, even though you can put both in an Object."  So even given the above, it _still_ seems like a sensible (if not forced) move to extend the current meaning of == — SAME== — to the new types.  Then everything is total, and everything is consistent:

  - == means “are the two operands the same value" (indistinguishable);
  - equals() means “are the two operands semantically equivalent”

and both are total, working on primitives, references, and inline instances alike.  (As mentioned earlier, we can also later — but absolutely not now — explore whether equals() merits a better syntax.)

Your agenda here (which I agree with) is to lessen the importance of ==.  Where I disagree is that we should do so by making == harder to use.  Instead, I think we should do so by making the better alternatives easier to use, and educating people about the changed object model and performance reality.

(I’m still not sure whether exposing V <: Object, rather than V convertible-to Object, sets the right user model here — but that’s a separate discussion.)



*Curse you, NaN.

So, if you want to make this case, start over, and convince people that Object== is the root problem here.
Object== is not the root of the problem, Object== becomes a problem when we have decided lword, when at the end, every types is a subtype of Object, because this is what lworld is. == has been created with ad hoc polymorphism in mind (overload polymorphism is a better term BTW), let say your are in Java 1.0 time, you have a strong rift between objects and primitive types, and no super type in between them, the way be able to write polymorphic code is to use overloading, so you have println(Object)/println(int)/println(double) etc. But it's not enough, so in 1.1 you introduce the wrapper types, Integer, Double etc, because you can not write reflection code without being able to see a primitive value as an Object. Here, we are doing the opposite, since we have decided to use lworld, Object is the root of every things, indirect types obviously, inline types too. We also know that in the future, we don't want to stay in a 3 kinds of types world. So we have to retrofit primitive types to see them as inline types. By doing this, we are also saying that every types has now Object has its root type. In this brave new world, val== makes little sense, because it's introducing a new overload in a world where you have subtyping polymorphism so you don't need overload polymorphism anymore. For an indirect type, the way to test structural equality is to use equals(), if every types is a subtypes of Object, the logical move for me is to say, use equals() everywhere and to stop using ==. So having a useful val== or a useful Object== goes in the wrong direction, we should demote == and look to the future*. Rémi * and it's very intellectually satisfactory to have a solution which means that our users will have less thing to learn instead of more, i'm thrill that there will be a time where my students will be able to use .equals on a primitive types.

Reply via email to