Great summary of the options.

For those who didn't read the whole thing:
 - CE is bitwise equality -- "are these two things identical copies"
 - OE is calling Object.equals()
- NE (for values) is the synthetic "recurse with == on primitive components, NE on value components, and OE on reference components"

If it were 1995, and we were inventing Java (and we didn't have our heads addled with an interpreter-based cost model), what would we do? I think we'd bind ==(ref,ref) to OE, with an (uglier-named) API point for CE (e.g., Objects.isSameReference) which would be used (a) for known-interned things, (b) for IdentityHashMap, (c) as a default implementation of Object.equals(), and (d) possibly as a short-circuiting optimization *inside* overrides of equals().

This hypothetical world (call it J') still gives users the choice of CE vs OE whenever they want, while nudging users towards OE (by giving it the prime syntactic real estate) which is probably what they want most of the time.

Why didn't we do this in 1995? Hard to know (I'll ask James next time I see him), but I'd posit two main forces:

- C bias. Since C has *only* CE (and it was desirable to make Java feel like "a safer C") it probably seemed like a big improvement already to offer programmers both CE and OE on all references, and binding == to OE probably seemed too radical at the time.

- Cost-model bias. In the Java 1.0 days, pointer comparison was probably 100x faster in the interpreter than a virtual call to Object.equals(). If binding == to OE was even considered, it was probably deemed implausible.

Of course, both of these feel a bit silly 20 years later, but here we are. So, in a J' world, what would we do with ==(val,val)? I think it would be a no-brainer -- bind it to NE, since Java developers would already associate == with a deeper comparison. Then we'd just have to adjust whatever the API point for CE is to also accomodate CE on values, and we'd be done.

But, we don't live in J' world.  So our choices become:

P1: Bind ==(val,val) to CE, as we do with refs. Optimization challenges with the usual (a==b || a.equals(b)) idiom [1], but the rules work the same for values and refs.

P2: Bind ==(val,val) to NE. This is J' world for values and J world for refs. (With even bigger optimization challenges for the (a==b || a.equals(b)) idiom.) Rules are different for values and refs, meaning (a) users will have to keep in mind which world they're in, (b) when migrating a class from ref to value they'll have to find and update all equality comparisons (!), (c) writing code that's generic over values and refs has to use an idiom that works on both, (d) when migrating code from ref-generic to any-generic, inspect every equality comparison to make sure it's still what was intended.

P3: Add a new equality operator. I've already been laughed at enough, thank you.

P4: Ban ==(val,val). This might be fine in value-only code, but it complicates writing generic code, especially migrating generic code.


[1] John points out that if == is CE, then (a==b||a.equals(b)) will redundantly load the fields on failed ==. But, many equals implementations start with "a==b" as a short-circuiting optimization, which means "a==b" will be a common (pure) subexpression in the resulting expansion (and for values, methods are monomorphic and will get inlined more frequently), so the two checks can be collapsed.


Going back to op==, there are two plausible options for binding it to
new types:

(P1) Syntax of op==(val,val) and op==(any,any) binds to CE as with
op==(ref,ref).  Therefore, NE is uniformly reached by today's idiom,
which traverses value fields twice.

(P2) Syntax of op==(val,val) and op==(any,any) is direct access to
NE.  CE is reachable by experts at System.isEqualCopy.  The old idiom
for NE works also calls equals twice.

(P3) Same as P1, op== is uniform access to CE.  New op (spelled
"===", ".==", "=~", etc.) is uniform, optimizable access to NE,
attracting users away from legacy idiom for NE.

Reply via email to