On May 16, 2016, at 11:20 AM, Brian Goetz <[email protected]> wrote:
> 
> We have two choices in front of us regarding equality on value types:
> 
> 1.  What does `==` on values do?  Choices include a bitwise comparison, a 
> componentwise comparision (identical to bitwise unless there are float 
> members), or invoking the `equals()` method.

I.e., what is the semantic binding of the op== syntax?

> 2.  What is the default for the `equals()` method?  The obvious choice is a 
> componentwise comparision, but Kevin has (strongly) suggested we consider a 
> deep comparison, calling .equals() on any reference members.  (Note that the 
> two choices collapse to the same thing when there are no reference members.)

I.e., what are the global expectations of the .equals method, and how can we 
embody them in a suitable default?

> For values whose members are value-ish objects themselves (e.g., String, 
> Optional, LocalDateTime), the deep-equals is obviously appealing -- a tuple 
> of (int, date) makes total sense to compare using .equals() on the date.  On 
> the other hand, for values whose members are references to mutable things 
> (like lists), it's not as obvious what the right default is.

I can't vote for "deep" vs. "shallow", or any syntax binding, without setting 
up a bunch of semantic background.  So here goes…

Let's start with semantics with no syntax.  I would like to single out two 
especially useful equivalence relations, to be called normal equals ("NE") and 
copy-wise equals ("CE").

("Normal" is a shameless play for pre-eminence of a particular theory.  But we 
have to define a normal, because we have to say what we expect of .equals.  We 
might have to tweak my notion of "normal", but here it is.)

For exposition, we need also names for legacy comparison operations for 
primitives and references, to be called primitive equals ("PE") and object 
equals ("OE").

CE is the most refined possible equality.  If two values (reference or 
primitive) are CE, then you can *never* execute code that detects a difference 
between them.  If two values are CE then each is fully substitutable for the 
other, in all contexts.  This is sometimes a useful property to observe in 
generic code.

  CE(a,b) := "a is indistinguishable from an assignment-copy of b"

Implementation note:  CE is op== today, except for floats.  For floats, CE must 
consult raw bits; cf. Double.doubleToRawLongBits.  CE for values has to be a 
component-wise distribution of CE, or else it wouldn't be CE.  But after this 
point we get less certain.

In most cases, CE is exactly the legacy op==, but not for floats.  So let's 
define a slightly coarser equivalence relation for primitive equality:

  PE(a,b) := legacy(a==b) || (floating(a) && CE(a,b))

Here the op== is the legacy primitive equality appropriate to both a and b.  
(Note: These formulas will assume that a and b are either both refs, both of 
the same primitive type, or (in some cases) both of the same value type.)

Implementation note:  Both PE(+0.0,-0.0) and PE(Float.NaN, Float.NaN) are true, 
while CE(+0.0,-0.0) is true but Float.NaN==Float.NaN is false.  These are the 
only differences, so we can treat PE as a delta from CE:

  PE(a,b) == CE(a,b) || (floating(a) && legacy(a==0 && b==0)).

Also, since the box types Float and Double use CE internally, we can also boxed 
comparison to obtain CE:

  PE(a,b) == legacy(a==b) || ((Float)a).equals((Float)b)

Implementation note:  By appealing to boxes for a semantic notion like PE, we 
*do not* require implementations to perform boxing.  We simply require that any 
computation of the semantics produces the same result *as if* the formula were 
directly evaluated, boxes and all.

So PE is neither CE nor the legacy op==, but is an equivalence relation that 
closely approximates both.  It is presented here to show the interactions of 
general equality with primitive equality, but (spoiler alert) it probably won't 
make the final cut.

OE is simply an appeal to the Object.equals method, guarded first by CE.

  OE(a,b) := CE(a,b) || (reference(a) && legacy(a != null && a.equals(b)))

Implementation note:  The CE guard takes care of null==null.  For references, 
this relation is embodied in Objects.equals(a,b).

Implementation note:  For primitive wrappers, OE is compatible with CE (but not 
PE or op==) on the wrapped primitive value.  This is true even for floats (the 
.equals methods consult the raw bits).  Therefore we can use these relations, 
if we want:

  CE(a,b) == OE((Object)a,(Object)b)  where  primitive(a)
  PE(a,b) == legacy(a==b) || (primitive(a) && OE((Object)a,(Object)b))

For some objects, OE is the same as CE; this includes arrays.  For objects 
which favor structural equality, including most collection classes, OE recurses 
into substructure.

One might say that two objects which are OE but not CE are not fully 
substitutable, but "structurally substitutable".  The Arrays.equals method 
family offers an alternative form of structural substitutability by treating 
arrays as lists.  Structural substitutability is a time-varying property, if an 
object or any subpart is mutable.  Still, many objects are never-mutated, so 
structural substitutability can be viewed as time-invariant in many cases 
(though this is a fruitful source of bugs and TOCTOU exploits).

Now it's time to define "normal", as a concept that works equally well with 
both objects and primitives, and can extend to the new value types.  Here is a 
symmetric possibility:

  NE'(a,b) := PE(a,b) || OE(a,b)

In other words, a "normal equality" test compares primitives with op== (except 
it uses CE for NaNs), and compares objects with .equals.

We can also rewrite this formula to omit the non-legacy notion of PE, and 
appeal to boxes instead:

  NE'(a,b) := legacy(a==b) || OE((Object)a,(Object)b)

For reference types, this is obviously our old friend Objects.equals in another 
guise.  For primitive types it is something new, but perhaps desirable.  (So 
I'm putting it on the table, though I don't prefer it.)  It is a "naively 
generified" version of "a == b || a != null && a.equals(b)".  Since it includes 
PE, it compares +0.0 as equal to -0.0.

But I prefer a simpler version of NE:

  NE(a,b) := CE(a,b) || (reference(a) && OE(a,b))

This version drops PE from the equation.  Float zeroes are not respected; 
primitives comparisons are uniformly bitwise (CE).

Because primitive boxes perform CE for .equals, we can alternatively formulate 
NE in terms of boxes:

  NE(a,b) := OE((Object)a,(Object)b)

This finally takes me to why I want to privilege this comparison with the name 
"normal":  It is in use every nanosecond already, in our collection types.  
When you auto-box a primitive into a collection, it participates in the form of 
NE.

A third candidate for NE would just use legacy comparisons:

  NE''(a,b) := legacy(a == b) || (reference(a) && OE(a,b))

But this is not an equivalence relation, in the case of floating point NaNs.  
Nor is it compatible with current computations involving boxed NaNs (which *do* 
compare as equal if they are CE).  I believe these reasons, combined, take NE'' 
out of the running.

Still staying in semantics, let's consider extending these relations to values. 
 As noted above, CE *must* be "componentwise CE", unless we have a way to 
guarantee that some field will *never* be detectible by user code, in which 
case we should just drop the field.

PE *could* be extended to values componentwise also.  This would give a 
slightly weakened version of CE for values which contain floats.  I don't think 
that move carries its weight.

To refer to the equals method for values, we can make a simple partial 
definition:

  VE(a,b) := a.equals(b)  where value(a) and typeof(a)==typeof(b)

Alternatively, OE is easily extended to values, either by boxing, or directly:

  OE'(a,b) := OE(a,b) || (value(a) && OE((Object)a, (Object)b))
  OE'(a,b) := CE(a,b) || (!primitive(a) && legacy(a != null && a.equals(b)))

(To make notation easier, we can assume a!=null is identically true for any 
value a.)

Those two definitions will remain in alignment as long as .equals behavior 
commutes with boxing a value to Object.

Then we get an compatibly extended version of NE (with no prime this time) as:

  NE(a,b) := CE(a,b) || OE(a,b) || VE(a,b)

We can fold together the equals calls instead:

  NE(a,b) := (!primitive(a) && OE'(a,b))

Or, recognizing that PE is not in play, simply fold everything into boxes:

  NE(a,b) := OE((Object)a,(Object)b)

By dropping CE from the formula, we are relying on the fact that all .equals 
methods are reflexive, even for values.

Any of these formulas are plausible starting points for efficient 
implementations of NE.

An NE with values is a highly plausible extension for places in our system 
which execute the logic of Objects.equals.

For places in our system which execute non-CE primitive equality predicates 
(such as op== or PE), NE is an approximate replacement, and may need special 
handling.  But the differences are small, affecting only to the treatment of 
NaN dis-equality and signed-zero equality.  And today's normative treatment 
(after boxing) is simply CE, which is compatible with NE for boxed primitives.

For the record, we could also define array-traversing predicates which coarsen 
OE, such as:

  AE(a,b) := /*OE(a,b) ||*/ Arrays.deepEquals(new Object[]{a},new Object[]{b})
  AE'(a,b) := OE(a,b) || (array(statictypeof(a)) && array(statictypeof(b)) && 
Arrays.equals(a,b))
  AE''(a,b) := OE(a,b) || (array(statictypeof(a)) && array(statictypeof(b)) && 
Arrays.deepEquals(a,b))

I hope that is helpful to others besides me.

Now, as for syntax and bindings.

CE is obviously useful; when you need it you shouldn't have to approximate it.  
So it needs its own API point, something like System.isEqualCopy.  One reason 
for this, rather than rely on a legacy operator, is to give programmers a 
definitive way to avoid binding to the floating op== which has a surprise 
inside.  Generic code, if it specializes to floating op==, will break 
substitutability in just that one place; surely this will be unwelcome 
sometimes.  Even if generic code uses CE uniformly for op==, System.isEqualCopy 
can have a pedagogical function, as well as being handy in numeric code.

PE is not obviously useful, so unless it is incorporated in NE (as NE') it does 
not need an API point.  It can be coded (in numeric code) as a==b || CE(a,b).

It is plausible to bind op== to CE almost everywhere.  There is one caveat 
(besides floating point):  For values, there will be a performance cost, 
because many value comparisons will *not* welcome a pre-pass with CE as an 
optimization to VE.  In other words, if op== is bound to CE, the NE formula for 
values will be the usual:

  v==w || v.equals(w)

This is nice and symmetric with current practice, *but* it is slow.  The big 
user-written pattern puts a burden on the JIT.  The JIT *should* be able to 
optimize away the various field references in "v==w", by recognizing that the 
same fields are being compared in the .equals method.  *But* if the equals 
method does anything other than CE to those fields (e.g., NE or AE or a 
user-defined check), then the JIT might not be able to infer that the original 
field comparison is subsumed by the later test.  There may be control flow in 
the way which the JIT does not understand.  Thus, "v==w||v.equals(w)" may 
optimize only partially, including machine code artifacts like "v.f==w.f; …; 
v.f.equals(w.f)".

Finally, even if the JIT is really smart, users are even smarter, and they 
endlessly produce new variations on the theme of a==b||a.equals(b).  The two 
comparisons, which we want to merge into each other, can be separated by 
arbitrary user code.  This is a classic problem with verbose languages:  The 
compiler has to recognize any code shape the user throws at it.

For this reason one might prefer that op== for values (and any-types) be bound 
to NE, not CE.  (This is P2 below.)  An underlying assumption here would be 
that CE is much less useful than NE, so NE should have preferential access.

Data is mixed on whether NE is more important than CE.  Object reference 
identity would seem to be the wrong question to answer in many cases, yet some 
codes use CE to detect full substitutability.  Some uses of op== are errors, 
where the user mean to reach for NE but forgot how to spell it (or was too 
lazy?).  Sometimes CE seems like an expert-only tool, a sharp needle to be kept 
on a high shelf.  Deciding the true utility of CE requires corpus studies.

Considered by itself, NE is very useful; we call it Objects.equals(a,b) 
nowadays, so it has an API point waiting for it already.

What do today's classes do when implementing .equals?  This is an empirical 
question, but to me it seems that they typically consult most or all of their 
fields, comparing each field value with the field value in the comparand.  The 
tendency is to use recursive structural comparison on fields.  In short, it is 
rare to find CE applied to fields in an .equals method, and common to find NE 
applied.  (Sometimes an AE or another thing is used.)

(Note:  When we say that code executes NE or CE, we really mean that it 
executes other code whose result is equivalent to NE or CE, as restricted to 
the types and inputs actually present.)

If a class or value models a *cursor* into some mutable state, it may be 
undesirable to traverse the state.  A cursor comparison operation should use CE 
on the state reference, while is can use NE (or op==) on other parts such as 
indexes.  Java does not have many such data types, but the option must exist 
for some occasions, to use CE instead of NE for a field comparison.  (C++ STL 
has many such types.  Project Panama defines proxies for C pointers which 
behave like this.)

It is simple to use NE to specify a useful default .equals method for values:  
Simply distribute NE across the fields, in the same way that CE *must* 
distribute itself across the fields.

If a value type wants to use floating op== on a floating point field, it can 
specify this "manually".  Likewise for AE (arrays treated structurally).  The 
manual specialization can be carried out either by hand-writing the .equals 
method, or else giving the field a type for which NE delegates to the correct 
equality computation.  As Peter Levart points out, this field type can itself 
be a value type which wraps the "real" field value, and such a value type is 
probably of negligible cost.

So it appears that having NE-across-components be the default equals method is 
customary and reasonable.  Given wrapper types for tweaked fields, it is even 
(arguably) universal.

Going back to op==, there are two plausible options for binding it to new types:

(P1) Syntax of op==(val,val) and op==(any,any) binds to CE as with 
op==(ref,ref).  Therefore, NE is uniformly reached by today's idiom, which 
traverses value fields twice.

(P2) Syntax of op==(val,val) and op==(any,any) is direct access to NE.  CE is 
reachable by experts at System.isEqualCopy.  The old idiom for NE works also 
calls equals twice.

(P3) Same as P1, op== is uniform access to CE.  New op (spelled "===", ".==", 
"=~", etc.) is uniform, optimizable access to NE, attracting users away from 
legacy idiom for NE.

In all cases, floats have irregularities on NaNs and zeroes.  Basically, 
generic code needs to access CE, and wants to use op== for it, but 
floating-point code has a slightly different meaning for op==.  You can't have 
two meanings for one operator, so something has to give.  I recommend 
segregating the floating-point meaning of op== to monomorphic code, so 
polymorphic code can be reasoned about in terms of the clean CE.

In P1 and P2, the legacy comparison idiom (a==b || a.equals(b)) reaches NE, but 
also iterates over value fields twice, with unpredictable optimization results. 
 In P2, both passes are done by V.equals, so if there is a guaranteed that 
V.equals is idempotent, it can be directly optimized.  If P1, if there is a 
guarantee that op== is subsumed by .equals, the leading op== can be optimized.  
In any case, P2 allows code to be improved to reach NE by a single expression 
a==b, allowing full optimization in all cases.

With P1, we can use op== for CE substitutability checks (except for 
non-specialized floats), and use Objects.equals for NE.  There is no need to 
directly call .equals, but users will do this because it is easier than calling 
Objects.equals.

With P2, we can use op== as abbreviation for Objects.equals, and use, 
System.isEqualCopy for substitutability checks.  Users will be attracted away 
from direct calls to .equals.

With P3, we can use op== for CE as in P1, and the new op for NE.  Users will be 
attracted away from direct calls to .equals.

Sadly, legacy use of op==(ref,ref) makes P2 difficult to adopt.  We could make 
an agreement, as with float, that op==(ref,ref) means one thing in legacy code, 
while op==(any,any) means something different (and better) in any-generic code. 
 Note that legacy code includes generics, in which op== is CE not NE.

There are good arguments that CE is so useful that we *should* allocate an 
operator to it, anyway.  But if this argument is accepted, it provides even 
more motivation for an operator for NE, since NE is extremely common.  For 
example, many uses of op== (CE) are written simply to support the idiom for NE 
(Objects.equals).  And if we default field comparison to NE not CE, how is it 
that users are given the less-friendly notation for NE and the more friendly 
one for NE?

At the cost of another bikeshed and some sugar, P3 would allow users and 
optimizers smoother access to NE.  Too much sugar is bad for the diet, but this 
sugar (op===) might just be the right way to lead Hansel and Gretel to the 
correct house, instead of stumbling through the woods with the Objects.equals 
incantation.

— John

Reply via email to