Equality for values -- new analysis, same conclusion

Brian Goetz Fri, 09 Aug 2019 09:56:44 -0700

Time to take another look at equality, now that we’ve simplified away the 
LFoo/QFoo distinction. This mail focuses on the language notion of equality 
(==) only.  Let’s start with the simplest case, the ==(V,V) operator.  There is 
a range of possible interpretations:


 - Not allowed; the compiler treats == as not applicable to operands of type V. 
 (Note that since V <: Object, == may still be called upon to have an opinion 
about two Vs whose static types are Object or interface.)
 - Allowed, but always false.  (This appeals to a concept of “aggressive 
reboxing”, where a value is reboxed between every pair of byte codes.)
 - Weak substitutability.  This is where we make a “good faith” attempt to 
treat equal points as equal, but there are cases (such as those where a value 
hides behind an Object/interface) where two otherwise equal objects might 
report not equal.  This would have to appeal to some notion of invisible 
boxing, where sometimes two boxes for the same value are not equal.  
 - Substitutability.  This is where we extend == field wise over the fields of 
the object, potentially recursively.  

As noted above, we don’t only have to define ==V, but ==Object when there may 
be a value hiding behind the object. It might be acceptable, though clearly 
weird, for two values that are ==V to not be ==Object when viewed as Objects. 
However, the only way this might make sense to users is if this were appealing 
to a boxing conversion, and its hard to say there’s a boxing conversion from V 
to Object when V <: Object.  There is a gravitational force that says that if 
two values are V==, then they should still be == when viewed as Object or 
Comparable.  

Let’s take a look at the use cases for Object==.  

 - Direct identity comparison.  This is used for objects that are known to be 
interned (such as interned strings or Enum constants), as well as algorithms 
that want to compare objects by identity. such as IdentityHashMap.  (When the 
operands are of generic (T) or dynamic (Object) type, the “Interned” case is 
less credible, but the other cases are still credible.)
 - As a fast path for deeper equality comparisons (a == b || a.equals(b)), 
since the contract of equals() requires that == objects are equals().
 - In comparing references against null.
 - In comparing references against a known sentinel value, such as a value 
already observed, or a sentinel value provided by the user.

When generics are specialized, T== will specialize too, so when T specializes 
to a value, we will get V==, and when T is erased, we will get Object==.  

Suptyping is a powerful constraint; it says that a value is-a Object.  While it 
is theoretically possible to say that v1==v2 does not imply that ((Object)v1 == 
(Object) v2), I think we’ll have a very hard time suggesting this with a 
straight face.  (If, instead, the conversion from value to Object were a 
straight boxing conversion, this would become credible.)  Which says to me that 
if we define == on values at all, then it must be consistent with == on object 
or interface types. 

Similarly, the fact that we want to migrate erased generics to specialized, 
where T== will degenerate to V== on specialization, suggests that having 
Object== and V== be consistent is a strong normalizing force.


Having == not be allowed on values at all would surely be strange, since == is 
(mostly) a substitutibilty test on primitives, and values are supposed to “work 
like an int.”   And, even if we disallowed == on values, one could always cast 
the value to an Object, and compare them.  While this is not an outright 
indefensible position, it is going to be an uncomfortable one.  

Having V== always be false still does not seem like something we can offer with 
a straight face, again, citing “works like an int.”  

Having V== be “weak substitutability” is possible, but I don’t think it would 
make the VM people happy anyway.  Most values won’t require recursive 
comparisons (since most fields of value types will be statically typed as 
primitives, refs, or values), but much of the cost is in having the split at 
all.  

Note too that treating == as substitutibility means use cases such as 
IdentityHashMap will just work as expected, with no modification for a 
value-full world.  

So if V <: Object, it feels we are still being “boxed” into the corner that == 
is a substitutability test.  But, in generic / dynamically typed code, we are 
likely to discourage broad use of Object==, since the most common case (fast 
path comparison) is no long as fast as it once was.  


We have a few other options to mitigate the performance concerns here:

 - Live with legacy ACMP anomalies;
 - Re-explore a boxing relationship between V and Object.

If we say that == is substitutability, we still have the option to translate == 
to something other than ACMP.  Which means that existing binaries (and likely, 
binaries recompiled with —source 8) will still use ACMP.  If we give ACMP the 
“false if value” interpretation, then existing classifies (which mostly use == 
as a fast-path check) will still work, as those tests should be backed up with 
.equals(), though they may suffer performance changes on recompilation.  This 
is an uncomfortable compromise, but is worth considering.  Down this route, 
ACMP has a much narrower portfolio, as we would not use it in translating most 
Object== unless we were sure we were dealing with identityful types.  

The alternate route to preserving a narrower definition of == is to say that 
_at the language level_, values are not subtypes of Object.  Then, we can 
credibly say that the eclair companion type is the box, and there is a boxing 
conversion between V and I (putting the cream in the eclair is like putting it 
in a box.)  This may seem like a huge step backwards, but it actually is a 
consistent world, and in this world, boxing is a super-lightweight operation.  
The main concern here is that when a user assigns a value to an 
object/interface type, and then invokes Object.getClass(), they will see the 
value class — which perhaps we can present as “the runtime box is so light that 
you can’t even see it.”  

Where this world runs into more trouble is with specialized generics; we’d like 
to treat specialized Foo<T> as being generic in “T extends Object”, which 
subsumes values.  This complicates things like bound computation and type 
inference, and also makes invoking Object methods trickier, since we have to do 
some sort of reasoning by parts (which we did in M3, but didn’t like it.)  

tl;dr: if we want a unified type system where values are objects, then I think 
we have to take the obvious semantics for ==, and if we want to reduce the 
runtime impact on _old_ binaries, we should consider whether giving older 
binaries older semantics, and taking the discontinuity as the cost of 
unification.

Equality for values -- new analysis, same conclusion

Reply via email to