On Apr 12, 2016, at 1:51 PM, Brian Goetz <[email protected]> wrote:
> 
>> 
>> Using a value type for something that isn't a value raises alarm bells for 
>> me. At the minimum I would expect this user to have to implement eq/hc by 
>> hand, because the default behavior users want 99% of the time is (deep) 
>> content-based equality.
> 
> This may be the reality-distortion field speaking, but in my view a reference 
> *is* a kind of value — albeit a very special kind.  They’re immutable, like 
> other values.

To be more precise:  Values have no mutable subfields.  References and 
primitives have no mutable subfields.  (References, when not null, point to 
various objects, some of which themselves have mutable subfields.)  In this 
way, they are all similar.

Also, *some* value or references (but not any primitives or some references 
like String) *may* refer to objects which have mutable subfields or some other 
kind of mutable state.

Mainly because of this property, values, primitives, and references are at 
least partially referentially transparent under copy operations.  When you copy 
a value (of any sort, ref or val), you capture all of its substructure, 
including any possible future value of its substructure.  But the copy is not 
infinitely deep.  (That would be an additional commitment.)  The copy only 
copies the immediate subfields of the value:  The whole ref, both halves of the 
long, all immediate fields of a custom value type.

So far a reference is indistinguishable from a one-field value wrapping a 
reference.  The next question is whether a value-wrapped reference should be 
restricted in its mutability (or in its type, etc.).  I think we have to give a 
firm *no* here; there's too much to lose from making restrictions that are not 
natural to the computational machinery of the JVM.

For example, if (following some well-intentioned commentators) we require that 
the transitive closure of a value be fully immutable ("like an int, right?") we 
give up the ability to use values as cursors and tuples.  Q: "Why can't I 
return two values from my method without boxing to the heap?"  A: "Well, one of 
your values was really a reference to mutable state, don't you see."  That's 
not a conversation I am willing to have.  Some languages (even on the JVM) 
might not allow partial immutability, but Java must, IMO.

>  Almost all their state is encapsulated (they can be compared by identity, 
> that’s it).  They can only be constructed by privileged factories (we call 
> these constructors.)  But, ultimately, they behave like values — they are 
> passed by value, they have no identity of their own.  

…In any case, values with immutable immediate components lead us to questions 
about their equivalence relations (equality).

There is more than one natural-looking equality predicate that can be assigned 
to a tuple (and therefore to any value).  This means any choice of default has 
a ready-made rebuttal.

1. Component-wise boxed normal.  Equivalent to boxing the tuple into a 
List<Object>, and running List.equals.

2. Component-wise op==.  Equivalent to using the Java op== on each component, 
and-ing the bits together, or (if possible) packing into an array and using 
Arrays.equals.

3. Copy-wise equality.  Component-wise op==, except that floats and doubles are 
converted to their corresponding "raw bits".

And, perhaps a surprise:

0. Component-wise normal.  Equivalent to boxing the tuple into a List<Object>, 
and running List.equals, except that floats and doubles are boxed to a 
non-standard container that lifts floating-point equality op==(float,float).

(Non-tuple values can be thought of as those with private state, as 
hashcode-savers, or with value-specific symmetries, as rationals.  These guys 
should define their own equality relations explicitly.)

The basic choice (for tuples) is whether to recurse into "equals" on references 
(choices 0, 1).  A secondary choice is whether to treat floats and doubles 
copy-wise (choices 0, 2).  If not copy-wise, there is a further choice between 
Float.equals and op==(float,float).  (Sadly, they are not exactly the same; see 
the javadoc!)

The numbered cases 0..3 are in order of increasing refinement.  Actually, 2 
does not exactly refine 1, because of the difference between boxed and unboxed 
float equality.  But #3 distinguishes the logical maximum number of values.  
(I.e., there is no stronger equality relation, where the "weakest" relation 
would make everything appear equal.)

#0 captures the common practice of using op==(val,val) for values and 
Object.equals for refs.  I'd like to call this "normal equality", and it is may 
be what we want for any-variables. (<T> T t1, t2; t1==t2 could be normal in 
this way.)  For floats there's a second choice that may be called "boxed normal 
equality", where Object.equal is used uniformly after boxing non-refs.  (You 
can also invent more:  What if the refs are box types?  What about arrays?  
Etc.)

Both kinds of "normal" of equality (actually, all kinds that resort to 
Object.equals) are in tension with what I want to call "copy-wise equality".  
Two values are copy-wise equal if and only if there is no way to prove that one 
is not an copy (by assignment) of the other.  This is basically bitwise 
equality, but don't tell implementors that, because they will be required to 
skip padding bits, and to traverse internal indirections (if the implementation 
uses them), to prevent spurious inequality results.

Copy-wise equality is sometimes desirable, in order to prove that an operand 
has been seen before.  (IdentityHashMap is used today for this purpose.)  I 
think it should get a separate name, "System.isEqualCopy(T,T)" or some such.  
It is possible to imagine value types (such as IHM entries or proxies) which 
would use copy-wise equality on some of their fields.

Here's my evaluation of the four choices:

0. Component-wise normal.  This is equivalent to writing the sort of method 
that people already write, which performs .equals on refs and op== on vals.  
Probably the least surprising.  It is also generally useful, having worked well 
for collections.  Use it if you can afford it.  But beware:  Cursors require 
identity comparison on any backing store component instead of Object.equals.

1. Component-wise boxed normal.  Easy to describe, but has an unfortunate 
dependency on the surprising Float.equals, which makes it unsuitable for 
numerically-oriented values.  Avoid for that reason in favor of #0.

2. Component-wise op==.  A bad candidate for V.equals, but sometimes a useful 
analog to op==(ref,ref). However, because of float oddities, consider favoring 
#3 instead.

3. Copy-wise equality.  Component-wise op==, except that floats and doubles are 
converted to their corresponding "raw bits".  Distinguishes the logical maximum 
number of values (there is no stronger equality relation).  Should not be the 
default, but give it a name and allow users and value class writers to apply it 
when needed.

Based on this analysis, if we were to assign a default equality predicate to 
value types, I think the weakest (#0) would be best.  This is the one which 
works most like collections, but which "respects" primitives more by not boxing 
them.

I think the current translation strategy uses normal equality for op==(T,T) 
where T is an any-type variable.  This is consistent with imagining the default 
equality predicate (#0 as recommended) being expanded from any-generic code 
that gives each field a separate any-type.

(IMO, this is a desirable incompatibility with legacy generics, where op==(T,T) 
means op==(ref,ref), and usually requires an explicit backup call to 
Object.equals.)

Note that copy-wise equality breaks abstraction boundaries in a way analogous 
to its version applied to refs, which is op==(ref,ref).  It can be used to 
prove that two apparently equivalent values (or refs) are in fact different in 
some way (perhaps an invisible way).  Open question:  Are there some values 
that would want to specialize the isEqualCopy method, perhaps by making it 
behave more like the equals method?  Is that worth the cost of virtualizing 
this operation?

The costs (in user model and implementation complexity) of isEqualCopy are (in 
my mind) the exact counterpart of today's costs of reference equality.  (They 
add complexity to the user model which is not always needed, as for Strings, 
and they prevent some important unbox/box optimizations when EA fails.)

— John

P.S. Exercise for the reader:  For a value type V which needs deep equality 
through many layers of indirect structure, it might be desirable to have 
factory methods (V constructors) perform interning (invisibly) on indirect 
portions of the value, using something like an IdentityHashTable (private to 
the implementation).  How could such a value avoid a full recursion on every 
call to V.equals?  What would the implementation look like?  Does this 
technique require a cost increase somewhere other than V.equals?  Under what 
circumstances will the cost savings in V.equals pay for any other costs?  
(Hint:  Consider using System.isEqualCopy.)

Reply via email to