On Apr 12, 2016, at 1:51 PM, Brian Goetz <[email protected]> wrote:
>
>>
>> Using a value type for something that isn't a value raises alarm bells for
>> me. At the minimum I would expect this user to have to implement eq/hc by
>> hand, because the default behavior users want 99% of the time is (deep)
>> content-based equality.
>
> This may be the reality-distortion field speaking, but in my view a reference
> *is* a kind of value — albeit a very special kind. They’re immutable, like
> other values.
To be more precise: Values have no mutable subfields. References and
primitives have no mutable subfields. (References, when not null, point to
various objects, some of which themselves have mutable subfields.) In this
way, they are all similar.
Also, *some* value or references (but not any primitives or some references
like String) *may* refer to objects which have mutable subfields or some other
kind of mutable state.
Mainly because of this property, values, primitives, and references are at
least partially referentially transparent under copy operations. When you copy
a value (of any sort, ref or val), you capture all of its substructure,
including any possible future value of its substructure. But the copy is not
infinitely deep. (That would be an additional commitment.) The copy only
copies the immediate subfields of the value: The whole ref, both halves of the
long, all immediate fields of a custom value type.
So far a reference is indistinguishable from a one-field value wrapping a
reference. The next question is whether a value-wrapped reference should be
restricted in its mutability (or in its type, etc.). I think we have to give a
firm *no* here; there's too much to lose from making restrictions that are not
natural to the computational machinery of the JVM.
For example, if (following some well-intentioned commentators) we require that
the transitive closure of a value be fully immutable ("like an int, right?") we
give up the ability to use values as cursors and tuples. Q: "Why can't I
return two values from my method without boxing to the heap?" A: "Well, one of
your values was really a reference to mutable state, don't you see." That's
not a conversation I am willing to have. Some languages (even on the JVM)
might not allow partial immutability, but Java must, IMO.
> Almost all their state is encapsulated (they can be compared by identity,
> that’s it). They can only be constructed by privileged factories (we call
> these constructors.) But, ultimately, they behave like values — they are
> passed by value, they have no identity of their own.
…In any case, values with immutable immediate components lead us to questions
about their equivalence relations (equality).
There is more than one natural-looking equality predicate that can be assigned
to a tuple (and therefore to any value). This means any choice of default has
a ready-made rebuttal.
1. Component-wise boxed normal. Equivalent to boxing the tuple into a
List<Object>, and running List.equals.
2. Component-wise op==. Equivalent to using the Java op== on each component,
and-ing the bits together, or (if possible) packing into an array and using
Arrays.equals.
3. Copy-wise equality. Component-wise op==, except that floats and doubles are
converted to their corresponding "raw bits".
And, perhaps a surprise:
0. Component-wise normal. Equivalent to boxing the tuple into a List<Object>,
and running List.equals, except that floats and doubles are boxed to a
non-standard container that lifts floating-point equality op==(float,float).
(Non-tuple values can be thought of as those with private state, as
hashcode-savers, or with value-specific symmetries, as rationals. These guys
should define their own equality relations explicitly.)
The basic choice (for tuples) is whether to recurse into "equals" on references
(choices 0, 1). A secondary choice is whether to treat floats and doubles
copy-wise (choices 0, 2). If not copy-wise, there is a further choice between
Float.equals and op==(float,float). (Sadly, they are not exactly the same; see
the javadoc!)
The numbered cases 0..3 are in order of increasing refinement. Actually, 2
does not exactly refine 1, because of the difference between boxed and unboxed
float equality. But #3 distinguishes the logical maximum number of values.
(I.e., there is no stronger equality relation, where the "weakest" relation
would make everything appear equal.)
#0 captures the common practice of using op==(val,val) for values and
Object.equals for refs. I'd like to call this "normal equality", and it is may
be what we want for any-variables. (<T> T t1, t2; t1==t2 could be normal in
this way.) For floats there's a second choice that may be called "boxed normal
equality", where Object.equal is used uniformly after boxing non-refs. (You
can also invent more: What if the refs are box types? What about arrays?
Etc.)
Both kinds of "normal" of equality (actually, all kinds that resort to
Object.equals) are in tension with what I want to call "copy-wise equality".
Two values are copy-wise equal if and only if there is no way to prove that one
is not an copy (by assignment) of the other. This is basically bitwise
equality, but don't tell implementors that, because they will be required to
skip padding bits, and to traverse internal indirections (if the implementation
uses them), to prevent spurious inequality results.
Copy-wise equality is sometimes desirable, in order to prove that an operand
has been seen before. (IdentityHashMap is used today for this purpose.) I
think it should get a separate name, "System.isEqualCopy(T,T)" or some such.
It is possible to imagine value types (such as IHM entries or proxies) which
would use copy-wise equality on some of their fields.
Here's my evaluation of the four choices:
0. Component-wise normal. This is equivalent to writing the sort of method
that people already write, which performs .equals on refs and op== on vals.
Probably the least surprising. It is also generally useful, having worked well
for collections. Use it if you can afford it. But beware: Cursors require
identity comparison on any backing store component instead of Object.equals.
1. Component-wise boxed normal. Easy to describe, but has an unfortunate
dependency on the surprising Float.equals, which makes it unsuitable for
numerically-oriented values. Avoid for that reason in favor of #0.
2. Component-wise op==. A bad candidate for V.equals, but sometimes a useful
analog to op==(ref,ref). However, because of float oddities, consider favoring
#3 instead.
3. Copy-wise equality. Component-wise op==, except that floats and doubles are
converted to their corresponding "raw bits". Distinguishes the logical maximum
number of values (there is no stronger equality relation). Should not be the
default, but give it a name and allow users and value class writers to apply it
when needed.
Based on this analysis, if we were to assign a default equality predicate to
value types, I think the weakest (#0) would be best. This is the one which
works most like collections, but which "respects" primitives more by not boxing
them.
I think the current translation strategy uses normal equality for op==(T,T)
where T is an any-type variable. This is consistent with imagining the default
equality predicate (#0 as recommended) being expanded from any-generic code
that gives each field a separate any-type.
(IMO, this is a desirable incompatibility with legacy generics, where op==(T,T)
means op==(ref,ref), and usually requires an explicit backup call to
Object.equals.)
Note that copy-wise equality breaks abstraction boundaries in a way analogous
to its version applied to refs, which is op==(ref,ref). It can be used to
prove that two apparently equivalent values (or refs) are in fact different in
some way (perhaps an invisible way). Open question: Are there some values
that would want to specialize the isEqualCopy method, perhaps by making it
behave more like the equals method? Is that worth the cost of virtualizing
this operation?
The costs (in user model and implementation complexity) of isEqualCopy are (in
my mind) the exact counterpart of today's costs of reference equality. (They
add complexity to the user model which is not always needed, as for Strings,
and they prevent some important unbox/box optimizations when EA fails.)
— John
P.S. Exercise for the reader: For a value type V which needs deep equality
through many layers of indirect structure, it might be desirable to have
factory methods (V constructors) perform interning (invisibly) on indirect
portions of the value, using something like an IdentityHashTable (private to
the implementation). How could such a value avoid a full recursion on every
call to V.equals? What would the implementation look like? Does this
technique require a cost increase somewhere other than V.equals? Under what
circumstances will the cost savings in V.equals pay for any other costs?
(Hint: Consider using System.isEqualCopy.)