On May 18, 2016, at 7:57 AM, Brian Goetz <[email protected]> wrote:
>
> [1] John points out that if == is CE, then (a==b||a.equals(b)) will
> redundantly load the fields on failed ==. But, many equals implementations
> start with "a==b" as a short-circuiting optimization, which means "a==b" will
> be a common (pure) subexpression in the resulting expansion (and for values,
> methods are monomorphic and will get inlined more frequently), so the two
> checks can be collapsed.
Proposed acronym: Call the explicit legacy idiom for equality "LIFE". This is
"a==b||a.equals(b)", maybe with a null (or NaN?) check. Semantically it is
CE||OE.
The question of optimization is worth expanding on. In a nutshell, pure
control flow test like a.f=ACMP=b.f are often collapsible. (The dominating
test makes the later test go away, or in other words, the dominated test can
rise upward to merge into the dominating test.) But, even pure tests cannot be
sunk past exception-producing code. If a sub-operation of OE can produce an
exception, it cannot be reordered with a previous CE sub-operation.
Unfortunately, exceptions provide a semantic "window" into order of operations;
this is one of the major difficulties with Java optimization.
For example:
if (a.someTag != b.someTag) return NOT_EQUAL; //CE, first pass
if (!a.equals(b)) return NOT_EQUAL; //OE, second pass
return IS_EQUAL;
If the heavy comparison somehow includes a repetition of the "someTag"
comparison, that latter comparison can be removed as a dominating test.
But, if the heavy comparison can throw an exception (or make any side effect),
then the "someTag" comparison cannot be moved *down* into or past the equals
method.
The net of this is that the writer of the equals method has, in general, no
control over the first pass of the equals algorithm, even after optimization.
The code might get some spot benefits from removal of explicit tests in the OE
pass if they are already covered (dominated) by the CE pass. If the code
*omits* the equivalent of a CE test (e.g., an ACMP comparison) for some reason,
the CE test is extra work. Not much extra work, but it could cause extra fills
or spills as the JIT works to provide a first-pass access to a component that
will be needed, again, in a second pass after arbitrary intermediate labor.
So a mandatory LIFE sentence (couldn't resist that), if op== is CE (that's P1),
means some loss of performance control around equals methods.
By comparison, if op== is OE (that's P2), there is no need to explicitly call
CE; the .equals method does the exact job, and the coder has control.
In legacy code where op== is OE, the LIFE expression produces back-to-back
calls to .equals, semantically OE||OE. That would be even worse than CE||OE,
unless we have a way to optimize OE||OE to OE, or until the users pick the LIFE
(or the Legacy Idiom Connoting Equality) out of their code.
But, if it is allowed, collapsing two adjacent calls to .equals is a much
simpler JIT transform than merging an adjacent CE and NE. It's a macro-level
CSE on partially lowered IR.
As noted above, merging a CE into an NE requires that all the logic of the CE
be performed before the first possible exception produced by NE.
Let's try a more complete example:
class MyVal {
TypeDesc type;
long payload1;
long[] payload2;
boolean equals(MyVal that) {
return this.type.equals(that.type) && this.payload1 == that.payload1 &&
Arrays.equals(this.payload1, that.payload2);
}
}
class User {
boolean foo(MyVal a, MyVal b) {
return a==b || a.equals(b); // CE || OE == LIFE
}
}
In P1, the intermediate representation for the legacy idiom will contain these
operations:
PASS1: // start CE
if (a.type !ACMP= b.type) goto PASS2;
if (a.payload1 !LCMP= b.payload1) goto PASS2;
if (a.payload2 !ACMP= b.payload2) goto PASS2;
return IS_EQUAL;
PASS2: // start OE
z = a.type.INVOKEVIRTUAL[TypeDesc, equals, (TypeDesc)Z](b.type); //[DT a.type?]
… // random logic here; can throw exceptions in general
if (!z) return NOT_EQUAL;
if (a.payload1 !LCMP= b.payload1) return NOT_EQUAL; //[DT a.payload1]
z = INVOKESTATIC[Arrays, equals, ([J [J)Z](a.payload2, b.payload2); //[DT
a.payload2]
… // random logic here; can throw exceptions in general
if (!z) return NOT_EQUAL;
return IS_EQUAL;
What's wrong with this? Well, from the JIT POV it is micromanagement, by the
CE code, of the order of operations. The JIT has to reverse any unfortunate
decisions in the CE code if it wants to respect the order of operations in the
OE code.
From the POV of the author of MyVal.equals. the CE enforces a policy that it is
more profitable to compare the bits of payload1 and the object identity of
payload2 before the structure of TypeDesc is consulted. Sometimes this is
right, sometimes it is wrong, but the CE+OE idiom fits everybody into a
one-size-fits all policy, and takes away ordering control from the .equals
method.
Could the JIT reorder stuff any way it wants, and make the code more like what
the author intended? After all, CE is provably a subset of OE. There are two
problems with that. First, the subset relation is subtle and not visible (to
the JIT) at the bytecode level. (After all, CE and OE are two completely
different bytecode shapes.)
Second, as noted above, the possibility of exceptions places barriers to
reordering. You cannot reorder exceptions with normal exits. In the above IR,
if the payload1 fields differ, TypeDesc.equals is never called, so even if it
were going to throw an exception (due to an assert or a bug or a malformed
MyVal or TypeDesc, say) the IR says it can't, and the JIT can't reorder the CE
code to merge it into the OE code.
In this particular example, the possible optimizations are removal of the
dominated tests marked "[DT …]" above. The DT on payload2 is inside the
implementation of Arrays.equals. The DT on a.type may or may not be present in
the TypeDesc.equals method; it's a programmer choice.
More perturbations are possible with larger value objects: If you add more
64-bit payload fields, the bitwise comparison clearly needs to load the entire
value object (from whatever buffer or box it is in) before the first call to
TypeDesc.equals. If the programmer knows that TypeDesc.equals method is likely
to prove inequality, without examination of the payload, then the first pass
can be wasted bandwidth, in cases of degenerate values with (say) all zeroes
payloads. Is this a problem? Depends on the data structure. In this case,
the dominated tests (e.g., DT of payload1) can usually be elided, so at worst
it is a reordering of programmer intentions.
On the other hand, if you add more reference fields (like type or payload2),
the first pass amounts to a reordering of (likely) ACMP operations inside the
.equals method of each successive component. Is that bad? Well, as noted
above, it forces the JIT to load each reference field twice, once to do ACMP,
and then (much later) to call .equals. Are there enough registers to avoid
double loads? Often, yes, until the computation gets register bound, and then
you simply have multiple loads, just to ensure the semantics of exception
ordering.
In summary, optimizing CE || OE (the legacy semantics of LIFE) is not horrible
but will remove some programmer control from .equals ordering, and will cause
more data motion in the presence of multiple reference components.
LIFE with P1 will also make the JIT slightly slower and less robust, as it
pattern matches on more ad hoc and verbose bytecode shapes. (Complex pattern
matching is difficult; that's why we moved the string concatenation idioms into
invokedynamic in JDK 9.)
Let's go to P2 for a moment. In the presence of LIFE, the IR will first
contain unexpanded intrinsics for a repeated OE || OE:
// a==b
z = INTRINSIC[a.equals(b)];
if (!z) goto NOT_EQUAL;
// a.equals(b)
z = INTRINSIC[a.equals(b)];
if (!z) goto NOT_EQUAL;
goto IS_EQUAL;
We need a special permission, an above-the-bytecode guarantee that any normal
or abnormal result produced by the second call will be produced by the first
call. In that case, we can elide the IR for the second call (as provably
redundant):
z = INTRINSIC[a.equals(b)];
if (!z) goto NOT_EQUAL;
goto IS_EQUAL;
Expansion then continues as above, but starting with PASS2. Code needs less
optimization and is more under the control of the author of .equals.
HTH
— John