On May 18, 2016, at 7:57 AM, Brian Goetz <[email protected]> wrote:
> 
> [1] John points out that if == is CE, then (a==b||a.equals(b)) will 
> redundantly load the fields on failed ==.  But, many equals implementations 
> start with "a==b" as a short-circuiting optimization, which means "a==b" will 
> be a common (pure) subexpression in the resulting expansion (and for values, 
> methods are monomorphic and will get inlined more frequently), so the two 
> checks can be collapsed.

Proposed acronym:  Call the explicit legacy idiom for equality "LIFE".  This is 
"a==b||a.equals(b)", maybe with a null (or NaN?) check.  Semantically it is 
CE||OE.

The question of optimization is worth expanding on.  In a nutshell, pure 
control flow test like a.f=ACMP=b.f are often collapsible.  (The dominating 
test makes the later test go away, or in other words, the dominated test can 
rise upward to merge into the dominating test.)  But, even pure tests cannot be 
sunk past exception-producing code.  If a sub-operation of OE can produce an 
exception, it cannot be reordered with a previous CE sub-operation.  
Unfortunately, exceptions provide a semantic "window" into order of operations; 
this is one of the major difficulties with Java optimization.

For example:

if (a.someTag != b.someTag)  return NOT_EQUAL;  //CE, first pass
if (!a.equals(b))  return NOT_EQUAL; //OE, second pass
return IS_EQUAL;

If the heavy comparison somehow includes a repetition of the "someTag" 
comparison, that latter comparison can be removed as a dominating test.

But, if the heavy comparison can throw an exception (or make any side effect), 
then the "someTag" comparison cannot be moved *down* into or past the equals 
method.

The net of this is that the writer of the equals method has, in general, no 
control over the first pass of the equals algorithm, even after optimization.  
The code might get some spot benefits from removal of explicit tests in the OE 
pass if they are already covered (dominated) by the CE pass.  If the code 
*omits* the equivalent of a CE test (e.g., an ACMP comparison) for some reason, 
the CE test is extra work.  Not much extra work, but it could cause extra fills 
or spills as the JIT works to provide a first-pass access to a component that 
will be needed, again, in a second pass after arbitrary intermediate labor.

So a mandatory LIFE sentence (couldn't resist that), if op== is CE (that's P1), 
means some loss of performance control around equals methods.

By comparison, if op== is OE (that's P2), there is no need to explicitly call 
CE; the .equals method does the exact job, and the coder has control.

In legacy code where op== is OE, the LIFE expression produces back-to-back 
calls to .equals, semantically OE||OE.  That would be even worse than CE||OE, 
unless we have a way to optimize OE||OE to OE, or until the users pick the LIFE 
(or the Legacy Idiom Connoting Equality) out of their code.

But, if it is allowed, collapsing two adjacent calls to .equals is a much 
simpler JIT transform than merging an adjacent CE and NE.  It's a macro-level 
CSE on partially lowered IR.

As noted above, merging a CE into an NE requires that all the logic of the CE 
be performed before the first possible exception produced by NE.

Let's try a more complete example:

class MyVal {
  TypeDesc type;
  long payload1;
  long[] payload2;
  boolean equals(MyVal that) {
    return this.type.equals(that.type) && this.payload1 == that.payload1 && 
Arrays.equals(this.payload1, that.payload2);
  }
}
class User {
   boolean foo(MyVal a, MyVal b) {
      return a==b || a.equals(b);  // CE || OE == LIFE
   }
}

In P1, the intermediate representation for the legacy idiom will contain these 
operations:

PASS1: // start CE
if (a.type !ACMP= b.type)  goto PASS2;
if (a.payload1 !LCMP= b.payload1)  goto PASS2;
if (a.payload2 !ACMP= b.payload2)  goto PASS2;
return IS_EQUAL;
PASS2: // start OE
z = a.type.INVOKEVIRTUAL[TypeDesc, equals, (TypeDesc)Z](b.type); //[DT a.type?]
… // random logic here; can throw exceptions in general
if (!z)  return NOT_EQUAL;
if (a.payload1 !LCMP= b.payload1)  return NOT_EQUAL; //[DT a.payload1]
z = INVOKESTATIC[Arrays, equals, ([J [J)Z](a.payload2, b.payload2); //[DT 
a.payload2]
… // random logic here; can throw exceptions in general
if (!z)  return NOT_EQUAL;
return IS_EQUAL;

What's wrong with this?  Well, from the JIT POV it is micromanagement, by the 
CE code, of the order of operations.  The JIT has to reverse any unfortunate 
decisions in the CE code if it wants to respect the order of operations in the 
OE code.

From the POV of the author of MyVal.equals. the CE enforces a policy that it is 
more profitable to compare the bits of payload1 and the object identity of 
payload2 before the structure of TypeDesc is consulted.  Sometimes this is 
right, sometimes it is wrong, but the CE+OE idiom fits everybody into a 
one-size-fits all policy, and takes away ordering control from the .equals 
method.

Could the JIT reorder stuff any way it wants, and make the code more like what 
the author intended?  After all, CE is provably a subset of OE.  There are two 
problems with that.  First, the subset relation is subtle and not visible (to 
the JIT) at the bytecode level.  (After all, CE and OE are two completely 
different bytecode shapes.)

Second, as noted above, the possibility of exceptions places barriers to 
reordering.  You cannot reorder exceptions with normal exits.  In the above IR, 
if the payload1 fields differ, TypeDesc.equals is never called, so even if it 
were going to throw an exception (due to an assert or a bug or a malformed 
MyVal or TypeDesc, say) the IR says it can't, and the JIT can't reorder the CE 
code to merge it into the OE code.

In this particular example, the possible optimizations are removal of the 
dominated tests marked "[DT …]" above.  The DT on payload2 is inside the 
implementation of Arrays.equals.  The DT on a.type may or may not be present in 
the TypeDesc.equals method; it's a programmer choice.

More perturbations are possible with larger value objects:  If you add more 
64-bit payload fields, the bitwise comparison clearly needs to load the entire 
value object (from whatever buffer or box it is in) before the first call to 
TypeDesc.equals.  If the programmer knows that TypeDesc.equals method is likely 
to prove inequality, without examination of the payload, then the first pass 
can be wasted bandwidth, in cases of degenerate values with (say) all zeroes 
payloads.  Is this a problem?  Depends on the data structure.  In this case, 
the dominated tests (e.g., DT of payload1) can usually be elided, so at worst 
it is a reordering of programmer intentions.

On the other hand, if you add more reference fields (like type or payload2), 
the first pass amounts to a reordering of (likely) ACMP operations inside the 
.equals method of each successive component.  Is that bad?  Well, as noted 
above, it forces the JIT to load each reference field twice, once to do ACMP, 
and then (much later) to call .equals.  Are there enough registers to avoid 
double loads?  Often, yes, until the computation gets register bound, and then 
you simply have multiple loads, just to ensure the semantics of exception 
ordering.

In summary, optimizing CE || OE (the legacy semantics of LIFE) is not horrible 
but will remove some programmer control from .equals ordering, and will cause 
more data motion in the presence of multiple reference components.

LIFE with P1 will also make the JIT slightly slower and less robust, as it 
pattern matches on more ad hoc and verbose bytecode shapes.  (Complex pattern 
matching is difficult; that's why we moved the string concatenation idioms into 
invokedynamic in JDK 9.)

Let's go to P2 for a moment.  In the presence of LIFE, the IR will first 
contain unexpanded intrinsics for a repeated OE || OE:

// a==b
z = INTRINSIC[a.equals(b)];
if (!z)  goto NOT_EQUAL;
// a.equals(b)
z = INTRINSIC[a.equals(b)];
if (!z)  goto NOT_EQUAL;
goto IS_EQUAL;

We need a special permission, an above-the-bytecode guarantee that any normal 
or abnormal result produced by the second call will be produced by the first 
call.  In that case, we can elide the IR for the second call (as provably 
redundant):

z = INTRINSIC[a.equals(b)];
if (!z)  goto NOT_EQUAL;
goto IS_EQUAL;

Expansion then continues as above, but starting with PASS2.  Code needs less 
optimization and is more under the control of the author of .equals.

HTH
— John

Reply via email to