Re: Why is LambdaMetafactory 10% slower than a static MethodHandle but 80% faster than a non-static MethodHandle?

2018-02-20 Thread Remi Forax


- Mail original -
> De: "Vladimir Ivanov" 
> À: "Wenlei Xie" , "Da Vinci Machine Project" 
> 
> Envoyé: Mardi 20 Février 2018 00:14:42
> Objet: Re: Why is LambdaMetafactory 10% slower than a static MethodHandle but 
> 80% faster than a non-static MethodHandle?

>> Sorry if it's a dumb question, but why nonStaticMethodHandle cannot get
>> inlined here? -- In the benchmark it's always the same line with the
>> same final MethodHandle variable, can JIT based on some profiling info
>> to inline it (similar to the function object generated by
>> LambdaMetafactory). -- Or it cannot sine InvokeExact's
>> PolymorphicSignature makes it quite special?
> 
> Yes, method handle invokers are special and ordinary type profiling
> (class-based) doesn't work for them.
> 
> There was an idea to implement value profiling for MH invokers: record
> individual MethodHandle instances observed at invoker call sites and use
> that to guide devirtualizaiton & inlining decisions. But it looked way
> too specialized to be beneficial in practice.

Here is a code that does exactly that,
https://gist.github.com/forax/7bf08669f58804991fd45656a671c381

[...]

> Best regards,
> Vladimir Ivanov

Rémi

>> On Mon, Feb 19, 2018 at 4:00 AM, Vladimir Ivanov
>> mailto:vladimir.x.iva...@oracle.com>> wrote:
>> 
>> Geoffrey,
>> 
>> In both staticMethodHandle & lambdaMetafactory Dog::getName is
>> inlined, but using different mechanisms.
>> 
>> In staticMethodHandle target method is statically known [1], but in
>> case of lambdaMetafactory [2] compiler has to rely on profiling info
>> to devirtualize Function::apply(). The latter requires exact type
>> check on the receiver at runtime and that explains the difference
>> you are seeing.
>> 
>> But comparing that with nonStaticMethodHandle is not fair: there's
>> no inlining happening there.
>> 
>> If you want a fair comparison, then you have to measure with
>> polluted profile so no inlining happens. In that case [3] non-static
>> MethodHandles are on par (or even slightly faster):
>> 
>> LMF._4_lmf_fs  avgt   10  20.020 ± 0.635  ns/op
>> LMF._4_lmf_mhs avgt   10  18.360 ± 0.181  ns/op
>> 
>> (scores for 3 invocations in a row.)
>> 
>> Best regards,
>> Vladimir Ivanov
>> 
>> [1] 715  126    b        org.lmf.LMF::_1_staticMethodHandle (11 bytes)
>> ...
>>      @ 37
>>   java.lang.invoke.DirectMethodHandle$Holder::invokeVirtual (14
>> bytes)   force inline by annotation
>>        @ 1   java.lang.invoke.DirectMethodHandle::internalMemberName
>> (8 bytes)   force inline by annotation
>>        @ 10   org.lmf.LMF$Dog::getName (5 bytes)   accessor
>> 
>> 
>> 
>> 
>> [2] 678  117    b        org.lmf.LMF::_2_lambdaMetafactory (14 bytes)
>> @ 8   org.lmf.LMF$$Lambda$37/552160541::apply (8 bytes)   inline (hot)
>>   \-> TypeProfile (6700/6700 counts) = org/lmf/LMF$$Lambda$37
>>    @ 4   org.lmf.LMF$Dog::getName (5 bytes)   accessor
>> 
>> 
>> [3] http://cr.openjdk.java.net/~vlivanov/misc/LMF.java
>> 
>> 
>>      static Function make() throws Throwable {
>>          CallSite site = LambdaMetafactory.metafactory(LOOKUP,
>>                  "apply",
>>                  MethodType.methodType(Function.class),
>>                  MethodType.methodType(Object.class, Object.class),
>>                  LOOKUP.findVirtual(Dog.class, "getName",
>> MethodType.methodType(String.class)),
>>                  MethodType.methodType(String.class, Dog.class));
>>          return (Function) site.getTarget().invokeExact();
>>      }
>> 
>>      private Function[] fs = new Function[] {
>>          make(), make(), make()
>>      };
>> 
>>      private MethodHandle[] mhs = new MethodHandle[] {
>>          nonStaticMethodHandle,
>>          nonStaticMethodHandle,
>>          nonStaticMethodHandle
>>      };
>> 
>>      @Benchmark
>>      public Object _4_lmf_fs() throws Throwable {
>>          Object r = null;
>>          for (Function f : fs {
>>              r = f.apply(dogObject);
>>          }
>>          return r;
>>      }
>> 
>>      @Benchmark
>>      public Object _4_lmf_mh() throws Throwable {
>>          Object r = null;
>>          for (MethodHandle mh : mhs) {
>>              r = mh.invokeExact(dogObject);
>>          }
>>          return r;
>> 
>>      }
>> 
>> On 2/19/18 1:42 PM, Geoffrey De Smet wrote:
>> 
>> Hi guys,
>> 
>> I ran the following JMH benchmark on JDK 9 and JDK 8.
>> Source code and detailed results below.
>> 
>> Benchmark on JDK 9    Score
>> staticMethodHandle  2.770
>> lambdaMetafactory  3.052    // 10% slower
>> nonStaticMethodHandle   5.250    // 90% slower
>> 
>> Why is LambdaMetafactory 10% sl

Re: Why is LambdaMetafactory 10% slower than a static MethodHandle but 80% faster than a non-static MethodHandle?

2018-02-20 Thread Geoffrey De Smet

  
  

  
Also,
  does that mean if we try to pollute the LambdaMetafactory
(e.g. by 3 different function objects) to prevent
inline, we are likely to see similar performance :)
  
  As far as I can tell, I see a similar performance
  for this benchmark uses a megamorphic approach:
   
https://github.com/ge0ffrey/ge0ffrey-presentations/blob/master/code/fasterreflection/fasterreflection-client/src/main/java/be/ge0ffrey/presentations/fasterreflection/client/MegamorphicFasterReflectionClientBenchmark.java#L40
Result:

Benchmark 
Mode  Cnt   Score   Error  Units
  MegamorphicFasterReflectionClientBenchmark._200_MethodHandle  
avgt   60  17.507 ± 0.281  ns/op // Non-static
MethodHandle, still seriously slower
  MegamorphicFasterReflectionClientBenchmark._400_LambdaMetafactory 
avgt   60  14.393 ± 0.275  ns/op


  With kind regards,
Geoffrey De Smet

On 19/02/18 23:54, Wenlei Xie wrote:


  Thank you Vladimir for the explanation!


>
In both staticMethodHandle & lambdaMetafactory
Dog::getName is inlined, but using different mechanisms.
  
  >
In staticMethodHandle target method is statically known [1],
but in case of lambdaMetafactory [2] compiler has to rely on
profiling info to devirtualize Function::apply(). The latter
requires exact type check on the receiver at runtime and
that explains the difference you are seeing.
  
  >
But comparing that with nonStaticMethodHandle is not fair:
there's no inlining happening there.



Sorry if it's a dumb question, but why nonStaticMethodHandle
  cannot get inlined here? -- In the benchmark it's always
  the same line with the same final MethodHandle variable,
  can JIT based on some profiling info to inline it (similar
  to the function object generated by LambdaMetafactory).
-- Or it cannot sine InvokeExact's PolymorphicSignature
  makes it quite special?


Also,
  does that mean if we try to pollute the LambdaMetafactory
(e.g. by 3 different function objects) to prevent
inline, we are likely to see similar performance :)

  
Best,
Wenlei



  On Mon, Feb 19, 2018 at 4:00 AM,
Vladimir Ivanov 
wrote:
Geoffrey,
  
  In both staticMethodHandle & lambdaMetafactory
  Dog::getName is inlined, but using different mechanisms.
  
  In staticMethodHandle target method is statically known
  [1], but in case of lambdaMetafactory [2] compiler has to
  rely on profiling info to devirtualize Function::apply().
  The latter requires exact type check on the receiver at
  runtime and that explains the difference you are seeing.
  
  But comparing that with nonStaticMethodHandle is not fair:
  there's no inlining happening there.
  
  If you want a fair comparison, then you have to measure
  with polluted profile so no inlining happens. In that case
  [3] non-static MethodHandles are on par (or even slightly
  faster):
  
  LMF._4_lmf_fs  avgt   10  20.020 ± 0.635  ns/op
  LMF._4_lmf_mhs avgt   10  18.360 ± 0.181  ns/op
  
  (scores for 3 invocations in a row.)
  
  Best regards,
  Vladimir Ivanov
  
  [1] 715  126    b        org.lmf.LMF::_1_staticMethodHandle
  (11 bytes)
  ...
      @ 37   java.lang.invoke.DirectMethodHandle$Holder::invokeVirtual
  (14 bytes)   force inline by annotation
        @ 1   java.lang.invoke.DirectMethodHandle::internalMemberName
  (8 bytes)   force inline by annotation
        @ 10   org.lmf.LMF$Dog::getName (5 bytes)   accessor
  
  
  
  
  [2] 678  117    b        org.lmf.LMF::_2_lambdaMetafactory
  (14 bytes)
  @ 8   org.lmf.LMF$$Lambda$37/552160541::apply (8
  bytes)   inline (hot)
   \-> TypeProfile (6700/6700 counts) =
  org/lmf/LMF$$Lambda$37
    @ 4   org.lmf.LMF$Dog::getName (5 bytes)   accessor
  
  
  [3] http://cr.openjdk.java.net/~vlivanov/misc/LMF.java