Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

Saam Barati Wed, 19 Sep 2018 23:10:19 -0700

Interesting! I must have not run this experiment correctly when I did it.

- Saam


> On Sep 19, 2018, at 7:31 PM, Yusuke Suzuki <[email protected]> wrote:
> 
>> On Thu, Sep 20, 2018 at 12:54 AM Saam Barati <[email protected]> wrote:
>> To elaborate: I ran this same experiment before. And I forgot to turn off 
>> the RegExp JIT and got results similar to what you got. Once I turned off 
>> the RegExp JIT, I saw no perf difference.
> 
> Yeah, I disabled JIT and RegExpJIT explicitly by using
> 
> export JSC_useJIT=false
> export JSC_useRegExpJIT=false
> 
> and I checked no JIT code is generated by running dumpDisassembly. And I also 
> put `CRASH()` in ExecutableAllocator::singleton() to ensure no executable 
> memory is allocated.
> The result is the same. I think `useJIT=false` disables RegExp JIT too.
> 
>                                            baseline                  patched  
>                                     
> 
> ai-astar                              3499.046+-14.772     ^    
> 1897.624+-234.517       ^ definitely 1.8439x faster
> audio-beat-detection                  1803.466+-491.965          
> 970.636+-428.051         might be 1.8580x faster
> audio-dft                             1756.985+-68.710     ^     
> 954.312+-528.406       ^ definitely 1.8411x faster
> audio-fft                             1637.969+-458.129          
> 850.083+-449.228         might be 1.9268x faster
> audio-oscillator                      1866.006+-569.581    ^     
> 967.194+-82.521        ^ definitely 1.9293x faster
> imaging-darkroom                      2156.526+-591.042    ^    
> 1231.318+-187.297       ^ definitely 1.7514x faster
> imaging-desaturate                    3059.335+-284.740    ^    
> 1754.128+-339.941       ^ definitely 1.7441x faster
> imaging-gaussian-blur                16034.828+-1930.938   ^    
> 7389.919+-2228.020      ^ definitely 2.1698x faster
> json-parse-financial                    60.273+-4.143             
> 53.935+-28.957          might be 1.1175x faster
> json-stringify-tinderbox                39.497+-3.915             
> 38.146+-9.652           might be 1.0354x faster
> stanford-crypto-aes                    873.623+-208.225    ^     
> 486.350+-132.379       ^ definitely 1.7963x faster
> stanford-crypto-ccm                    538.707+-33.979     ^     
> 285.944+-41.570        ^ definitely 1.8840x faster
> stanford-crypto-pbkdf2                1929.960+-649.861    ^    
> 1044.320+-1.182         ^ definitely 1.8481x faster
> stanford-crypto-sha256-iterative       614.344+-200.228          
> 342.574+-123.524         might be 1.7933x faster
> 
> <arithmetic>                          2562.183+-207.456    ^    
> 1304.749+-312.963       ^ definitely 1.9637x faster
> 
> I think this result is not related to RegExp JIT since ai-astar is not using 
> RegExp.
> 
> Best regards,
> Yusuke Suzuki
>  
>> 
>> - Saam
>> 
>>> On Sep 19, 2018, at 8:53 AM, Saam Barati <[email protected]> wrote:
>>> 
>>> Did you turn off the RegExp JIT?
>>> 
>>> - Saam
>>> 
>>>> On Sep 18, 2018, at 11:23 PM, Yusuke Suzuki <[email protected]> 
>>>> wrote:
>>>> 
>>>> Hi WebKittens!
>>>> 
>>>> Recently, node-jsc is announced[1]. When I read the documents of that 
>>>> project,
>>>> I found that they use LLInt ASM interpreter instead of CLoop in non-JIT 
>>>> environment.
>>>> So I had one question in my mind: How fast the LLInt ASM interpreter when 
>>>> comparing to CLoop?
>>>> 
>>>> I've set up two builds. One is CLoop build (-DENABLE_JIT=OFF) and another 
>>>> is JIT build JSC with `JSC_useJIT=false`.
>>>> And I've ran kraken benchmarks with these two builds in x64 Linux machine. 
>>>> The results are the followings.
>>>> 
>>>> Benchmark report for Kraken on sakura-trick.
>>>> 
>>>> VMs tested:
>>>> "baseline" at 
>>>> /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/bin/jsc
>>>> "patched" at 
>>>> /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit-llint/Release/bin/jsc
>>>> 
>>>> Collected 10 samples per benchmark/VM, with 10 VM invocations per 
>>>> benchmark. Emitted a call to gc() between sample
>>>> measurements. Used 1 benchmark iteration per VM invocation for warm-up. 
>>>> Used the jsc-specific preciseTime()
>>>> function to get microsecond-level timing. Reporting benchmark execution 
>>>> times with 95% confidence intervals in
>>>> milliseconds.
>>>> 
>>>>                                            baseline                  
>>>> patched                                      
>>>> 
>>>> ai-astar                              3619.974+-57.095     ^    
>>>> 2014.835+-59.016        ^ definitely 1.7967x faster
>>>> audio-beat-detection                  1762.085+-24.853     ^    
>>>> 1030.902+-19.743        ^ definitely 1.7093x faster
>>>> audio-dft                             1822.426+-28.704     ^     
>>>> 909.262+-16.640        ^ definitely 2.0043x faster
>>>> audio-fft                             1651.070+-9.994      ^     
>>>> 865.203+-7.912         ^ definitely 1.9083x faster
>>>> audio-oscillator                      1853.697+-26.539     ^     
>>>> 992.406+-12.811        ^ definitely 1.8679x faster
>>>> imaging-darkroom                      2118.737+-23.219     ^    
>>>> 1303.729+-8.071         ^ definitely 1.6251x faster
>>>> imaging-desaturate                    3133.654+-28.545     ^    
>>>> 1759.738+-18.182        ^ definitely 1.7808x faster
>>>> imaging-gaussian-blur                16321.090+-154.893    ^    
>>>> 7228.017+-58.508        ^ definitely 2.2580x faster
>>>> json-parse-financial                    57.256+-2.876             
>>>> 56.101+-4.265           might be 1.0206x faster
>>>> json-stringify-tinderbox                38.470+-2.788      ?      
>>>> 38.771+-0.935         ?
>>>> stanford-crypto-aes                    851.341+-7.738      ^     
>>>> 485.438+-13.904        ^ definitely 1.7538x faster
>>>> stanford-crypto-ccm                    556.133+-6.606      ^     
>>>> 264.161+-3.970         ^ definitely 2.1053x faster
>>>> stanford-crypto-pbkdf2                1945.718+-15.968     ^    
>>>> 1075.013+-13.337        ^ definitely 1.8099x faster
>>>> stanford-crypto-sha256-iterative       623.203+-7.604      ^     
>>>> 349.782+-12.810        ^ definitely 1.7817x faster
>>>> 
>>>> <arithmetic>                          2596.775+-14.857     ^    
>>>> 1312.383+-8.840         ^ definitely 1.9787x faster
>>>> 
>>>> Surprisingly, LLInt ASM interpreter is significantly faster than CLoop. I 
>>>> expected it would be fast, but it would show around 10% performance win.
>>>> But the reality is that it is 2x faster. It is too much number to me to 
>>>> consider enabling LLInt ASM interpreter for non-JIT build configuration.
>>>> As a bonus, LLInt ASM interpreter offers sampling profiler support even in 
>>>> non-JIT environment.
>>>> 
>>>> So my proposal is, how about enabling LLInt ASM interpreter in non-JIT 
>>>> configuration environment in major architectures (x64 and ARM64)?
>>>> 
>>>> Best regards,
>>>> Yusuke Suzuki
>>>> 
>>>> [1]: 
>>>> https://lists.webkit.org/pipermail/webkit-dev/2018-September/030140.html
>>>> _______________________________________________
>>>> webkit-dev mailing list
>>>> [email protected]
>>>> https://lists.webkit.org/mailman/listinfo/webkit-dev
>>> _______________________________________________
>>> jsc-dev mailing list
>>> [email protected]
>>> https://lists.webkit.org/mailman/listinfo/jsc-dev

_______________________________________________
webkit-dev mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-dev

Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

Reply via email to