Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

Filip Pizlo Thu, 20 Sep 2018 08:58:42 -0700

I think that we should move to removing JSVALUE32_64, since it doesn’t get 
significant testing or maintenance anymore. I’d love it if 32-bit targets used 
the cloop with JSVALUE64, so that we can rip out the 32-bit jit and offlineasm 
backends, and remove the 32-bit representation code from the runtime.


I’m fine with using asm llint on 64-bit platforms, but using it on 32-bit 
platforms seems like it’ll be short lived. 

-Filip

> On Sep 20, 2018, at 12:00 AM, Yusuke Suzuki <yusukesuz...@slowstart.org> 
> wrote:
> 
> I've just set up MacBook Pro to measure the effect on macOS.
> 
> The results are the followings.
> 
> VMs tested:
> "baseline" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/jsc
> "patched" at 
> /Users/yusukesuzuki/dev/WebKit/WebKitBuild/nojit-llint/Release/jsc
> 
> Collected 2 samples per benchmark/VM, with 2 VM invocations per benchmark. 
> Emitted a call to gc() between sample
> measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used 
> the jsc-specific preciseTime()
> function to get microsecond-level timing. Reporting benchmark execution times 
> with 95% confidence intervals in
> milliseconds.
> 
>                                            baseline                  patched  
>                                    
> 
> ai-astar                              1738.056+-49.666     ^    
> 1568.904+-44.535        ^ definitely 1.1078x faster
> audio-beat-detection                  1127.677+-15.749     ^     
> 972.323+-23.908        ^ definitely 1.1598x faster
> audio-dft                              942.952+-107.209          
> 919.933+-310.247         might be 1.0250x faster
> audio-fft                              985.489+-47.414     ^     
> 796.955+-25.476        ^ definitely 1.2366x faster
> audio-oscillator                       967.891+-34.854     ^     
> 801.778+-18.226        ^ definitely 1.2072x faster
> imaging-darkroom                      1265.340+-114.464    ^    
> 1099.233+-2.372         ^ definitely 1.1511x faster
> imaging-desaturate                    1737.826+-40.791     ?    
> 1749.010+-167.969       ?
> imaging-gaussian-blur                 7846.369+-52.165     ^    
> 6392.379+-1025.168      ^ definitely 1.2275x faster
> json-parse-financial                    33.141+-0.473             
> 33.054+-1.058         
> json-stringify-tinderbox                20.803+-0.901             
> 20.664+-0.717         
> stanford-crypto-aes                    401.589+-39.750           
> 376.622+-12.111          might be 1.0663x faster
> stanford-crypto-ccm                    245.629+-45.322           
> 228.013+-8.976           might be 1.0773x faster
> stanford-crypto-pbkdf2                 941.178+-28.744           
> 864.462+-60.083          might be 1.0887x faster
> stanford-crypto-sha256-iterative       299.988+-47.729           
> 270.849+-32.356          might be 1.1076x faster
> 
> <arithmetic>                          1325.281+-2.613      ^    
> 1149.584+-75.875        ^ definitely 1.1528x faster
> 
> Interestingly, the improvement is not so large. In Linux box, it was 2x. But 
> in macOS, it is 15%.
> But I think it is very nice if we can get 15% boost without any drawbacks.
> 
>> On Thu, Sep 20, 2018 at 3:08 PM Saam Barati <sbar...@apple.com> wrote:
>> Interesting! I must have not run this experiment correctly when I did it.
>> 
>> - Saam
>> 
>>> On Sep 19, 2018, at 7:31 PM, Yusuke Suzuki <yusukesuz...@slowstart.org> 
>>> wrote:
>>> 
>>>> On Thu, Sep 20, 2018 at 12:54 AM Saam Barati <sbar...@apple.com> wrote:
>>>> To elaborate: I ran this same experiment before. And I forgot to turn off 
>>>> the RegExp JIT and got results similar to what you got. Once I turned off 
>>>> the RegExp JIT, I saw no perf difference.
>>> 
>>> Yeah, I disabled JIT and RegExpJIT explicitly by using
>>> 
>>> export JSC_useJIT=false
>>> export JSC_useRegExpJIT=false
>>> 
>>> and I checked no JIT code is generated by running dumpDisassembly. And I 
>>> also put `CRASH()` in ExecutableAllocator::singleton() to ensure no 
>>> executable memory is allocated.
>>> The result is the same. I think `useJIT=false` disables RegExp JIT too.
>>> 
>>>                                            baseline                  
>>> patched                                      
>>> 
>>> ai-astar                              3499.046+-14.772     ^    
>>> 1897.624+-234.517       ^ definitely 1.8439x faster
>>> audio-beat-detection                  1803.466+-491.965          
>>> 970.636+-428.051         might be 1.8580x faster
>>> audio-dft                             1756.985+-68.710     ^     
>>> 954.312+-528.406       ^ definitely 1.8411x faster
>>> audio-fft                             1637.969+-458.129          
>>> 850.083+-449.228         might be 1.9268x faster
>>> audio-oscillator                      1866.006+-569.581    ^     
>>> 967.194+-82.521        ^ definitely 1.9293x faster
>>> imaging-darkroom                      2156.526+-591.042    ^    
>>> 1231.318+-187.297       ^ definitely 1.7514x faster
>>> imaging-desaturate                    3059.335+-284.740    ^    
>>> 1754.128+-339.941       ^ definitely 1.7441x faster
>>> imaging-gaussian-blur                16034.828+-1930.938   ^    
>>> 7389.919+-2228.020      ^ definitely 2.1698x faster
>>> json-parse-financial                    60.273+-4.143             
>>> 53.935+-28.957          might be 1.1175x faster
>>> json-stringify-tinderbox                39.497+-3.915             
>>> 38.146+-9.652           might be 1.0354x faster
>>> stanford-crypto-aes                    873.623+-208.225    ^     
>>> 486.350+-132.379       ^ definitely 1.7963x faster
>>> stanford-crypto-ccm                    538.707+-33.979     ^     
>>> 285.944+-41.570        ^ definitely 1.8840x faster
>>> stanford-crypto-pbkdf2                1929.960+-649.861    ^    
>>> 1044.320+-1.182         ^ definitely 1.8481x faster
>>> stanford-crypto-sha256-iterative       614.344+-200.228          
>>> 342.574+-123.524         might be 1.7933x faster
>>> 
>>> <arithmetic>                          2562.183+-207.456    ^    
>>> 1304.749+-312.963       ^ definitely 1.9637x faster
>>> 
>>> I think this result is not related to RegExp JIT since ai-astar is not 
>>> using RegExp.
>>> 
>>> Best regards,
>>> Yusuke Suzuki
>>>  
>>>> 
>>>> - Saam
>>>> 
>>>>> On Sep 19, 2018, at 8:53 AM, Saam Barati <sbar...@apple.com> wrote:
>>>>> 
>>>>> Did you turn off the RegExp JIT?
>>>>> 
>>>>> - Saam
>>>>> 
>>>>>> On Sep 18, 2018, at 11:23 PM, Yusuke Suzuki <yusukesuz...@slowstart.org> 
>>>>>> wrote:
>>>>>> 
>>>>>> Hi WebKittens!
>>>>>> 
>>>>>> Recently, node-jsc is announced[1]. When I read the documents of that 
>>>>>> project,
>>>>>> I found that they use LLInt ASM interpreter instead of CLoop in non-JIT 
>>>>>> environment.
>>>>>> So I had one question in my mind: How fast the LLInt ASM interpreter 
>>>>>> when comparing to CLoop?
>>>>>> 
>>>>>> I've set up two builds. One is CLoop build (-DENABLE_JIT=OFF) and 
>>>>>> another is JIT build JSC with `JSC_useJIT=false`.
>>>>>> And I've ran kraken benchmarks with these two builds in x64 Linux 
>>>>>> machine. The results are the followings.
>>>>>> 
>>>>>> Benchmark report for Kraken on sakura-trick.
>>>>>> 
>>>>>> VMs tested:
>>>>>> "baseline" at 
>>>>>> /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/bin/jsc
>>>>>> "patched" at 
>>>>>> /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit-llint/Release/bin/jsc
>>>>>> 
>>>>>> Collected 10 samples per benchmark/VM, with 10 VM invocations per 
>>>>>> benchmark. Emitted a call to gc() between sample
>>>>>> measurements. Used 1 benchmark iteration per VM invocation for warm-up. 
>>>>>> Used the jsc-specific preciseTime()
>>>>>> function to get microsecond-level timing. Reporting benchmark execution 
>>>>>> times with 95% confidence intervals in
>>>>>> milliseconds.
>>>>>> 
>>>>>>                                            baseline                  
>>>>>> patched                                      
>>>>>> 
>>>>>> ai-astar                              3619.974+-57.095     ^    
>>>>>> 2014.835+-59.016        ^ definitely 1.7967x faster
>>>>>> audio-beat-detection                  1762.085+-24.853     ^    
>>>>>> 1030.902+-19.743        ^ definitely 1.7093x faster
>>>>>> audio-dft                             1822.426+-28.704     ^     
>>>>>> 909.262+-16.640        ^ definitely 2.0043x faster
>>>>>> audio-fft                             1651.070+-9.994      ^     
>>>>>> 865.203+-7.912         ^ definitely 1.9083x faster
>>>>>> audio-oscillator                      1853.697+-26.539     ^     
>>>>>> 992.406+-12.811        ^ definitely 1.8679x faster
>>>>>> imaging-darkroom                      2118.737+-23.219     ^    
>>>>>> 1303.729+-8.071         ^ definitely 1.6251x faster
>>>>>> imaging-desaturate                    3133.654+-28.545     ^    
>>>>>> 1759.738+-18.182        ^ definitely 1.7808x faster
>>>>>> imaging-gaussian-blur                16321.090+-154.893    ^    
>>>>>> 7228.017+-58.508        ^ definitely 2.2580x faster
>>>>>> json-parse-financial                    57.256+-2.876             
>>>>>> 56.101+-4.265           might be 1.0206x faster
>>>>>> json-stringify-tinderbox                38.470+-2.788      ?      
>>>>>> 38.771+-0.935         ?
>>>>>> stanford-crypto-aes                    851.341+-7.738      ^     
>>>>>> 485.438+-13.904        ^ definitely 1.7538x faster
>>>>>> stanford-crypto-ccm                    556.133+-6.606      ^     
>>>>>> 264.161+-3.970         ^ definitely 2.1053x faster
>>>>>> stanford-crypto-pbkdf2                1945.718+-15.968     ^    
>>>>>> 1075.013+-13.337        ^ definitely 1.8099x faster
>>>>>> stanford-crypto-sha256-iterative       623.203+-7.604      ^     
>>>>>> 349.782+-12.810        ^ definitely 1.7817x faster
>>>>>> 
>>>>>> <arithmetic>                          2596.775+-14.857     ^    
>>>>>> 1312.383+-8.840         ^ definitely 1.9787x faster
>>>>>> 
>>>>>> Surprisingly, LLInt ASM interpreter is significantly faster than CLoop. 
>>>>>> I expected it would be fast, but it would show around 10% performance 
>>>>>> win.
>>>>>> But the reality is that it is 2x faster. It is too much number to me to 
>>>>>> consider enabling LLInt ASM interpreter for non-JIT build configuration.
>>>>>> As a bonus, LLInt ASM interpreter offers sampling profiler support even 
>>>>>> in non-JIT environment.
>>>>>> 
>>>>>> So my proposal is, how about enabling LLInt ASM interpreter in non-JIT 
>>>>>> configuration environment in major architectures (x64 and ARM64)?
>>>>>> 
>>>>>> Best regards,
>>>>>> Yusuke Suzuki
>>>>>> 
>>>>>> [1]: 
>>>>>> https://lists.webkit.org/pipermail/webkit-dev/2018-September/030140.html
>>>>>> _______________________________________________
>>>>>> webkit-dev mailing list
>>>>>> webkit-dev@lists.webkit.org
>>>>>> https://lists.webkit.org/mailman/listinfo/webkit-dev
>>>>> _______________________________________________
>>>>> jsc-dev mailing list
>>>>> jsc-...@lists.webkit.org
>>>>> https://lists.webkit.org/mailman/listinfo/jsc-dev
> _______________________________________________
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/webkit-dev

_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev

Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

Reply via email to