I think that we should move to removing JSVALUE32_64, since it doesn’t get significant testing or maintenance anymore. I’d love it if 32-bit targets used the cloop with JSVALUE64, so that we can rip out the 32-bit jit and offlineasm backends, and remove the 32-bit representation code from the runtime.
I’m fine with using asm llint on 64-bit platforms, but using it on 32-bit platforms seems like it’ll be short lived. -Filip > On Sep 20, 2018, at 12:00 AM, Yusuke Suzuki <yusukesuz...@slowstart.org> > wrote: > > I've just set up MacBook Pro to measure the effect on macOS. > > The results are the followings. > > VMs tested: > "baseline" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/jsc > "patched" at > /Users/yusukesuzuki/dev/WebKit/WebKitBuild/nojit-llint/Release/jsc > > Collected 2 samples per benchmark/VM, with 2 VM invocations per benchmark. > Emitted a call to gc() between sample > measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used > the jsc-specific preciseTime() > function to get microsecond-level timing. Reporting benchmark execution times > with 95% confidence intervals in > milliseconds. > > baseline patched > > > ai-astar 1738.056+-49.666 ^ > 1568.904+-44.535 ^ definitely 1.1078x faster > audio-beat-detection 1127.677+-15.749 ^ > 972.323+-23.908 ^ definitely 1.1598x faster > audio-dft 942.952+-107.209 > 919.933+-310.247 might be 1.0250x faster > audio-fft 985.489+-47.414 ^ > 796.955+-25.476 ^ definitely 1.2366x faster > audio-oscillator 967.891+-34.854 ^ > 801.778+-18.226 ^ definitely 1.2072x faster > imaging-darkroom 1265.340+-114.464 ^ > 1099.233+-2.372 ^ definitely 1.1511x faster > imaging-desaturate 1737.826+-40.791 ? > 1749.010+-167.969 ? > imaging-gaussian-blur 7846.369+-52.165 ^ > 6392.379+-1025.168 ^ definitely 1.2275x faster > json-parse-financial 33.141+-0.473 > 33.054+-1.058 > json-stringify-tinderbox 20.803+-0.901 > 20.664+-0.717 > stanford-crypto-aes 401.589+-39.750 > 376.622+-12.111 might be 1.0663x faster > stanford-crypto-ccm 245.629+-45.322 > 228.013+-8.976 might be 1.0773x faster > stanford-crypto-pbkdf2 941.178+-28.744 > 864.462+-60.083 might be 1.0887x faster > stanford-crypto-sha256-iterative 299.988+-47.729 > 270.849+-32.356 might be 1.1076x faster > > <arithmetic> 1325.281+-2.613 ^ > 1149.584+-75.875 ^ definitely 1.1528x faster > > Interestingly, the improvement is not so large. In Linux box, it was 2x. But > in macOS, it is 15%. > But I think it is very nice if we can get 15% boost without any drawbacks. > >> On Thu, Sep 20, 2018 at 3:08 PM Saam Barati <sbar...@apple.com> wrote: >> Interesting! I must have not run this experiment correctly when I did it. >> >> - Saam >> >>> On Sep 19, 2018, at 7:31 PM, Yusuke Suzuki <yusukesuz...@slowstart.org> >>> wrote: >>> >>>> On Thu, Sep 20, 2018 at 12:54 AM Saam Barati <sbar...@apple.com> wrote: >>>> To elaborate: I ran this same experiment before. And I forgot to turn off >>>> the RegExp JIT and got results similar to what you got. Once I turned off >>>> the RegExp JIT, I saw no perf difference. >>> >>> Yeah, I disabled JIT and RegExpJIT explicitly by using >>> >>> export JSC_useJIT=false >>> export JSC_useRegExpJIT=false >>> >>> and I checked no JIT code is generated by running dumpDisassembly. And I >>> also put `CRASH()` in ExecutableAllocator::singleton() to ensure no >>> executable memory is allocated. >>> The result is the same. I think `useJIT=false` disables RegExp JIT too. >>> >>> baseline >>> patched >>> >>> ai-astar 3499.046+-14.772 ^ >>> 1897.624+-234.517 ^ definitely 1.8439x faster >>> audio-beat-detection 1803.466+-491.965 >>> 970.636+-428.051 might be 1.8580x faster >>> audio-dft 1756.985+-68.710 ^ >>> 954.312+-528.406 ^ definitely 1.8411x faster >>> audio-fft 1637.969+-458.129 >>> 850.083+-449.228 might be 1.9268x faster >>> audio-oscillator 1866.006+-569.581 ^ >>> 967.194+-82.521 ^ definitely 1.9293x faster >>> imaging-darkroom 2156.526+-591.042 ^ >>> 1231.318+-187.297 ^ definitely 1.7514x faster >>> imaging-desaturate 3059.335+-284.740 ^ >>> 1754.128+-339.941 ^ definitely 1.7441x faster >>> imaging-gaussian-blur 16034.828+-1930.938 ^ >>> 7389.919+-2228.020 ^ definitely 2.1698x faster >>> json-parse-financial 60.273+-4.143 >>> 53.935+-28.957 might be 1.1175x faster >>> json-stringify-tinderbox 39.497+-3.915 >>> 38.146+-9.652 might be 1.0354x faster >>> stanford-crypto-aes 873.623+-208.225 ^ >>> 486.350+-132.379 ^ definitely 1.7963x faster >>> stanford-crypto-ccm 538.707+-33.979 ^ >>> 285.944+-41.570 ^ definitely 1.8840x faster >>> stanford-crypto-pbkdf2 1929.960+-649.861 ^ >>> 1044.320+-1.182 ^ definitely 1.8481x faster >>> stanford-crypto-sha256-iterative 614.344+-200.228 >>> 342.574+-123.524 might be 1.7933x faster >>> >>> <arithmetic> 2562.183+-207.456 ^ >>> 1304.749+-312.963 ^ definitely 1.9637x faster >>> >>> I think this result is not related to RegExp JIT since ai-astar is not >>> using RegExp. >>> >>> Best regards, >>> Yusuke Suzuki >>> >>>> >>>> - Saam >>>> >>>>> On Sep 19, 2018, at 8:53 AM, Saam Barati <sbar...@apple.com> wrote: >>>>> >>>>> Did you turn off the RegExp JIT? >>>>> >>>>> - Saam >>>>> >>>>>> On Sep 18, 2018, at 11:23 PM, Yusuke Suzuki <yusukesuz...@slowstart.org> >>>>>> wrote: >>>>>> >>>>>> Hi WebKittens! >>>>>> >>>>>> Recently, node-jsc is announced[1]. When I read the documents of that >>>>>> project, >>>>>> I found that they use LLInt ASM interpreter instead of CLoop in non-JIT >>>>>> environment. >>>>>> So I had one question in my mind: How fast the LLInt ASM interpreter >>>>>> when comparing to CLoop? >>>>>> >>>>>> I've set up two builds. One is CLoop build (-DENABLE_JIT=OFF) and >>>>>> another is JIT build JSC with `JSC_useJIT=false`. >>>>>> And I've ran kraken benchmarks with these two builds in x64 Linux >>>>>> machine. The results are the followings. >>>>>> >>>>>> Benchmark report for Kraken on sakura-trick. >>>>>> >>>>>> VMs tested: >>>>>> "baseline" at >>>>>> /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/bin/jsc >>>>>> "patched" at >>>>>> /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit-llint/Release/bin/jsc >>>>>> >>>>>> Collected 10 samples per benchmark/VM, with 10 VM invocations per >>>>>> benchmark. Emitted a call to gc() between sample >>>>>> measurements. Used 1 benchmark iteration per VM invocation for warm-up. >>>>>> Used the jsc-specific preciseTime() >>>>>> function to get microsecond-level timing. Reporting benchmark execution >>>>>> times with 95% confidence intervals in >>>>>> milliseconds. >>>>>> >>>>>> baseline >>>>>> patched >>>>>> >>>>>> ai-astar 3619.974+-57.095 ^ >>>>>> 2014.835+-59.016 ^ definitely 1.7967x faster >>>>>> audio-beat-detection 1762.085+-24.853 ^ >>>>>> 1030.902+-19.743 ^ definitely 1.7093x faster >>>>>> audio-dft 1822.426+-28.704 ^ >>>>>> 909.262+-16.640 ^ definitely 2.0043x faster >>>>>> audio-fft 1651.070+-9.994 ^ >>>>>> 865.203+-7.912 ^ definitely 1.9083x faster >>>>>> audio-oscillator 1853.697+-26.539 ^ >>>>>> 992.406+-12.811 ^ definitely 1.8679x faster >>>>>> imaging-darkroom 2118.737+-23.219 ^ >>>>>> 1303.729+-8.071 ^ definitely 1.6251x faster >>>>>> imaging-desaturate 3133.654+-28.545 ^ >>>>>> 1759.738+-18.182 ^ definitely 1.7808x faster >>>>>> imaging-gaussian-blur 16321.090+-154.893 ^ >>>>>> 7228.017+-58.508 ^ definitely 2.2580x faster >>>>>> json-parse-financial 57.256+-2.876 >>>>>> 56.101+-4.265 might be 1.0206x faster >>>>>> json-stringify-tinderbox 38.470+-2.788 ? >>>>>> 38.771+-0.935 ? >>>>>> stanford-crypto-aes 851.341+-7.738 ^ >>>>>> 485.438+-13.904 ^ definitely 1.7538x faster >>>>>> stanford-crypto-ccm 556.133+-6.606 ^ >>>>>> 264.161+-3.970 ^ definitely 2.1053x faster >>>>>> stanford-crypto-pbkdf2 1945.718+-15.968 ^ >>>>>> 1075.013+-13.337 ^ definitely 1.8099x faster >>>>>> stanford-crypto-sha256-iterative 623.203+-7.604 ^ >>>>>> 349.782+-12.810 ^ definitely 1.7817x faster >>>>>> >>>>>> <arithmetic> 2596.775+-14.857 ^ >>>>>> 1312.383+-8.840 ^ definitely 1.9787x faster >>>>>> >>>>>> Surprisingly, LLInt ASM interpreter is significantly faster than CLoop. >>>>>> I expected it would be fast, but it would show around 10% performance >>>>>> win. >>>>>> But the reality is that it is 2x faster. It is too much number to me to >>>>>> consider enabling LLInt ASM interpreter for non-JIT build configuration. >>>>>> As a bonus, LLInt ASM interpreter offers sampling profiler support even >>>>>> in non-JIT environment. >>>>>> >>>>>> So my proposal is, how about enabling LLInt ASM interpreter in non-JIT >>>>>> configuration environment in major architectures (x64 and ARM64)? >>>>>> >>>>>> Best regards, >>>>>> Yusuke Suzuki >>>>>> >>>>>> [1]: >>>>>> https://lists.webkit.org/pipermail/webkit-dev/2018-September/030140.html >>>>>> _______________________________________________ >>>>>> webkit-dev mailing list >>>>>> webkit-dev@lists.webkit.org >>>>>> https://lists.webkit.org/mailman/listinfo/webkit-dev >>>>> _______________________________________________ >>>>> jsc-dev mailing list >>>>> jsc-...@lists.webkit.org >>>>> https://lists.webkit.org/mailman/listinfo/jsc-dev > _______________________________________________ > webkit-dev mailing list > webkit-dev@lists.webkit.org > https://lists.webkit.org/mailman/listinfo/webkit-dev
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev