I've just set up MacBook Pro to measure the effect on macOS. The results are the followings.
VMs tested: "baseline" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/jsc "patched" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/nojit-llint/Release/jsc Collected 2 samples per benchmark/VM, with 2 VM invocations per benchmark. Emitted a call to gc() between sample measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used the jsc-specific preciseTime() function to get microsecond-level timing. Reporting benchmark execution times with 95% confidence intervals in milliseconds. baseline patched ai-astar 1738.056+-49.666 ^ 1568.904+-44.535 ^ definitely 1.1078x faster audio-beat-detection 1127.677+-15.749 ^ 972.323+-23.908 ^ definitely 1.1598x faster audio-dft 942.952+-107.209 919.933+-310.247 might be 1.0250x faster audio-fft 985.489+-47.414 ^ 796.955+-25.476 ^ definitely 1.2366x faster audio-oscillator 967.891+-34.854 ^ 801.778+-18.226 ^ definitely 1.2072x faster imaging-darkroom 1265.340+-114.464 ^ 1099.233+-2.372 ^ definitely 1.1511x faster imaging-desaturate 1737.826+-40.791 ? 1749.010+-167.969 ? imaging-gaussian-blur 7846.369+-52.165 ^ 6392.379+-1025.168 ^ definitely 1.2275x faster json-parse-financial 33.141+-0.473 33.054+-1.058 json-stringify-tinderbox 20.803+-0.901 20.664+-0.717 stanford-crypto-aes 401.589+-39.750 376.622+-12.111 might be 1.0663x faster stanford-crypto-ccm 245.629+-45.322 228.013+-8.976 might be 1.0773x faster stanford-crypto-pbkdf2 941.178+-28.744 864.462+-60.083 might be 1.0887x faster stanford-crypto-sha256-iterative 299.988+-47.729 270.849+-32.356 might be 1.1076x faster <arithmetic> 1325.281+-2.613 ^ 1149.584+-75.875 ^ definitely 1.1528x faster Interestingly, the improvement is not so large. In Linux box, it was 2x. But in macOS, it is 15%. But I think it is very nice if we can get 15% boost without any drawbacks. On Thu, Sep 20, 2018 at 3:08 PM Saam Barati <sbar...@apple.com> wrote: > Interesting! I must have not run this experiment correctly when I did it. > > - Saam > > On Sep 19, 2018, at 7:31 PM, Yusuke Suzuki <yusukesuz...@slowstart.org> > wrote: > > On Thu, Sep 20, 2018 at 12:54 AM Saam Barati <sbar...@apple.com> wrote: > >> To elaborate: I ran this same experiment before. And I forgot to turn off >> the RegExp JIT and got results similar to what you got. Once I turned off >> the RegExp JIT, I saw no perf difference. >> > > Yeah, I disabled JIT and RegExpJIT explicitly by using > > export JSC_useJIT=false > export JSC_useRegExpJIT=false > > and I checked no JIT code is generated by running dumpDisassembly. And I > also put `CRASH()` in ExecutableAllocator::singleton() to ensure no > executable memory is allocated. > The result is the same. I think `useJIT=false` disables RegExp JIT too. > > baseline > patched > > ai-astar 3499.046+-14.772 ^ > 1897.624+-234.517 ^ definitely 1.8439x faster > audio-beat-detection 1803.466+-491.965 > 970.636+-428.051 might be 1.8580x faster > audio-dft 1756.985+-68.710 ^ > 954.312+-528.406 ^ definitely 1.8411x faster > audio-fft 1637.969+-458.129 > 850.083+-449.228 might be 1.9268x faster > audio-oscillator 1866.006+-569.581 ^ > 967.194+-82.521 ^ definitely 1.9293x faster > imaging-darkroom 2156.526+-591.042 ^ > 1231.318+-187.297 ^ definitely 1.7514x faster > imaging-desaturate 3059.335+-284.740 ^ > 1754.128+-339.941 ^ definitely 1.7441x faster > imaging-gaussian-blur 16034.828+-1930.938 ^ > 7389.919+-2228.020 ^ definitely 2.1698x faster > json-parse-financial 60.273+-4.143 > 53.935+-28.957 might be 1.1175x faster > json-stringify-tinderbox 39.497+-3.915 > 38.146+-9.652 might be 1.0354x faster > stanford-crypto-aes 873.623+-208.225 ^ > 486.350+-132.379 ^ definitely 1.7963x faster > stanford-crypto-ccm 538.707+-33.979 ^ > 285.944+-41.570 ^ definitely 1.8840x faster > stanford-crypto-pbkdf2 1929.960+-649.861 ^ > 1044.320+-1.182 ^ definitely 1.8481x faster > stanford-crypto-sha256-iterative 614.344+-200.228 > 342.574+-123.524 might be 1.7933x faster > > <arithmetic> 2562.183+-207.456 ^ > 1304.749+-312.963 ^ definitely 1.9637x faster > > I think this result is not related to RegExp JIT since ai-astar is not > using RegExp. > > Best regards, > Yusuke Suzuki > > >> >> - Saam >> >> On Sep 19, 2018, at 8:53 AM, Saam Barati <sbar...@apple.com> wrote: >> >> Did you turn off the RegExp JIT? >> >> - Saam >> >> On Sep 18, 2018, at 11:23 PM, Yusuke Suzuki <yusukesuz...@slowstart.org> >> wrote: >> >> Hi WebKittens! >> >> Recently, node-jsc is announced[1]. When I read the documents of that >> project, >> I found that they use LLInt ASM interpreter instead of CLoop in non-JIT >> environment. >> So I had one question in my mind: How fast the LLInt ASM interpreter when >> comparing to CLoop? >> >> I've set up two builds. One is CLoop build (-DENABLE_JIT=OFF) and another >> is JIT build JSC with `JSC_useJIT=false`. >> And I've ran kraken benchmarks with these two builds in x64 Linux >> machine. The results are the followings. >> >> Benchmark report for Kraken on sakura-trick. >> >> VMs tested: >> "baseline" at >> /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/bin/jsc >> "patched" at >> /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit-llint/Release/bin/jsc >> >> Collected 10 samples per benchmark/VM, with 10 VM invocations per >> benchmark. Emitted a call to gc() between sample >> measurements. Used 1 benchmark iteration per VM invocation for warm-up. >> Used the jsc-specific preciseTime() >> function to get microsecond-level timing. Reporting benchmark execution >> times with 95% confidence intervals in >> milliseconds. >> >> baseline >> patched >> >> ai-astar 3619.974+-57.095 ^ >> 2014.835+-59.016 ^ definitely 1.7967x faster >> audio-beat-detection 1762.085+-24.853 ^ >> 1030.902+-19.743 ^ definitely 1.7093x faster >> audio-dft 1822.426+-28.704 ^ >> 909.262+-16.640 ^ definitely 2.0043x faster >> audio-fft 1651.070+-9.994 ^ >> 865.203+-7.912 ^ definitely 1.9083x faster >> audio-oscillator 1853.697+-26.539 ^ >> 992.406+-12.811 ^ definitely 1.8679x faster >> imaging-darkroom 2118.737+-23.219 ^ >> 1303.729+-8.071 ^ definitely 1.6251x faster >> imaging-desaturate 3133.654+-28.545 ^ >> 1759.738+-18.182 ^ definitely 1.7808x faster >> imaging-gaussian-blur 16321.090+-154.893 ^ >> 7228.017+-58.508 ^ definitely 2.2580x faster >> json-parse-financial 57.256+-2.876 >> 56.101+-4.265 might be 1.0206x faster >> json-stringify-tinderbox 38.470+-2.788 ? >> 38.771+-0.935 ? >> stanford-crypto-aes 851.341+-7.738 ^ >> 485.438+-13.904 ^ definitely 1.7538x faster >> stanford-crypto-ccm 556.133+-6.606 ^ >> 264.161+-3.970 ^ definitely 2.1053x faster >> stanford-crypto-pbkdf2 1945.718+-15.968 ^ >> 1075.013+-13.337 ^ definitely 1.8099x faster >> stanford-crypto-sha256-iterative 623.203+-7.604 ^ >> 349.782+-12.810 ^ definitely 1.7817x faster >> >> <arithmetic> 2596.775+-14.857 ^ >> 1312.383+-8.840 ^ definitely 1.9787x faster >> >> Surprisingly, LLInt ASM interpreter is significantly faster than CLoop. I >> expected it would be fast, but it would show around 10% performance win. >> But the reality is that it is 2x faster. It is too much number to me to >> consider enabling LLInt ASM interpreter for non-JIT build configuration. >> As a bonus, LLInt ASM interpreter offers sampling profiler support even >> in non-JIT environment. >> >> So my proposal is, how about enabling LLInt ASM interpreter in non-JIT >> configuration environment in major architectures (x64 and ARM64)? >> >> Best regards, >> Yusuke Suzuki >> >> [1]: >> https://lists.webkit.org/pipermail/webkit-dev/2018-September/030140.html >> >> _______________________________________________ >> webkit-dev mailing list >> webkit-dev@lists.webkit.org >> https://lists.webkit.org/mailman/listinfo/webkit-dev >> >> _______________________________________________ >> jsc-dev mailing list >> jsc-...@lists.webkit.org >> https://lists.webkit.org/mailman/listinfo/jsc-dev >> >>
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev