Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

2018-09-21 Thread Filip Pizlo


> On Sep 21, 2018, at 9:44 AM, Guillaume Emont  wrote:
> 
> Quoting Yusuke Suzuki (2018-09-21 10:10:59)
>> Yeah, I'm not planning to enable LLInt ASM interpreter on 32bit architectures
>> since no buildbot exists for this configuration.
> 
> I'm confused. Do you mean you don't want to enable LLint instead of
> CLoop, for the case when JIT is disabled on 32-bit architectures?

If you guys want to take responsibility for 32-bit then you can enable whatever 
LLInt config you want on 32-bit.

> FTR, the configuration LLInt(with offlineasm)+jit+dfg is tested in
> 32-bit testbots for at least mips, armv7 and x86.
> 
> 
>> And we should make 32bit architectures JSVALUE64, so LLInt JSVALUE32_64 
>> should
>> be removed in the future.
> 
> See what Filip and Michael were saying. We believe that we need
> JSVALUE32_64, and we are willing to maintain it, as the performance gap
> between LLInt or CLoop and JIT+DFG on 32-bit architectures is
> significant.

I’m saying we should remove JSVALUE32_64. That is my preference. I’m letting it 
stay in tree so long as someone maintains it, but honestly I’d prefer it if it 
wasn’t maintained and if we could let it die. 

I’d like to see the majority of JSC development move to 64-bit. I’d prefer if 
new features or enhancements were 64-bit only, since that means that it will 
take less time to develop and test them. I think that folks doing JSC 
development should be encouraged to land changes only for 64-bit since that’s 
our focus as a project.

-Filip

> 
> Guillaume
> 
>> 
>> On Fri, Sep 21, 2018 at 2:33 AM Michael Catanzaro 
>> wrote:
>> 
>>>On Thu, Sep 20, 2018 at 12:02 PM, Filip Pizlo  wrote:
>>> - Enable cloop/JSVALUE64 to work on 32-bit.  I don’t think it does
>>> right now, but that’s probably trivial to fix.
>>> - Switch Darwin ports to that configuration for 32-bit.
>>> - When changes land to support new features, make it mandatory to
>>> support JSVALUE64 and optional to support JSVALUE32_64.  Such changes
>>> should include whoever volunteers to maintain JSVALUE32_64 in CC.
>>> 
>>> If you guys consider JSVALUE32_64 to be critical, then you can go
>>> ahead and maintain it.  We’ll let JSVALUE32_64 stay in the tree so
>>> long as someone is maintaining it.
>> 
>>Yes that's fine with us. I think that's the previous agreement, anyway.
>>:)
>> 
>>Michael
>> 
>> 
>> 
>> 
>> --
>> Best regards,
>> Yusuke Suzuki
> ___
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/webkit-dev
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

2018-09-21 Thread Guillaume Emont
Quoting Yusuke Suzuki (2018-09-21 10:10:59)
> Yeah, I'm not planning to enable LLInt ASM interpreter on 32bit architectures
> since no buildbot exists for this configuration.

I'm confused. Do you mean you don't want to enable LLint instead of
CLoop, for the case when JIT is disabled on 32-bit architectures?
FTR, the configuration LLInt(with offlineasm)+jit+dfg is tested in
32-bit testbots for at least mips, armv7 and x86.


> And we should make 32bit architectures JSVALUE64, so LLInt JSVALUE32_64 should
> be removed in the future.

See what Filip and Michael were saying. We believe that we need
JSVALUE32_64, and we are willing to maintain it, as the performance gap
between LLInt or CLoop and JIT+DFG on 32-bit architectures is
significant.

Guillaume

> 
> On Fri, Sep 21, 2018 at 2:33 AM Michael Catanzaro 
> wrote:
> 
> On Thu, Sep 20, 2018 at 12:02 PM, Filip Pizlo  wrote:
> > - Enable cloop/JSVALUE64 to work on 32-bit.  I don’t think it does
> > right now, but that’s probably trivial to fix.
> > - Switch Darwin ports to that configuration for 32-bit.
> > - When changes land to support new features, make it mandatory to
> > support JSVALUE64 and optional to support JSVALUE32_64.  Such changes
> > should include whoever volunteers to maintain JSVALUE32_64 in CC.
> >
> > If you guys consider JSVALUE32_64 to be critical, then you can go
> > ahead and maintain it.  We’ll let JSVALUE32_64 stay in the tree so
> > long as someone is maintaining it.
> 
> Yes that's fine with us. I think that's the previous agreement, anyway.
> :)
> 
> Michael
> 
> 
> 
> 
> --
> Best regards,
> Yusuke Suzuki
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

2018-09-21 Thread Yusuke Suzuki
Yeah, I'm not planning to enable LLInt ASM interpreter on 32bit
architectures since no buildbot exists for this configuration.
And we should make 32bit architectures JSVALUE64, so LLInt JSVALUE32_64
should be removed in the future.

On Fri, Sep 21, 2018 at 2:33 AM Michael Catanzaro 
wrote:

> On Thu, Sep 20, 2018 at 12:02 PM, Filip Pizlo  wrote:
> > - Enable cloop/JSVALUE64 to work on 32-bit.  I don’t think it does
> > right now, but that’s probably trivial to fix.
> > - Switch Darwin ports to that configuration for 32-bit.
> > - When changes land to support new features, make it mandatory to
> > support JSVALUE64 and optional to support JSVALUE32_64.  Such changes
> > should include whoever volunteers to maintain JSVALUE32_64 in CC.
> >
> > If you guys consider JSVALUE32_64 to be critical, then you can go
> > ahead and maintain it.  We’ll let JSVALUE32_64 stay in the tree so
> > long as someone is maintaining it.
>
> Yes that's fine with us. I think that's the previous agreement, anyway.
> :)
>
> Michael
>
>

-- 
Best regards,
Yusuke Suzuki
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

2018-09-20 Thread Michael Catanzaro

On Thu, Sep 20, 2018 at 12:02 PM, Filip Pizlo  wrote:
- Enable cloop/JSVALUE64 to work on 32-bit.  I don’t think it does 
right now, but that’s probably trivial to fix.

- Switch Darwin ports to that configuration for 32-bit.
- When changes land to support new features, make it mandatory to 
support JSVALUE64 and optional to support JSVALUE32_64.  Such changes 
should include whoever volunteers to maintain JSVALUE32_64 in CC.


If you guys consider JSVALUE32_64 to be critical, then you can go 
ahead and maintain it.  We’ll let JSVALUE32_64 stay in the tree so 
long as someone is maintaining it.


Yes that's fine with us. I think that's the previous agreement, anyway. 
:)


Michael

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

2018-09-20 Thread Filip Pizlo
Most JSC development focuses on JSVALUE64.  JSVALUE32_64 is currently years 
behind JSVALUE64 - it has no concurrent JIT, no concurrent GC, no FTL.  We 
regularly do tuning that ends up affecting both JSVALUE32_64 and JSVALUE64 
without even testing its impact on JSVALUE32_64.  JSVALUE32_64 is a 
second-class citizen in JSC.

I propose this:

- Enable cloop/JSVALUE64 to work on 32-bit.  I don’t think it does right now, 
but that’s probably trivial to fix.
- Switch Darwin ports to that configuration for 32-bit.
- When changes land to support new features, make it mandatory to support 
JSVALUE64 and optional to support JSVALUE32_64.  Such changes should include 
whoever volunteers to maintain JSVALUE32_64 in CC.

If you guys consider JSVALUE32_64 to be critical, then you can go ahead and 
maintain it.  We’ll let JSVALUE32_64 stay in the tree so long as someone is 
maintaining it.

-Filip


> On Sep 20, 2018, at 9:07 AM, Michael Catanzaro  wrote:
> 
> 
> I believe Guillaume has previously established that results in a substantial 
> performance regression for WPE. It is currently running in production on tens 
> of millions of consumer set top boxes. I think that's substantial testing. :)
> 
> Michael
> 

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

2018-09-20 Thread Michael Catanzaro
On Thu, Sep 20, 2018 at 11:49 AM, Geoffrey Garen  
wrote:
Is there something Linux-specific about CLoop or LLInt, or is this a 
compiler difference?


No clue. I'll refer you to the results of Guillaume's investigation:

https://lists.webkit.org/pipermail/webkit-dev/2018-February/029877.html

Michael

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

2018-09-20 Thread Geoffrey Garen
> Interestingly, the improvement is not so large. In Linux box, it was 2x. But 
> in macOS, it is 15%.

Is there something Linux-specific about CLoop or LLInt, or is this a compiler 
difference?

Thanks,
Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

2018-09-20 Thread Michael Catanzaro



I believe Guillaume has previously established that results in a 
substantial performance regression for WPE. It is currently running in 
production on tens of millions of consumer set top boxes. I think 
that's substantial testing. :)


Michael

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

2018-09-20 Thread Filip Pizlo
I think that we should move to removing JSVALUE32_64, since it doesn’t get 
significant testing or maintenance anymore. I’d love it if 32-bit targets used 
the cloop with JSVALUE64, so that we can rip out the 32-bit jit and offlineasm 
backends, and remove the 32-bit representation code from the runtime. 

I’m fine with using asm llint on 64-bit platforms, but using it on 32-bit 
platforms seems like it’ll be short lived. 

-Filip

> On Sep 20, 2018, at 12:00 AM, Yusuke Suzuki  
> wrote:
> 
> I've just set up MacBook Pro to measure the effect on macOS.
> 
> The results are the followings.
> 
> VMs tested:
> "baseline" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/jsc
> "patched" at 
> /Users/yusukesuzuki/dev/WebKit/WebKitBuild/nojit-llint/Release/jsc
> 
> Collected 2 samples per benchmark/VM, with 2 VM invocations per benchmark. 
> Emitted a call to gc() between sample
> measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used 
> the jsc-specific preciseTime()
> function to get microsecond-level timing. Reporting benchmark execution times 
> with 95% confidence intervals in
> milliseconds.
> 
>baseline  patched  
>
> 
> ai-astar  1738.056+-49.666 ^
> 1568.904+-44.535^ definitely 1.1078x faster
> audio-beat-detection  1127.677+-15.749 ^ 
> 972.323+-23.908^ definitely 1.1598x faster
> audio-dft  942.952+-107.209  
> 919.933+-310.247 might be 1.0250x faster
> audio-fft  985.489+-47.414 ^ 
> 796.955+-25.476^ definitely 1.2366x faster
> audio-oscillator   967.891+-34.854 ^ 
> 801.778+-18.226^ definitely 1.2072x faster
> imaging-darkroom  1265.340+-114.464^
> 1099.233+-2.372 ^ definitely 1.1511x faster
> imaging-desaturate1737.826+-40.791 ?
> 1749.010+-167.969   ?
> imaging-gaussian-blur 7846.369+-52.165 ^
> 6392.379+-1025.168  ^ definitely 1.2275x faster
> json-parse-financial33.141+-0.473 
> 33.054+-1.058 
> json-stringify-tinderbox20.803+-0.901 
> 20.664+-0.717 
> stanford-crypto-aes401.589+-39.750   
> 376.622+-12.111  might be 1.0663x faster
> stanford-crypto-ccm245.629+-45.322   
> 228.013+-8.976   might be 1.0773x faster
> stanford-crypto-pbkdf2 941.178+-28.744   
> 864.462+-60.083  might be 1.0887x faster
> stanford-crypto-sha256-iterative   299.988+-47.729   
> 270.849+-32.356  might be 1.1076x faster
> 
>   1325.281+-2.613  ^
> 1149.584+-75.875^ definitely 1.1528x faster
> 
> Interestingly, the improvement is not so large. In Linux box, it was 2x. But 
> in macOS, it is 15%.
> But I think it is very nice if we can get 15% boost without any drawbacks.
> 
>> On Thu, Sep 20, 2018 at 3:08 PM Saam Barati  wrote:
>> Interesting! I must have not run this experiment correctly when I did it.
>> 
>> - Saam
>> 
>>> On Sep 19, 2018, at 7:31 PM, Yusuke Suzuki  
>>> wrote:
>>> 
 On Thu, Sep 20, 2018 at 12:54 AM Saam Barati  wrote:
 To elaborate: I ran this same experiment before. And I forgot to turn off 
 the RegExp JIT and got results similar to what you got. Once I turned off 
 the RegExp JIT, I saw no perf difference.
>>> 
>>> Yeah, I disabled JIT and RegExpJIT explicitly by using
>>> 
>>> export JSC_useJIT=false
>>> export JSC_useRegExpJIT=false
>>> 
>>> and I checked no JIT code is generated by running dumpDisassembly. And I 
>>> also put `CRASH()` in ExecutableAllocator::singleton() to ensure no 
>>> executable memory is allocated.
>>> The result is the same. I think `useJIT=false` disables RegExp JIT too.
>>> 
>>>baseline  
>>> patched  
>>> 
>>> ai-astar  3499.046+-14.772 ^
>>> 1897.624+-234.517   ^ definitely 1.8439x faster
>>> audio-beat-detection  1803.466+-491.965  
>>> 970.636+-428.051 might be 1.8580x faster
>>> audio-dft 1756.985+-68.710 ^ 
>>> 954.312+-528.406   ^ definitely 1.8411x faster
>>> audio-fft 1637.969+-458.129  
>>> 850.083+-449.228 might be 1.9268x faster
>>> audio-oscillator  1866.006+-569.581^ 
>>> 967.194+-82.521^ definitely 1.9293x faster
>>> imaging-darkroom  2156.526+-591.042^
>>> 1231.318+-187.297   ^ definitely 1.7514x faster
>>> imaging-desaturate3059.335+-284.740^
>>>

Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

2018-09-20 Thread Yusuke Suzuki
I've just set up MacBook Pro to measure the effect on macOS.

The results are the followings.

VMs tested:

"baseline" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/jsc

"patched" at
/Users/yusukesuzuki/dev/WebKit/WebKitBuild/nojit-llint/Release/jsc


Collected 2 samples per benchmark/VM, with 2 VM invocations per benchmark.
Emitted a call to gc() between sample

measurements. Used 1 benchmark iteration per VM invocation for warm-up.
Used the jsc-specific preciseTime()

function to get microsecond-level timing. Reporting benchmark execution
times with 95% confidence intervals in

milliseconds.


   baseline
patched


ai-astar  1738.056+-49.666 ^
1568.904+-44.535^ definitely 1.1078x faster

audio-beat-detection  1127.677+-15.749 ^
972.323+-23.908^ definitely 1.1598x faster

audio-dft  942.952+-107.209
919.933+-310.247 might be 1.0250x faster

audio-fft  985.489+-47.414 ^
796.955+-25.476^ definitely 1.2366x faster

audio-oscillator   967.891+-34.854 ^
801.778+-18.226^ definitely 1.2072x faster

imaging-darkroom  1265.340+-114.464^
1099.233+-2.372 ^ definitely 1.1511x faster

imaging-desaturate1737.826+-40.791 ?
1749.010+-167.969   ?

imaging-gaussian-blur 7846.369+-52.165 ^
6392.379+-1025.168  ^ definitely 1.2275x faster

json-parse-financial33.141+-0.473
33.054+-1.058

json-stringify-tinderbox20.803+-0.901
20.664+-0.717

stanford-crypto-aes401.589+-39.750
376.622+-12.111  might be 1.0663x faster

stanford-crypto-ccm245.629+-45.322
228.013+-8.976   might be 1.0773x faster

stanford-crypto-pbkdf2 941.178+-28.744
864.462+-60.083  might be 1.0887x faster

stanford-crypto-sha256-iterative   299.988+-47.729
270.849+-32.356  might be 1.1076x faster


  1325.281+-2.613  ^
1149.584+-75.875^ definitely 1.1528x faster

Interestingly, the improvement is not so large. In Linux box, it was 2x.
But in macOS, it is 15%.
But I think it is very nice if we can get 15% boost without any drawbacks.

On Thu, Sep 20, 2018 at 3:08 PM Saam Barati  wrote:

> Interesting! I must have not run this experiment correctly when I did it.
>
> - Saam
>
> On Sep 19, 2018, at 7:31 PM, Yusuke Suzuki 
> wrote:
>
> On Thu, Sep 20, 2018 at 12:54 AM Saam Barati  wrote:
>
>> To elaborate: I ran this same experiment before. And I forgot to turn off
>> the RegExp JIT and got results similar to what you got. Once I turned off
>> the RegExp JIT, I saw no perf difference.
>>
>
> Yeah, I disabled JIT and RegExpJIT explicitly by using
>
> export JSC_useJIT=false
> export JSC_useRegExpJIT=false
>
> and I checked no JIT code is generated by running dumpDisassembly. And I
> also put `CRASH()` in ExecutableAllocator::singleton() to ensure no
> executable memory is allocated.
> The result is the same. I think `useJIT=false` disables RegExp JIT too.
>
>baseline
> patched
>
> ai-astar  3499.046+-14.772 ^
> 1897.624+-234.517   ^ definitely 1.8439x faster
> audio-beat-detection  1803.466+-491.965
> 970.636+-428.051 might be 1.8580x faster
> audio-dft 1756.985+-68.710 ^
> 954.312+-528.406   ^ definitely 1.8411x faster
> audio-fft 1637.969+-458.129
> 850.083+-449.228 might be 1.9268x faster
> audio-oscillator  1866.006+-569.581^
> 967.194+-82.521^ definitely 1.9293x faster
> imaging-darkroom  2156.526+-591.042^
> 1231.318+-187.297   ^ definitely 1.7514x faster
> imaging-desaturate3059.335+-284.740^
> 1754.128+-339.941   ^ definitely 1.7441x faster
> imaging-gaussian-blur16034.828+-1930.938   ^
> 7389.919+-2228.020  ^ definitely 2.1698x faster
> json-parse-financial60.273+-4.143
> 53.935+-28.957  might be 1.1175x faster
> json-stringify-tinderbox39.497+-3.915
> 38.146+-9.652   might be 1.0354x faster
> stanford-crypto-aes873.623+-208.225^
> 486.350+-132.379   ^ definitely 1.7963x faster
> stanford-crypto-ccm538.707+-33.979 ^
> 285.944+-41.570^ definitely 1.8840x faster
> stanford-crypto-pbkdf21929.960+-649.861^
> 1044.320+-1.182 ^ definitely 1.8481x faster
> stanford-crypto-sha256-iterative   614.344+-200.228
> 342.574+-123.524 might be 1.7933x faster
>
>   2562.183+-207.456^
> 1304.749+-312.963   ^ definitely 1.9637x faster
>
> I think thi

Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

2018-09-19 Thread Saam Barati
Interesting! I must have not run this experiment correctly when I did it.

- Saam

> On Sep 19, 2018, at 7:31 PM, Yusuke Suzuki  wrote:
> 
>> On Thu, Sep 20, 2018 at 12:54 AM Saam Barati  wrote:
>> To elaborate: I ran this same experiment before. And I forgot to turn off 
>> the RegExp JIT and got results similar to what you got. Once I turned off 
>> the RegExp JIT, I saw no perf difference.
> 
> Yeah, I disabled JIT and RegExpJIT explicitly by using
> 
> export JSC_useJIT=false
> export JSC_useRegExpJIT=false
> 
> and I checked no JIT code is generated by running dumpDisassembly. And I also 
> put `CRASH()` in ExecutableAllocator::singleton() to ensure no executable 
> memory is allocated.
> The result is the same. I think `useJIT=false` disables RegExp JIT too.
> 
>baseline  patched  
> 
> 
> ai-astar  3499.046+-14.772 ^
> 1897.624+-234.517   ^ definitely 1.8439x faster
> audio-beat-detection  1803.466+-491.965  
> 970.636+-428.051 might be 1.8580x faster
> audio-dft 1756.985+-68.710 ^ 
> 954.312+-528.406   ^ definitely 1.8411x faster
> audio-fft 1637.969+-458.129  
> 850.083+-449.228 might be 1.9268x faster
> audio-oscillator  1866.006+-569.581^ 
> 967.194+-82.521^ definitely 1.9293x faster
> imaging-darkroom  2156.526+-591.042^
> 1231.318+-187.297   ^ definitely 1.7514x faster
> imaging-desaturate3059.335+-284.740^
> 1754.128+-339.941   ^ definitely 1.7441x faster
> imaging-gaussian-blur16034.828+-1930.938   ^
> 7389.919+-2228.020  ^ definitely 2.1698x faster
> json-parse-financial60.273+-4.143 
> 53.935+-28.957  might be 1.1175x faster
> json-stringify-tinderbox39.497+-3.915 
> 38.146+-9.652   might be 1.0354x faster
> stanford-crypto-aes873.623+-208.225^ 
> 486.350+-132.379   ^ definitely 1.7963x faster
> stanford-crypto-ccm538.707+-33.979 ^ 
> 285.944+-41.570^ definitely 1.8840x faster
> stanford-crypto-pbkdf21929.960+-649.861^
> 1044.320+-1.182 ^ definitely 1.8481x faster
> stanford-crypto-sha256-iterative   614.344+-200.228  
> 342.574+-123.524 might be 1.7933x faster
> 
>   2562.183+-207.456^
> 1304.749+-312.963   ^ definitely 1.9637x faster
> 
> I think this result is not related to RegExp JIT since ai-astar is not using 
> RegExp.
> 
> Best regards,
> Yusuke Suzuki
>  
>> 
>> - Saam
>> 
>>> On Sep 19, 2018, at 8:53 AM, Saam Barati  wrote:
>>> 
>>> Did you turn off the RegExp JIT?
>>> 
>>> - Saam
>>> 
 On Sep 18, 2018, at 11:23 PM, Yusuke Suzuki  
 wrote:
 
 Hi WebKittens!
 
 Recently, node-jsc is announced[1]. When I read the documents of that 
 project,
 I found that they use LLInt ASM interpreter instead of CLoop in non-JIT 
 environment.
 So I had one question in my mind: How fast the LLInt ASM interpreter when 
 comparing to CLoop?
 
 I've set up two builds. One is CLoop build (-DENABLE_JIT=OFF) and another 
 is JIT build JSC with `JSC_useJIT=false`.
 And I've ran kraken benchmarks with these two builds in x64 Linux machine. 
 The results are the followings.
 
 Benchmark report for Kraken on sakura-trick.
 
 VMs tested:
 "baseline" at 
 /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/bin/jsc
 "patched" at 
 /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit-llint/Release/bin/jsc
 
 Collected 10 samples per benchmark/VM, with 10 VM invocations per 
 benchmark. Emitted a call to gc() between sample
 measurements. Used 1 benchmark iteration per VM invocation for warm-up. 
 Used the jsc-specific preciseTime()
 function to get microsecond-level timing. Reporting benchmark execution 
 times with 95% confidence intervals in
 milliseconds.
 
baseline  
 patched  
 
 ai-astar  3619.974+-57.095 ^
 2014.835+-59.016^ definitely 1.7967x faster
 audio-beat-detection  1762.085+-24.853 ^
 1030.902+-19.743^ definitely 1.7093x faster
 audio-dft 1822.426+-28.704 ^ 
 909.262+-16.640^ definitely 2.0043x faster
 audio-fft 1651.070+-9.994  ^ 
 865.203+-7.912 ^ definitely 1.9083x faster
 audio-oscillator  1853.697+-26.539 ^ 
 992.406+-12.811

Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

2018-09-19 Thread Yusuke Suzuki
On Thu, Sep 20, 2018 at 12:54 AM Saam Barati  wrote:

> To elaborate: I ran this same experiment before. And I forgot to turn off
> the RegExp JIT and got results similar to what you got. Once I turned off
> the RegExp JIT, I saw no perf difference.
>

Yeah, I disabled JIT and RegExpJIT explicitly by using

export JSC_useJIT=false
export JSC_useRegExpJIT=false

and I checked no JIT code is generated by running dumpDisassembly. And I
also put `CRASH()` in ExecutableAllocator::singleton() to ensure no
executable memory is allocated.
The result is the same. I think `useJIT=false` disables RegExp JIT too.

   baseline
patched

ai-astar  3499.046+-14.772 ^
1897.624+-234.517   ^ definitely 1.8439x faster
audio-beat-detection  1803.466+-491.965
970.636+-428.051 might be 1.8580x faster
audio-dft 1756.985+-68.710 ^
954.312+-528.406   ^ definitely 1.8411x faster
audio-fft 1637.969+-458.129
850.083+-449.228 might be 1.9268x faster
audio-oscillator  1866.006+-569.581^
967.194+-82.521^ definitely 1.9293x faster
imaging-darkroom  2156.526+-591.042^
1231.318+-187.297   ^ definitely 1.7514x faster
imaging-desaturate3059.335+-284.740^
1754.128+-339.941   ^ definitely 1.7441x faster
imaging-gaussian-blur16034.828+-1930.938   ^
7389.919+-2228.020  ^ definitely 2.1698x faster
json-parse-financial60.273+-4.143
53.935+-28.957  might be 1.1175x faster
json-stringify-tinderbox39.497+-3.915
38.146+-9.652   might be 1.0354x faster
stanford-crypto-aes873.623+-208.225^
486.350+-132.379   ^ definitely 1.7963x faster
stanford-crypto-ccm538.707+-33.979 ^
285.944+-41.570^ definitely 1.8840x faster
stanford-crypto-pbkdf21929.960+-649.861^
1044.320+-1.182 ^ definitely 1.8481x faster
stanford-crypto-sha256-iterative   614.344+-200.228
342.574+-123.524 might be 1.7933x faster

  2562.183+-207.456^
1304.749+-312.963   ^ definitely 1.9637x faster

I think this result is not related to RegExp JIT since ai-astar is not
using RegExp.

Best regards,
Yusuke Suzuki


>
> - Saam
>
> On Sep 19, 2018, at 8:53 AM, Saam Barati  wrote:
>
> Did you turn off the RegExp JIT?
>
> - Saam
>
> On Sep 18, 2018, at 11:23 PM, Yusuke Suzuki 
> wrote:
>
> Hi WebKittens!
>
> Recently, node-jsc is announced[1]. When I read the documents of that
> project,
> I found that they use LLInt ASM interpreter instead of CLoop in non-JIT
> environment.
> So I had one question in my mind: How fast the LLInt ASM interpreter when
> comparing to CLoop?
>
> I've set up two builds. One is CLoop build (-DENABLE_JIT=OFF) and another
> is JIT build JSC with `JSC_useJIT=false`.
> And I've ran kraken benchmarks with these two builds in x64 Linux machine.
> The results are the followings.
>
> Benchmark report for Kraken on sakura-trick.
>
> VMs tested:
> "baseline" at
> /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/bin/jsc
> "patched" at
> /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit-llint/Release/bin/jsc
>
> Collected 10 samples per benchmark/VM, with 10 VM invocations per
> benchmark. Emitted a call to gc() between sample
> measurements. Used 1 benchmark iteration per VM invocation for warm-up.
> Used the jsc-specific preciseTime()
> function to get microsecond-level timing. Reporting benchmark execution
> times with 95% confidence intervals in
> milliseconds.
>
>baseline
> patched
>
> ai-astar  3619.974+-57.095 ^
> 2014.835+-59.016^ definitely 1.7967x faster
> audio-beat-detection  1762.085+-24.853 ^
> 1030.902+-19.743^ definitely 1.7093x faster
> audio-dft 1822.426+-28.704 ^
> 909.262+-16.640^ definitely 2.0043x faster
> audio-fft 1651.070+-9.994  ^
> 865.203+-7.912 ^ definitely 1.9083x faster
> audio-oscillator  1853.697+-26.539 ^
> 992.406+-12.811^ definitely 1.8679x faster
> imaging-darkroom  2118.737+-23.219 ^
> 1303.729+-8.071 ^ definitely 1.6251x faster
> imaging-desaturate3133.654+-28.545 ^
> 1759.738+-18.182^ definitely 1.7808x faster
> imaging-gaussian-blur16321.090+-154.893^
> 7228.017+-58.508^ definitely 2.2580x faster
> json-parse-financial57.256+-2.876
> 56.101+-4.265   might be 1.0206x faster
> json-stringify-tinderbox38.470+-2.788  ?
> 38.771+-0.935 ?
> stanford-crypto-aes851.341+-7.738  ^
> 485.438+-13.904^

Re: [webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

2018-09-19 Thread Saam Barati
To elaborate: I ran this same experiment before. And I forgot to turn off the 
RegExp JIT and got results similar to what you got. Once I turned off the 
RegExp JIT, I saw no perf difference.

- Saam

> On Sep 19, 2018, at 8:53 AM, Saam Barati  wrote:
> 
> Did you turn off the RegExp JIT?
> 
> - Saam
> 
>> On Sep 18, 2018, at 11:23 PM, Yusuke Suzuki  
>> wrote:
>> 
>> Hi WebKittens!
>> 
>> Recently, node-jsc is announced[1]. When I read the documents of that 
>> project,
>> I found that they use LLInt ASM interpreter instead of CLoop in non-JIT 
>> environment.
>> So I had one question in my mind: How fast the LLInt ASM interpreter when 
>> comparing to CLoop?
>> 
>> I've set up two builds. One is CLoop build (-DENABLE_JIT=OFF) and another is 
>> JIT build JSC with `JSC_useJIT=false`.
>> And I've ran kraken benchmarks with these two builds in x64 Linux machine. 
>> The results are the followings.
>> 
>> Benchmark report for Kraken on sakura-trick.
>> 
>> VMs tested:
>> "baseline" at /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/bin/jsc
>> "patched" at 
>> /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit-llint/Release/bin/jsc
>> 
>> Collected 10 samples per benchmark/VM, with 10 VM invocations per benchmark. 
>> Emitted a call to gc() between sample
>> measurements. Used 1 benchmark iteration per VM invocation for warm-up. Used 
>> the jsc-specific preciseTime()
>> function to get microsecond-level timing. Reporting benchmark execution 
>> times with 95% confidence intervals in
>> milliseconds.
>> 
>>baseline  patched 
>>  
>> 
>> ai-astar  3619.974+-57.095 ^
>> 2014.835+-59.016^ definitely 1.7967x faster
>> audio-beat-detection  1762.085+-24.853 ^
>> 1030.902+-19.743^ definitely 1.7093x faster
>> audio-dft 1822.426+-28.704 ^ 
>> 909.262+-16.640^ definitely 2.0043x faster
>> audio-fft 1651.070+-9.994  ^ 
>> 865.203+-7.912 ^ definitely 1.9083x faster
>> audio-oscillator  1853.697+-26.539 ^ 
>> 992.406+-12.811^ definitely 1.8679x faster
>> imaging-darkroom  2118.737+-23.219 ^
>> 1303.729+-8.071 ^ definitely 1.6251x faster
>> imaging-desaturate3133.654+-28.545 ^
>> 1759.738+-18.182^ definitely 1.7808x faster
>> imaging-gaussian-blur16321.090+-154.893^
>> 7228.017+-58.508^ definitely 2.2580x faster
>> json-parse-financial57.256+-2.876 
>> 56.101+-4.265   might be 1.0206x faster
>> json-stringify-tinderbox38.470+-2.788  ?  
>> 38.771+-0.935 ?
>> stanford-crypto-aes851.341+-7.738  ^ 
>> 485.438+-13.904^ definitely 1.7538x faster
>> stanford-crypto-ccm556.133+-6.606  ^ 
>> 264.161+-3.970 ^ definitely 2.1053x faster
>> stanford-crypto-pbkdf21945.718+-15.968 ^
>> 1075.013+-13.337^ definitely 1.8099x faster
>> stanford-crypto-sha256-iterative   623.203+-7.604  ^ 
>> 349.782+-12.810^ definitely 1.7817x faster
>> 
>>   2596.775+-14.857 ^
>> 1312.383+-8.840 ^ definitely 1.9787x faster
>> 
>> Surprisingly, LLInt ASM interpreter is significantly faster than CLoop. I 
>> expected it would be fast, but it would show around 10% performance win.
>> But the reality is that it is 2x faster. It is too much number to me to 
>> consider enabling LLInt ASM interpreter for non-JIT build configuration.
>> As a bonus, LLInt ASM interpreter offers sampling profiler support even in 
>> non-JIT environment.
>> 
>> So my proposal is, how about enabling LLInt ASM interpreter in non-JIT 
>> configuration environment in major architectures (x64 and ARM64)?
>> 
>> Best regards,
>> Yusuke Suzuki
>> 
>> [1]: https://lists.webkit.org/pipermail/webkit-dev/2018-September/030140.html
>> ___
>> webkit-dev mailing list
>> webkit-dev@lists.webkit.org
>> https://lists.webkit.org/mailman/listinfo/webkit-dev
> ___
> jsc-dev mailing list
> jsc-...@lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/jsc-dev
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev