Re: [v8-dev] performance considerations for well-typed Wasm binaries

Ben Noordhuis Sun, 07 Apr 2024 23:40:17 -0700

Does --no-wasm-bounds-checks imply no guard pages? That could explain
the difference when the wasm module is small and executes quickly; the
overhead of mmap/mprotect/munmap probably starts to dominate.
Profiling with perf(1) should be able to confirm that.


On Mon, Apr 8, 2024 at 7:07 AM 'Andreas Haas' via v8-dev
<[email protected]> wrote:
>
> Hi Wolfgang,
>
> If you use an x64 machine, then I'm really wondering why you see performance 
> improvements. Maybe you are even just measuring a performance artifact. E.g. 
> disabling bounds checks makes the code objects slightly smaller, so maybe 
> what you are measuring is just a code alignment issue?
>
> You can measure your benchmark with "--wasm-enforce-bounds-checks", which 
> adds explicit bounds checks to the code. The bounds checks that you see in 
> that configuration are the bounds checks that can either be eliminated due to 
> your analysis, or due to the use of the signal handling I described before.
>
> On 32-bit builds of V8, signal handlers cannot be used for bounds checks, so 
> your analysis would be the only way to eliminate bounds checks, and you would 
> see a much bigger impact.
>
> Cheers, Andreas
>
> On Fri, Apr 5, 2024 at 10:27 PM Wolfgang <[email protected]> wrote:
>>
>> Hi Andreas,
>>
>> yes, the binaries are small. (Independent note: they are strange in some way 
>> it seems: wasmtime takes 2 seconds to compile/instantiate color, and is in 
>> general much slower for all our other benchmarks as well. Investigating this 
>> as well currently.)
>>
>> Thanks for the explanation, interesting!
>> No, we don't use ARM, we measured the improvement on two standard x64 linux 
>> laptops.
>>
>> Some more background, not sure if relevant:
>> Our memory model is quite simplistic: memory is never freed, we have a 
>> global that points to the next free segment in the linear mem, the pointer 
>> is increased after every allocation and never decreased. We grow the linear 
>> memory regularly if needed.
>> In this model, color uses just 16MB of linear mem, sha_fast 25MB.
>>
>> In the end I'm happy it's a bit faster :) Sure, it would be cool to know 
>> why...
>> Best,
>> Wolfgang
>>
>> The setup of the following two is exactly the same, except in the second one 
>> we call node with "--no-wasm-bounds-checks".
>>
>> Without --no-wasm-bounds-checks:
>> Running avg. of 20 runs in node (v20.10.0).
>>        demo1-opt_coalesce-locals : startup:    3, main:   2, pp: 20, sum:   
>> 25
>>        demo2-opt_coalesce-locals : startup:    3, main:   0, pp:  9, sum:   
>> 12
>>     list_sum-opt_coalesce-locals : startup:    3, main:   0, pp:  2, sum:    
>> 5
>>      vs_easy-opt_coalesce-locals : startup:    4, main:  22, pp:  3, sum:   
>> 29
>>      vs_hard-opt_coalesce-locals : startup:    4, main:  91, pp:  3, sum:   
>> 98
>>        binom-opt_coalesce-locals : startup:    3, main:  15, pp: 18, sum:   
>> 36
>>        color-opt_coalesce-locals : startup:    7, main: 164, pp:  2, sum:  
>> 173
>>     sha_fast-opt_coalesce-locals : startup:    4, main: 112, pp:  7, sum:  
>> 123
>>   even_10000-opt_coalesce-locals : startup:    3, main:   1, pp:  1, sum:    
>> 5
>>      ack_3_9-opt_coalesce-locals : startup:    3, main: 112, pp: 28, sum:  
>> 143
>> sm_gauss_nat-opt_coalesce-locals : startup:    3, main:  58, pp:  1, sum:   
>> 62
>>   sm_gauss_N-opt_coalesce-locals : startup:    3, main:  22, pp:  1, sum:   
>> 26
>>
>> With --no-wasm-bounds-checks:
>> Running avg. of 20 runs in node (v20.10.0).
>>        demo1-opt_coalesce-locals : startup:    3, main:   2, pp: 19, sum:   
>> 24
>>        demo2-opt_coalesce-locals : startup:    3, main:   0, pp:  9, sum:   
>> 12
>>     list_sum-opt_coalesce-locals : startup:    3, main:   0, pp:  3, sum:    
>> 6
>>      vs_easy-opt_coalesce-locals : startup:    4, main:  21, pp:  3, sum:   
>> 28
>>      vs_hard-opt_coalesce-locals : startup:    4, main:  91, pp:  2, sum:   
>> 97
>>        binom-opt_coalesce-locals : startup:    3, main:  14, pp: 18, sum:   
>> 35
>>        color-opt_coalesce-locals : startup:    7, main: 157, pp:  2, sum:  
>> 166
>>     sha_fast-opt_coalesce-locals : startup:    4, main: 109, pp:  7, sum:  
>> 120
>>   even_10000-opt_coalesce-locals : startup:    3, main:   1, pp:  1, sum:    
>> 5
>>      ack_3_9-opt_coalesce-locals : startup:    3, main: 112, pp: 28, sum:  
>> 143
>> sm_gauss_nat-opt_coalesce-locals : startup:    3, main:  58, pp:  1, sum:   
>> 62
>>   sm_gauss_N-opt_coalesce-locals : startup:    3, main:  22, pp:  1, sum:   
>> 26
>>
>> On Tue, Apr 2, 2024 at 10:26 AM 'Andreas Haas' via v8-dev 
>> <[email protected]> wrote:
>>>
>>> Hi Wolfgang,
>>>
>>> Your module seems to be quite small, so I guess disabling validation saves 
>>> you less than 1ms. Do you run your experiments on a MacBook?
>>>
>>> It is a bit surprising that you can save even 5% by disabling bounds 
>>> checks. The reason is that we don't have explicit bounds checks, but 
>>> instead use a signal handler. This works as follows: The WebAssembly memory 
>>> is surrounded by guard pages which are marked as not readable and not 
>>> writable. The WebAssembly memory and the guard pages together cover 8GB. 
>>> which means that any 32 bit memory address + 32 bit offset will always 
>>> either hit the WebAssembly memory or a guard page. WebAssembly code then 
>>> executes memory accesses unconditionally. If the memory access is inside 
>>> the WebAssembly memory, then everything works just fine. If it hits a guard 
>>> page, then a segfault is triggered. The signal handler catches the 
>>> segfault, creates a JavaScript exception, and then continues execution at a 
>>> JavaScript exception handler. Naturally this only works on 64-bit 
>>> architecture, because otherwise you cannot reserve 8GB of address space. 
>>> Also, on arm64 we cannot do this optimization for writes. Assume a write 
>>> that is partially out of memory, e.g. the first two bytes are in-bounds, 
>>> but the second two bytes are out of bounds. On x64, a segfault is triggered 
>>> before any of the four bytes are written, but on some arm64 CPUs, the two 
>>> in-bounds bytes get written before the segfault gets triggered. Therefore, 
>>> I assume you did your experiment on a MacBook or some other arm64 device, 
>>> because on x64 I would assume that you cannot measure any performance 
>>> improvements by skipping bounds checks.
>>>
>>> Cheers, Andreas
>>>
>>> On Sat, Mar 30, 2024 at 11:15 PM Wolfgang <[email protected]> wrote:
>>>>
>>>> Hi Andreas,
>>>> thanks a lot for the quick response!
>>>> This is precisely what I was looking for.
>>>>
>>>> We indeed know, that memory accesses are always in bounds, so we can 
>>>> safely use the flag `--no-wasm-bounds-checks`. This speeds up one of our 
>>>> benchmarks by ~5% thanks! (for the others, the difference is not really 
>>>> measurable)
>>>>
>>>> Here is the Coq definition for instantiation [0] which we proved, this 
>>>> includes that the module is well-typed [1], so omitting the function 
>>>> validation check you suggested (ValidateFunctionBody) should be safe.
>>>> However, doing that didn't affect the performance in a measurable way: It 
>>>> seems, function validation is quite cheap.
>>>> (I compared the current master of Node.js (v8 version: 11.9.169.7) with 
>>>> and without the function validation check, by commenting out lines 87-101, 
>>>> 103 [2].
>>>>  Our modules have <150 functions with around 10-1000 instructions and a 
>>>> main function with typically >50k instructions.)
>>>>
>>>> We also know that all programs terminate, but I guess we can still 
>>>> overflow the call stack, e.g. when the source program is big enough (and 
>>>> non-tail-recursive).
>>>>
>>>> Best,
>>>> Wolfgang
>>>>
>>>> [0]: 
>>>> https://github.com/WasmCert/WasmCert-Coq/blob/1fda06ea0d52caae1ece83af109b48a9838b3869/theories/instantiation_spec.v#L530
>>>> [1]: 
>>>> https://github.com/WasmCert/WasmCert-Coq/blob/1fda06ea0d52caae1ece83af109b48a9838b3869/theories/instantiation_spec.v#L389
>>>> [2]: 
>>>> https://source.chromium.org/chromium/chromium/src/+/main:v8/src/wasm/function-compiler.cc;l=87;drc=d176526630b2049486a8be93fd89482d0aba6cfc;bpv=1;bpt=1
>>>>
>>>> On Monday, March 25, 2024 at 11:16:00 PM UTC+1 [email protected] wrote:
>>>>>
>>>>> Hi Wolfgang,
>>>>>
>>>>> The command "nodejs --help --v8-options" seems to print all V8 command 
>>>>> line options. It's not clear what runtime checks you would like to 
>>>>> adjust. Here are some flags I can think of:
>>>>>
>>>>> You could use --wasm-lazy-validation to avoid function validation during 
>>>>> module compilation and module instantiation, but function validation 
>>>>> would still happen lazily when a function gets executed for the first 
>>>>> time. If you want to prevent even the lazy function validation, then you 
>>>>> have to adjust the code at [1].
>>>>>
>>>>> You can use `--no-wasm-stack-checks` if you can guarantee that there will 
>>>>> not be a stack overflow, e.g. because of an unbounded recursion.
>>>>>
>>>>> You can use `--no-wasm-bounds-checks` if you can guarantee that memory 
>>>>> accesses are always in-bounds.
>>>>>
>>>>> Cheers, Andreas
>>>>>
>>>>> [1] 
>>>>> https://source.chromium.org/chromium/chromium/src/+/main:v8/src/wasm/function-compiler.cc;l=97;drc=d176526630b2049486a8be93fd89482d0aba6cfc
>>>>>
>>>>> On Sun, Mar 24, 2024 at 7:41 PM Wolfgang <[email protected]> wrote:
>>>>>>
>>>>>> Hello,
>>>>>> I'm a masters student at Aarhus university.
>>>>>>
>>>>>> We built a provably correct compiler (using Coq) targeting Wasm: Link
>>>>>> This includes a proof that the modules our compiler generates instantiate
>>>>>> according to the spec, i.e. they are well-typed.
>>>>>>
>>>>>> We are evaluating the performance with Node.js, is there a way to 
>>>>>> improve our performance
>>>>>> by e.g. disabling some runtime-checks, given that we have the stronger 
>>>>>> correctness guarantees?
>>>>>>
>>>>>> Node.js doesn't seem to have (publicly documented) flags to allow that.
>>>>>>
>>>>>> I included some of our numbers below.
>>>>>>
>>>>>> Best,
>>>>>> Wolfgang
>>>>>>
>>>>>> All times in ms, avarage of 10 runs:
>>>>>> startup=load binary+instantiate
>>>>>> main=main function
>>>>>> pp=pretty printing the result as an S-expression, we call an imported 
>>>>>> function
>>>>>>       for every character, thus somewhat slow, but not that relevant for 
>>>>>> my question
>>>>>>
>>>>>> For a description of the benchmarks, see: Chapter 8.2, 
>>>>>> https://zoep.github.io/thesis_final.pdf
>>>>>>
>>>>>> Node.js: v18.19.0:
>>>>>>    demo1-opt_coalesce-locals : startup:     6,  main:   0,   pp: 28,  
>>>>>> sum:   34
>>>>>>    demo2-opt_coalesce-locals : startup:     3,  main:   0,   pp:  8,    
>>>>>> sum:   11
>>>>>> list_sum-opt_coalesce-locals : startup:     3,   main:   0,   pp:  2,   
>>>>>> sum:    5
>>>>>>  vs_easy-opt_coalesce-locals : startup:    10,  main:  33,  pp:  1,  
>>>>>> sum:   44
>>>>>>  vs_hard-opt_coalesce-locals : startup:    10,  main: 101, pp:  1,   
>>>>>> sum:  112
>>>>>>     binom-opt_coalesce-locals : startup:    18,  main:  10,  pp: 24,  
>>>>>> sum:   52
>>>>>> sha_fast-opt_coalesce-locals : startup:    65,  main:  70,  pp:  7,   
>>>>>> sum:  142
>>>>>>        color-opt_coalesce-locals : startup:  132, main:  44,  pp:  2,   
>>>>>> sum:  178
>>>>>>
>>>>>> Node.js: v20.11.1
>>>>>>    demo1-opt_coalesce-locals : startup:    1, main:     3, pp:  24,  
>>>>>> sum:   28
>>>>>>    demo2-opt_coalesce-locals : startup:    2, main:     0, pp:  12,  
>>>>>> sum:   14
>>>>>> list_sum-opt_coalesce-locals : startup:    2,  main:     0, pp:   2,   
>>>>>> sum:    4
>>>>>>  vs_easy-opt_coalesce-locals : startup:    4, main:    38, pp:   4,   
>>>>>> sum:   46
>>>>>>  vs_hard-opt_coalesce-locals : startup:    3,  main:  110, pp:   3,  
>>>>>> sum:  116
>>>>>>     binom-opt_coalesce-locals : startup:     3, main:    26, pp: 23,  
>>>>>> sum:   52
>>>>>> sha_fast-opt_coalesce-locals : startup:    4,  main:  228, pp: 10,  sum: 
>>>>>>  242
>>>>>>        color-opt_coalesce-locals : startup:   12, main: 332, pp:  2,   
>>>>>> sum:  346
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> v8-dev mailing list
>>>>>> [email protected]
>>>>>> http://groups.google.com/group/v8-dev
>>>>>> ---
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "v8-dev" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>> an email to [email protected].
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/v8-dev/e7e5b6c8-0f7b-4787-b66f-592f02873950n%40googlegroups.com.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Andreas Haas
>>>>>
>>>>> Software Engineer
>>>>>
>>>>> [email protected]
>>>>>
>>>>>
>>>>> Google Germany GmbH
>>>>>
>>>>> Erika-Mann-Straße 33
>>>>>
>>>>> 80636 München
>>>>>
>>>>>
>>>>> Geschäftsführer: Paul Manicle, Liana Sebastian
>>>>>
>>>>> Registergericht und -nummer: Hamburg, HRB 86891
>>>>>
>>>>> Sitz der Gesellschaft: Hamburg
>>>>>
>>>>>
>>>>> Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten 
>>>>> haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, 
>>>>> löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte 
>>>>> wissen, dass die E-Mail an die falsche Person gesendet wurde.
>>>>>
>>>>>
>>>>>
>>>>> This e-mail is confidential. If you received this communication by 
>>>>> mistake, please don't forward it to anyone else, please erase all copies 
>>>>> and attachments, and please let me know that it has gone to the wrong 
>>>>> person.
>>>>>
>>>>>
>>>> --
>>>> --
>>>> v8-dev mailing list
>>>> [email protected]
>>>> http://groups.google.com/group/v8-dev
>>>> ---
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "v8-dev" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/v8-dev/5aa215cc-ed00-4577-bc54-d322161d44a3n%40googlegroups.com.
>>>
>>>
>>>
>>> --
>>>
>>> Andreas Haas
>>>
>>> Software Engineer
>>>
>>> [email protected]
>>>
>>>
>>> Google Germany GmbH
>>>
>>> Erika-Mann-Straße 33
>>>
>>> 80636 München
>>>
>>>
>>> Geschäftsführer: Paul Manicle, Liana Sebastian
>>>
>>> Registergericht und -nummer: Hamburg, HRB 86891
>>>
>>> Sitz der Gesellschaft: Hamburg
>>>
>>>
>>> Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten 
>>> haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, 
>>> löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, 
>>> dass die E-Mail an die falsche Person gesendet wurde.
>>>
>>>
>>>
>>> This e-mail is confidential. If you received this communication by mistake, 
>>> please don't forward it to anyone else, please erase all copies and 
>>> attachments, and please let me know that it has gone to the wrong person.
>>>
>>>
>>> --
>>> --
>>> v8-dev mailing list
>>> [email protected]
>>> http://groups.google.com/group/v8-dev
>>> ---
>>> You received this message because you are subscribed to the Google Groups 
>>> "v8-dev" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/v8-dev/CAELSTvfyxsUiF7RZfw3tAY%2B3f7veg3PVqWUUEjJd8KxxOBM5vA%40mail.gmail.com.
>>
>> --
>> --
>> v8-dev mailing list
>> [email protected]
>> http://groups.google.com/group/v8-dev
>> ---
>> You received this message because you are subscribed to the Google Groups 
>> "v8-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/v8-dev/CAGU0z--x6pVB02j2hQV5ADq070tN3XaOKwkEpB5tSsSZhnQ9LA%40mail.gmail.com.
>
>
>
> --
>
> Andreas Haas
>
> Software Engineer
>
> [email protected]
>
>
> Google Germany GmbH
>
> Erika-Mann-Straße 33
>
> 80636 München
>
>
> Geschäftsführer: Paul Manicle, Liana Sebastian
>
> Registergericht und -nummer: Hamburg, HRB 86891
>
> Sitz der Gesellschaft: Hamburg
>
>
> Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten 
> haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen 
> Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die 
> E-Mail an die falsche Person gesendet wurde.
>
>
>
> This e-mail is confidential. If you received this communication by mistake, 
> please don't forward it to anyone else, please erase all copies and 
> attachments, and please let me know that it has gone to the wrong person.
>
>
> --
> --
> v8-dev mailing list
> [email protected]
> http://groups.google.com/group/v8-dev
> ---
> You received this message because you are subscribed to the Google Groups 
> "v8-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/v8-dev/CAELSTvdT7FXktfkMsX2VBYOAOf6onh%2BBgXCTVg3vRQ2AXu6B4w%40mail.gmail.com.

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-dev/CAHQurc8Gjf%2Bce9S%3Dypsw4NMCt3Ea1%2B3W2HRTapnMEmoUM22SBg%40mail.gmail.com.

Re: [v8-dev] performance considerations for well-typed Wasm binaries

Reply via email to