Hi Clemens, Thanks for those really helpful pointers.
I have continued digging into this for the last week or so, and have put my current code up GitHub for anyone interested: https://github.com/tangobravo/webassembly-halfsample-benchmark Results are that the bit-twiddling approaches do offer a pretty decent speed-up on mobile platforms (and 64-bit desktop), so this is a promising route :) On Android it seems the version of Chrome distributed through Google Play is still the 32-bit one, even on 64-bit devices (tested on a Google Pixel 2 - chrome://version states 32-bit). Despite that, there is still a speedup using the packed implementations. The fastest on the Pixel 2 is the half_sample_uint32x2_blocks implementation, which gives something like a 2.4x speedup (2.04ms to 0.84ms for half-sampling a 720p image). Safari in iOS 12.4 is 64-bit, and there the half_sample_uint64_blocks implementation is fastest and gives more than a 3x speedup on an iPod Touch 7 (1.0ms to 0.3ms for 720p input). Each benchmark run does 10 iterations with different but overlapping input data (so both input and output are likely to be at least in L2 cache - this is the case I expect in practice). The timing numbers that are printed by the test code show the total time for all 10 iterations, so the numbers above are divided by 10 from typical outputs over multiple runs. Safari only offers 1ms resolution on Performance.now() hence is harder to get accurate measurements, but the numbers above look pretty consistent over multiple runs. Out of interest I've also tried to write an implementation targeting WebAssembly SIMD. I was able to get it to compile with emcc from latest-upstream but it doesn't run in my self-built d8 7.7 with the --experimental-wasm-simd flag. More details in the README of the repo linked above. I'd appreciate any help to get that one running for a further comparison datapoint. So the questions that arise: 1) Is there any way for a page to detect if the browser is 32-bit or 64-bit? navigator.platform reports Linux armv8l on the Pixel 2, so that doesn't help unfortunately. I have 32 and 64 bit "busy loops" that report approximately equal counts on 64-bit platforms (and not on 32-bit), but it would be nice if there was a more direct way to determine this! 2) Are there plans to transition Play Store Chrome releases to 64-bit for 64-bit Android? I did some searching but couldn't find any official information about the reasons for sticking with 32-bit there (though I assume memory usage, apk size, or both). 3) Any hints on how I can get the SIMD version to run or is this likely just a bug / spec instability between emcc and d8? Cheers, Simon On Tuesday, 3 September 2019 10:09:28 UTC+1, Clemens Hammacher wrote: > > Hi Simon, > > that's an interesting project, please keep us updated :) > > 1) Does v8 codegen emit 64-bit machine instructions for 64-bit wasm >> instructions on 64-bit architectures (specifically Android)? I imagine the >> speedup from SWAR techniques will be significantly reduced using just >> 32-bit registers, perhaps to the level that doesn't make this worthwhile. >> > > Yes, of course we use 64-bit instructions for 64-bit operations (if the > binary was compiled for a 64-bit platform). > > 2) Does v8's codegen currently do any vectorization? If not, are there >> plans to add it? In this case the plain C version might be best to stick >> with as it would be easier for auto-vectorization to detect and optimize. >> > > No, we do not do any auto-vectorization, and I am not aware of any plan to > add this. > > 3) Can anyone provide tips / links to help with investigating and >> optimizing this kind of thing? Any way of flagging wasm functions for >> maximum optimization for benchmarking purposes? >> > > For benchmarking, you should disable the baseline compiler (Liftoff). > Chrome has a feature flag for that ( > chrome://flags/#enable-webassembly-baseline), if you experiment with d8 > directly, then add "--no-liftoff --no-wasm-tier-up". If you want to inspect > the generated machine code, you can add the "--print-wasm-code" command > line flag. This requires a build with then "v8_enable_disassembler" gn flag > enabled, which is the default for debug builds. Since compilation happens > concurrently, you need to add "--predictable" (or "--single-threaded" or > "--wasm-num-compilation-tasks=0") to get readable output. > > Hope that helps, > Clemens > > -- > > Clemens Hammacher > > Software Engineer > > [email protected] <javascript:> > > > Google Germany GmbH > > Erika-Mann-Straße 33 > > 80636 München > > Geschäftsführer: Paul Manicle, Halimah DeLaine Prado > > Registergericht und -nummer: Hamburg, HRB 86891 > > Sitz der Gesellschaft: Hamburg > > Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten > haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, > löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, > dass die E-Mail an die falsche Person gesendet wurde. > > > > This e-mail is confidential. If you received this communication by > mistake, please don't forward it to anyone else, please erase all copies > and attachments, and please let me know that it has gone to the wrong > person. > -- -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/09853b85-6d5d-4726-ad90-c11d566396d1%40googlegroups.com.
