Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: e2cf0b9d1b39b4d2adc69adcafc8346f385bba25
      
https://github.com/WebKit/WebKit/commit/e2cf0b9d1b39b4d2adc69adcafc8346f385bba25
  Author: Chris Dumez <[email protected]>
  Date:   2026-01-04 (Sun, 04 Jan 2026)

  Changed paths:
    M Source/WTF/wtf/text/StringCommon.h

  Log Message:
  -----------
  Use faster algorithm in WTF::copyElements() for better performance
https://bugs.webkit.org/show_bug.cgi?id=304835

Reviewed by Yusuke Suzuki.

The old code used interleaved stores (vst2q_u8) to achieve the
upconversion by interleaving data with zeros. Your new code uses
vmovl_u8 (widening move), which is specifically designed for
zero-extending 8-bit to 16-bit values. This is exactly what the hardware
instruction was meant for - it's more semantically direct.

The old code used vst2q_u8 which writes in an interleaved pattern
(complex addressing). The new code uses straightforward sequential
vst1q_u16 stores. Sequential stores are generally more cache-friendly
and easier for the CPU's store buffer to handle.

Micro-benchmark results:
====================================

      Size |              Before |         After |   Speedup |   Before |  After
--------------------------------------------------------------------------------
        16 bytes |       2.27 ns |       1.87 ns |     1.21x |  7041.40 |  
8540.33 GB/s
        32 bytes |       2.37 ns |       2.33 ns |     1.02x | 13492.26 | 
13751.07 GB/s
        48 bytes |       2.83 ns |       2.85 ns |     0.99x | 16987.61 | 
16851.08 GB/s
        63 bytes |       5.21 ns |       5.15 ns |     1.01x | 12086.38 | 
12229.38 GB/s
        64 bytes |       1.64 ns |       1.25 ns |     1.32x | 39018.24 | 
51383.06 GB/s
        65 bytes |       1.94 ns |       1.48 ns |     1.31x | 33554.37 | 
43843.30 GB/s
       128 bytes |       2.89 ns |       2.04 ns |     1.42x | 44259.87 | 
62891.60 GB/s
       256 bytes |       5.31 ns |       4.19 ns |     1.27x | 48216.11 | 
61029.10 GB/s
       512 bytes |      10.18 ns |       8.30 ns |     1.23x | 50310.98 | 
61708.24 GB/s
      1024 bytes |      19.43 ns |      16.52 ns |     1.18x | 52707.15 | 
61987.44 GB/s
      4096 bytes |     125.92 ns |      69.52 ns |     1.81x | 32528.44 | 
58914.64 GB/s
      8192 bytes |     159.90 ns |     149.73 ns |     1.07x | 51233.31 | 
54712.31 GB/s
     16384 bytes |     416.98 ns |     528.18 ns |     0.79x | 39291.73 | 
31019.66 GB/s
     32768 bytes |     963.62 ns |     808.52 ns |     1.19x | 34005.25 | 
40528.20 GB/s
     65536 bytes |    1529.74 ns |    1484.54 ns |     1.03x | 42841.39 | 
44145.55 GB/s
    131072 bytes |    2452.53 ns |    1982.41 ns |     1.24x | 53443.64 | 
66117.38 GB/s
    262144 bytes |    6291.20 ns |    6126.99 ns |     1.03x | 41668.38 | 
42785.13 GB/s
    524288 bytes |   12640.27 ns |   12525.43 ns |     1.01x | 41477.59 | 
41857.88 GB/s
   1048576 bytes |   24742.13 ns |   23726.78 ns |     1.04x | 42380.18 | 
44193.77 GB/s

This seems to result in a 0.56%-0.74% progression on Speedometer 3 on macOS,
depending on the model as well. It is performance neutral on iOS and on
JetStream.

I used Claude AI to assist with this optimization.

* Source/WTF/wtf/text/StringCommon.h:
(WTF::copyElements):

Canonical link: https://commits.webkit.org/305095@main



To unsubscribe from these emails, change your notification settings at 
https://github.com/WebKit/WebKit/settings/notifications

Reply via email to