Re: [v8-dev] Easy 10x performance boost for TypedArray.prototype.fill()? (in special cases)

Claudia Tue, 07 Jun 2022 10:30:18 -0700

Have you all also considered using `rep stos{b,w,d,q}` on x86 for this? 
While it's highly architecture-specific (ARM's also got a variant on the way 
<https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2021>,
 
but it's not here yet), it might be able to work for similar effect as a 
universal optimization.


On Friday, June 3, 2022 at 5:27:30 AM UTC-7 [email protected] wrote:

> Cool, glad this has worked out.
>
> I decided to play around a bit more and figured out a faster 
> implementation of TypedArray.prototype.fill() for arbitrary values BTW (3x 
> improvement for me):
>
> ```
> function fill(arr, value) {
>   arr[0] = value
>   for (let i = 1; i < arr.length; i *= 2) {
>     arr.copyWithin(i, 0, i)
>   }
>   return arr
> }
> ```
>
> This works out because `TypedArray.prototype.copyWithin()` uses memmove. 
> Maybe worth doing for 16/32/64-bit typed arrays, perhaps even for plain old 
> javascript arrays?
>
> Updated benchmark: https://jsbench.me/1zl3y8q6q7/2
>
> On Friday, June 3, 2022 at 12:33:14 PM UTC+1 Jakob Kummerow wrote:
>
>> I guess 0 is by far the most common case.
>>
>> A patch is easy enough: 
>> https://chromium-review.googlesource.com/c/v8/v8/+/3687700
>>
>> Locally, I'm seeing 1.5x - 2x improvement, far from 10x but clearly 
>> measurable for large arrays.
>>
>>
>> On Fri, Jun 3, 2022 at 1:10 PM Leszek Swirski <[email protected]> 
>> wrote:
>>
>>> I guess we could do it for anything where all two/four bytes of the fill 
>>> value are identical? Probably (aside from 0) an unusual case like you say, 
>>> but the complexity of checking it should be low enough.
>>>
>>> On Fri, Jun 3, 2022 at 11:56 AM Marja Hölttä <[email protected]> wrote:
>>>
>>>> Interesting, thanks for making us / me aware of this.
>>>>
>>>> It sounds like a generally reasonable optimization; at least in the 
>>>> fill(0) case. Not sure how common fill(-1) is but if it doesn't increase 
>>>> the code complexity too much, why not.
>>>>
>>>>
>>>> On Fri, Jun 3, 2022 at 11:34 AM Ashton Six <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I noticed `new Int8Array(int32.buffer).fill(-1)` is equivalent to 
>>>>> `int32.fill(-1)` but 10x faster, since [Uint, Int]8.fill() uses memset 
>>>>> whereas every other implementation of TypedArray.fill() uses a for loop (
>>>>> https://github.com/llvm/llvm-project/blob/8bb1dbbf7544eaac3afab8d1f91b71f383dab903/libcxx/include/algorithm#L1987-L2005).
>>>>>  
>>>>> Benchmark: https://jsbench.me/1zl3y8q6q7/1.
>>>>>
>>>>> Is it worth making this optimisation part of v8 for 
>>>>> TypedArray.fill(0), Int16Array.fill(-1) and Int32Array.fill(-1)?
>>>>>
>>>>> Regards,
>>>>> Ashton 
>>>>>
>>>>> -- 
>>>>> --
>>>>
>>>>

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/v8-dev/b7bc2765-aef3-4644-9576-071542e308dcn%40googlegroups.com.

Re: [v8-dev] Easy 10x performance boost for TypedArray.prototype.fill()? (in special cases)

Reply via email to