Have you all also considered using `rep stos{b,w,d,q}` on x86 for this?
While it's highly architecture-specific (ARM's also got a variant on the way
<https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2021>,
but it's not here yet), it might be able to work for similar effect as a
universal optimization.
On Friday, June 3, 2022 at 5:27:30 AM UTC-7 [email protected] wrote:
> Cool, glad this has worked out.
>
> I decided to play around a bit more and figured out a faster
> implementation of TypedArray.prototype.fill() for arbitrary values BTW (3x
> improvement for me):
>
> ```
> function fill(arr, value) {
> arr[0] = value
> for (let i = 1; i < arr.length; i *= 2) {
> arr.copyWithin(i, 0, i)
> }
> return arr
> }
> ```
>
> This works out because `TypedArray.prototype.copyWithin()` uses memmove.
> Maybe worth doing for 16/32/64-bit typed arrays, perhaps even for plain old
> javascript arrays?
>
> Updated benchmark: https://jsbench.me/1zl3y8q6q7/2
>
> On Friday, June 3, 2022 at 12:33:14 PM UTC+1 Jakob Kummerow wrote:
>
>> I guess 0 is by far the most common case.
>>
>> A patch is easy enough:
>> https://chromium-review.googlesource.com/c/v8/v8/+/3687700
>>
>> Locally, I'm seeing 1.5x - 2x improvement, far from 10x but clearly
>> measurable for large arrays.
>>
>>
>> On Fri, Jun 3, 2022 at 1:10 PM Leszek Swirski <[email protected]>
>> wrote:
>>
>>> I guess we could do it for anything where all two/four bytes of the fill
>>> value are identical? Probably (aside from 0) an unusual case like you say,
>>> but the complexity of checking it should be low enough.
>>>
>>> On Fri, Jun 3, 2022 at 11:56 AM Marja Hölttä <[email protected]> wrote:
>>>
>>>> Interesting, thanks for making us / me aware of this.
>>>>
>>>> It sounds like a generally reasonable optimization; at least in the
>>>> fill(0) case. Not sure how common fill(-1) is but if it doesn't increase
>>>> the code complexity too much, why not.
>>>>
>>>>
>>>> On Fri, Jun 3, 2022 at 11:34 AM Ashton Six <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I noticed `new Int8Array(int32.buffer).fill(-1)` is equivalent to
>>>>> `int32.fill(-1)` but 10x faster, since [Uint, Int]8.fill() uses memset
>>>>> whereas every other implementation of TypedArray.fill() uses a for loop (
>>>>> https://github.com/llvm/llvm-project/blob/8bb1dbbf7544eaac3afab8d1f91b71f383dab903/libcxx/include/algorithm#L1987-L2005).
>>>>>
>>>>> Benchmark: https://jsbench.me/1zl3y8q6q7/1.
>>>>>
>>>>> Is it worth making this optimisation part of v8 for
>>>>> TypedArray.fill(0), Int16Array.fill(-1) and Int32Array.fill(-1)?
>>>>>
>>>>> Regards,
>>>>> Ashton
>>>>>
>>>>> --
>>>>> --
>>>>
>>>>
--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/v8-dev/b7bc2765-aef3-4644-9576-071542e308dcn%40googlegroups.com.