Re: [v8-users] Streams and InternalPackedArray queue scalability

Jakob Kummerow Wed, 18 Jan 2017 02:27:08 -0800

Interesting case :-)

The implementation of InternalPackedArray is essentially the same as plain
JavaScript "Array", the biggest difference is that InternalPackedArray is
not exposed to websites and therefore not monkey-patchable.
Array.shift() is inherently slow, because it requires all remaining
elements to be moved. (Array.push()/Array.pop() are much faster.)
V8 uses a nasty trick to make Array.shift() fast: object left-trimming,
where the object header is moved, so that the elements themselves can stay
in place. This is an ugly hack and we would like to get rid of it.
Left-trimming is not possible in "large object space", a special area of
V8's heap for large objects. The limit for what constitutes a "large
object" is the regular heap's page size, which is 512KB.
Arrays over-allocate their backing store in steps; when you start with "new
Array()" and keep push()ing onto it, the sequence of backing store
capacities is 4, 23, 52, 95, 160, 257, 403, 622, 950, 1442, 2180, 3287,
4948, 7439, 11176, 16781, 25189, 37801, *56719*, 85096, 127661, ...
So when the 56720th element is pushed, the backing store is grown to 85096
* 8 bytes = 680768 bytes > 512KB, so the new backing store must be
allocated in large-object space. Every subsequent shift() on that array is
an O(n) operation.
The reason the threshold is four times as high on your Android tablet is
because we ship 32-bit Chrome builds to Android (so you can fit 2x as many
pointers into the same space), and older V8 versions use 1MB pages (so you
can fit 2x as large objects into regular pages).


Long story short:
• your patch looks fine
• worrying about deopts is probably unnecessary, but only measuring will
tell for sure
• then again switching between two implementations is probably also
unnecessary (encapsulating the behavior you want into one class is probably
cleaner)
• you can split at 32768, sure, or at any random integer between 1 and
50000 (I wouldn't rely on the specific numbers I quoted above, because
that's an implementation detail and could change at any time)
• ideally you don't use Array.shift() at all, and instead have a "cursor"
index for the next element to be consumed; drop the Queue's "front" when
cursor === front.length (this would probably work well with a chunk size in
the [100..1000] range or so)
• if the chunks you're queuing are indeed typically 1K or bigger, then you
probably don't need to bother with arrays at all, and can just use a plain
simple linked list (because a few pointers of overhead per element are
negligible if they give you guaranteed O(1) push/shift performance
regardless of queue length), like so:

class Queue {
  constructor() {
    this.size = 0;
    this.front = null;
    this.back = null;
  }

  get length() { return this.size; }

  push(element) {
    ++this.size;
    if (this.size === 1) {
      this.front = this.back = { value: element, next: null };
    } else {
      var node = { value: element, next: null };
      this.back.next = node;
      this.back = node;
    }
  }

  shift() {
    --this.size;
    var result = this.front.value;
    this.front = this.front.next;
    return result;
  }
}

On Wed, Jan 18, 2017 at 8:24 AM, Jochen Eisinger <[email protected]>
wrote:

> +Domenic Denicola <[email protected]>
>
> On Wed, Jan 18, 2017 at 4:25 AM Adam Rice <[email protected]> wrote:
>
>> I work on the Chrome implementation of ReadableStream and WritableStream
>> <https://streams.spec.whatwg.org/>. They are implemented in Javascript
>> using v8 extras.
>>
>> They currently use an InternalPackedArray to implement a queue structure,
>> however I have found a scalability issue. The benchmarks and repro can be
>> found at http://crbug.com/681493 ("ReadableStream, WritableStream get
>> dramatically slower when queue grows to 56720 chunks").
>>
>> I have a proposed fix at http://crrev.com/2637863002. If possible, I
>> would like someone who is familiar with the implementation of
>> InternalPackedArray to review it.
>>
>> Particular things I'd like to know:
>>
>>    - Is it worth worrying about deopt or would it be better to use
>>    InternalPackedArray for small queues and only switch to Queue for larger
>>    ones?
>>    - Is 32768 a good size to split the arrays at? Can we be reasonably
>>    sure that it is small enough to get good behaviour on all platforms and
>>    architectures?
>>
>> and if possible
>>
>>    - why does performance change so dramatically at a threshold?
>>
>> Thanks,
>> Adam Rice
>>
>> --
>> --
>> v8-users mailing list
>> [email protected]
>> http://groups.google.com/group/v8-users
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "v8-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> --
> v8-users mailing list
> [email protected]
> http://groups.google.com/group/v8-users
> ---
> You received this message because you are subscribed to the Google Groups
> "v8-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
-- 
v8-users mailing list
[email protected]
http://groups.google.com/group/v8-users
--- 
You received this message because you are subscribed to the Google Groups 
"v8-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [v8-users] Streams and InternalPackedArray queue scalability

Reply via email to