Re: Why is querySelector much slower?

2015-04-29 Thread Boris Zbarsky

On 4/29/15 12:23 AM, Glen Huang wrote:

because of the proxy machinery involved on the JS engine side


Do you mean the cost introduced by passing a C++ object into ecmascript world?


No, it's the cost introduced by needing custom property behavior for 
integer-named properties on lists (and in particular the need to have 
them appear and disappear, not be redefinable, and so forth).  The 
upshot in the SpiderMonkey implementation is that the list object is a 
proxy in the ES6 sense, which makes the [0] access take a slower path 
through the JS engine than I'd like.


-Boris



Re: Why is querySelector much slower?

2015-04-28 Thread Boris Zbarsky

On 4/28/15 2:44 AM, Glen Huang wrote:

But If I do getElementsByClass()[0], and LiveNodeList is immediately
garbage collectable


Then it will only be collected the next time GC happens.  Which might 
not be for a while.


-Boris



Re: Why is querySelector much slower?

2015-04-28 Thread Boris Zbarsky

On 4/28/15 1:58 AM, Glen Huang wrote:

Just a quick follow up question to quench my curiosity: if I do list[1] and no one has 
ever asked the list for any element, Gecko will find the first two matching elements, and store 
them in the list, if I then immediately do list[0], the first element is returned 
without walking the DOM (assuming there are at least two matching elements)?


Yes, exactly.


querySelector(foo) and getElementsByTagName(foo)[0] can return different 
nodes


Still a bit confused regarding this. If the premise is the selector only 
contains characters allowed in a tag name


Then in that case I _think_ those are now equivalent, though I'd have to 
check it carefully.  They didn't use to be back when 
getElementsByTagName was defined as matching on qualified name, not 
localname...


-Boris



Re: Why is querySelector much slower?

2015-04-28 Thread Boris Zbarsky

On 4/28/15 2:13 AM, Glen Huang wrote:

On second thought, if the list returned by getElementsByClass() is lazy
populated as Boris says, it shouldn't be a problem. The list is only
updated when you access that list again.


Mutations have to check whether the list is marked dirty already or not.

This is not too bad if you only have a few lists around, but if you have 
several thousand it can start to hurt.


-Boris



Re: Why is querySelector much slower?

2015-04-28 Thread Glen Huang
Ww, this is pure gold. Thank you so much for such thorough explanation, any 
even took the trouble to actually implement optimizations to make sure the 
numbers are right. I'm so grateful for the work you put into this just to 
answer my question. How do I accept your answer here? ;)

 So what you're seeing is that the benchmark claims the operation is performed 
 in 1-2 clock cycles

I never thought about relating ops/sec numbers to clock cycles. Thanks for the 
tip.

 So what this getElementById benchmark measures is how fast a loop counter can 
 be decremented from some starting value to 0.

This makes so much sense now.

 because of the proxy machinery involved on the JS engine side

Do you mean the cost introduced by passing a C++ object into ecmascript world?

 In this case, those all seem to have about the same cost;

I now see why querySelector has some extract work to do.

 But for real-life testcases algorithmic complexity can often be much more 
 important.

Yes. But I suddenly find microbenchmarks to be a wonderful conversation 
starter. ;)

Thanks again for all the explanations, I'm motivated by them to actually dig 
into the engine source code to discover things myself next time (probably not 
easy, but should be rewarding). :)


Re: Why is querySelector much slower?

2015-04-28 Thread Boris Zbarsky

On 4/28/15 2:59 AM, Glen Huang wrote:

Looking at the microbenchmark again, for Gecko, getElementById is around 300x 
faster than querySelector('#id'), and even getElementsByClassName is faster 
than it.


This is why one should not write microbenchmarks.  ;)  Or at least if 
one does, examine the results very carefully.


The numbers you see for the getElementById benchmark there are on the 
order of 2e9 operations per second, yes?  And modern desktop/laptop CPUs 
are clocked somewhere in the 2-4 GHz range.  So what you're seeing is 
that the benchmark claims the operation is performed in 1-2 clock 
cycles.  This should seem unlikely if you think the operation involves a 
hashtable lookup!


What's happening there is that Gecko happens to know at JIT compile time 
in this microbenchmark:


1)  The bareword lookup is going to end up at the global, because there 
is nothing on the scope chain that would shadow the document name.
2)  The global has an own property named document whose getter is 
side-effect-free.
3)  The return value of the document property has only been observed 
to be a Document.
4)  Looking up getElementById on the return value of the document 
property has consistently found it on Document.prototype.

5)  Document.prototype.getElementById is known to be side-effect-free.
6)  The return value of getElementById is not used (assigned to a 
function-local variable that is then not used).


The upshot of all that is that with a few guards both the 
getElementById call get and the  document get can be dead-code 
eliminated here.  And even if you stored the value somewhere persistent 
they could both still be loop-hoisted in this case.  So what this 
getElementById benchmark measures is how fast a loop counter can be 
decremented from some starting value to 0.  It happens that this can be 
done in about 1-2 clock cycles per loop iteration.


OK, so what about querySelector(#id) vs getElementsByClassName?

In the former case, loop-hoisting and dead code elimination are 
disallowed because querySelector can throw.  That means that you can't 
eliminate it, and you can't move it past other things that might have 
observable side effects (like the counter increment).  Arguably this is 
a misfeature in the design of querySelector.


In the latter case, loop-hoisting or dead code elimination can't happen 
because Gecko doesn't know enough about what [0] will do so assumes the 
worst: that it can have side-effects that can affect what the document 
getter returns as well as what the getElementsByClassName() call returns.


So there are no shortcuts here; you have to actually do the calls. What 
do those calls do?


querySelector does a hashtable lookup for the selector to find a parsed 
selector.  Then it sets up some state that's needed for selector 
matching.  Then it detects that the selector's right-hand-most bit has a 
simple ID selector and does a fast path that involves looking up that id 
in the hashtable and then comparing the selector to the elements that 
are returned until one of them matches.


getElementsByClassName has to do a hashtable lookup on the class name, 
then return the result.  Then it has to do the [0] (which is actually 
surprisingly expensive, by the way, because of the proxy machinery 
involved on the JS engine side).


So we _could_ make querySelector faster here by adding another special 
case for selectors that are _just_ an id as opposed to the existing 
optimization (which works for #foo  #bar and similar as well).  And 
of course the new special case would only work the way you want for 
document.querySelector, not element.querySelector; the latter needs to 
check for your result being a descendant of the element anyway.  It's a 
tradeoff between complexity of implementation (which has its own 
maintenance _and_ performance costs) and real-life use cases.


Lastly, I'd like to put numbers to this.  On this particular testcase, 
the querySelector(#list) call takes about 100ns on my hardware: about 
300 CPU cycles.  We could add that other set of special-casing and get 
it down to 70ns (I just checked by implementing it, so this is not a 
random guess).  At that point you've got two hashtable lookups (which we 
could try to make faster, perhaps), the logic to detect that the 
optimization can be done at all (which is not that trivial; our selector 
representation requires a bunch of checks to ensure that it's just an id 
selector), and whatever work is involved in the binding layer.  In this 
case, those all seem to have about the same cost; about 17-18ns (50 CPU 
cycles) each.


So is your use case one where the difference between querySelector 
costing 100ns and it costing 70ns actually makes a difference?



It doesn't look like it benefits much from an eagerly populated hash table?


It benefits a good bit for non-toy documents where avoiding walking the 
entire DOM is the important part of the optimization.  Again, 
microbenchmarks mostly serve to highlight the 

Re: Why is querySelector much slower?

2015-04-28 Thread Glen Huang
Wow, it's now super clear. Thanks for the detailed explanation.

Just a quick follow up question to quench my curiosity: if I do list[1] and 
no one has ever asked the list for any element, Gecko will find the first two 
matching elements, and store them in the list, if I then immediately do 
list[0], the first element is returned without walking the DOM (assuming 
there are at least two matching elements)? 

 querySelector(foo) and getElementsByTagName(foo)[0] can return different 
 nodes

Still a bit confused regarding this. If the premise is the selector only 
contains characters allowed in a tag name, how can they return different nodes, 
maybe I missed something? Unless you mean querySelector(:foo) and 
getElementsByTagName(:foo)[0] can return different results, which is obvious.

If by parsing the passed selector (or lookup the cached parsed selectors) you 
know it only contains a tag name, why it is a bit harder to optimize? You just 
look up the (tagname, root) hash table, no?

 In practice this hasn't come up as a bottleneck in anything we've profiled so 
 far

I'm probably prematurely optimizing my code. But nevertheless learned something 
quite valuable by asking. Thanks for answering it. :)


Re: Why is querySelector much slower?

2015-04-28 Thread Glen Huang
 querySelector with an id selector does in fact benefit from the id hashtable

Looking at the microbenchmark again, for Gecko, getElementById is around 300x 
faster than querySelector('#id'), and even getElementsByClassName is faster 
than it. It doesn't look like it benefits much from an eagerly populated hash 
table?

P.S it's very interesting to see Gecko is around 100x faster than others when 
it comes to the performance of getElementById. It probably does something 
unusual?


Re: Why is querySelector much slower?

2015-04-28 Thread Glen Huang
 Live node lists make all dom mutation slower
 
Haven't thought about this before. Thank you for pointing it out. So if I use, 
for example, lots of getElementsByClass() in the code, I'm actually slowing 
down all DOM mutating APIs?

Re: Why is querySelector much slower?

2015-04-28 Thread Elliott Sprehn
On Mon, Apr 27, 2015 at 11:13 PM, Glen Huang curvedm...@gmail.com wrote:

 On second thought, if the list returned by getElementsByClass() is lazy
 populated as Boris says, it shouldn't be a problem. The list is only
 updated when you access that list again.


The invalidation is what makes your code slower. Specifically any time you
mutate the tree, and you have live node lists, we traverse ancestors to
mark them as needing to be updated.

Blink (and likely other browsers) will eventually garbage collect the
LiveNodeList and then your DOM mutations will get faster again.



 On Apr 28, 2015, at 2:08 PM, Glen Huang curvedm...@gmail.com wrote:

 Live node lists make all dom mutation slower

 Haven't thought about this before. Thank you for pointing it out. So if I
 use, for example, lots of getElementsByClass() in the code, I'm actually
 slowing down all DOM mutating APIs?





Re: Why is querySelector much slower?

2015-04-28 Thread Glen Huang
But If I do getElementsByClass()[0], and LiveNodeList is immediately garbage 
collectable, then if I change the DOM, Blink won't traverse ancestors, thus no 
penalty for DOM mutation?

 On Apr 28, 2015, at 2:28 PM, Elliott Sprehn espr...@chromium.org wrote:
 
 
 
 On Mon, Apr 27, 2015 at 11:13 PM, Glen Huang curvedm...@gmail.com 
 mailto:curvedm...@gmail.com wrote:
 On second thought, if the list returned by getElementsByClass() is lazy 
 populated as Boris says, it shouldn't be a problem. The list is only updated 
 when you access that list again.
 
 The invalidation is what makes your code slower. Specifically any time you 
 mutate the tree, and you have live node lists, we traverse ancestors to mark 
 them as needing to be updated.
 
 Blink (and likely other browsers) will eventually garbage collect the 
 LiveNodeList and then your DOM mutations will get faster again.
  
 
 On Apr 28, 2015, at 2:08 PM, Glen Huang curvedm...@gmail.com 
 mailto:curvedm...@gmail.com wrote:
 
 Live node lists make all dom mutation slower
 
 Haven't thought about this before. Thank you for pointing it out. So if I 
 use, for example, lots of getElementsByClass() in the code, I'm actually 
 slowing down all DOM mutating APIs?
 
 



Re: Why is querySelector much slower?

2015-04-28 Thread Glen Huang
On second thought, if the list returned by getElementsByClass() is lazy 
populated as Boris says, it shouldn't be a problem. The list is only updated 
when you access that list again.

 On Apr 28, 2015, at 2:08 PM, Glen Huang curvedm...@gmail.com wrote:
 
 Live node lists make all dom mutation slower
 
 Haven't thought about this before. Thank you for pointing it out. So if I 
 use, for example, lots of getElementsByClass() in the code, I'm actually 
 slowing down all DOM mutating APIs?



RE: Why is querySelector much slower?

2015-04-27 Thread François REMY
Not sure this is a public-webapps issue but I'm pretty sure however that the 
reason is the return values of  getElement*By*(...)  are cached by the 
browser which means that you end up not doing the work at all at some point in 
the loop, while you probably do it every single time for  querySelector  
which cannot return a cached result.



From: curvedm...@gmail.com
Date: Mon, 27 Apr 2015 16:57:23 +0800
To: public-webapps@w3.org
Subject: Why is querySelector much slower?

Intuitively, querySelector('.class') only needs to find the first matching 
node, whereas getElementsByClassName('.class')[0] needs to find all matching 
nodes and then return the first. The former should be a lot quicker than the 
latter. Why that's not the case?
See http://jsperf.com/queryselectorall-vs-getelementsbytagname/119 for the test
I know querySelectorAll is slow because of the static nature of returned 
NodeList, but this shouldn't be an issue for querySelector. 
 

Re: Why is querySelector much slower?

2015-04-27 Thread Jonas Sicking
On Mon, Apr 27, 2015 at 1:57 AM, Glen Huang curvedm...@gmail.com wrote:
 Intuitively, querySelector('.class') only needs to find the first matching
 node, whereas getElementsByClassName('.class')[0] needs to find all matching
 nodes and then return the first. The former should be a lot quicker than the
 latter. Why that's not the case?

I can't speak for other browsers, but Gecko-based browsers only search
the DOM until the first hit for getElementsByClassName('class')[0].
I'm not sure why you say that it must scan for all hits.

/ Jonas



Re: Why is querySelector much slower?

2015-04-27 Thread Glen Huang
Thank you for the sample code. It's very helpful.

When you say var node = list[0]; walks the DOM until the first item is found, 
do you mean it only happens under the condition that some previous code has 
changed the DOM structure? If not, the returned list object will be marked as 
up-to-day, and accessing the first element is very cheap? I ask because in the 
first paragraph you said the returned list and returned first element is 
probably precomputed.

Also, this is my mental model after reading your explanation, I wonder if 
that's correct:

After UA has parsed html, it caches a hash table of elements with class names 
(also all element with ids, all elements with tag names, etc in different hash 
tables), keyed under the class names. When getElementsByClassName() is called, 
and the DOM hasn't been modified, it simply creates a list of elements with 
that class name from the hash table. When the first element is accessed from 
that list, and the DOM still isn't modified, the element is returned directly.

The hash table is kept in sync with the DOM when it's modified. And if the DOM 
is changed after the list is returned but before it's accessed, the list will 
be masked as dirty, and accessing its element will walk the DOM (and mark the 
list as partially updated after that).

Is this description correct?

And the final question:

Why can't querySelector benefit from these hash tables? I currently feel the 
urge to optimize it myself by overriding it with a custom function which will 
parse the passed selector, and if it's a simple selector like div, .class, 
#id, call the corresponding getElement*() function instead. Why can't UAs 
perform this for us?

If my mental model is correct, it's simpler than getElement*() from an UA's 
point of view. It simply needs to lookup the first matching element from the 
hash table and return it, no need to return a list and mark it as clean or 
dirty any more. The only price it  pays is parsing the selector.

Is it because authors don't use querySelector often enough that UAs aren't 
interested in optimizing it?

 On Apr 27, 2015, at 9:51 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 
 On 4/27/15 4:57 AM, Glen Huang wrote:
 Intuitively, querySelector('.class') only needs to find the first
 matching node, whereas getElementsByClassName('.class')[0] needs to find
 all matching nodes
 
 Not true; see below.
 
 and then return the first. The former should be a lot
 quicker than the latter. Why that's not the case?
 
 See http://jsperf.com/queryselectorall-vs-getelementsbytagname/119 for
 the test
 
 All getElementsByClassName(.foo) has to do in a microbenchmark like this is 
 look up a cached list (probably a single hashtable lookup) and return its 
 first element (likewise precomputed, unless you're modifying the DOM in ways 
 that would affect the list).  It doesn't have to walk the tree at all.
 
 querySelector(.foo), on the other hand, probably walks the tree at the 
 moment in implementations.
 
 Also, back to the not true above: since the list returned by getElementsBy* 
 is live and periodically needs to be recomputed anyway, and since grabbing 
 just its first element is a common usage pattern, Gecko's implementation is 
 actually lazy (see https://bugzilla.mozilla.org/show_bug.cgi?id=104603#c0 for 
 the motivation): it will only walk as much of the DOM as needed to reply to 
 the query being made.  So for example:
 
  // Creates a list object, doesn't do any walking of the DOM, marks
  // object as dirty and returns it.
  var list = document.getElementsByClassName(.foo);
 
  // Walks the DOM until it finds the first element of the list, marks
  // the list as partially updated, and returns that first element.
  var node = list[0];
 
  // Marks the list as dirty again, since the set of nodes it matches
  // has changed
  document.documentElement.className = foo;
 
 I can't speak for what other UAs here, but the assumption that 
 getElementsByClassName('.class')[0] needs to find all matching nodes is just 
 not true in Gecko.
 
 -Boris




Re: Why is querySelector much slower?

2015-04-27 Thread Ryosuke Niwa

 On Apr 27, 2015, at 7:04 PM, Jonas Sicking jo...@sicking.cc wrote:
 
 On Mon, Apr 27, 2015 at 1:57 AM, Glen Huang curvedm...@gmail.com wrote:
 Intuitively, querySelector('.class') only needs to find the first matching
 node, whereas getElementsByClassName('.class')[0] needs to find all matching
 nodes and then return the first. The former should be a lot quicker than the
 latter. Why that's not the case?
 
 I can't speak for other browsers, but Gecko-based browsers only search
 the DOM until the first hit for getElementsByClassName('class')[0].
 I'm not sure why you say that it must scan for all hits.

WebKit (and, AFAIK, Blink) has the same optimization. It's a very important 
optimization.

- R. Niwa




Re: Why is querySelector much slower?

2015-04-27 Thread Glen Huang
I wonder why querySelector can't get the same optimization: If the passed 
selector is a simple selector like .class, do exactly as 
getElementsByClassName('class')[0] does?

 On Apr 28, 2015, at 10:51 AM, Ryosuke Niwa rn...@apple.com wrote:
 
 
 On Apr 27, 2015, at 7:04 PM, Jonas Sicking jo...@sicking.cc wrote:
 
 On Mon, Apr 27, 2015 at 1:57 AM, Glen Huang curvedm...@gmail.com wrote:
 Intuitively, querySelector('.class') only needs to find the first matching
 node, whereas getElementsByClassName('.class')[0] needs to find all matching
 nodes and then return the first. The former should be a lot quicker than the
 latter. Why that's not the case?
 
 I can't speak for other browsers, but Gecko-based browsers only search
 the DOM until the first hit for getElementsByClassName('class')[0].
 I'm not sure why you say that it must scan for all hits.
 
 WebKit (and, AFAIK, Blink) has the same optimization. It's a very important 
 optimization.
 
 - R. Niwa
 




Re: Why is querySelector much slower?

2015-04-27 Thread Boris Zbarsky

On 4/27/15 11:27 PM, Glen Huang wrote:

When you say var node = list[0]; walks the DOM until the first item is found, 
do you mean it only happens under the condition that some previous code has changed the 
DOM structure?


Or that no one has ever asked the list for its [0] element before, yes.


If not, the returned list object will be marked as up-to-day, and accessing the 
first element is very cheap?


In Gecko, yes.


I ask because in the first paragraph you said the returned list and returned 
first element is probably precomputed.


In the case of the microbenchmark where you just ask for it repeatedly 
without changing the DOM, yes.



After UA has parsed html, it caches a hash table of elements with class names 
(also all element with ids, all elements with tag names, etc in different hash 
tables), keyed under the class names.


At least in Gecko, that's not how it works.

There _is_ a hashtable mapping ids to element lists used for 
getElementById that is populated eagerly.


There is also hashtable mapping the pair (class string, root) to an 
element list that's used by getElementsByClassName and is populated 
lazily.  Likewise, a hashtable mapping the pair (tagname, root) to an 
element list that's used by getElementsByClassName; this one is also 
populated lazily.


 When getElementsByClassName() is called, and the DOM hasn't been 
modified, it simply creates a list of elements with that class name from 
the hash table.


No, it just gets the list pointer from the hashtable (if any) and 
returns it.  That is, getElementsByClasName(foo) === 
getElementsByClassName(foo) tests true.  If there is no list pointer 
in the hashtable, an empty list is created, stored in the hashtable, and 
returned.



When the first element is accessed from that list, and the DOM still isn't 
modified, the element is returned directly.


If the list has computed it before.  If not, it walks the DOM until it 
finds its first element, then adds that one element to the list and 
returns it.



The hash table is kept in sync with the DOM when it's modified.


The id hashtable is.  The class/tagname hashtables aren't kept in sync 
per se, since they're lazily populated anyway, but the lists stored in 
them may need to be marked dirty.



And if the DOM is changed after the list is returned but before it's accessed


Or before the list is returned.  The order really doesn't matter here; 
what matters is whether the DOM is changed after the previous access, if 
any.



Why can't querySelector benefit from these hash tables?


It could, somewhat.  querySelector with an id selector does in fact 
benefit from the id hashtable.  For the specific case of querySelector 
with a class selector, we _could_ internally try to optimize a bit, 
especially for the class case (the tag name case is a bit harder 
because, for example, querySelector(foo) and 
getElementsByTagName(foo)[0] can return different nodes depending on 
the value of the string foo and whether it contains any ':' characters 
and whatnot).



I currently feel the urge to optimize it myself by overriding it with a custom function which will parse the 
passed selector, and if it's a simple selector like div, .class, #id, 
call the corresponding getElement*() function instead.


Then you'll end up with incorrect behavior in some cases, which you may 
of course not care about.



Why can't UAs perform this for us?


To some extent they could.  In practice this hasn't come up as a 
bottleneck in anything we've profiled so far, so people have avoided 
adding what seems like unnecessary complexity, but if there's a 
real-life example (not a microbenchmark) where this is being a problem 
that would certainly help a lot with getting this sort of thing on the 
radar.



If my mental model is correct


It's not quite.


The only price it  pays is parsing the selector.


Not even that, possibly; at least Gecko has a cache of parsed selectors.


Is it because authors don't use querySelector often enough that UAs aren't 
interested in optimizing it?


Or more precisely don't use it in ways that make it the performance 
bottleneck.


-Boris



Re: Why is querySelector much slower?

2015-04-27 Thread Elliott Sprehn
Live node lists make all dom mutation slower, so while it might look faster
in your benchmark it's actually slower elsewhere (ex. appendChild).

Do you have a real application where you see querySelector as the
bottleneck?
On Apr 27, 2015 5:32 PM, Glen Huang curvedm...@gmail.com wrote:

 I wonder why querySelector can't get the same optimization: If the passed
 selector is a simple selector like .class, do exactly as
 getElementsByClassName('class')[0] does?

  On Apr 28, 2015, at 10:51 AM, Ryosuke Niwa rn...@apple.com wrote:
 
 
  On Apr 27, 2015, at 7:04 PM, Jonas Sicking jo...@sicking.cc wrote:
 
  On Mon, Apr 27, 2015 at 1:57 AM, Glen Huang curvedm...@gmail.com
 wrote:
  Intuitively, querySelector('.class') only needs to find the first
 matching
  node, whereas getElementsByClassName('.class')[0] needs to find all
 matching
  nodes and then return the first. The former should be a lot quicker
 than the
  latter. Why that's not the case?
 
  I can't speak for other browsers, but Gecko-based browsers only search
  the DOM until the first hit for getElementsByClassName('class')[0].
  I'm not sure why you say that it must scan for all hits.
 
  WebKit (and, AFAIK, Blink) has the same optimization. It's a very
 important optimization.
 
  - R. Niwa