Re: Why is querySelector much slower?
On 4/29/15 12:23 AM, Glen Huang wrote: because of the proxy machinery involved on the JS engine side Do you mean the cost introduced by passing a C++ object into ecmascript world? No, it's the cost introduced by needing custom property behavior for integer-named properties on lists (and in particular the need to have them appear and disappear, not be redefinable, and so forth). The upshot in the SpiderMonkey implementation is that the list object is a proxy in the ES6 sense, which makes the [0] access take a slower path through the JS engine than I'd like. -Boris
Re: Why is querySelector much slower?
On 4/28/15 2:44 AM, Glen Huang wrote: But If I do getElementsByClass()[0], and LiveNodeList is immediately garbage collectable Then it will only be collected the next time GC happens. Which might not be for a while. -Boris
Re: Why is querySelector much slower?
On 4/28/15 1:58 AM, Glen Huang wrote: Just a quick follow up question to quench my curiosity: if I do list[1] and no one has ever asked the list for any element, Gecko will find the first two matching elements, and store them in the list, if I then immediately do list[0], the first element is returned without walking the DOM (assuming there are at least two matching elements)? Yes, exactly. querySelector(foo) and getElementsByTagName(foo)[0] can return different nodes Still a bit confused regarding this. If the premise is the selector only contains characters allowed in a tag name Then in that case I _think_ those are now equivalent, though I'd have to check it carefully. They didn't use to be back when getElementsByTagName was defined as matching on qualified name, not localname... -Boris
Re: Why is querySelector much slower?
On 4/28/15 2:13 AM, Glen Huang wrote: On second thought, if the list returned by getElementsByClass() is lazy populated as Boris says, it shouldn't be a problem. The list is only updated when you access that list again. Mutations have to check whether the list is marked dirty already or not. This is not too bad if you only have a few lists around, but if you have several thousand it can start to hurt. -Boris
Re: Why is querySelector much slower?
Ww, this is pure gold. Thank you so much for such thorough explanation, any even took the trouble to actually implement optimizations to make sure the numbers are right. I'm so grateful for the work you put into this just to answer my question. How do I accept your answer here? ;) So what you're seeing is that the benchmark claims the operation is performed in 1-2 clock cycles I never thought about relating ops/sec numbers to clock cycles. Thanks for the tip. So what this getElementById benchmark measures is how fast a loop counter can be decremented from some starting value to 0. This makes so much sense now. because of the proxy machinery involved on the JS engine side Do you mean the cost introduced by passing a C++ object into ecmascript world? In this case, those all seem to have about the same cost; I now see why querySelector has some extract work to do. But for real-life testcases algorithmic complexity can often be much more important. Yes. But I suddenly find microbenchmarks to be a wonderful conversation starter. ;) Thanks again for all the explanations, I'm motivated by them to actually dig into the engine source code to discover things myself next time (probably not easy, but should be rewarding). :)
Re: Why is querySelector much slower?
On 4/28/15 2:59 AM, Glen Huang wrote: Looking at the microbenchmark again, for Gecko, getElementById is around 300x faster than querySelector('#id'), and even getElementsByClassName is faster than it. This is why one should not write microbenchmarks. ;) Or at least if one does, examine the results very carefully. The numbers you see for the getElementById benchmark there are on the order of 2e9 operations per second, yes? And modern desktop/laptop CPUs are clocked somewhere in the 2-4 GHz range. So what you're seeing is that the benchmark claims the operation is performed in 1-2 clock cycles. This should seem unlikely if you think the operation involves a hashtable lookup! What's happening there is that Gecko happens to know at JIT compile time in this microbenchmark: 1) The bareword lookup is going to end up at the global, because there is nothing on the scope chain that would shadow the document name. 2) The global has an own property named document whose getter is side-effect-free. 3) The return value of the document property has only been observed to be a Document. 4) Looking up getElementById on the return value of the document property has consistently found it on Document.prototype. 5) Document.prototype.getElementById is known to be side-effect-free. 6) The return value of getElementById is not used (assigned to a function-local variable that is then not used). The upshot of all that is that with a few guards both the getElementById call get and the document get can be dead-code eliminated here. And even if you stored the value somewhere persistent they could both still be loop-hoisted in this case. So what this getElementById benchmark measures is how fast a loop counter can be decremented from some starting value to 0. It happens that this can be done in about 1-2 clock cycles per loop iteration. OK, so what about querySelector(#id) vs getElementsByClassName? In the former case, loop-hoisting and dead code elimination are disallowed because querySelector can throw. That means that you can't eliminate it, and you can't move it past other things that might have observable side effects (like the counter increment). Arguably this is a misfeature in the design of querySelector. In the latter case, loop-hoisting or dead code elimination can't happen because Gecko doesn't know enough about what [0] will do so assumes the worst: that it can have side-effects that can affect what the document getter returns as well as what the getElementsByClassName() call returns. So there are no shortcuts here; you have to actually do the calls. What do those calls do? querySelector does a hashtable lookup for the selector to find a parsed selector. Then it sets up some state that's needed for selector matching. Then it detects that the selector's right-hand-most bit has a simple ID selector and does a fast path that involves looking up that id in the hashtable and then comparing the selector to the elements that are returned until one of them matches. getElementsByClassName has to do a hashtable lookup on the class name, then return the result. Then it has to do the [0] (which is actually surprisingly expensive, by the way, because of the proxy machinery involved on the JS engine side). So we _could_ make querySelector faster here by adding another special case for selectors that are _just_ an id as opposed to the existing optimization (which works for #foo #bar and similar as well). And of course the new special case would only work the way you want for document.querySelector, not element.querySelector; the latter needs to check for your result being a descendant of the element anyway. It's a tradeoff between complexity of implementation (which has its own maintenance _and_ performance costs) and real-life use cases. Lastly, I'd like to put numbers to this. On this particular testcase, the querySelector(#list) call takes about 100ns on my hardware: about 300 CPU cycles. We could add that other set of special-casing and get it down to 70ns (I just checked by implementing it, so this is not a random guess). At that point you've got two hashtable lookups (which we could try to make faster, perhaps), the logic to detect that the optimization can be done at all (which is not that trivial; our selector representation requires a bunch of checks to ensure that it's just an id selector), and whatever work is involved in the binding layer. In this case, those all seem to have about the same cost; about 17-18ns (50 CPU cycles) each. So is your use case one where the difference between querySelector costing 100ns and it costing 70ns actually makes a difference? It doesn't look like it benefits much from an eagerly populated hash table? It benefits a good bit for non-toy documents where avoiding walking the entire DOM is the important part of the optimization. Again, microbenchmarks mostly serve to highlight the
Re: Why is querySelector much slower?
Wow, it's now super clear. Thanks for the detailed explanation. Just a quick follow up question to quench my curiosity: if I do list[1] and no one has ever asked the list for any element, Gecko will find the first two matching elements, and store them in the list, if I then immediately do list[0], the first element is returned without walking the DOM (assuming there are at least two matching elements)? querySelector(foo) and getElementsByTagName(foo)[0] can return different nodes Still a bit confused regarding this. If the premise is the selector only contains characters allowed in a tag name, how can they return different nodes, maybe I missed something? Unless you mean querySelector(:foo) and getElementsByTagName(:foo)[0] can return different results, which is obvious. If by parsing the passed selector (or lookup the cached parsed selectors) you know it only contains a tag name, why it is a bit harder to optimize? You just look up the (tagname, root) hash table, no? In practice this hasn't come up as a bottleneck in anything we've profiled so far I'm probably prematurely optimizing my code. But nevertheless learned something quite valuable by asking. Thanks for answering it. :)
Re: Why is querySelector much slower?
querySelector with an id selector does in fact benefit from the id hashtable Looking at the microbenchmark again, for Gecko, getElementById is around 300x faster than querySelector('#id'), and even getElementsByClassName is faster than it. It doesn't look like it benefits much from an eagerly populated hash table? P.S it's very interesting to see Gecko is around 100x faster than others when it comes to the performance of getElementById. It probably does something unusual?
Re: Why is querySelector much slower?
Live node lists make all dom mutation slower Haven't thought about this before. Thank you for pointing it out. So if I use, for example, lots of getElementsByClass() in the code, I'm actually slowing down all DOM mutating APIs?
Re: Why is querySelector much slower?
On Mon, Apr 27, 2015 at 11:13 PM, Glen Huang curvedm...@gmail.com wrote: On second thought, if the list returned by getElementsByClass() is lazy populated as Boris says, it shouldn't be a problem. The list is only updated when you access that list again. The invalidation is what makes your code slower. Specifically any time you mutate the tree, and you have live node lists, we traverse ancestors to mark them as needing to be updated. Blink (and likely other browsers) will eventually garbage collect the LiveNodeList and then your DOM mutations will get faster again. On Apr 28, 2015, at 2:08 PM, Glen Huang curvedm...@gmail.com wrote: Live node lists make all dom mutation slower Haven't thought about this before. Thank you for pointing it out. So if I use, for example, lots of getElementsByClass() in the code, I'm actually slowing down all DOM mutating APIs?
Re: Why is querySelector much slower?
But If I do getElementsByClass()[0], and LiveNodeList is immediately garbage collectable, then if I change the DOM, Blink won't traverse ancestors, thus no penalty for DOM mutation? On Apr 28, 2015, at 2:28 PM, Elliott Sprehn espr...@chromium.org wrote: On Mon, Apr 27, 2015 at 11:13 PM, Glen Huang curvedm...@gmail.com mailto:curvedm...@gmail.com wrote: On second thought, if the list returned by getElementsByClass() is lazy populated as Boris says, it shouldn't be a problem. The list is only updated when you access that list again. The invalidation is what makes your code slower. Specifically any time you mutate the tree, and you have live node lists, we traverse ancestors to mark them as needing to be updated. Blink (and likely other browsers) will eventually garbage collect the LiveNodeList and then your DOM mutations will get faster again. On Apr 28, 2015, at 2:08 PM, Glen Huang curvedm...@gmail.com mailto:curvedm...@gmail.com wrote: Live node lists make all dom mutation slower Haven't thought about this before. Thank you for pointing it out. So if I use, for example, lots of getElementsByClass() in the code, I'm actually slowing down all DOM mutating APIs?
Re: Why is querySelector much slower?
On second thought, if the list returned by getElementsByClass() is lazy populated as Boris says, it shouldn't be a problem. The list is only updated when you access that list again. On Apr 28, 2015, at 2:08 PM, Glen Huang curvedm...@gmail.com wrote: Live node lists make all dom mutation slower Haven't thought about this before. Thank you for pointing it out. So if I use, for example, lots of getElementsByClass() in the code, I'm actually slowing down all DOM mutating APIs?
RE: Why is querySelector much slower?
Not sure this is a public-webapps issue but I'm pretty sure however that the reason is the return values of getElement*By*(...) are cached by the browser which means that you end up not doing the work at all at some point in the loop, while you probably do it every single time for querySelector which cannot return a cached result. From: curvedm...@gmail.com Date: Mon, 27 Apr 2015 16:57:23 +0800 To: public-webapps@w3.org Subject: Why is querySelector much slower? Intuitively, querySelector('.class') only needs to find the first matching node, whereas getElementsByClassName('.class')[0] needs to find all matching nodes and then return the first. The former should be a lot quicker than the latter. Why that's not the case? See http://jsperf.com/queryselectorall-vs-getelementsbytagname/119 for the test I know querySelectorAll is slow because of the static nature of returned NodeList, but this shouldn't be an issue for querySelector.
Re: Why is querySelector much slower?
On Mon, Apr 27, 2015 at 1:57 AM, Glen Huang curvedm...@gmail.com wrote: Intuitively, querySelector('.class') only needs to find the first matching node, whereas getElementsByClassName('.class')[0] needs to find all matching nodes and then return the first. The former should be a lot quicker than the latter. Why that's not the case? I can't speak for other browsers, but Gecko-based browsers only search the DOM until the first hit for getElementsByClassName('class')[0]. I'm not sure why you say that it must scan for all hits. / Jonas
Re: Why is querySelector much slower?
Thank you for the sample code. It's very helpful. When you say var node = list[0]; walks the DOM until the first item is found, do you mean it only happens under the condition that some previous code has changed the DOM structure? If not, the returned list object will be marked as up-to-day, and accessing the first element is very cheap? I ask because in the first paragraph you said the returned list and returned first element is probably precomputed. Also, this is my mental model after reading your explanation, I wonder if that's correct: After UA has parsed html, it caches a hash table of elements with class names (also all element with ids, all elements with tag names, etc in different hash tables), keyed under the class names. When getElementsByClassName() is called, and the DOM hasn't been modified, it simply creates a list of elements with that class name from the hash table. When the first element is accessed from that list, and the DOM still isn't modified, the element is returned directly. The hash table is kept in sync with the DOM when it's modified. And if the DOM is changed after the list is returned but before it's accessed, the list will be masked as dirty, and accessing its element will walk the DOM (and mark the list as partially updated after that). Is this description correct? And the final question: Why can't querySelector benefit from these hash tables? I currently feel the urge to optimize it myself by overriding it with a custom function which will parse the passed selector, and if it's a simple selector like div, .class, #id, call the corresponding getElement*() function instead. Why can't UAs perform this for us? If my mental model is correct, it's simpler than getElement*() from an UA's point of view. It simply needs to lookup the first matching element from the hash table and return it, no need to return a list and mark it as clean or dirty any more. The only price it pays is parsing the selector. Is it because authors don't use querySelector often enough that UAs aren't interested in optimizing it? On Apr 27, 2015, at 9:51 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 4/27/15 4:57 AM, Glen Huang wrote: Intuitively, querySelector('.class') only needs to find the first matching node, whereas getElementsByClassName('.class')[0] needs to find all matching nodes Not true; see below. and then return the first. The former should be a lot quicker than the latter. Why that's not the case? See http://jsperf.com/queryselectorall-vs-getelementsbytagname/119 for the test All getElementsByClassName(.foo) has to do in a microbenchmark like this is look up a cached list (probably a single hashtable lookup) and return its first element (likewise precomputed, unless you're modifying the DOM in ways that would affect the list). It doesn't have to walk the tree at all. querySelector(.foo), on the other hand, probably walks the tree at the moment in implementations. Also, back to the not true above: since the list returned by getElementsBy* is live and periodically needs to be recomputed anyway, and since grabbing just its first element is a common usage pattern, Gecko's implementation is actually lazy (see https://bugzilla.mozilla.org/show_bug.cgi?id=104603#c0 for the motivation): it will only walk as much of the DOM as needed to reply to the query being made. So for example: // Creates a list object, doesn't do any walking of the DOM, marks // object as dirty and returns it. var list = document.getElementsByClassName(.foo); // Walks the DOM until it finds the first element of the list, marks // the list as partially updated, and returns that first element. var node = list[0]; // Marks the list as dirty again, since the set of nodes it matches // has changed document.documentElement.className = foo; I can't speak for what other UAs here, but the assumption that getElementsByClassName('.class')[0] needs to find all matching nodes is just not true in Gecko. -Boris
Re: Why is querySelector much slower?
On Apr 27, 2015, at 7:04 PM, Jonas Sicking jo...@sicking.cc wrote: On Mon, Apr 27, 2015 at 1:57 AM, Glen Huang curvedm...@gmail.com wrote: Intuitively, querySelector('.class') only needs to find the first matching node, whereas getElementsByClassName('.class')[0] needs to find all matching nodes and then return the first. The former should be a lot quicker than the latter. Why that's not the case? I can't speak for other browsers, but Gecko-based browsers only search the DOM until the first hit for getElementsByClassName('class')[0]. I'm not sure why you say that it must scan for all hits. WebKit (and, AFAIK, Blink) has the same optimization. It's a very important optimization. - R. Niwa
Re: Why is querySelector much slower?
I wonder why querySelector can't get the same optimization: If the passed selector is a simple selector like .class, do exactly as getElementsByClassName('class')[0] does? On Apr 28, 2015, at 10:51 AM, Ryosuke Niwa rn...@apple.com wrote: On Apr 27, 2015, at 7:04 PM, Jonas Sicking jo...@sicking.cc wrote: On Mon, Apr 27, 2015 at 1:57 AM, Glen Huang curvedm...@gmail.com wrote: Intuitively, querySelector('.class') only needs to find the first matching node, whereas getElementsByClassName('.class')[0] needs to find all matching nodes and then return the first. The former should be a lot quicker than the latter. Why that's not the case? I can't speak for other browsers, but Gecko-based browsers only search the DOM until the first hit for getElementsByClassName('class')[0]. I'm not sure why you say that it must scan for all hits. WebKit (and, AFAIK, Blink) has the same optimization. It's a very important optimization. - R. Niwa
Re: Why is querySelector much slower?
On 4/27/15 11:27 PM, Glen Huang wrote: When you say var node = list[0]; walks the DOM until the first item is found, do you mean it only happens under the condition that some previous code has changed the DOM structure? Or that no one has ever asked the list for its [0] element before, yes. If not, the returned list object will be marked as up-to-day, and accessing the first element is very cheap? In Gecko, yes. I ask because in the first paragraph you said the returned list and returned first element is probably precomputed. In the case of the microbenchmark where you just ask for it repeatedly without changing the DOM, yes. After UA has parsed html, it caches a hash table of elements with class names (also all element with ids, all elements with tag names, etc in different hash tables), keyed under the class names. At least in Gecko, that's not how it works. There _is_ a hashtable mapping ids to element lists used for getElementById that is populated eagerly. There is also hashtable mapping the pair (class string, root) to an element list that's used by getElementsByClassName and is populated lazily. Likewise, a hashtable mapping the pair (tagname, root) to an element list that's used by getElementsByClassName; this one is also populated lazily. When getElementsByClassName() is called, and the DOM hasn't been modified, it simply creates a list of elements with that class name from the hash table. No, it just gets the list pointer from the hashtable (if any) and returns it. That is, getElementsByClasName(foo) === getElementsByClassName(foo) tests true. If there is no list pointer in the hashtable, an empty list is created, stored in the hashtable, and returned. When the first element is accessed from that list, and the DOM still isn't modified, the element is returned directly. If the list has computed it before. If not, it walks the DOM until it finds its first element, then adds that one element to the list and returns it. The hash table is kept in sync with the DOM when it's modified. The id hashtable is. The class/tagname hashtables aren't kept in sync per se, since they're lazily populated anyway, but the lists stored in them may need to be marked dirty. And if the DOM is changed after the list is returned but before it's accessed Or before the list is returned. The order really doesn't matter here; what matters is whether the DOM is changed after the previous access, if any. Why can't querySelector benefit from these hash tables? It could, somewhat. querySelector with an id selector does in fact benefit from the id hashtable. For the specific case of querySelector with a class selector, we _could_ internally try to optimize a bit, especially for the class case (the tag name case is a bit harder because, for example, querySelector(foo) and getElementsByTagName(foo)[0] can return different nodes depending on the value of the string foo and whether it contains any ':' characters and whatnot). I currently feel the urge to optimize it myself by overriding it with a custom function which will parse the passed selector, and if it's a simple selector like div, .class, #id, call the corresponding getElement*() function instead. Then you'll end up with incorrect behavior in some cases, which you may of course not care about. Why can't UAs perform this for us? To some extent they could. In practice this hasn't come up as a bottleneck in anything we've profiled so far, so people have avoided adding what seems like unnecessary complexity, but if there's a real-life example (not a microbenchmark) where this is being a problem that would certainly help a lot with getting this sort of thing on the radar. If my mental model is correct It's not quite. The only price it pays is parsing the selector. Not even that, possibly; at least Gecko has a cache of parsed selectors. Is it because authors don't use querySelector often enough that UAs aren't interested in optimizing it? Or more precisely don't use it in ways that make it the performance bottleneck. -Boris
Re: Why is querySelector much slower?
Live node lists make all dom mutation slower, so while it might look faster in your benchmark it's actually slower elsewhere (ex. appendChild). Do you have a real application where you see querySelector as the bottleneck? On Apr 27, 2015 5:32 PM, Glen Huang curvedm...@gmail.com wrote: I wonder why querySelector can't get the same optimization: If the passed selector is a simple selector like .class, do exactly as getElementsByClassName('class')[0] does? On Apr 28, 2015, at 10:51 AM, Ryosuke Niwa rn...@apple.com wrote: On Apr 27, 2015, at 7:04 PM, Jonas Sicking jo...@sicking.cc wrote: On Mon, Apr 27, 2015 at 1:57 AM, Glen Huang curvedm...@gmail.com wrote: Intuitively, querySelector('.class') only needs to find the first matching node, whereas getElementsByClassName('.class')[0] needs to find all matching nodes and then return the first. The former should be a lot quicker than the latter. Why that's not the case? I can't speak for other browsers, but Gecko-based browsers only search the DOM until the first hit for getElementsByClassName('class')[0]. I'm not sure why you say that it must scan for all hits. WebKit (and, AFAIK, Blink) has the same optimization. It's a very important optimization. - R. Niwa