Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread
Parsing, especially JS parsing still takes a large amount of time on page loading. We tried to improve the preload scanner by moving it into anouther thread, but there was no gain (except some special cases). Synchronization between threads is surprisingly (ridiculously) costly, usually worth for those tasks, which needs quite a few million instructions to be executed (and tokenization takes far less in most cases). For smaller tasks, SIMD instruction sets can help, which is basically a parallel execution on a single thread. Anyway it is worth trying, but it is really challenging to make it work in practice. Good luck! Regards, Zoltan On Jan 9, 2013, at 10:04 PM, Ian Hickson i...@hixie.ch wrote: On Wed, 9 Jan 2013, Eric Seidel wrote: The core goal is to reduce latency -- to free up the main thread for JavaScript and UI interaction -- which as you correctly note, cannot be moved off of the main thread due to the single thread of execution model of the web. Parsing and (maybe to a lesser extent) compiling JS can be moved off the main thread, though, right? That's probably worth examining too, if it hasn't already been done. 100% agree. However, the same problem I brought up about tokenization applies here: a lot of JS functions are super cheap to parse and compile already, and the latency of doing so on the main thread is likely to be lower than the latency of chatting with another core. I suspect this could be alleviated by (1) aggressively pipelining the work, where during page load or during heavy JS use the compilation thread always has a non-empty queue of work to do; this will mean that the latency of communication is paid only when the first compilation occurs, and (2) allowing the main thread to steal work from the compilation queue. I'm not sure how to make (2) work well. For parsing it's actually harder since we rely heavily on the lazy parsing optimization: code is only parsed once we need it *right now* to run a function. For compilation, it's somewhat easier: the most expensive compilation step is the third-tier optimizing JIT; we can delay this as long as we want, though the longer we dela y it, the longer we spend running slower code. Hence, to make parsing concurrent, the main problem is figuring out how to do predictive parsing: have a concurrent thread start parsing something just before we need it. Without predictive parsing, making it concurrent would be a guaranteed loss since the main thread would just be stuck waiting for the thread to finish. To make optimized compiles concurrent without a regression, the main problem is ensuring that in those cases where we believe that the time taken to compile the function will be smaller than the time taken to awake the concurrent thread, we will instead just compile it on the main thread right away. Though, if we could predict that a function was going to get hot in the future, we could speculatively tell a concurrent thread to compile it fully knowing that it won't wake up and do so until exactly when we would have otherwise invoked the compiler on the main thread (that is, it'll wake up and start compiling it once the main thread has executed the function enough times to get good profiling data). Anyway, you're absolutely right that this is an area that should be explored. -F -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread
https://bugs.webkit.org/show_bug.cgi?id=63531 The work was done by Zoltan Horvath and Balazs Kelemen. Regards, Zoltan Hi Zoltan, I would be curious how you did the synchronization. I've had some luck reducing synchronization costs before. Was the patch ever uploaded anywhere? -F On Jan 10, 2013, at 12:11 AM, Zoltan Herczeg zherc...@webkit.org wrote: Parsing, especially JS parsing still takes a large amount of time on page loading. We tried to improve the preload scanner by moving it into anouther thread, but there was no gain (except some special cases). Synchronization between threads is surprisingly (ridiculously) costly, usually worth for those tasks, which needs quite a few million instructions to be executed (and tokenization takes far less in most cases). For smaller tasks, SIMD instruction sets can help, which is basically a parallel execution on a single thread. Anyway it is worth trying, but it is really challenging to make it work in practice. Good luck! Regards, Zoltan On Jan 9, 2013, at 10:04 PM, Ian Hickson i...@hixie.ch wrote: On Wed, 9 Jan 2013, Eric Seidel wrote: The core goal is to reduce latency -- to free up the main thread for JavaScript and UI interaction -- which as you correctly note, cannot be moved off of the main thread due to the single thread of execution model of the web. Parsing and (maybe to a lesser extent) compiling JS can be moved off the main thread, though, right? That's probably worth examining too, if it hasn't already been done. 100% agree. However, the same problem I brought up about tokenization applies here: a lot of JS functions are super cheap to parse and compile already, and the latency of doing so on the main thread is likely to be lower than the latency of chatting with another core. I suspect this could be alleviated by (1) aggressively pipelining the work, where during page load or during heavy JS use the compilation thread always has a non-empty queue of work to do; this will mean that the latency of communication is paid only when the first compilation occurs, and (2) allowing the main thread to steal work from the compilation queue. I'm not sure how to make (2) work well. For parsing it's actually harder since we rely heavily on the lazy parsing optimization: code is only parsed once we need it *right now* to run a function. For compilation, it's somewhat easier: the most expensive compilation step is the third-tier optimizing JIT; we can delay this as long as we want, though the longer we dela y it, the longer we spend running slower code. Hence, to make parsing concurrent, the main problem is figuring out how to do predictive parsing: have a concurrent thread start parsing something just before we need it. Without predictive parsing, making it concurrent would be a guaranteed loss since the main thread would just be stuck waiting for the thread to finish. To make optimized compiles concurrent without a regression, the main problem is ensuring that in those cases where we believe that the time taken to compile the function will be smaller than the time taken to awake the concurrent thread, we will instead just compile it on the main thread right away. Though, if we could predict that a function was going to get hot in the future, we could speculatively tell a concurrent thread to compile it fully knowing that it won't wake up and do so until exactly when we would have otherwise invoked the compiler on the main thread (that is, it'll wake up and start compiling it once the main thread has executed the function enough times to get good profiling data). Anyway, you're absolutely right that this is an area that should be explored. -F -- Ian Hickson U+1047E)\._.,--,'``. fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread
I presume from your other comments that the goal of this work is responsiveness, rather than page load speed as such. I'm excited about the potential to improve responsiveness during page loading. One question: what tests are you planning to use to validate whether this approach achieves its goals of better responsiveness? The reason I ask is that this sounds like a significant increase in complexity, so we should be very confident that there is a real and major benefit. One thing I wonder about is how common it is to have enough of the page processed that the user could interact with it in principle, yet still have large parsing chunks remaining which would prevent that interaction from being smooth. Another thing I wonder about is whether yielding to the event loop more aggressively could achieve a similar benefit at a much lower complexity cost. Having a test to drive the work would allow us to answer these types of questions. (It may also be that the test data you cited would already answer these questions but I didn't sufficiently understand it; if so, further explanation would be appreciated.) Regards, Maciej On Jan 9, 2013, at 6:00 PM, Eric Seidel e...@webkit.org wrote: We're planning to move parts of the HTML Parser off of the main thread: https://bugs.webkit.org/show_bug.cgi?id=106127 This is driven by our testing showing that HTML parsing on mobile is be slow, and long (causing user-visible delays averaging 10 frames / 150ms). https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002 Complete data can be found at [1]. Mozilla moved their parser onto a separate thread during their HTML5 parser re-write: https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading We plan to take a slightly simpler approach, moving only Tokenizing off of the main thread: https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit The left is our current design, the middle is a tokenizer-only design, and the right is more like mozilla's threaded-parser design. Profiling shows Tokenizing accounts for about 10x the number of samples as TreeBuilding. Including Antti's recent testing (.5% vs. 3%): https://bugs.webkit.org/show_bug.cgi?id=106127#c10 If after we do this we measure and find ourselves still spending a lot of main-thread time parsing, we'll move the TreeBuilder too. :) (This work is a nicely separable sub-set of larger work needed to move the TreeBuilder.) We welcome your thoughts and comments. 1. https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0 (Epic thanks to Nat Duca for helping us collect that data.) ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread
When loading web pages we are very frequently in a situation where we already have the source data (HTML text here but the same applies to preloaded Javascript, CSS, images, ...) and know we are likely to need it in soon, but can't actually utilize it for indeterminate time. This happens because pending external JS resources blocks the main parser (and pending CSS resources block JS execution) for web compatibility reasons. In this situation it makes sense to start processing resources we have to forms that are faster to use when they are eventually actually needed (like token stream here). One thing we already do when the main parser gets blocked is preload scanning. We look through the unparsed HTML source we have and trigger loads for any resources found. It would be beneficial if this happened off the main thread. We could do it when new data arrives in parallel with JS execution and other time consuming engine work, potentially triggering resource loads earlier. I think a good first step here would be to share the tokens between the preload scanner and the main parser and worry about the threading part afterwards. We often parse the HTML source more or less twice so this is an unquestionable win. antti On Thu, Jan 10, 2013 at 7:41 AM, Filip Pizlo fpi...@apple.com wrote: I think your biggest challenge will be ensuring that the latency of shoving things to another core and then shoving them back will be smaller than the latency of processing those same things on the main thread. For small documents, I expect concurrent tokenization to be a pure regression because the latency of waking up another thread to do just a small bit of work, plus the added cost of whatever synchronization operations will be needed to ensure safety, will involve more total work than just tokenizing locally. We certainly see this in the JSC parallel GC, and in line with traditional parallel GC design, we ensure that parallel threads only kick in when the main thread is unable to keep up with the work that it has created for itself. Do you have a vision for how to implement a similar self-throttling, where tokenizing continues on the main thread so long as it is cheap to do so? -Filip On Jan 9, 2013, at 6:00 PM, Eric Seidel e...@webkit.org wrote: We're planning to move parts of the HTML Parser off of the main thread: https://bugs.webkit.org/show_bug.cgi?id=106127 This is driven by our testing showing that HTML parsing on mobile is be slow, and long (causing user-visible delays averaging 10 frames / 150ms). https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002 Complete data can be found at [1]. Mozilla moved their parser onto a separate thread during their HTML5 parser re-write: https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading We plan to take a slightly simpler approach, moving only Tokenizing off of the main thread: https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit The left is our current design, the middle is a tokenizer-only design, and the right is more like mozilla's threaded-parser design. Profiling shows Tokenizing accounts for about 10x the number of samples as TreeBuilding. Including Antti's recent testing (.5% vs. 3%): https://bugs.webkit.org/show_bug.cgi?id=106127#c10 If after we do this we measure and find ourselves still spending a lot of main-thread time parsing, we'll move the TreeBuilder too. :) (This work is a nicely separable sub-set of larger work needed to move the TreeBuilder.) We welcome your thoughts and comments. 1. https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0 (Epic thanks to Nat Duca for helping us collect that data.) ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread
The data Eric and Adam were using comes from a python library a few of us have been developing called telemetry. Its basically a bunch of python that lets us write performance tests against any browser that speaks the inspector websocket protocol. We're using it a lot of should we parallelize X questions, as well as regression-style have our changes to X stayed a win over time? They might have other ways in mind to obtain this data that is more webkit-y, but I figure a bit on how we got this far might be useful for this mailing list. Roughly, telemetry scripts connect up to a host and port where you've arranged to have an inspector websocket listening, e.g. $MY_PHONE_IP:9222, or google-chrome --remote-debugging-port=9222 telemetry --browser=$LOCALHOST:9222. Once that's established, we have communication with WebCore's InspectorAgent, and assuming we trust the agent, can do some pretty powerful stuff from there. The benchmark being discussed here [webkit_benchmark] navigates the browser from page to page, enabling inspector's TimelineAgent as it does in order to get performance data about the page load. We then postprocess that data stream into a human consumable csv and there is [some amount] of rejoicing. Assuming we trust inspector timeline [Pavel's done a number of fixes to help us trust it more!] this gets pretty clean results, pretty easily. A key challenge with telemetry has been getting stable runs on real world sites. The archive.org technqiues are cool, but they dont capture some of the big ones, like a logged-in gmail account. We've addressed this using tonyg and simonjam's http://code.google.com/p/web-page-replay/. If the browser under test supports web page replay [~= redirecting dns requests to the replay server instead of the real site], then you can get stable, repeatable runs against super complex real world sites --- its worked on every site we've tried so far. The core telemetry framework is here: http://src.chromium.org/chrome/trunk/src/tools/telemetry/ Its in chromium repo, but please dont hold that against it --- its movable, given interest. The actual webkit benchmark is pretty simple, because most of the functionality comes from telemetry: https://codereview.chromium.org/11791043/ With the patch above landed, obtaining the benchmarking results that Eric got against chrome should be ~= getting a telemetry checkout and doing: ./run_multipage_benchmarks --browser=canary webkit_benchmark page_sets/top_25.json Or if you had an android with chrome on it: ./run_multipage_benchmarks --browser=android-chrome webkit_benchmark page_sets/top_25.json Anyway, I'll leave it to Eric/Adam to speak to how this maps back into the WebKit ecosystem. The use of inspector protocol makes it a theoretical possibility on other ports, but I know some people get nervous (or run away angrily!) when they hear that we're using Inspector as a perf data source. :) - Nat On Thu, Jan 10, 2013 at 1:44 AM, Antti Koivisto koivi...@iki.fi wrote: When loading web pages we are very frequently in a situation where we already have the source data (HTML text here but the same applies to preloaded Javascript, CSS, images, ...) and know we are likely to need it in soon, but can't actually utilize it for indeterminate time. This happens because pending external JS resources blocks the main parser (and pending CSS resources block JS execution) for web compatibility reasons. In this situation it makes sense to start processing resources we have to forms that are faster to use when they are eventually actually needed (like token stream here). One thing we already do when the main parser gets blocked is preload scanning. We look through the unparsed HTML source we have and trigger loads for any resources found. It would be beneficial if this happened off the main thread. We could do it when new data arrives in parallel with JS execution and other time consuming engine work, potentially triggering resource loads earlier. I think a good first step here would be to share the tokens between the preload scanner and the main parser and worry about the threading part afterwards. We often parse the HTML source more or less twice so this is an unquestionable win. antti On Thu, Jan 10, 2013 at 7:41 AM, Filip Pizlo fpi...@apple.com wrote: I think your biggest challenge will be ensuring that the latency of shoving things to another core and then shoving them back will be smaller than the latency of processing those same things on the main thread. For small documents, I expect concurrent tokenization to be a pure regression because the latency of waking up another thread to do just a small bit of work, plus the added cost of whatever synchronization operations will be needed to ensure safety, will involve more total work than just tokenizing locally. We certainly see this in the JSC parallel GC, and in line with traditional parallel GC design, we ensure that parallel threads only
Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread
On Thu, Jan 10, 2013 at 8:37 AM, Maciej Stachowiak m...@apple.com wrote: The reason I ask is that this sounds like a significant increase in complexity, so we should be very confident that there is a real and major benefit. One thing I wonder about is how common it is to have enough of the page processed that the user could interact with it in principle, yet still have large parsing chunks remaining which would prevent that interaction from being smooth. Another thing I wonder about is whether yielding to the event loop more aggressively could achieve a similar benefit at a much lower complexity cost. I don't want to let this point of Maciej's slip away: on mobile we may have fewer cores than desktop, and we're paying a pretty high complexity burden for multiple threads already; some of Nat's awesome recent work in Chromium is too multithreaded for my comfort. I'd back-of-enveloped yielding during page layout and guessed it wasn't worthwhile, but do we know that yielding during parsing isn't? Tom ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] PSA: Migration plan to GStreamer 1.x
Hi, FYI, WebKit-EFL is now building with gstreamer 1.0 by default [1]. [1] https://bugs.webkit.org/show_bug.cgi?id=106178 On Tue, Jan 8, 2013 at 10:09 PM, Ryan Ware w...@linux.intel.com wrote: -Original Message- From: webkit-dev-boun...@lists.webkit.org [mailto:webkit-dev- boun...@lists.webkit.org] On Behalf Of Simon Hausmann Sent: Tuesday, January 08, 2013 2:10 AM To: webkit-dev@lists.webkit.org Subject: Re: [webkit-dev] PSA: Migration plan to GStreamer 1.x On Tuesday, January 08, 2013 10:21:00 AM Philippe Normand wrote: Hi, This mail is mainly for the GTK, Qt and EFL port maintainers, I decided to post here instead of cross-posting to three mailing lists :) So there's been work to port the MediaPlayer and WebAudio GStreamer backends to the new GStreamer 1.x APIs. At the moment you can choose (well, for the GTK port at least) at build time if you want to use the 0.10 or 1.x APIs. The issue is that GStreamer 0.10 is no longer actively maintained and the GStreamer developers/maintainers entirely focus on GStreamer 1.x. Moreover we currently don't have the manpower to maintain the 2 code paths in the WebKit/GStreamer platform layer. The GTK port buildbots already switched to 1.0 last month and I encourage Qt and EFL to do the same ASAP, at least for their buildbots. I'd like to propose we drop the GStreamer 0.10 support from WebKit once the next stable branch of GStreamer is released, it will be 1.2, scheduled somewhere around February. Sounds good to me. This will also be less problematic from a security perspective in the future since it will be harder to get security updates for 0.10. Ryan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev -- Christophe Dumez, PhD Linux Software Engineer Intel Finland Oy - Open Source Technology Center ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Constructors for DOM4 Events
At TPAC there was no objection for DOM4 Event constructors (e.g. new MouseEvent()). Now DOM4 Event constructors are on editor's draft: http://html5labs.interoperabilitybridges.com/dom4events/ https://dvcs.w3.org/hg/d4e/raw-file/tip/source_respec.htm Given the above, I am planning to implement them in WebKit (without any flag). If you have any concern, please let me know. Best Regards On Mon, Oct 1, 2012 at 7:44 AM, Kentaro Hara hara...@chromium.org wrote: Since TPAC is less than a month away, I don't understand why we can't wait for that discussion. Sounds reasonable. I'll wait for TPAC. I do support the idea in general, and I plan to be at TPAC and will advocate for it. I'll be also going to TPAC. I would appreciate your support. On Mon, Oct 1, 2012 at 2:11 PM, Maciej Stachowiak m...@apple.com wrote: Since TPAC is less than a month away, I don't understand why we can't wait for that discussion. I do support the idea in general, and I plan to be at TPAC and will advocate for it. I understand that sometimes we need to move ahead of the spec. If there's a reason not to wait a few extra weeks in this case, then please at least use a prefix. Cheers, Maciej On Sep 30, 2012, at 6:32 PM, Kentaro Hara hara...@chromium.org wrote: TL;DR: Would it be OK to implement constructors for DOM4 Events in WebKit without waiting for the spec? == Background == Events should have constructors. 'new XXXEvent()' is much easier than 'e = document.createEvent(...); e.initXXXEvent(_a_lot_of_arguments_)'. We have already implemented constructors for a bunch of Events such as Event, CustomEvent, ProgressEvent, etc [5]. However, we have not yet implemented constructors for DOM4 Events (i.e. UIEvent, MouseEvent, KeyboardEvent, WheelEvent, TextEvent, CompositionEvent) because they are not yet speced. Recently PointerEvent was speced with [Constructor] [2]. Considering that PointerEvent inherits MouseEvent, now we want to support [Constructor] on MouseEvent too. In terms of implementation, it is possible to implement [Constructor] on PointerEvent without implementing [Constructor] on MouseEvent. However, implementing [Constructor] on both PointerEvent and MouseEvent would be best. == Rationale for implementing constructors for DOM4 Events == I have been discussing this topic for one year, in www-dom@ [4] and a www.w3.org bug [3]. It looks like there is a consensus on introducing constructors for DOM4 Events. However, the spec is still a draft [1] and the www.w3.org bug [3] is marked as LATER. Last week I discussed the timeline of the spec with Jacob Rossi (a.k.a. a spec author of PointerEvent and DOM4 Events). According to him: - Their primary focus is on finishing DOM3 Events first. - With DOM3 Events in Candidate Recommendation, they are going to start working on the DOM4 Events. They will discuss it in TPAC. - They will introduce constructors to DOM4 Events. In summary, constructors for DOM4 Events are going to be speced, but it will take time. So I would like to implement them in WebKit a bit ahead of the spec (and thus implement PointerEvent constructors too). If you have any concern, please let me know. == References == [1] The spec draft by Jacob Rossi: http://html5labs.interoperabilitybridges.com/dom4events/ [2] The spec of Pointer Events: http://www.w3.org/Submission/pointer-events/ [3] www.w3.org bug: https://www.w3.org/Bugs/Public/show_bug.cgi?id=14051 [4] Discussion on www-dom@: http://lists.w3.org/Archives/Public/www-dom/2011OctDec/0081.html http://lists.w3.org/Archives/Public/www-dom/2012JanMar/0025.html [5] WebKit bug: https://bugs.webkit.org/show_bug.cgi?id=67824 -- Kentaro Hara, Tokyo, Japan (http://haraken.info) ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev -- Kentaro Hara, Tokyo, Japan (http://haraken.info) -- Kentaro Hara, Tokyo, Japan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread
Thanks everyone for your feedback. Detailed responses inline. On Wed, Jan 9, 2013 at 9:41 PM, Filip Pizlo fpi...@apple.com wrote: I think your biggest challenge will be ensuring that the latency of shoving things to another core and then shoving them back will be smaller than the latency of processing those same things on the main thread. Yes. That's something we know we have to worry about. Given that we need to retain the ability to parse HTML on the main thread to handle document.write and innerHTML, we should be able to easily do A/B comparisons to make sure we understand any performance trade-offs that might arise. For small documents, I expect concurrent tokenization to be a pure regression because the latency of waking up another thread to do just a small bit of work, plus the added cost of whatever synchronization operations will be needed to ensure safety, will involve more total work than just tokenizing locally. Once we have the ability to tokenize on a background thread, we can examine cases like these and heuristically decide whether to use the background thread or not at runtime. As I wrote above, we'll need these ability anyway, so keeping the ability to optimize these cases shouldn't add any new constraints to the design. We certainly see this in the JSC parallel GC, and in line with traditional parallel GC design, we ensure that parallel threads only kick in when the main thread is unable to keep up with the work that it has created for itself. Do you have a vision for how to implement a similar self-throttling, where tokenizing continues on the main thread so long as it is cheap to do so? It's certainly something we can tune in the optimization phase. I don't think we need a particular vision to be able to do it. Given that we want to implement speculative parsing (to replace preload scanning---more on this below), we'll already have the ability to checkpoint and restore the tokenizer state across threads. Once you have that primitive, it's easy to decide whether to continue tokenization on the main thread or on a background thread. On Wed, Jan 9, 2013 at 10:04 PM, Ian Hickson i...@hixie.ch wrote: Parsing and (maybe to a lesser extent) compiling JS can be moved off the main thread, though, right? That's probably worth examining too, if it hasn't already been done. Yes, once we have the tokenizer running on a background thread, that opens up the possibility of parsing other sorts of data on the background thread as well. For example, when the tokenizer encounters an inline script block, you could imagine parsing the script on the background thread as well so that the main thread has less work to do. (You could also imagine making the optimizations without a background tokenizer, but the design constraints would be a bit different.) On Thu, Jan 10, 2013 at 12:11 AM, Zoltan Herczeg zherc...@webkit.org wrote: Parsing, especially JS parsing still takes a large amount of time on page loading. We tried to improve the preload scanner by moving it into anouther thread, but there was no gain (except some special cases). Synchronization between threads is surprisingly (ridiculously) costly, usually worth for those tasks, which needs quite a few million instructions to be executed (and tokenization takes far less in most cases). For smaller tasks, SIMD instruction sets can help, which is basically a parallel execution on a single thread. Anyway it is worth trying, but it is really challenging to make it work in practice. Good luck! This is something we're worried about and will need to be careful about. In the design we're proposing, preload scanning is replaced by speculative parsing, so the overhead of the preload scanner is removed entirely. The way this works is a follows: When running on the background thread, the tokenizer produces a queue of PickledTokens. As these tokens are queued, we can scan them to kick off any preloads that we find. Whenever the tokenizer queues a token that creates a new insertion point (in the terminology of the HTML specification), the tokenizer checkpoints itself but continues tokenizing speculatively. (Notice that tokens produced in this situation are still scanned for preloads but might not ever actually result in DOM being constructed.) After the main thread has processed the token that created the insertion point, if no characters were inserted, the main thread continues processing PickledTokens that were created speculative. If some characters were inserted, the main thread instead instructs the tokenizer to roll back to that checkpoint and continue tokenizing in a new state. In this case, the queue of speculative tokens is discarded. Notice that in the common case, we're execute JavaScript and tokenize in parallel, something that's not possible with a main-thread tokenizer. Once the script is done executing, we expect it to be common to be able to result tree building immediately as the
Re: [webkit-dev] Constructors for DOM4 Events
+1 on the feature addition. Please use a feature define so vendors can decide to ship the new functionality at a time of their choosing. Cheers, Maciej On Jan 10, 2013, at 6:36 AM, Kentaro Hara hara...@chromium.org wrote: At TPAC there was no objection for DOM4 Event constructors (e.g. new MouseEvent()). Now DOM4 Event constructors are on editor's draft: http://html5labs.interoperabilitybridges.com/dom4events/ https://dvcs.w3.org/hg/d4e/raw-file/tip/source_respec.htm Given the above, I am planning to implement them in WebKit (without any flag). If you have any concern, please let me know. Best Regards On Mon, Oct 1, 2012 at 7:44 AM, Kentaro Hara hara...@chromium.org wrote: Since TPAC is less than a month away, I don't understand why we can't wait for that discussion. Sounds reasonable. I'll wait for TPAC. I do support the idea in general, and I plan to be at TPAC and will advocate for it. I'll be also going to TPAC. I would appreciate your support. On Mon, Oct 1, 2012 at 2:11 PM, Maciej Stachowiak m...@apple.com wrote: Since TPAC is less than a month away, I don't understand why we can't wait for that discussion. I do support the idea in general, and I plan to be at TPAC and will advocate for it. I understand that sometimes we need to move ahead of the spec. If there's a reason not to wait a few extra weeks in this case, then please at least use a prefix. Cheers, Maciej On Sep 30, 2012, at 6:32 PM, Kentaro Hara hara...@chromium.org wrote: TL;DR: Would it be OK to implement constructors for DOM4 Events in WebKit without waiting for the spec? == Background == Events should have constructors. 'new XXXEvent()' is much easier than 'e = document.createEvent(...); e.initXXXEvent(_a_lot_of_arguments_)'. We have already implemented constructors for a bunch of Events such as Event, CustomEvent, ProgressEvent, etc [5]. However, we have not yet implemented constructors for DOM4 Events (i.e. UIEvent, MouseEvent, KeyboardEvent, WheelEvent, TextEvent, CompositionEvent) because they are not yet speced. Recently PointerEvent was speced with [Constructor] [2]. Considering that PointerEvent inherits MouseEvent, now we want to support [Constructor] on MouseEvent too. In terms of implementation, it is possible to implement [Constructor] on PointerEvent without implementing [Constructor] on MouseEvent. However, implementing [Constructor] on both PointerEvent and MouseEvent would be best. == Rationale for implementing constructors for DOM4 Events == I have been discussing this topic for one year, in www-dom@ [4] and a www.w3.org bug [3]. It looks like there is a consensus on introducing constructors for DOM4 Events. However, the spec is still a draft [1] and the www.w3.org bug [3] is marked as LATER. Last week I discussed the timeline of the spec with Jacob Rossi (a.k.a. a spec author of PointerEvent and DOM4 Events). According to him: - Their primary focus is on finishing DOM3 Events first. - With DOM3 Events in Candidate Recommendation, they are going to start working on the DOM4 Events. They will discuss it in TPAC. - They will introduce constructors to DOM4 Events. In summary, constructors for DOM4 Events are going to be speced, but it will take time. So I would like to implement them in WebKit a bit ahead of the spec (and thus implement PointerEvent constructors too). If you have any concern, please let me know. == References == [1] The spec draft by Jacob Rossi: http://html5labs.interoperabilitybridges.com/dom4events/ [2] The spec of Pointer Events: http://www.w3.org/Submission/pointer-events/ [3] www.w3.org bug: https://www.w3.org/Bugs/Public/show_bug.cgi?id=14051 [4] Discussion on www-dom@: http://lists.w3.org/Archives/Public/www-dom/2011OctDec/0081.html http://lists.w3.org/Archives/Public/www-dom/2012JanMar/0025.html [5] WebKit bug: https://bugs.webkit.org/show_bug.cgi?id=67824 -- Kentaro Hara, Tokyo, Japan (http://haraken.info) ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev -- Kentaro Hara, Tokyo, Japan (http://haraken.info) -- Kentaro Hara, Tokyo, Japan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread
On Jan 10, 2013, at 12:07 PM, Adam Barth aba...@webkit.org wrote: On Thu, Jan 10, 2013 at 12:37 AM, Maciej Stachowiak m...@apple.com wrote: I presume from your other comments that the goal of this work is responsiveness, rather than page load speed as such. I'm excited about the potential to improve responsiveness during page loading. The goals are described in the first link Eric gave in his email: https://bugs.webkit.org/show_bug.cgi?id=106127#c0. Specifically: ---8--- 1) Moving parsing off the main thread could make web pages more responsive because the main thread is available for handling input events and executing JavaScript. 2) Moving parsing off the main thread could make web pages load more quickly because WebCore can do other work in parallel with parsing HTML (such as parsing CSS or attaching elements to the render tree). OK - what test (if any) will be used to test whether the page load speed goal is achieved? ---8--- One question: what tests are you planning to use to validate whether this approach achieves its goals of better responsiveness? The tests we've run so far are also described in the first link Eric gave in his email: https://bugs.webkit.org/show_bug.cgi?id=106127. They suggest that there's a good deal of room for improvement in this area. After we have a working implementation, we'll likely re-run those experiments and run other experiments to do an A/B comparison of the two approaches. As Filip points out, we'll likely end up with a hybrid of the two designs that's optimized for handling various work loads. I agree the test suggests there is room for improvement. From the description of how the test is run, I can think of two potential ways to improve how well it correlates with actual user-perceived responsiveness: (1) It seems to look at the max parsing pause time without considering whether there's any content being shown that it's possible to interact with. If the longest pauses happen before meaningful content is visible, then reducing those pauses is unlikely to actually materially improve responsiveness, at least in models where web content processing happens in a separate process or thread from the UI. One possibility is to track the max parsing pause time starting from the first visually non-empty layout. That would better approximate how much actual user interaction is blocked. (2) It might be helpful to track max and average pause time from non-parsing sources, for the sake of comparison. These might result in a more accurate assessment of the benfits. The reason I ask is that this sounds like a significant increase in complexity, so we should be very confident that there is a real and major benefit. One thing I wonder about is how common it is to have enough of the page processed that the user could interact with it in principle, yet still have large parsing chunks remaining which would prevent that interaction from being smooth. If you're interested in reducing the complexity of the parser, I'd recommend removing the NEW_XML code. As previously discussed, that code creates significant complexity for zero benefit. Tu quoque fallacy. From your glib reply, I get the impression that you are not giving the complexity cost of multithreading due consideration. I hope that is not actually the case and I merely caught you at a bad moment or something. (And also we agreed to a drop dead date to remove the code which has either passed or is very close.) Another thing I wonder about is whether yielding to the event loop more aggressively could achieve a similar benefit at a much lower complexity cost. Yielding to the event loop more could reduce the ParseHTML_max time, but it cannot reduce the ParseHTML time. Generally speaking, yielding to the event loop is a trade-off between throughput (i.e., page load time) and responsiveness. Moving work to a background thread should let us achieve a better trade-off between these quantities than we're likely to be able to achieve by tuning the yield parameter alone. I agree that is possible. But it also seems like making the improvements that don't impose the complexity and hazards of multithreading in this area are worth trying first. Things such as retuning yielding and replacing the preload scanner with (non-threaded) speculative pre-tokenizing as suggested by Antti. That would let us better assess the benefits of the threading itself. Having a test to drive the work would allow us to answer these types of questions. (It may also be that the test data you cited would already answer these questions but I didn't sufficiently understand it; if so, further explanation would be appreciated.) If you're interested in building such a test, I would be interested in hearing the results. We don't plan to build such a test at this time. If you're actually planning to make a significant complexity-imposing architectural change
[webkit-dev] commit-queue and JSC/WK2 specific changes
Hi all, As you might all know, the commit-queue uses chromium linux port. Consequently, any JavaScriptCore and WebKit2 specific changes (and any non-Chromium port specific changes) are never tested. Commit-queue doesn't even detect whether it builds or not. This is a source of confusion because many (new) contributors appear to mistakenly think that commit-queue ensures that the patch builds passes tests on all platforms yet commit-queue doesn't wait until EWS bots process a patch before landing it. As a result, I've seen quite a few people landing patches that break JSC/WK2 via commit-queue. My initial proposal was to make commit-queue wait until EWS bots catch up when landing a port specific or JSC/WK2 specific changes. However, Adam thinks that's a bad idea (webkit.org/b/74776) because EWS bots are only advisory and waiting for EWS bots slows things down. Is this a problem worth finding a solution? If so, do you have any suggestions? - R. Niwa ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] commit-queue and JSC/WK2 specific changes
The solution I'd recommend is to make the JavaScriptCore and/or WebKit2 bots faster. If those bots are able to complete their processing before the commit-queue, then they'll stop the patch from being committed by marking the patch commit-queue-. Adam On Thu, Jan 10, 2013 at 9:22 PM, Ryosuke Niwa rn...@webkit.org wrote: Hi all, As you might all know, the commit-queue uses chromium linux port. Consequently, any JavaScriptCore and WebKit2 specific changes (and any non-Chromium port specific changes) are never tested. Commit-queue doesn't even detect whether it builds or not. This is a source of confusion because many (new) contributors appear to mistakenly think that commit-queue ensures that the patch builds passes tests on all platforms yet commit-queue doesn't wait until EWS bots process a patch before landing it. As a result, I've seen quite a few people landing patches that break JSC/WK2 via commit-queue. My initial proposal was to make commit-queue wait until EWS bots catch up when landing a port specific or JSC/WK2 specific changes. However, Adam thinks that's a bad idea (webkit.org/b/74776) because EWS bots are only advisory and waiting for EWS bots slows things down. Is this a problem worth finding a solution? If so, do you have any suggestions? - R. Niwa ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread
Adam, Thanks for your detailed reply. Seems like you guys have a pretty good plan in place. I hope this works and produces a performance improvement. That being said this does look like a sufficiently complex work item that success is far from guaranteed. So to play devil's advocate, what is your plan for if this doesn't work out? I.e. are we talking about adding a bunch of threading support code in the optimistic hope that it makes things run fast, and then forgetting about it if it doesn't? Or are you prepared to roll put any complexity that got landed if this does not ultimately live up to promise? Or is this going to be one giant patch that only lands if it works? I'm also trying to understand what would happen during the interim when this work is incomplete, we have thread-related goop in some critical paths, and we don't yet know if the WIP code is ever going to result in a speedup. And also, what will happen sometime from now if that code is never successfully optimized to the point where it is worth enabling. I appreciate that this sort of question can be asked of any performance work but in this particular case my gut tells me that this is going to result in significantly more complexity than the usual incremental performance work. So it's good to understand what plan B is. Probably a good answer to this sort of question would address some fears that people may have. If this work does lead to a performance win then probably everyone will be happy. But if it doesn't then it would be great to have a plan of retreat. -Filip Dnia 10 sty 2013 o godz. 12:07 Adam Barth aba...@webkit.org napisaĆ(a): Thanks everyone for your feedback. Detailed responses inline. On Wed, Jan 9, 2013 at 9:41 PM, Filip Pizlo fpi...@apple.com wrote: I think your biggest challenge will be ensuring that the latency of shoving things to another core and then shoving them back will be smaller than the latency of processing those same things on the main thread. Yes. That's something we know we have to worry about. Given that we need to retain the ability to parse HTML on the main thread to handle document.write and innerHTML, we should be able to easily do A/B comparisons to make sure we understand any performance trade-offs that might arise. For small documents, I expect concurrent tokenization to be a pure regression because the latency of waking up another thread to do just a small bit of work, plus the added cost of whatever synchronization operations will be needed to ensure safety, will involve more total work than just tokenizing locally. Once we have the ability to tokenize on a background thread, we can examine cases like these and heuristically decide whether to use the background thread or not at runtime. As I wrote above, we'll need these ability anyway, so keeping the ability to optimize these cases shouldn't add any new constraints to the design. We certainly see this in the JSC parallel GC, and in line with traditional parallel GC design, we ensure that parallel threads only kick in when the main thread is unable to keep up with the work that it has created for itself. Do you have a vision for how to implement a similar self-throttling, where tokenizing continues on the main thread so long as it is cheap to do so? It's certainly something we can tune in the optimization phase. I don't think we need a particular vision to be able to do it. Given that we want to implement speculative parsing (to replace preload scanning---more on this below), we'll already have the ability to checkpoint and restore the tokenizer state across threads. Once you have that primitive, it's easy to decide whether to continue tokenization on the main thread or on a background thread. On Wed, Jan 9, 2013 at 10:04 PM, Ian Hickson i...@hixie.ch wrote: Parsing and (maybe to a lesser extent) compiling JS can be moved off the main thread, though, right? That's probably worth examining too, if it hasn't already been done. Yes, once we have the tokenizer running on a background thread, that opens up the possibility of parsing other sorts of data on the background thread as well. For example, when the tokenizer encounters an inline script block, you could imagine parsing the script on the background thread as well so that the main thread has less work to do. (You could also imagine making the optimizations without a background tokenizer, but the design constraints would be a bit different.) On Thu, Jan 10, 2013 at 12:11 AM, Zoltan Herczeg zherc...@webkit.org wrote: Parsing, especially JS parsing still takes a large amount of time on page loading. We tried to improve the preload scanner by moving it into anouther thread, but there was no gain (except some special cases). Synchronization between threads is surprisingly (ridiculously) costly, usually worth for those tasks, which needs quite a few million instructions to be executed