Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-11 Thread Adam Barth
On Thu, Jan 10, 2013 at 9:19 PM, Maciej Stachowiak m...@apple.com wrote:
 On Jan 10, 2013, at 12:07 PM, Adam Barth aba...@webkit.org wrote:
 On Thu, Jan 10, 2013 at 12:37 AM, Maciej Stachowiak m...@apple.com wrote:
 I presume from your other comments that the goal of this work is 
 responsiveness, rather than page load speed as such. I'm excited about the 
 potential to improve responsiveness during page loading.

 The goals are described in the first link Eric gave in his email:
 https://bugs.webkit.org/show_bug.cgi?id=106127#c0.  Specifically:

 ---8---
 1) Moving parsing off the main thread could make web pages more
 responsive because the main thread is available for handling input
 events and executing JavaScript.
 2) Moving parsing off the main thread could make web pages load more
 quickly because WebCore can do other work in parallel with parsing
 HTML (such as parsing CSS or attaching elements to the render tree).
 ---8---

 OK - what test (if any) will be used to test whether the page load speed goal 
 is achieved?

All of them.  :)

More seriously, Chromium runs a very large battery of performance
tests continuously on a matrix of different platforms, including
desktop and mobile.  You can see one of the overview dashboards here:

http://build.chromium.org/f/chromium/perf/dashboard/overview.html

The ones that are particularly relevant to this work are the various
page load tests, both with simulated network delays and without
network delays.  For iterative benchmarking, we plan to use Chromium's
Telemetry framework http://www.chromium.org/developers/telemetry.
Specifically, I expect we plan to work with the top_25 dataset
http://src.chromium.org/viewvc/chrome/trunk/src/tools/perf/page_sets/top_25.json?view=markup,
but we might use some other data sets if there are particular areas we
want to measure more carefully.

 One question: what tests are you planning to use to validate whether this 
 approach achieves its goals of better responsiveness?

 The tests we've run so far are also described in the first link Eric
 gave in his email: https://bugs.webkit.org/show_bug.cgi?id=106127.
 They suggest that there's a good deal of room for improvement in this
 area.  After we have a working implementation, we'll likely re-run
 those experiments and run other experiments to do an A/B comparison of
 the two approaches.  As Filip points out, we'll likely end up with a
 hybrid of the two designs that's optimized for handling various work
 loads.

 I agree the test suggests there is room for improvement. From the description 
 of how the test is run, I can think of two potential ways to improve how well 
 it correlates with actual user-perceived responsiveness:

 (1) It seems to look at the max parsing pause time without considering 
 whether there's any content being shown that it's possible to interact with. 
 If the longest pauses happen before meaningful content is visible, then 
 reducing those pauses is unlikely to actually materially improve 
 responsiveness, at least in models where web content processing happens in a 
 separate process or thread from the UI. One possibility is to track the max 
 parsing pause time starting from the first visually non-empty layout. That 
 would better approximate how much actual user interaction is blocked.

Consider, also, that pages might be parsing in the same process in
another tab, or in a frame in the current tab.

 (2) It might be helpful to track max and average pause time from non-parsing 
 sources, for the sake of comparison.

If you looked at the information Eric provided in his initial email,
you might have noticed
https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0,
which is precisely that.

 These might result in a more accurate assessment of the benfits.

 The reason I ask is that this sounds like a significant increase in 
 complexity, so we should be very confident that there is a real and major 
 benefit. One thing I wonder about is how common it is to have enough of the 
 page processed that the user could interact with it in principle, yet still 
 have large parsing chunks remaining which would prevent that interaction 
 from being smooth.

 If you're interested in reducing the complexity of the parser, I'd
 recommend removing the NEW_XML code.  As previously discussed, that
 code creates significant complexity for zero benefit.

 Tu quoque fallacy. From your glib reply, I get the impression that you are 
 not giving the complexity cost of multithreading due consideration. I hope 
 that is not actually the case and I merely caught you at a bad moment or 
 something.

I'm quite aware of the complexity of multithreaded code having written
a great deal of it for Chromium.

One of the things I hope comes out of this project is a good example
of how to do multithreaded processing in WebCore.  Currently, every
subsystem seems rolls their own threading abstractions, I think
largely because there hasn't been a 

Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-11 Thread Filip Pizlo

On Jan 11, 2013, at 12:21 AM, Adam Barth aba...@webkit.org wrote:

 On Thu, Jan 10, 2013 at 9:19 PM, Maciej Stachowiak m...@apple.com wrote:
 On Jan 10, 2013, at 12:07 PM, Adam Barth aba...@webkit.org wrote:
 On Thu, Jan 10, 2013 at 12:37 AM, Maciej Stachowiak m...@apple.com wrote:
 I presume from your other comments that the goal of this work is 
 responsiveness, rather than page load speed as such. I'm excited about the 
 potential to improve responsiveness during page loading.
 
 The goals are described in the first link Eric gave in his email:
 https://bugs.webkit.org/show_bug.cgi?id=106127#c0.  Specifically:
 
 ---8---
 1) Moving parsing off the main thread could make web pages more
 responsive because the main thread is available for handling input
 events and executing JavaScript.
 2) Moving parsing off the main thread could make web pages load more
 quickly because WebCore can do other work in parallel with parsing
 HTML (such as parsing CSS or attaching elements to the render tree).
 ---8---
 
 OK - what test (if any) will be used to test whether the page load speed 
 goal is achieved?
 
 All of them.  :)
 
 More seriously, Chromium runs a very large battery of performance
 tests continuously on a matrix of different platforms, including
 desktop and mobile.  You can see one of the overview dashboards here:
 
 http://build.chromium.org/f/chromium/perf/dashboard/overview.html
 
 The ones that are particularly relevant to this work are the various
 page load tests, both with simulated network delays and without
 network delays.  For iterative benchmarking, we plan to use Chromium's
 Telemetry framework http://www.chromium.org/developers/telemetry.
 Specifically, I expect we plan to work with the top_25 dataset
 http://src.chromium.org/viewvc/chrome/trunk/src/tools/perf/page_sets/top_25.json?view=markup,
 but we might use some other data sets if there are particular areas we
 want to measure more carefully.
 
 One question: what tests are you planning to use to validate whether this 
 approach achieves its goals of better responsiveness?
 
 The tests we've run so far are also described in the first link Eric
 gave in his email: https://bugs.webkit.org/show_bug.cgi?id=106127.
 They suggest that there's a good deal of room for improvement in this
 area.  After we have a working implementation, we'll likely re-run
 those experiments and run other experiments to do an A/B comparison of
 the two approaches.  As Filip points out, we'll likely end up with a
 hybrid of the two designs that's optimized for handling various work
 loads.
 
 I agree the test suggests there is room for improvement. From the 
 description of how the test is run, I can think of two potential ways to 
 improve how well it correlates with actual user-perceived responsiveness:
 
 (1) It seems to look at the max parsing pause time without considering 
 whether there's any content being shown that it's possible to interact with. 
 If the longest pauses happen before meaningful content is visible, then 
 reducing those pauses is unlikely to actually materially improve 
 responsiveness, at least in models where web content processing happens in a 
 separate process or thread from the UI. One possibility is to track the max 
 parsing pause time starting from the first visually non-empty layout. That 
 would better approximate how much actual user interaction is blocked.
 
 Consider, also, that pages might be parsing in the same process in
 another tab, or in a frame in the current tab.
 
 (2) It might be helpful to track max and average pause time from non-parsing 
 sources, for the sake of comparison.
 
 If you looked at the information Eric provided in his initial email,
 you might have noticed
 https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0,
 which is precisely that.
 
 These might result in a more accurate assessment of the benfits.
 
 The reason I ask is that this sounds like a significant increase in 
 complexity, so we should be very confident that there is a real and major 
 benefit. One thing I wonder about is how common it is to have enough of 
 the page processed that the user could interact with it in principle, yet 
 still have large parsing chunks remaining which would prevent that 
 interaction from being smooth.
 
 If you're interested in reducing the complexity of the parser, I'd
 recommend removing the NEW_XML code.  As previously discussed, that
 code creates significant complexity for zero benefit.
 
 Tu quoque fallacy. From your glib reply, I get the impression that you are 
 not giving the complexity cost of multithreading due consideration. I hope 
 that is not actually the case and I merely caught you at a bad moment or 
 something.
 
 I'm quite aware of the complexity of multithreaded code having written
 a great deal of it for Chromium.
 
 One of the things I hope comes out of this project is a good example
 of how to do multithreaded processing in WebCore.  

Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-11 Thread Maciej Stachowiak

Your comments here make me feel more positively towards this project. In 
particular, I'm happy that:
- There actually will be meaningful testing.
- You're prepared to abandon this approach if it doesn't meet its perf goals 
(presumably at minimum no regression to page load time or responsiveness while 
loading, and meaningful improvement to at least one of these).
- The shared-nothing message-passing approach to threading sounds likely to be 
a relatively less complex/fragile approach to threading than most others.

Thanks for following up. 

I have a comment on a tangential point that I'll split into another thread.

Cheers,
Maciej


On Jan 11, 2013, at 12:21 AM, Adam Barth aba...@webkit.org wrote:

 On Thu, Jan 10, 2013 at 9:19 PM, Maciej Stachowiak m...@apple.com wrote:
 On Jan 10, 2013, at 12:07 PM, Adam Barth aba...@webkit.org wrote:
 On Thu, Jan 10, 2013 at 12:37 AM, Maciej Stachowiak m...@apple.com wrote:
 I presume from your other comments that the goal of this work is 
 responsiveness, rather than page load speed as such. I'm excited about the 
 potential to improve responsiveness during page loading.
 
 The goals are described in the first link Eric gave in his email:
 https://bugs.webkit.org/show_bug.cgi?id=106127#c0.  Specifically:
 
 ---8---
 1) Moving parsing off the main thread could make web pages more
 responsive because the main thread is available for handling input
 events and executing JavaScript.
 2) Moving parsing off the main thread could make web pages load more
 quickly because WebCore can do other work in parallel with parsing
 HTML (such as parsing CSS or attaching elements to the render tree).
 ---8---
 
 OK - what test (if any) will be used to test whether the page load speed 
 goal is achieved?
 
 All of them.  :)
 
 More seriously, Chromium runs a very large battery of performance
 tests continuously on a matrix of different platforms, including
 desktop and mobile.  You can see one of the overview dashboards here:
 
 http://build.chromium.org/f/chromium/perf/dashboard/overview.html
 
 The ones that are particularly relevant to this work are the various
 page load tests, both with simulated network delays and without
 network delays.  For iterative benchmarking, we plan to use Chromium's
 Telemetry framework http://www.chromium.org/developers/telemetry.
 Specifically, I expect we plan to work with the top_25 dataset
 http://src.chromium.org/viewvc/chrome/trunk/src/tools/perf/page_sets/top_25.json?view=markup,
 but we might use some other data sets if there are particular areas we
 want to measure more carefully.
 
 One question: what tests are you planning to use to validate whether this 
 approach achieves its goals of better responsiveness?
 
 The tests we've run so far are also described in the first link Eric
 gave in his email: https://bugs.webkit.org/show_bug.cgi?id=106127.
 They suggest that there's a good deal of room for improvement in this
 area.  After we have a working implementation, we'll likely re-run
 those experiments and run other experiments to do an A/B comparison of
 the two approaches.  As Filip points out, we'll likely end up with a
 hybrid of the two designs that's optimized for handling various work
 loads.
 
 I agree the test suggests there is room for improvement. From the 
 description of how the test is run, I can think of two potential ways to 
 improve how well it correlates with actual user-perceived responsiveness:
 
 (1) It seems to look at the max parsing pause time without considering 
 whether there's any content being shown that it's possible to interact with. 
 If the longest pauses happen before meaningful content is visible, then 
 reducing those pauses is unlikely to actually materially improve 
 responsiveness, at least in models where web content processing happens in a 
 separate process or thread from the UI. One possibility is to track the max 
 parsing pause time starting from the first visually non-empty layout. That 
 would better approximate how much actual user interaction is blocked.
 
 Consider, also, that pages might be parsing in the same process in
 another tab, or in a frame in the current tab.
 
 (2) It might be helpful to track max and average pause time from non-parsing 
 sources, for the sake of comparison.
 
 If you looked at the information Eric provided in his initial email,
 you might have noticed
 https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0,
 which is precisely that.
 
 These might result in a more accurate assessment of the benfits.
 
 The reason I ask is that this sounds like a significant increase in 
 complexity, so we should be very confident that there is a real and major 
 benefit. One thing I wonder about is how common it is to have enough of 
 the page processed that the user could interact with it in principle, yet 
 still have large parsing chunks remaining which would prevent that 
 interaction from being smooth.
 
 If you're interested in 

Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-10 Thread Zoltan Herczeg
Parsing, especially JS parsing still takes a large amount of time on page
loading. We tried to improve the preload scanner by moving it into
anouther thread, but there was no gain (except some special cases).
Synchronization between threads is surprisingly (ridiculously) costly,
usually worth for those tasks, which needs quite a few million
instructions to be executed (and tokenization takes far less in most
cases). For smaller tasks, SIMD instruction sets can help, which is
basically a parallel execution on a single thread. Anyway it is worth
trying, but it is really challenging to make it work in practice. Good
luck!

Regards,
Zoltan

 On Jan 9, 2013, at 10:04 PM, Ian Hickson i...@hixie.ch wrote:

 On Wed, 9 Jan 2013, Eric Seidel wrote:

 The core goal is to reduce latency -- to free up the main thread for
 JavaScript and UI interaction -- which as you correctly note, cannot be
 moved off of the main thread due to the single thread of execution
 model of the web.

 Parsing and (maybe to a lesser extent) compiling JS can be moved off the
 main thread, though, right? That's probably worth examining too, if it
 hasn't already been done.

 100% agree.

 However, the same problem I brought up about tokenization applies here: a
 lot of JS functions are super cheap to parse and compile already, and the
 latency of doing so on the main thread is likely to be lower than the
 latency of chatting with another core.  I suspect this could be alleviated
 by (1) aggressively pipelining the work, where during page load or during
 heavy JS use the compilation thread always has a non-empty queue of work
 to do; this will mean that the latency of communication is paid only when
 the first compilation occurs, and (2) allowing the main thread to steal
 work from the compilation queue.  I'm not sure how to make (2) work well.
 For parsing it's actually harder since we rely heavily on the lazy parsing
 optimization: code is only parsed once we need it *right now* to run a
 function.  For compilation, it's somewhat easier: the most expensive
 compilation step is the third-tier optimizing JIT; we can delay this as
 long as we want, though the longer we dela
  y it, the longer we spend running slower code.

 Hence, to make parsing concurrent, the main problem is figuring out how to
 do predictive parsing: have a concurrent thread start parsing something
 just before we need it.  Without predictive parsing, making it concurrent
 would be a guaranteed loss since the main thread would just be stuck
 waiting for the thread to finish.

 To make optimized compiles concurrent without a regression, the main
 problem is ensuring that in those cases where we believe that the time
 taken to compile the function will be smaller than the time taken to awake
 the concurrent thread, we will instead just compile it on the main thread
 right away.  Though, if we could predict that a function was going to get
 hot in the future, we could speculatively tell a concurrent thread to
 compile it fully knowing that it won't wake up and do so until exactly
 when we would have otherwise invoked the compiler on the main thread (that
 is, it'll wake up and start compiling it once the main thread has executed
 the function enough times to get good profiling data).

 Anyway, you're absolutely right that this is an area that should be
 explored.

 -F



 --
 Ian Hickson   U+1047E)\._.,--,'``.fL
 http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
 Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo/webkit-dev

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo/webkit-dev



___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-10 Thread Zoltan Herczeg
https://bugs.webkit.org/show_bug.cgi?id=63531

The work was done by Zoltan Horvath and Balazs Kelemen.

Regards,
Zoltan

 Hi Zoltan,

 I would be curious how you did the synchronization.  I've had some luck
 reducing synchronization costs before.

 Was the patch ever uploaded anywhere?

 -F


 On Jan 10, 2013, at 12:11 AM, Zoltan Herczeg zherc...@webkit.org wrote:

 Parsing, especially JS parsing still takes a large amount of time on
 page
 loading. We tried to improve the preload scanner by moving it into
 anouther thread, but there was no gain (except some special cases).
 Synchronization between threads is surprisingly (ridiculously) costly,
 usually worth for those tasks, which needs quite a few million
 instructions to be executed (and tokenization takes far less in most
 cases). For smaller tasks, SIMD instruction sets can help, which is
 basically a parallel execution on a single thread. Anyway it is worth
 trying, but it is really challenging to make it work in practice. Good
 luck!

 Regards,
 Zoltan

 On Jan 9, 2013, at 10:04 PM, Ian Hickson i...@hixie.ch wrote:

 On Wed, 9 Jan 2013, Eric Seidel wrote:

 The core goal is to reduce latency -- to free up the main thread for
 JavaScript and UI interaction -- which as you correctly note, cannot
 be
 moved off of the main thread due to the single thread of execution
 model of the web.

 Parsing and (maybe to a lesser extent) compiling JS can be moved off
 the
 main thread, though, right? That's probably worth examining too, if it
 hasn't already been done.

 100% agree.

 However, the same problem I brought up about tokenization applies here:
 a
 lot of JS functions are super cheap to parse and compile already, and
 the
 latency of doing so on the main thread is likely to be lower than the
 latency of chatting with another core.  I suspect this could be
 alleviated
 by (1) aggressively pipelining the work, where during page load or
 during
 heavy JS use the compilation thread always has a non-empty queue of
 work
 to do; this will mean that the latency of communication is paid only
 when
 the first compilation occurs, and (2) allowing the main thread to steal
 work from the compilation queue.  I'm not sure how to make (2) work
 well.
 For parsing it's actually harder since we rely heavily on the lazy
 parsing
 optimization: code is only parsed once we need it *right now* to run a
 function.  For compilation, it's somewhat easier: the most expensive
 compilation step is the third-tier optimizing JIT; we can delay this as
 long as we want, though the longer we dela
 y it, the longer we spend running slower code.

 Hence, to make parsing concurrent, the main problem is figuring out how
 to
 do predictive parsing: have a concurrent thread start parsing something
 just before we need it.  Without predictive parsing, making it
 concurrent
 would be a guaranteed loss since the main thread would just be stuck
 waiting for the thread to finish.

 To make optimized compiles concurrent without a regression, the main
 problem is ensuring that in those cases where we believe that the time
 taken to compile the function will be smaller than the time taken to
 awake
 the concurrent thread, we will instead just compile it on the main
 thread
 right away.  Though, if we could predict that a function was going to
 get
 hot in the future, we could speculatively tell a concurrent thread to
 compile it fully knowing that it won't wake up and do so until exactly
 when we would have otherwise invoked the compiler on the main thread
 (that
 is, it'll wake up and start compiling it once the main thread has
 executed
 the function enough times to get good profiling data).

 Anyway, you're absolutely right that this is an area that should be
 explored.

 -F



 --
 Ian Hickson   U+1047E)\._.,--,'``.
 fL
 http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._
 ,.
 Things that are impossible just take longer.
 `._.-(,_..'--(,_..'`-.;.'
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo/webkit-dev

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo/webkit-dev



 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo/webkit-dev




___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-10 Thread Maciej Stachowiak

I presume from your other comments that the goal of this work is 
responsiveness, rather than page load speed as such. I'm excited about the 
potential to improve responsiveness during page loading.

One question: what tests are you planning to use to validate whether this 
approach achieves its goals of better responsiveness?

The reason I ask is that this sounds like a significant increase in complexity, 
so we should be very confident that there is a real and major benefit. One 
thing I wonder about is how common it is to have enough of the page processed 
that the user could interact with it in principle, yet still have large parsing 
chunks remaining which would prevent that interaction from being smooth. 
Another thing I wonder about is whether yielding to the event loop more 
aggressively could achieve a similar benefit at a much lower complexity cost. 

Having a test to drive the work would allow us to answer these types of 
questions. (It may also be that the test data you cited would already answer 
these questions but I didn't sufficiently understand it; if so, further 
explanation would be appreciated.)

Regards,
Maciej

On Jan 9, 2013, at 6:00 PM, Eric Seidel e...@webkit.org wrote:

 We're planning to move parts of the HTML Parser off of the main thread:
 https://bugs.webkit.org/show_bug.cgi?id=106127
 
 This is driven by our testing showing that HTML parsing on mobile is
 be slow, and long (causing user-visible delays averaging 10 frames /
 150ms).
 https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002
 Complete data can be found at [1].
 
 Mozilla moved their parser onto a separate thread during their HTML5
 parser re-write:
 https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading
 
 We plan to take a slightly simpler approach, moving only Tokenizing
 off of the main thread:
 https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit
 The left is our current design, the middle is a tokenizer-only design,
 and the right is more like mozilla's threaded-parser design.
 
 Profiling shows Tokenizing accounts for about 10x the number of
 samples as TreeBuilding.  Including Antti's recent testing (.5% vs.
 3%):
 https://bugs.webkit.org/show_bug.cgi?id=106127#c10
 If after we do this we measure and find ourselves still spending a lot
 of main-thread time parsing, we'll move the TreeBuilder too. :)  (This
 work is a nicely separable sub-set of larger work needed to move the
 TreeBuilder.)
 
 We welcome your thoughts and comments.
 
 
 1. 
 https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0
 (Epic thanks to Nat Duca for helping us collect that data.)
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-10 Thread Antti Koivisto
When loading web pages we are very frequently in a situation where we
already have the source data (HTML text here but the same applies to
preloaded Javascript, CSS, images, ...) and know we are likely to need it
in soon, but can't actually utilize it for indeterminate time. This happens
because pending external JS resources blocks the main parser (and pending
CSS resources block JS execution) for web compatibility reasons. In this
situation it makes sense to start processing resources we have to forms
that are faster to use when they are eventually actually needed (like token
stream here).

One thing we already do when the main parser gets blocked is preload
scanning. We look through the unparsed HTML source we have and trigger
loads for any resources found. It would be beneficial if this happened off
the main thread. We could do it when new data arrives in parallel with JS
execution and other time consuming engine work, potentially triggering
resource loads earlier.

I think a good first step here would be to share the tokens between the
preload scanner and the main parser and worry about the threading part
afterwards. We often parse the HTML source more or less twice so this is an
unquestionable win.


  antti


On Thu, Jan 10, 2013 at 7:41 AM, Filip Pizlo fpi...@apple.com wrote:

 I think your biggest challenge will be ensuring that the latency of
 shoving things to another core and then shoving them back will be smaller
 than the latency of processing those same things on the main thread.

 For small documents, I expect concurrent tokenization to be a pure
 regression because the latency of waking up another thread to do just a
 small bit of work, plus the added cost of whatever synchronization
 operations will be needed to ensure safety, will involve more total work
 than just tokenizing locally.

 We certainly see this in the JSC parallel GC, and in line with traditional
 parallel GC design, we ensure that parallel threads only kick in when the
 main thread is unable to keep up with the work that it has created for
 itself.

 Do you have a vision for how to implement a similar self-throttling, where
 tokenizing continues on the main thread so long as it is cheap to do so?

 -Filip


 On Jan 9, 2013, at 6:00 PM, Eric Seidel e...@webkit.org wrote:

  We're planning to move parts of the HTML Parser off of the main thread:
  https://bugs.webkit.org/show_bug.cgi?id=106127
 
  This is driven by our testing showing that HTML parsing on mobile is
  be slow, and long (causing user-visible delays averaging 10 frames /
  150ms).
  https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002
  Complete data can be found at [1].
 
  Mozilla moved their parser onto a separate thread during their HTML5
  parser re-write:
 
 https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading
 
  We plan to take a slightly simpler approach, moving only Tokenizing
  off of the main thread:
 
 https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit
  The left is our current design, the middle is a tokenizer-only design,
  and the right is more like mozilla's threaded-parser design.
 
  Profiling shows Tokenizing accounts for about 10x the number of
  samples as TreeBuilding.  Including Antti's recent testing (.5% vs.
  3%):
  https://bugs.webkit.org/show_bug.cgi?id=106127#c10
  If after we do this we measure and find ourselves still spending a lot
  of main-thread time parsing, we'll move the TreeBuilder too. :)  (This
  work is a nicely separable sub-set of larger work needed to move the
  TreeBuilder.)
 
  We welcome your thoughts and comments.
 
 
  1.
 https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0
  (Epic thanks to Nat Duca for helping us collect that data.)
  ___
  webkit-dev mailing list
  webkit-dev@lists.webkit.org
  http://lists.webkit.org/mailman/listinfo/webkit-dev

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-10 Thread Nat Duca
The data Eric and Adam were using comes from a python library a few of us
have been developing called telemetry. Its basically a bunch of python
that lets us write performance tests against any browser that speaks the
inspector websocket protocol. We're using it a lot of should we
parallelize X questions, as well as regression-style have our changes to
X stayed a win over time?

They might have other ways in mind to obtain this data that is more
webkit-y, but I figure a bit on how we got this far might be useful for
this mailing list.

Roughly, telemetry scripts connect up to a host and port where you've
arranged to have an inspector websocket listening, e.g. $MY_PHONE_IP:9222,
or google-chrome --remote-debugging-port=9222  telemetry
--browser=$LOCALHOST:9222. Once that's established, we have communication
with WebCore's InspectorAgent, and assuming we trust the agent, can do some
pretty powerful stuff from there.

The benchmark being discussed here [webkit_benchmark] navigates the browser
from page to page, enabling inspector's TimelineAgent as it does in order
to get performance data about the page load. We then postprocess that data
stream into a human consumable csv and there is [some amount] of rejoicing.
Assuming we trust inspector timeline [Pavel's done a number of fixes to
help us trust it more!] this gets pretty clean results, pretty easily.


A key challenge with telemetry has been getting stable runs on real world
sites. The archive.org technqiues are cool, but they dont capture some of
the big ones, like a logged-in gmail account. We've addressed this using
tonyg and simonjam's http://code.google.com/p/web-page-replay/. If the
browser under test supports web page replay [~= redirecting dns requests to
the replay server instead of the real site], then you can get stable,
repeatable runs against super complex real world sites --- its worked on
every site we've tried so far.


The core telemetry framework is here:
http://src.chromium.org/chrome/trunk/src/tools/telemetry/

Its in chromium repo, but please dont hold that against it --- its movable,
given interest.

The actual webkit benchmark is pretty simple, because most of the
functionality comes from telemetry:
https://codereview.chromium.org/11791043/


With the patch above landed, obtaining the benchmarking results that Eric
got against chrome should be ~= getting a telemetry checkout and doing:
./run_multipage_benchmarks --browser=canary
webkit_benchmark page_sets/top_25.json

Or if you had an android with chrome on it:
./run_multipage_benchmarks --browser=android-chrome
webkit_benchmark page_sets/top_25.json


Anyway, I'll leave it to Eric/Adam to speak to how this maps back into the
WebKit ecosystem. The use of inspector protocol makes it a theoretical
possibility on other ports, but I know some people get nervous (or run away
angrily!) when they hear that we're using Inspector as a perf data source.
 :)


- Nat


On Thu, Jan 10, 2013 at 1:44 AM, Antti Koivisto koivi...@iki.fi wrote:

 When loading web pages we are very frequently in a situation where we
 already have the source data (HTML text here but the same applies to
 preloaded Javascript, CSS, images, ...) and know we are likely to need it
 in soon, but can't actually utilize it for indeterminate time. This happens
 because pending external JS resources blocks the main parser (and pending
 CSS resources block JS execution) for web compatibility reasons. In this
 situation it makes sense to start processing resources we have to forms
 that are faster to use when they are eventually actually needed (like token
 stream here).

 One thing we already do when the main parser gets blocked is preload
 scanning. We look through the unparsed HTML source we have and trigger
 loads for any resources found. It would be beneficial if this happened off
 the main thread. We could do it when new data arrives in parallel with JS
 execution and other time consuming engine work, potentially triggering
 resource loads earlier.

 I think a good first step here would be to share the tokens between the
 preload scanner and the main parser and worry about the threading part
 afterwards. We often parse the HTML source more or less twice so this is an
 unquestionable win.


   antti


 On Thu, Jan 10, 2013 at 7:41 AM, Filip Pizlo fpi...@apple.com wrote:

 I think your biggest challenge will be ensuring that the latency of
 shoving things to another core and then shoving them back will be smaller
 than the latency of processing those same things on the main thread.

 For small documents, I expect concurrent tokenization to be a pure
 regression because the latency of waking up another thread to do just a
 small bit of work, plus the added cost of whatever synchronization
 operations will be needed to ensure safety, will involve more total work
 than just tokenizing locally.

 We certainly see this in the JSC parallel GC, and in line with
 traditional parallel GC design, we ensure that parallel threads only 

Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-10 Thread Tom Hudson
On Thu, Jan 10, 2013 at 8:37 AM, Maciej Stachowiak m...@apple.com wrote:

 The reason I ask is that this sounds like a significant increase in
 complexity, so we should be very confident that there is a real and major
 benefit. One thing I wonder about is how common it is to have enough of the
 page processed that the user could interact with it in principle, yet still
 have large parsing chunks remaining which would prevent that interaction
 from being smooth. Another thing I wonder about is whether yielding to the
 event loop more aggressively could achieve a similar benefit at a much
 lower complexity cost.


I don't want to let this point of Maciej's slip away: on mobile we may have
fewer cores than desktop, and we're paying a pretty high complexity burden
for multiple threads already; some of Nat's awesome recent work in Chromium
is too multithreaded for my comfort. I'd back-of-enveloped yielding during
page layout and guessed it wasn't worthwhile, but do we know that yielding
during parsing isn't?

Tom
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-10 Thread Adam Barth
Thanks everyone for your feedback.  Detailed responses inline.

On Wed, Jan 9, 2013 at 9:41 PM, Filip Pizlo fpi...@apple.com wrote:
 I think your biggest challenge will be ensuring that the latency of shoving 
 things to another core and then shoving them back will be smaller than the 
 latency of processing those same things on the main thread.

Yes.  That's something we know we have to worry about.  Given that we
need to retain the ability to parse HTML on the main thread to handle
document.write and innerHTML, we should be able to easily do A/B
comparisons to make sure we understand any performance trade-offs that
might arise.

 For small documents, I expect concurrent tokenization to be a pure regression 
 because the latency of waking up another thread to do just a small bit of 
 work, plus the added cost of whatever synchronization operations will be 
 needed to ensure safety, will involve more total work than just tokenizing 
 locally.

Once we have the ability to tokenize on a background thread, we can
examine cases like these and heuristically decide whether to use the
background thread or not at runtime.  As I wrote above, we'll need
these ability anyway, so keeping the ability to optimize these cases
shouldn't add any new constraints to the design.

 We certainly see this in the JSC parallel GC, and in line with traditional 
 parallel GC design, we ensure that parallel threads only kick in when the 
 main thread is unable to keep up with the work that it has created for itself.

 Do you have a vision for how to implement a similar self-throttling, where 
 tokenizing continues on the main thread so long as it is cheap to do so?

It's certainly something we can tune in the optimization phase.  I
don't think we need a particular vision to be able to do it.  Given
that we want to implement speculative parsing (to replace preload
scanning---more on this below), we'll already have the ability to
checkpoint and restore the tokenizer state across threads.  Once you
have that primitive, it's easy to decide whether to continue
tokenization on the main thread or on a background thread.

On Wed, Jan 9, 2013 at 10:04 PM, Ian Hickson i...@hixie.ch wrote:
 Parsing and (maybe to a lesser extent) compiling JS can be moved off the
 main thread, though, right? That's probably worth examining too, if it
 hasn't already been done.

Yes, once we have the tokenizer running on a background thread, that
opens up the possibility of parsing other sorts of data on the
background thread as well.  For example, when the tokenizer encounters
an inline script block, you could imagine parsing the script on the
background thread as well so that the main thread has less work to do.
 (You could also imagine making the optimizations without a background
tokenizer, but the design constraints would be a bit different.)

On Thu, Jan 10, 2013 at 12:11 AM, Zoltan Herczeg zherc...@webkit.org wrote:
 Parsing, especially JS parsing still takes a large amount of time on page
 loading. We tried to improve the preload scanner by moving it into
 anouther thread, but there was no gain (except some special cases).
 Synchronization between threads is surprisingly (ridiculously) costly,
 usually worth for those tasks, which needs quite a few million
 instructions to be executed (and tokenization takes far less in most
 cases). For smaller tasks, SIMD instruction sets can help, which is
 basically a parallel execution on a single thread. Anyway it is worth
 trying, but it is really challenging to make it work in practice. Good
 luck!

This is something we're worried about and will need to be careful
about.  In the design we're proposing, preload scanning is replaced by
speculative parsing, so the overhead of the preload scanner is removed
entirely.  The way this works is a follows:

When running on the background thread, the tokenizer produces a queue
of PickledTokens.  As these tokens are queued, we can scan them to
kick off any preloads that we find.  Whenever the tokenizer queues a
token that creates a new insertion point (in the terminology of the
HTML specification), the tokenizer checkpoints itself but continues
tokenizing speculatively.  (Notice that tokens produced in this
situation are still scanned for preloads but might not ever actually
result in DOM being constructed.)

After the main thread has processed the token that created the
insertion point, if no characters were inserted, the main thread
continues processing PickledTokens that were created speculative.  If
some characters were inserted, the main thread instead instructs the
tokenizer to roll back to that checkpoint and continue tokenizing in a
new state.  In this case, the queue of speculative tokens is
discarded.

Notice that in the common case, we're execute JavaScript and tokenize
in parallel, something that's not possible with a main-thread
tokenizer.  Once the script is done executing, we expect it to be
common to be able to result tree building immediately as the 

Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-10 Thread Maciej Stachowiak

On Jan 10, 2013, at 12:07 PM, Adam Barth aba...@webkit.org wrote:

 
 On Thu, Jan 10, 2013 at 12:37 AM, Maciej Stachowiak m...@apple.com wrote:
 I presume from your other comments that the goal of this work is 
 responsiveness, rather than page load speed as such. I'm excited about the 
 potential to improve responsiveness during page loading.
 
 The goals are described in the first link Eric gave in his email:
 https://bugs.webkit.org/show_bug.cgi?id=106127#c0.  Specifically:
 
 ---8---
 1) Moving parsing off the main thread could make web pages more
 responsive because the main thread is available for handling input
 events and executing JavaScript.
 2) Moving parsing off the main thread could make web pages load more
 quickly because WebCore can do other work in parallel with parsing
 HTML (such as parsing CSS or attaching elements to the render tree).

OK - what test (if any) will be used to test whether the page load speed goal 
is achieved?

 ---8---
 
 One question: what tests are you planning to use to validate whether this 
 approach achieves its goals of better responsiveness?
 
 The tests we've run so far are also described in the first link Eric
 gave in his email: https://bugs.webkit.org/show_bug.cgi?id=106127.
 They suggest that there's a good deal of room for improvement in this
 area.  After we have a working implementation, we'll likely re-run
 those experiments and run other experiments to do an A/B comparison of
 the two approaches.  As Filip points out, we'll likely end up with a
 hybrid of the two designs that's optimized for handling various work
 loads.

I agree the test suggests there is room for improvement. From the description 
of how the test is run, I can think of two potential ways to improve how well 
it correlates with actual user-perceived responsiveness:

(1) It seems to look at the max parsing pause time without considering whether 
there's any content being shown that it's possible to interact with. If the 
longest pauses happen before meaningful content is visible, then reducing those 
pauses is unlikely to actually materially improve responsiveness, at least in 
models where web content processing happens in a separate process or thread 
from the UI. One possibility is to track the max parsing pause time starting 
from the first visually non-empty layout. That would better approximate how 
much actual user interaction is blocked.

(2) It might be helpful to track max and average pause time from non-parsing 
sources, for the sake of comparison.

These might result in a more accurate assessment of the benfits.

 
 The reason I ask is that this sounds like a significant increase in 
 complexity, so we should be very confident that there is a real and major 
 benefit. One thing I wonder about is how common it is to have enough of the 
 page processed that the user could interact with it in principle, yet still 
 have large parsing chunks remaining which would prevent that interaction 
 from being smooth.
 
 If you're interested in reducing the complexity of the parser, I'd
 recommend removing the NEW_XML code.  As previously discussed, that
 code creates significant complexity for zero benefit.

Tu quoque fallacy. From your glib reply, I get the impression that you are not 
giving the complexity cost of multithreading due consideration. I hope that is 
not actually the case and I merely caught you at a bad moment or something.

(And also we agreed to a drop dead date to remove the code which has either 
passed or is very close.)


 
 Another thing I wonder about is whether yielding to the event loop more 
 aggressively could achieve a similar benefit at a much lower complexity cost.
 
 Yielding to the event loop more could reduce the ParseHTML_max time,
 but it cannot reduce the ParseHTML time.  Generally speaking,
 yielding to the event loop is a trade-off between throughput (i.e.,
 page load time) and responsiveness.  Moving work to a background
 thread should let us achieve a better trade-off between these
 quantities than we're likely to be able to achieve by tuning the yield
 parameter alone.

I agree that is possible. But it also seems like making the improvements that 
don't impose the complexity and hazards of multithreading in this area are 
worth trying first. Things such as retuning yielding and replacing the preload 
scanner with (non-threaded) speculative pre-tokenizing as suggested by Antti. 
That would let us better assess the benefits of the threading itself.

 
 Having a test to drive the work would allow us to answer these types of 
 questions. (It may also be that the test data you cited would already answer 
 these questions but I didn't sufficiently understand it; if so, further 
 explanation would be appreciated.)
 
 If you're interested in building such a test, I would be interested in
 hearing the results.  We don't plan to build such a test at this time.

If you're actually planning to make a significant complexity-imposing 
architectural change 

Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-10 Thread Filip Pizlo
Adam,

Thanks for your detailed reply. Seems like you guys have a pretty good plan in 
place. 

I hope this works and produces a performance improvement. That being said this 
does look like a sufficiently complex work item that success is far from 
guaranteed. So to play devil's advocate, what is your plan for if this doesn't 
work out?

I.e. are we talking about adding a bunch of threading support code in the 
optimistic hope that it makes things run fast, and then forgetting about it if 
it doesn't?  Or are you prepared to roll put any complexity that got landed if 
this does not ultimately live up to promise?  Or is this going to be one giant 
patch that only lands if it works?

I'm also trying to understand what would happen during the interim when this 
work is incomplete, we have thread-related goop in some critical paths, and we 
don't yet know if the WIP code is ever going to result in a speedup. And also, 
what will happen sometime from now if that code is never successfully optimized 
to the point where it is worth enabling. 

I appreciate that this sort of question can be asked of any performance work 
but in this particular case my gut tells me that this is going to result in 
significantly more complexity than the usual incremental performance work. So 
it's good to understand what plan B is. 

Probably a good answer to this sort of question would address some fears that 
people may have. If this work does lead to a performance win then probably 
everyone will be happy. But if it doesn't then it would be great to have a 
plan of retreat. 

-Filip

Dnia 10 sty 2013 o godz. 12:07 Adam Barth aba...@webkit.org napisaƂ(a):

 Thanks everyone for your feedback.  Detailed responses inline.
 
 On Wed, Jan 9, 2013 at 9:41 PM, Filip Pizlo fpi...@apple.com wrote:
 I think your biggest challenge will be ensuring that the latency of shoving 
 things to another core and then shoving them back will be smaller than the 
 latency of processing those same things on the main thread.
 
 Yes.  That's something we know we have to worry about.  Given that we
 need to retain the ability to parse HTML on the main thread to handle
 document.write and innerHTML, we should be able to easily do A/B
 comparisons to make sure we understand any performance trade-offs that
 might arise.
 
 For small documents, I expect concurrent tokenization to be a pure 
 regression because the latency of waking up another thread to do just a 
 small bit of work, plus the added cost of whatever synchronization 
 operations will be needed to ensure safety, will involve more total work 
 than just tokenizing locally.
 
 Once we have the ability to tokenize on a background thread, we can
 examine cases like these and heuristically decide whether to use the
 background thread or not at runtime.  As I wrote above, we'll need
 these ability anyway, so keeping the ability to optimize these cases
 shouldn't add any new constraints to the design.
 
 We certainly see this in the JSC parallel GC, and in line with traditional 
 parallel GC design, we ensure that parallel threads only kick in when the 
 main thread is unable to keep up with the work that it has created for 
 itself.
 
 Do you have a vision for how to implement a similar self-throttling, where 
 tokenizing continues on the main thread so long as it is cheap to do so?
 
 It's certainly something we can tune in the optimization phase.  I
 don't think we need a particular vision to be able to do it.  Given
 that we want to implement speculative parsing (to replace preload
 scanning---more on this below), we'll already have the ability to
 checkpoint and restore the tokenizer state across threads.  Once you
 have that primitive, it's easy to decide whether to continue
 tokenization on the main thread or on a background thread.
 
 On Wed, Jan 9, 2013 at 10:04 PM, Ian Hickson i...@hixie.ch wrote:
 Parsing and (maybe to a lesser extent) compiling JS can be moved off the
 main thread, though, right? That's probably worth examining too, if it
 hasn't already been done.
 
 Yes, once we have the tokenizer running on a background thread, that
 opens up the possibility of parsing other sorts of data on the
 background thread as well.  For example, when the tokenizer encounters
 an inline script block, you could imagine parsing the script on the
 background thread as well so that the main thread has less work to do.
 (You could also imagine making the optimizations without a background
 tokenizer, but the design constraints would be a bit different.)
 
 On Thu, Jan 10, 2013 at 12:11 AM, Zoltan Herczeg zherc...@webkit.org wrote:
 Parsing, especially JS parsing still takes a large amount of time on page
 loading. We tried to improve the preload scanner by moving it into
 anouther thread, but there was no gain (except some special cases).
 Synchronization between threads is surprisingly (ridiculously) costly,
 usually worth for those tasks, which needs quite a few million
 instructions to be executed 

[webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-09 Thread Eric Seidel
We're planning to move parts of the HTML Parser off of the main thread:
https://bugs.webkit.org/show_bug.cgi?id=106127

This is driven by our testing showing that HTML parsing on mobile is
be slow, and long (causing user-visible delays averaging 10 frames /
150ms).
https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002
Complete data can be found at [1].

Mozilla moved their parser onto a separate thread during their HTML5
parser re-write:
https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading

We plan to take a slightly simpler approach, moving only Tokenizing
off of the main thread:
https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit
The left is our current design, the middle is a tokenizer-only design,
and the right is more like mozilla's threaded-parser design.

Profiling shows Tokenizing accounts for about 10x the number of
samples as TreeBuilding.  Including Antti's recent testing (.5% vs.
3%):
https://bugs.webkit.org/show_bug.cgi?id=106127#c10
If after we do this we measure and find ourselves still spending a lot
of main-thread time parsing, we'll move the TreeBuilder too. :)  (This
work is a nicely separable sub-set of larger work needed to move the
TreeBuilder.)

We welcome your thoughts and comments.


1. 
https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0
(Epic thanks to Nat Duca for helping us collect that data.)
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-09 Thread Adam Barth
On Wed, Jan 9, 2013 at 6:00 PM, Eric Seidel e...@webkit.org wrote:
 We're planning to move parts of the HTML Parser off of the main thread:
 https://bugs.webkit.org/show_bug.cgi?id=106127

 This is driven by our testing showing that HTML parsing on mobile is
 be slow, and long (causing user-visible delays averaging 10 frames /
 150ms).
 https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002
 Complete data can be found at [1].

In case it's not clear from that link, the ParseHTML column is the
total amount of time the web inspector attributes to HTML parsing when
loading those URLs on a Nexus 7 using a top-of-tree build of
Chromium's content_shell (similar to WebKitTestRunner).

The HTML parser parses data a chunk at a time, which means the total
time doesn't tell the whole story.  The ParseHTML_max column shows
the largest single block of time spent in the HTML parser, which is
more of a measure of the main thread jank caused by the parser.

Antti has pointed out that the inspector isn't the best source of
data.  He measured total time using instruments, and got numbers that
are consistent (within a factor of 2) of the inspector measurements.
(We were using different data sets, so we wouldn't expect perfect
agreement even if we were measuring precisely the same thing.)

Adam


 Mozilla moved their parser onto a separate thread during their HTML5
 parser re-write:
 https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading

 We plan to take a slightly simpler approach, moving only Tokenizing
 off of the main thread:
 https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit
 The left is our current design, the middle is a tokenizer-only design,
 and the right is more like mozilla's threaded-parser design.

 Profiling shows Tokenizing accounts for about 10x the number of
 samples as TreeBuilding.  Including Antti's recent testing (.5% vs.
 3%):
 https://bugs.webkit.org/show_bug.cgi?id=106127#c10
 If after we do this we measure and find ourselves still spending a lot
 of main-thread time parsing, we'll move the TreeBuilder too. :)  (This
 work is a nicely separable sub-set of larger work needed to move the
 TreeBuilder.)

 We welcome your thoughts and comments.


 1. 
 https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0
 (Epic thanks to Nat Duca for helping us collect that data.)
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-09 Thread Oliver Hunt
How will we ensure thread safety?  Even at just the tokenizing level don't we 
use AtomicString?  AtromicString isn't threadsafe wrt StringImpl IIRC so this 
seems like it sould add a world of hurt.

I realise it's been a long time since I've worked on this so it's completely 
possible that I'm not aware of the current behaviour.

That aside I question what the benefit of this will be.  All those cases where 
we've started parsing html are intrinsically tied to the web's general single 
thread of execution model, which implies that even if we do push parsing into 
a separate thread we'll just end up with the ui thread blocked on the parsing 
thread which doesn't seem hugely superior.

What is the objective here? To improve performance, add parallelism, or reduce 
latency?

--Oliver

On Jan 9, 2013, at 6:10 PM, Adam Barth aba...@webkit.org wrote:

 On Wed, Jan 9, 2013 at 6:00 PM, Eric Seidel e...@webkit.org wrote:
 We're planning to move parts of the HTML Parser off of the main thread:
 https://bugs.webkit.org/show_bug.cgi?id=106127
 
 This is driven by our testing showing that HTML parsing on mobile is
 be slow, and long (causing user-visible delays averaging 10 frames /
 150ms).
 https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002
 Complete data can be found at [1].
 
 In case it's not clear from that link, the ParseHTML column is the
 total amount of time the web inspector attributes to HTML parsing when
 loading those URLs on a Nexus 7 using a top-of-tree build of
 Chromium's content_shell (similar to WebKitTestRunner).
 
 The HTML parser parses data a chunk at a time, which means the total
 time doesn't tell the whole story.  The ParseHTML_max column shows
 the largest single block of time spent in the HTML parser, which is
 more of a measure of the main thread jank caused by the parser.
 
 Antti has pointed out that the inspector isn't the best source of
 data.  He measured total time using instruments, and got numbers that
 are consistent (within a factor of 2) of the inspector measurements.
 (We were using different data sets, so we wouldn't expect perfect
 agreement even if we were measuring precisely the same thing.)
 
 Adam
 
 
 Mozilla moved their parser onto a separate thread during their HTML5
 parser re-write:
 https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading
 
 We plan to take a slightly simpler approach, moving only Tokenizing
 off of the main thread:
 https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit
 The left is our current design, the middle is a tokenizer-only design,
 and the right is more like mozilla's threaded-parser design.
 
 Profiling shows Tokenizing accounts for about 10x the number of
 samples as TreeBuilding.  Including Antti's recent testing (.5% vs.
 3%):
 https://bugs.webkit.org/show_bug.cgi?id=106127#c10
 If after we do this we measure and find ourselves still spending a lot
 of main-thread time parsing, we'll move the TreeBuilder too. :)  (This
 work is a nicely separable sub-set of larger work needed to move the
 TreeBuilder.)
 
 We welcome your thoughts and comments.
 
 
 1. 
 https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0
 (Epic thanks to Nat Duca for helping us collect that data.)
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-09 Thread Eric Seidel
On Wed, Jan 9, 2013 at 6:38 PM, Oliver Hunt oli...@apple.com wrote:
 How will we ensure thread safety?  Even at just the tokenizing level don't we 
 use AtomicString?  AtromicString isn't threadsafe wrt StringImpl IIRC so this 
 seems like it sould add a world of hurt.

AtomicString is already usable from other threads
(http://trac.webkit.org/changeset/38094), but are correct this is the
core concern!  PickledToken (or whatever it's called) will have to be
written very carefully in order to minimize/eliminate copies, while
still guaranteeing thread safety.  The correct design and handling of
PickledToken is the entire question of this whole endeavor.

 I realise it's been a long time since I've worked on this so it's completely 
 possible that I'm not aware of the current behavior.

 That aside I question what the benefit of this will be.  All those cases 
 where we've started parsing html are intrinsically tied to the web's general 
 single thread of execution model, which implies that even if we do push 
 parsing into a separate thread we'll just end up with the ui thread blocked 
 on the parsing thread which doesn't seem hugely superior.

 What is the objective here? To improve performance, add parallelism, or 
 reduce latency?

The core goal is to reduce latency -- to free up the main thread for
JavaScript and UI interaction -- which as you correctly note, cannot
be moved off of the main thread due to the single thread of
execution model of the web.

One could view the pre-load scanner as a lay-man's attempt at this
type of tokenize asynchronously approach.  This model gets preload
scanning for free, as well as can easily answer wkb.ug/90751 request
to speculative tokenizing of the entire document.  (We just have to
save markers before every script token, as if the script uses
document.write, any tokens after /script become invalid.)

I should also note that not all HTML parsing can be moved off of the
main thread.  innerHTML for example, would still be done entirely on
the main thread.  I would imagine that when we were to land this on
trunk it would be behind a feature flag and ports could opt-in to the
threaded-parsing path, as we must maintain the main-thread parsing
ability for innerHTML anyway.

 --Oliver

 On Jan 9, 2013, at 6:10 PM, Adam Barth aba...@webkit.org wrote:

 On Wed, Jan 9, 2013 at 6:00 PM, Eric Seidel e...@webkit.org wrote:
 We're planning to move parts of the HTML Parser off of the main thread:
 https://bugs.webkit.org/show_bug.cgi?id=106127

 This is driven by our testing showing that HTML parsing on mobile is
 be slow, and long (causing user-visible delays averaging 10 frames /
 150ms).
 https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002
 Complete data can be found at [1].

 In case it's not clear from that link, the ParseHTML column is the
 total amount of time the web inspector attributes to HTML parsing when
 loading those URLs on a Nexus 7 using a top-of-tree build of
 Chromium's content_shell (similar to WebKitTestRunner).

 The HTML parser parses data a chunk at a time, which means the total
 time doesn't tell the whole story.  The ParseHTML_max column shows
 the largest single block of time spent in the HTML parser, which is
 more of a measure of the main thread jank caused by the parser.

 Antti has pointed out that the inspector isn't the best source of
 data.  He measured total time using instruments, and got numbers that
 are consistent (within a factor of 2) of the inspector measurements.
 (We were using different data sets, so we wouldn't expect perfect
 agreement even if we were measuring precisely the same thing.)

 Adam


 Mozilla moved their parser onto a separate thread during their HTML5
 parser re-write:
 https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading

 We plan to take a slightly simpler approach, moving only Tokenizing
 off of the main thread:
 https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit
 The left is our current design, the middle is a tokenizer-only design,
 and the right is more like mozilla's threaded-parser design.

 Profiling shows Tokenizing accounts for about 10x the number of
 samples as TreeBuilding.  Including Antti's recent testing (.5% vs.
 3%):
 https://bugs.webkit.org/show_bug.cgi?id=106127#c10
 If after we do this we measure and find ourselves still spending a lot
 of main-thread time parsing, we'll move the TreeBuilder too. :)  (This
 work is a nicely separable sub-set of larger work needed to move the
 TreeBuilder.)

 We welcome your thoughts and comments.


 1. 
 https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0
 (Epic thanks to Nat Duca for helping us collect that data.)
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo/webkit-dev

___
webkit-dev mailing list

Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-09 Thread Benjamin Poulain
On Wed, Jan 9, 2013 at 7:07 PM, Eric Seidel e...@webkit.org wrote:

 On Wed, Jan 9, 2013 at 6:38 PM, Oliver Hunt oli...@apple.com wrote:
  How will we ensure thread safety?  Even at just the tokenizing level
 don't we use AtomicString?  AtromicString isn't threadsafe wrt StringImpl
 IIRC so this seems like it sould add a world of hurt.

 AtomicString is already usable from other threads
 (http://trac.webkit.org/changeset/38094), but are correct this is the
 core concern!  PickledToken (or whatever it's called) will have to be
 written very carefully in order to minimize/eliminate copies, while
 still guaranteeing thread safety.  The correct design and handling of
 PickledToken is the entire question of this whole endeavor.


That is probably what you meant, but just in case...

AtomicString can be used from different threads, but is not thread safe.
You must make an isolatedCopy() for message passing if you keep a reference
to the String in your thread.
Not the end of the world, but something to be aware of :)

Cheers,
Benjamin
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-09 Thread Adam Barth
On Wed, Jan 9, 2013 at 7:35 PM, Benjamin Poulain benja...@webkit.org wrote:
 On Wed, Jan 9, 2013 at 7:07 PM, Eric Seidel e...@webkit.org wrote:
 On Wed, Jan 9, 2013 at 6:38 PM, Oliver Hunt oli...@apple.com wrote:
  How will we ensure thread safety?  Even at just the tokenizing level
  don't we use AtomicString?  AtromicString isn't threadsafe wrt StringImpl
  IIRC so this seems like it sould add a world of hurt.

 AtomicString is already usable from other threads
 (http://trac.webkit.org/changeset/38094), but are correct this is the
 core concern!  PickledToken (or whatever it's called) will have to be
 written very carefully in order to minimize/eliminate copies, while
 still guaranteeing thread safety.  The correct design and handling of
 PickledToken is the entire question of this whole endeavor.

 That is probably what you meant, but just in case...

 AtomicString can be used from different threads, but is not thread safe. You
 must make an isolatedCopy() for message passing if you keep a reference to
 the String in your thread.
 Not the end of the world, but something to be aware of :)

Yeah, we're aware of this issue.  We'll probably end up doing
something slightly customized for this use case.  For example, many of
the AtomicStrings used in parsing are tag and attribute names that are
known at compile time (e.g., div, href).  When moving these
strings back to the main thread, we need only the hash of the string
and not the underlying characters in the string (because we know
statically that the hash will exist in the main thread's atomic string
table).

It's tempting to optimize these things prematurely.  We'll likely
start with a simple approach that makes copies and then optimize away
the copies over the development of the feature as indicated by
profiles.

Adam
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-09 Thread Filip Pizlo
I think your biggest challenge will be ensuring that the latency of shoving 
things to another core and then shoving them back will be smaller than the 
latency of processing those same things on the main thread.

For small documents, I expect concurrent tokenization to be a pure regression 
because the latency of waking up another thread to do just a small bit of work, 
plus the added cost of whatever synchronization operations will be needed to 
ensure safety, will involve more total work than just tokenizing locally.

We certainly see this in the JSC parallel GC, and in line with traditional 
parallel GC design, we ensure that parallel threads only kick in when the main 
thread is unable to keep up with the work that it has created for itself.

Do you have a vision for how to implement a similar self-throttling, where 
tokenizing continues on the main thread so long as it is cheap to do so?

-Filip


On Jan 9, 2013, at 6:00 PM, Eric Seidel e...@webkit.org wrote:

 We're planning to move parts of the HTML Parser off of the main thread:
 https://bugs.webkit.org/show_bug.cgi?id=106127
 
 This is driven by our testing showing that HTML parsing on mobile is
 be slow, and long (causing user-visible delays averaging 10 frames /
 150ms).
 https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002
 Complete data can be found at [1].
 
 Mozilla moved their parser onto a separate thread during their HTML5
 parser re-write:
 https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading
 
 We plan to take a slightly simpler approach, moving only Tokenizing
 off of the main thread:
 https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit
 The left is our current design, the middle is a tokenizer-only design,
 and the right is more like mozilla's threaded-parser design.
 
 Profiling shows Tokenizing accounts for about 10x the number of
 samples as TreeBuilding.  Including Antti's recent testing (.5% vs.
 3%):
 https://bugs.webkit.org/show_bug.cgi?id=106127#c10
 If after we do this we measure and find ourselves still spending a lot
 of main-thread time parsing, we'll move the TreeBuilder too. :)  (This
 work is a nicely separable sub-set of larger work needed to move the
 TreeBuilder.)
 
 We welcome your thoughts and comments.
 
 
 1. 
 https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0
 (Epic thanks to Nat Duca for helping us collect that data.)
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-09 Thread Ian Hickson
On Wed, 9 Jan 2013, Eric Seidel wrote:
 
 The core goal is to reduce latency -- to free up the main thread for 
 JavaScript and UI interaction -- which as you correctly note, cannot be 
 moved off of the main thread due to the single thread of execution 
 model of the web.

Parsing and (maybe to a lesser extent) compiling JS can be moved off the 
main thread, though, right? That's probably worth examining too, if it 
hasn't already been done.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

2013-01-09 Thread Filip Pizlo

On Jan 9, 2013, at 10:04 PM, Ian Hickson i...@hixie.ch wrote:

 On Wed, 9 Jan 2013, Eric Seidel wrote:
 
 The core goal is to reduce latency -- to free up the main thread for 
 JavaScript and UI interaction -- which as you correctly note, cannot be 
 moved off of the main thread due to the single thread of execution 
 model of the web.
 
 Parsing and (maybe to a lesser extent) compiling JS can be moved off the 
 main thread, though, right? That's probably worth examining too, if it 
 hasn't already been done.

100% agree.

However, the same problem I brought up about tokenization applies here: a lot 
of JS functions are super cheap to parse and compile already, and the latency 
of doing so on the main thread is likely to be lower than the latency of 
chatting with another core.  I suspect this could be alleviated by (1) 
aggressively pipelining the work, where during page load or during heavy JS use 
the compilation thread always has a non-empty queue of work to do; this will 
mean that the latency of communication is paid only when the first compilation 
occurs, and (2) allowing the main thread to steal work from the compilation 
queue.  I'm not sure how to make (2) work well.  For parsing it's actually 
harder since we rely heavily on the lazy parsing optimization: code is only 
parsed once we need it *right now* to run a function.  For compilation, it's 
somewhat easier: the most expensive compilation step is the third-tier 
optimizing JIT; we can delay this as long as we want, though the longer we dela
 y it, the longer we spend running slower code.

Hence, to make parsing concurrent, the main problem is figuring out how to do 
predictive parsing: have a concurrent thread start parsing something just 
before we need it.  Without predictive parsing, making it concurrent would be a 
guaranteed loss since the main thread would just be stuck waiting for the 
thread to finish.

To make optimized compiles concurrent without a regression, the main problem is 
ensuring that in those cases where we believe that the time taken to compile 
the function will be smaller than the time taken to awake the concurrent 
thread, we will instead just compile it on the main thread right away.  Though, 
if we could predict that a function was going to get hot in the future, we 
could speculatively tell a concurrent thread to compile it fully knowing that 
it won't wake up and do so until exactly when we would have otherwise invoked 
the compiler on the main thread (that is, it'll wake up and start compiling it 
once the main thread has executed the function enough times to get good 
profiling data).

Anyway, you're absolutely right that this is an area that should be explored.

-F


 
 -- 
 Ian Hickson   U+1047E)\._.,--,'``.fL
 http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
 Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo/webkit-dev