Re: How to efficiently walk the DOM tree and its strings

2014-03-07 Thread Henri Sivonen
On Tue, Mar 4, 2014 at 11:26 AM, Andrew Sutherland wrote: > On 03/04/2014 03:13 AM, Henri Sivonen wrote: >> >> It saddens me that we are using non-compliant ad hoc parsers when we >> already have two spec-compliant (at least at some point in time) ones. > > Interesting! I assume you are referring

HTML sanitization, CSP, nsIContentPolicy, ServiceWorkers (was: Re: How to efficiently walk the DOM tree and its strings)

2014-03-05 Thread Andrew Sutherland
On 03/05/2014 01:52 AM, nsm.nik...@gmail.com wrote: On Tuesday, March 4, 2014 1:26:15 AM UTC-8, somb...@gmail.com wrote: While we have a defense-in-depth strategy (CSP and iframe sandbox should be protecting us from the worst possible scenarios) and we're hopeful that Service Workers will eventu

Re: How to efficiently walk the DOM tree and its strings

2014-03-04 Thread nsm . nikhil
On Tuesday, March 4, 2014 1:26:15 AM UTC-8, somb...@gmail.com wrote: > While we have a defense-in-depth strategy (CSP and iframe sandbox should > > be protecting us from the worst possible scenarios) and we're hopeful > > that Service Workers will eventually let us provide > > nsIContentPolic

Re: How to efficiently walk the DOM tree and its strings

2014-03-04 Thread Felipe G
The actual translation needs to happen at once, but that's ok if I can work in the chunks incrementally, and only when everything is ready I send it off to the translation service. What I need to find then is a good (and fast) partitioning algorithm that will give me a list of several blocks to tr

Re: How to efficiently walk the DOM tree and its strings

2014-03-04 Thread Robert O'Callahan
On Wed, Mar 5, 2014 at 8:47 AM, Felipe G wrote: > If I go with the clone route (to work on the snapshot'ed version of the > data), how can I later associate the cloned nodes to the original nodes > from the document? One way that I thought is to set a a userdata on the > DOM nodes and then use t

Re: How to efficiently walk the DOM tree and its strings

2014-03-04 Thread Felipe G
Thanks for the feedback so far! If I go with the clone route (to work on the snapshot'ed version of the data), how can I later associate the cloned nodes to the original nodes from the document? One way that I thought is to set a a userdata on the DOM nodes and then use the clone handler callback

Re: How to efficiently walk the DOM tree and its strings

2014-03-04 Thread Felipe G
Chrome imports a JS

Re: How to efficiently walk the DOM tree and its strings

2014-03-04 Thread Andrew Sutherland
On 03/04/2014 03:13 AM, Henri Sivonen wrote: It saddens me that we are using non-compliant ad hoc parsers when we already have two spec-compliant (at least at some point in time) ones. Interesting! I assume you are referring to: https://github.com/davidflanagan/html5/blob/master/html5parser.j

Re: How to efficiently walk the DOM tree and its strings

2014-03-04 Thread Henri Sivonen
On Mon, Mar 3, 2014 at 10:19 PM, Boris Zbarsky wrote: > How feasible is just doing .innerHTML to do that, then doing some sort of > async parse (e.g. XHR or DOMParser) to get a DOM snapshot? Seems more efficient to write the walk in C++, since the innerHTML getter already includes the walk in C++

Re: How to efficiently walk the DOM tree and its strings

2014-03-03 Thread Wesley Johnston
3, 2014 3:57:04 PM Subject: Re: How to efficiently walk the DOM tree and its strings On 03/03/2014 03:19 PM, Boris Zbarsky wrote: > That said, there might be JS implementations of an HTML5 parser out there. The Gaia e-mail app has a streaming HTML parser in its worker-friendly sanitizer a

Re: How to efficiently walk the DOM tree and its strings

2014-03-03 Thread Boris Zbarsky
On 3/3/14 9:07 PM, Boris Zbarsky wrote: document.documentElement.cloneNode(true): ~18ms document.cloneNode(true): ~8ms Oh, and the difference between these two is that in the former clones of elements try to do image loads, which takes about 70% of the cloning time, but in the latter w

Re: How to efficiently walk the DOM tree and its strings

2014-03-03 Thread Boris Zbarsky
On 3/3/14 5:38 PM, Robert O'Callahan wrote: Wouldn't a deep clone of the root element be more efficient? Possibly, yes. Of course if you plan to do processing on a worker you then have to serialize parts or all of the clone Anyway, I just did some measurements. On a year+ old laptop, o

Re: How to efficiently walk the DOM tree and its strings

2014-03-03 Thread Andrew Sutherland
On 03/03/2014 03:19 PM, Boris Zbarsky wrote: That said, there might be JS implementations of an HTML5 parser out there. The Gaia e-mail app has a streaming HTML parser in its worker-friendly sanitizer at https://github.com/mozilla-b2g/bleach.js/blob/worker-thread-friendly/lib/bleach.js. It's

Re: How to efficiently walk the DOM tree and its strings

2014-03-03 Thread Robert O'Callahan
On Tue, Mar 4, 2014 at 9:19 AM, Boris Zbarsky wrote: > How feasible is just doing .innerHTML to do that, then doing some sort of > async parse (e.g. XHR or DOMParser) to get a DOM snapshot? That said, this > would mean that you end up with a snapshot that's actual DOM stuff, on the > main thread

Re: How to efficiently walk the DOM tree and its strings

2014-03-03 Thread Felipe G
Hi Axel, yeah, we are planning to cover that case about the changing order of inline elements. See the bottom half of this comment for details: https://bugzilla.mozilla.org/show_bug.cgi?id=971043#c3 For alt text, input label placeholders, and other text content from attributes, I'm planning on bu

Re: How to efficiently walk the DOM tree and its strings

2014-03-03 Thread Axel Hecht
Hi, translating DOM is a bit funky. Generally, you can probably translate block elements one by one, but you need to persist inline elements. You should mark up the inline elements in the string that you send to the translation engine, such that you can support inline markup changing the ord

Fwd: How to efficiently walk the DOM tree and its strings

2014-03-03 Thread Felipe G
On Mon, Mar 3, 2014 at 5:19 PM, Boris Zbarsky wrote: > On 3/3/14 2:28 PM, Felipe G wrote: > >> A possible solution to that is to only pause the page that is being >> translated (with, >> say, EnterModalState) until we can finish working on it, while letting >> other pages and the UI work normally

Re: How to efficiently walk the DOM tree and its strings

2014-03-03 Thread Ehsan Akhgari
On 2014-03-03, 3:19 PM, Boris Zbarsky wrote: Better yet if we can send this copy with a non-copy move to a Worker thread. You could send the string to a worker (if you do this in C++ you won't even need to copy it, afaict), but then on the worker you have to parse the HTML... That said, there

Re: How to efficiently walk the DOM tree and its strings

2014-03-03 Thread Ehsan Akhgari
On 2014-03-03, 3:30 PM, Felipe G wrote: During the translation phase, Chrome imports a JS

Re: How to efficiently walk the DOM tree and its strings

2014-03-03 Thread Felipe G
During the translation phase, Chrome imports a JS

Re: How to efficiently walk the DOM tree and its strings

2014-03-03 Thread Boris Zbarsky
On 3/3/14 2:28 PM, Felipe G wrote: A possible solution to that is to only pause the page that is being translated (with, say, EnterModalState) until we can finish working on it, while letting other pages and the UI work normally. The other pages can still modify the DOM of the page in question

Re: How to efficiently walk the DOM tree and its strings

2014-03-03 Thread Jeff Muizelaar
On Mar 3, 2014, at 2:28 PM, Felipe G wrote: > Hi everyone, I'm working on a feature to offer webpage translation in > Firefox. Translation involves, quite unsurprisingly, a lot of DOM and > strings manipulation. Since DOM access only happens in the main thread, it > brings the question of how to

How to efficiently walk the DOM tree and its strings

2014-03-03 Thread Felipe G
Hi everyone, I'm working on a feature to offer webpage translation in Firefox. Translation involves, quite unsurprisingly, a lot of DOM and strings manipulation. Since DOM access only happens in the main thread, it brings the question of how to do it properly without causing jank. This is the use