On Mon, Jun 14, 2010 at 12:44 PM, Geoffrey Garen <[email protected]> wrote: > Measurements like this are more valuable. > > Not all HTML on the web is like the HTML in the HTML5 spec, though. Am I > right that the parser test you're using doesn't test invalid HTML at all?
If you have another corpus you'd to include in the benchmark, feel free to either add it yourself or send me a link. We need a large document to make the code spend a measurable amount of time in the parser, and we also need a document under a WebKit-compatible license. On Mon, Jun 14, 2010 at 12:58 PM, Mike Marchywka <[email protected]> wrote: > I'm starting to fear that the next blink of my disk light was cause me > to go into a fit. One thing you can consider right away is, > "plays nice with the other kids on a variety of playground equipment.." > That is, it may be great when it has unlimited memory but does > it start thrashing as soon as part of it is in VM. Not > sure how to test this entirely but this is such a huge problem I > just thought I would mention it again. Essentially it > comes down to memory coherence. I believe the new tokenizer has a similar memory footprint to the old code, but I don't have a good way to measure that. The bulk of the memory is used by the "data buffer," which is about 2k bytes in both. On Mon, Jun 14, 2010 at 1:06 PM, David Hyatt <[email protected]> wrote: > I really do consider the current code to be "barely hackable," so any new > code that follows the HTML5 spec (especially one that has a document.write / > pending script model that is easier to understand) is a huge win in my book. We ended up using the same algorithm as the old tokenizer to manage insertion points, however, we moved all the work into a separate InputStream data structure: http://trac.webkit.org/browser/trunk/WebCore/html/HTML5DocumentParser.h#L75 The old code was actually pretty clever once I figured out what it was doing. We're considering moving InputStream into its own file instead of keeping it as an inner class of the document parser. Adam _______________________________________________ webkit-dev mailing list [email protected] http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

