For what it's worth, we've got an extremely primitive XML parser PerformanceTest already:
http://trac.webkit.org/browser/trunk/PerformanceTests/Parser/xml-parser.html Adam On Wed, Jun 29, 2011 at 9:22 AM, Oliver Hunt <oli...@apple.com> wrote: > I think considerable effort should be put into building up a suite of > performance tests in advance of the new parser (probably culled from xml > encountered in the wild, but also a number of extreme edge cases wouldn't go > a miss either). > > We should also put effort into reducing any/all recursion in the parser as > stack depth guards are not the most efficient mechanism to prevent stack > overflows. > > --Oliver > > > On Jun 28, 2011, at 6:12 PM, Jeffrey Pfau wrote: > >> Currently, WebCore uses libxml2, or, if available, QtXml to parse incoming >> XML. However, QtXml isn't always available, and using libxml2 exposes its >> own share of problems. As such, I'm undertaking writing an XML parser that >> uses no external libraries. >> >> The first step to doing this is to add a new flag that switches off the >> other two parsers. As the parsers are already independent and can be >> switched between by checking USE(QXMLSTREAM), I am adding USE(LIBXML2) >> checks, replacing the #else conditionals, and also a new ENABLE check, >> tentatively called NEW_XML (although names such as NATIVE_XML or XML_NATIVE, >> etc, may be more appropriate). >> >> As there will probably be a new slew of files pertaining to XML parsing, I >> will put these files in WebCore/xml/parser, and move the existing >> XMLDocumentParser* file into this new directory. As far as I know, the >> placement of these files in WebCore/dom/ is legacy, and, assuming the build >> on each platform is changed, it makes sense to move them. >> >> Once all the files are in a logical place, I plan to make a new file for a >> skeleton of the new XMLDocumentParser, at least to get it to link until a >> real one is in place, even if the XML parser at that point is just a data >> sink. >> >> From there, I plan to copy and modify a good chunk of the lower level HTML >> tokenization and parsing code, and make changes as necessary to make it work >> on generalized XML, at least until I can generalize the common code in such >> a way that the HTML and XML tokenizers can be subclasses and use common >> code. I'd probably do the refactoring at the end. >> >> I'm still exploring the existing parsing code, but I'd probably work my way >> up from there. I've read a lot of the XML 1.0 spec in preparation, as well, >> but it doesn't have much on implementation itself. If QtWebKit or parsing >> people have any comments, concerns, or help, I'd be more than willing to >> listen--I'm just starting here, and I'm not completely familiar with the >> codebase. >> >> Although no code is checked in so far, I've started on this list already and >> have gotten as far as the new flags, a skeleton XMLDocumentParserNew.cpp, >> and making a tokenizer that compiles and links, but is completely untested. >> >> Jeffrey Pfau >> _______________________________________________ >> webkit-dev mailing list >> webkit-dev@lists.webkit.org >> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev > > _______________________________________________ > webkit-dev mailing list > webkit-dev@lists.webkit.org > http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev > _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev