I think considerable effort should be put into building up a suite of performance tests in advance of the new parser (probably culled from xml encountered in the wild, but also a number of extreme edge cases wouldn't go a miss either).
We should also put effort into reducing any/all recursion in the parser as stack depth guards are not the most efficient mechanism to prevent stack overflows. --Oliver On Jun 28, 2011, at 6:12 PM, Jeffrey Pfau wrote: > Currently, WebCore uses libxml2, or, if available, QtXml to parse incoming > XML. However, QtXml isn't always available, and using libxml2 exposes its own > share of problems. As such, I'm undertaking writing an XML parser that uses > no external libraries. > > The first step to doing this is to add a new flag that switches off the other > two parsers. As the parsers are already independent and can be switched > between by checking USE(QXMLSTREAM), I am adding USE(LIBXML2) checks, > replacing the #else conditionals, and also a new ENABLE check, tentatively > called NEW_XML (although names such as NATIVE_XML or XML_NATIVE, etc, may be > more appropriate). > > As there will probably be a new slew of files pertaining to XML parsing, I > will put these files in WebCore/xml/parser, and move the existing > XMLDocumentParser* file into this new directory. As far as I know, the > placement of these files in WebCore/dom/ is legacy, and, assuming the build > on each platform is changed, it makes sense to move them. > > Once all the files are in a logical place, I plan to make a new file for a > skeleton of the new XMLDocumentParser, at least to get it to link until a > real one is in place, even if the XML parser at that point is just a data > sink. > > From there, I plan to copy and modify a good chunk of the lower level HTML > tokenization and parsing code, and make changes as necessary to make it work > on generalized XML, at least until I can generalize the common code in such a > way that the HTML and XML tokenizers can be subclasses and use common code. > I'd probably do the refactoring at the end. > > I'm still exploring the existing parsing code, but I'd probably work my way > up from there. I've read a lot of the XML 1.0 spec in preparation, as well, > but it doesn't have much on implementation itself. If QtWebKit or parsing > people have any comments, concerns, or help, I'd be more than willing to > listen--I'm just starting here, and I'm not completely familiar with the > codebase. > > Although no code is checked in so far, I've started on this list already and > have gotten as far as the new flags, a skeleton XMLDocumentParserNew.cpp, and > making a tokenizer that compiles and links, but is completely untested. > > Jeffrey Pfau > _______________________________________________ > webkit-dev mailing list > webkit-dev@lists.webkit.org > http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev