See responses inline: On Jun 28, 2011, at 6:26 PM, Adam Barth wrote:
> A question and a comment: > > 1) Will this let us to remove the code for both the libxml2 and the > QtXml parsers? I'd certainly much rather have one XML parser than > three. This won't replace libxslt or QtXmlPatterns for XSL-T, as they depend on the respective XML libraries. The goal for this XML parser is to be able to replace the core XML parser itself. XSL-T support would have to come later. > 2) One thing we found very helpful in working on the HTML parser was a > good test suite. Presumably there are existing XML parsing test > suites. You might consider landing one (or more) of these test suites > as a first step. > > Adam I know that W3C provides a test suite, but it's probably not that comprehensive. I can try to find more online; I'm sure that some of the open source projects like libxml2 provide some. Jeffrey Pfau > > On Tue, Jun 28, 2011 at 6:12 PM, Jeffrey Pfau <jp...@apple.com> wrote: >> Currently, WebCore uses libxml2, or, if available, QtXml to parse incoming >> XML. However, QtXml isn't always available, and using libxml2 exposes its >> own share of problems. As such, I'm undertaking writing an XML parser that >> uses no external libraries. >> >> The first step to doing this is to add a new flag that switches off the >> other two parsers. As the parsers are already independent and can be >> switched between by checking USE(QXMLSTREAM), I am adding USE(LIBXML2) >> checks, replacing the #else conditionals, and also a new ENABLE check, >> tentatively called NEW_XML (although names such as NATIVE_XML or XML_NATIVE, >> etc, may be more appropriate). >> >> As there will probably be a new slew of files pertaining to XML parsing, I >> will put these files in WebCore/xml/parser, and move the existing >> XMLDocumentParser* file into this new directory. As far as I know, the >> placement of these files in WebCore/dom/ is legacy, and, assuming the build >> on each platform is changed, it makes sense to move them. >> >> Once all the files are in a logical place, I plan to make a new file for a >> skeleton of the new XMLDocumentParser, at least to get it to link until a >> real one is in place, even if the XML parser at that point is just a data >> sink. >> >> From there, I plan to copy and modify a good chunk of the lower level HTML >> tokenization and parsing code, and make changes as necessary to make it work >> on generalized XML, at least until I can generalize the common code in such >> a way that the HTML and XML tokenizers can be subclasses and use common >> code. I'd probably do the refactoring at the end. >> >> I'm still exploring the existing parsing code, but I'd probably work my way >> up from there. I've read a lot of the XML 1.0 spec in preparation, as well, >> but it doesn't have much on implementation itself. If QtWebKit or parsing >> people have any comments, concerns, or help, I'd be more than willing to >> listen--I'm just starting here, and I'm not completely familiar with the >> codebase. >> >> Although no code is checked in so far, I've started on this list already and >> have gotten as far as the new flags, a skeleton XMLDocumentParserNew.cpp, >> and making a tokenizer that compiles and links, but is completely untested. >> >> Jeffrey Pfau >> _______________________________________________ >> webkit-dev mailing list >> webkit-dev@lists.webkit.org >> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev >> _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev