I don't know all of the problems libxml2 has, but one of the ones I've heard is that WebCore uses UTF-16 internally, and libxml2 uses UTF-8, so the data is perpetually converted between the two formats--and this is slow. If there are any other big ones, I haven't been told them, only that it would be good to have a replacement.
Jeffrey Pfau On Jun 28, 2011, at 6:30 PM, Dirk Pranke wrote: > Can you expand a bit more on "using libxml2 exposes its own share of > problems"? > > -- Dirk > > On Tue, Jun 28, 2011 at 6:12 PM, Jeffrey Pfau <jp...@apple.com> wrote: >> Currently, WebCore uses libxml2, or, if available, QtXml to parse incoming >> XML. However, QtXml isn't always available, and using libxml2 exposes its >> own share of problems. As such, I'm undertaking writing an XML parser that >> uses no external libraries. >> >> The first step to doing this is to add a new flag that switches off the >> other two parsers. As the parsers are already independent and can be >> switched between by checking USE(QXMLSTREAM), I am adding USE(LIBXML2) >> checks, replacing the #else conditionals, and also a new ENABLE check, >> tentatively called NEW_XML (although names such as NATIVE_XML or XML_NATIVE, >> etc, may be more appropriate). >> >> As there will probably be a new slew of files pertaining to XML parsing, I >> will put these files in WebCore/xml/parser, and move the existing >> XMLDocumentParser* file into this new directory. As far as I know, the >> placement of these files in WebCore/dom/ is legacy, and, assuming the build >> on each platform is changed, it makes sense to move them. >> >> Once all the files are in a logical place, I plan to make a new file for a >> skeleton of the new XMLDocumentParser, at least to get it to link until a >> real one is in place, even if the XML parser at that point is just a data >> sink. >> >> From there, I plan to copy and modify a good chunk of the lower level HTML >> tokenization and parsing code, and make changes as necessary to make it work >> on generalized XML, at least until I can generalize the common code in such >> a way that the HTML and XML tokenizers can be subclasses and use common >> code. I'd probably do the refactoring at the end. >> >> I'm still exploring the existing parsing code, but I'd probably work my way >> up from there. I've read a lot of the XML 1.0 spec in preparation, as well, >> but it doesn't have much on implementation itself. If QtWebKit or parsing >> people have any comments, concerns, or help, I'd be more than willing to >> listen--I'm just starting here, and I'm not completely familiar with the >> codebase. >> >> Although no code is checked in so far, I've started on this list already and >> have gotten as far as the new flags, a skeleton XMLDocumentParserNew.cpp, >> and making a tokenizer that compiles and links, but is completely untested. >> >> Jeffrey Pfau >> _______________________________________________ >> webkit-dev mailing list >> webkit-dev@lists.webkit.org >> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev >> _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev