If that were all, would it be possible to patch libxml2 to use UTF-16? That might be less of an undertaking than writing a new xml library, but that could just be my youthful naivety..
On Tue, Jun 28, 2011 at 6:36 PM, Jeffrey Pfau <jp...@apple.com> wrote: > I don't know all of the problems libxml2 has, but one of the ones I've > heard is that WebCore uses UTF-16 internally, and libxml2 uses UTF-8, so the > data is perpetually converted between the two formats--and this is slow. If > there are any other big ones, I haven't been told them, only that it would > be good to have a replacement. > > Jeffrey Pfau > > On Jun 28, 2011, at 6:30 PM, Dirk Pranke wrote: > > > Can you expand a bit more on "using libxml2 exposes its own share of > problems"? > > > > -- Dirk > > > > On Tue, Jun 28, 2011 at 6:12 PM, Jeffrey Pfau <jp...@apple.com> wrote: > >> Currently, WebCore uses libxml2, or, if available, QtXml to parse > incoming XML. However, QtXml isn't always available, and using libxml2 > exposes its own share of problems. As such, I'm undertaking writing an XML > parser that uses no external libraries. > >> > >> The first step to doing this is to add a new flag that switches off the > other two parsers. As the parsers are already independent and can be > switched between by checking USE(QXMLSTREAM), I am adding USE(LIBXML2) > checks, replacing the #else conditionals, and also a new ENABLE check, > tentatively called NEW_XML (although names such as NATIVE_XML or XML_NATIVE, > etc, may be more appropriate). > >> > >> As there will probably be a new slew of files pertaining to XML parsing, > I will put these files in WebCore/xml/parser, and move the existing > XMLDocumentParser* file into this new directory. As far as I know, the > placement of these files in WebCore/dom/ is legacy, and, assuming the build > on each platform is changed, it makes sense to move them. > >> > >> Once all the files are in a logical place, I plan to make a new file for > a skeleton of the new XMLDocumentParser, at least to get it to link until a > real one is in place, even if the XML parser at that point is just a data > sink. > >> > >> From there, I plan to copy and modify a good chunk of the lower level > HTML tokenization and parsing code, and make changes as necessary to make it > work on generalized XML, at least until I can generalize the common code in > such a way that the HTML and XML tokenizers can be subclasses and use common > code. I'd probably do the refactoring at the end. > >> > >> I'm still exploring the existing parsing code, but I'd probably work my > way up from there. I've read a lot of the XML 1.0 spec in preparation, as > well, but it doesn't have much on implementation itself. If QtWebKit or > parsing people have any comments, concerns, or help, I'd be more than > willing to listen--I'm just starting here, and I'm not completely familiar > with the codebase. > >> > >> Although no code is checked in so far, I've started on this list already > and have gotten as far as the new flags, a skeleton > XMLDocumentParserNew.cpp, and making a tokenizer that compiles and links, > but is completely untested. > >> > >> Jeffrey Pfau > >> _______________________________________________ > >> webkit-dev mailing list > >> webkit-dev@lists.webkit.org > >> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev > >> > > _______________________________________________ > webkit-dev mailing list > webkit-dev@lists.webkit.org > http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev >
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev