On Wed, Jan 5, 2011 at 5:07 AM, Patrick Gansterer <par...@paroga.com> wrote: > > Is there a reason why we can't pass the "raw" data to libxml2? > E.g. when the input file is UTF-8 we convert it into UTF-16 and then libxml2 > converts it back into UTF-8 (its internal format). This is a real performance > problem when parsing XML [1]. > Is there some (required) magic involved when detecting the encoding in > WebKit? AFAIK XML always defaults to UTF-8 if there's no encoding declared. > Can we make libxml2 do the encoding detection and provide all of our decoders > so it can use it? > > [1] https://bugs.webkit.org/show_bug.cgi?id=43085 >
Looking at that bug, the "XSLT argument" is a red herring. We don't use libxml's data structures and so when we use libxslt we either turn the XML parser completely over to libxslt or we serialize and re-parse (that's how the javascript-invoked XLST works). In both cases, we're probably incurring a penalty for this double decoding of Unicode encodings. A native XML parser for WebKit would help in the situation where you aren't using XSLT. Only a native or different XSLT processor in conjunction with a native XML parser would help in all cases. The XSLT processor question is a thorny one that I brought up awhile ago. I personally would love to see us use a processor that has better integration with WebKit's API. There are a handful of choices but many of them are XSLT 2.0. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev