On Mon, Feb 13, 2006 at 03:40:48PM -0800, Eric Seidel wrote: > We convert everything to UTF16, and pass around only UTF16 strings > internally in WebKit (http://www.webkit.org). If that means we have > to also removed the encoding information from the string before > passing it into libxml (or better yet, tell libxml to ignore it) we > can do that. > > In our case, we don't want the parser to autodetect. We do all that > already in WebKit, we'd just like to pass an already properly decoded > utf16 string off to libxml and let it do its magic. > > In my example it still seems that libxml falls over well before > actually reaching any xml encoding declaration. The first byte > passed seems to put the parser context into an error state. Any > thoughts on what might be causing this? Again, removing my bogus > xmlSwitchEncoding call, does not change the behavior.
First thing I notice is that you pass one byte at a time. At best this is just massively inefficient, at worse you're hitting a bug . The source from parse4.c does not do this. Also if you have converted to a memory string, why do you need to use progressive parsing ? If the conversion is progressive, I still doubt it delivers data byte by byte, just pass the blocks as they are converted. Daniel -- Daniel Veillard | Red Hat http://redhat.com/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml