Re: [xml] Push-parsing Unicode with LibXML2

Daniel Veillard Tue, 14 Feb 2006 00:34:00 -0800

On Mon, Feb 13, 2006 at 03:40:48PM -0800, Eric Seidel wrote:
> We convert everything to UTF16, and pass around only UTF16 strings  
> internally in WebKit (http://www.webkit.org).  If that means we have  
> to also removed the encoding information from the string before  
> passing it into libxml (or better yet, tell libxml to ignore it) we  
> can do that.
> 
> In our case, we don't want the parser to autodetect.  We do all that  
> already in WebKit, we'd just like to pass an already properly decoded  
> utf16 string off to libxml and let it do its magic.
> 
> In my example it still seems that libxml falls over well before  
> actually reaching any xml encoding declaration.  The first byte  
> passed seems to put the parser context into an error state.  Any  
> thoughts on what might be causing this?  Again, removing my bogus  
> xmlSwitchEncoding call, does not change the behavior.


  First thing I notice is that you pass one byte at a time. At best
this is just massively inefficient, at worse you're hitting a bug .
The source from parse4.c does not do this.
  Also if you have converted to a memory string, why do you need to use
progressive parsing ? If the conversion is progressive, I still doubt
it delivers data byte by byte, just pass the blocks as they are converted.

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] Push-parsing Unicode with LibXML2

Reply via email to