On Fri, May 16, 2008 at 12:37:12PM +0200, bagnacauda wrote: > I need your help to understand what follows. > > I have this xml file (you can find it attached) whose tag may contain > western European, Russian or Greek characters, even mixed among them. > I have run xmllint --debug ?Csax on the file to see if everything is OK when > I get a mixed character string and I was surprised to see that the > characters callback is invoked twice: once for the first four characters > (which are western european) and once for the remaining part of the string
Wrong expectation, libxml2 behavious is normal. SAX being a streaming interface, and since a text node in XML has no size boundary, this imply that the content of a text node may be received as multiple characters callbacks, if you don't accept this in the receiving side you will loose data. Once it's clear that you must support multiple consecutive characters callbacks, well libxml2 uses this to speed up parsing when possible. That simple. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
