On Mon, Sep 28, 2009 at 08:43:08PM +0200, Stefan Behnel wrote: > Hi, > > there seems to be a change in libxml2 2.7.4 that prevents it from parsing a > Python unicode string buffer, which is UCS4-LE encoded on my system. The > first call to xmlCtxtResetPush() works and parses the first chunk as > expected, but subsequent calls to xmlParseChunk() then fail with an error: > "input conversion failed due to input error, bytes 0x22 0x00 0x00 0x00" > (the latter being '"', which was the first character in the second chunk). > > So, when passing '<?xml version=' to xmlCtxtResetPush() and '"1.0"?><ro' to > xmlParseChunk(), I get the error above. I only noticed this by accident, as > a few badly written test cases in lxml happened to parse from Unicode > strings when run under Python 3. > > Any ideas where this might originate from?
https://bugzilla.gnome.org/show_bug.cgi?id=566012 and git recent commit "Fix a parsing problem with little data at startup" if you can give me reproducer preferably in C (or with the default python bindings) I can check. It's about guessing the encoding at the beginning of the document and before the encoding is being specified in the XMLDecl Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [email protected] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
