[xml] magic characters make the HTML parser lose data

Aaron Patterson Tue, 21 Apr 2009 14:43:02 -0700

Hi,

One of my users has run in to a problem where the HTML parser will
lose all data after a particular sequence of characters in the HTML
body.  It seems that if there are two characters, 0x01 followed by
0x00, the HTML parser will loose all data after those two characters
even if the parser is put in recovery mode.


Here is a program and test file that reproduce the problem:

  http://gist.github.com/99401

I realize those characters are not valid UTF-8 characters, but it
seems that if the parser is in recovery mode it shouldn't lose all
data after them.  Shall I file a ticket in bugzilla?

-- 
Aaron Patterson
http://tenderlovemaking.com/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

[xml] magic characters make the HTML parser lose data

Reply via email to