Hi, One of my users has run in to a problem where the HTML parser will lose all data after a particular sequence of characters in the HTML body. It seems that if there are two characters, 0x01 followed by 0x00, the HTML parser will loose all data after those two characters even if the parser is put in recovery mode.
Here is a program and test file that reproduce the problem: http://gist.github.com/99401 I realize those characters are not valid UTF-8 characters, but it seems that if the parser is in recovery mode it shouldn't lose all data after them. Shall I file a ticket in bugzilla? -- Aaron Patterson http://tenderlovemaking.com/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
