On Tue, May 02, 2006 at 07:15:07PM +0200, A. Pagaltzis wrote: > * Olivier Sirven <[EMAIL PROTECTED]> [2006-05-02 18:35]: > > If you have a solution for correcting every invalid character > > into a valid one without loosing information I would be really > > happy to read it :) > > Well, not in the general case; the computer is not a mind reader. > But depending on the assumptions you can make, you can do > something like what I wrote about here: > > Repairing broken documents that mix UTF-8 and ISO-8859-1 > http://plasmasturm.org/log/416/
The problem is "how do you know it's ISO-8859-1 and not another variant. You can't garantee to not generate false positive (i.e. corrupt data) which is why the XML Working Group declared this had to be a fatal error. The only sane approach (in those days of liability for software this is especially true) is to force the error to get the input fixed, unless you have some information which tells you what the encoding really is and then you can still preprocess. Daniel -- Daniel Veillard | Red Hat http://redhat.com/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
