* Daniel Veillard wrote: > Hum, I don't know how it should be processed in theory ! In XML >the BOM is fine at the beginning of a document entity in UTF-8 or UTF-16 >but will usually mess things up in different encodings. For HTML I don't >know what the theory suggests. For compatibility I guess the character >should be dropped if detected.
HTML character encoding detection is a terrible mess and last time I checked libxml2 was not a compliant implementation in that it considered <meta> elements encoding switches and won't re-parse content preceding the <meta> element (much unlike browsers). Browsers typically treat the BOM here as they would do for XML documents. -- Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
