Hi Daniel, All, I'm using the xmlCharEncoding argument of htmlCreatePushParserCtxt in order to force the parser to expect a certain input encoding. It works fine but only as long as the HTML document contains no header like
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2"> where charset differs from the encoding which I'm trying to enforce. Imagine, if you like, that I receive the data from a web server, which already sends all as UTF-8, declares correctly Content-Type charset as UTF-8 in the HTTP header, but somehow the document still contains a (forgotten) <meta ...charset=iso-8859-2>. I should mention that both htmlParseDoc() and htmlParseFile(), under the same scenario, do obey the encoding I specify. Since these methods are more high-level I'm not sure whether it's a bug or feature that htmlCreatePushParserCtxt() favors <meta> over the encoding specified in the constructor while htmlParse* do otherwise. In either case, I'd be interested if there is currently some way to force encoding with the push parser (so that the <meta> won't overrride it)? Thanks, Petr P.S. this was also tested with CVS libxml2. -- _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
