On Fri, Nov 10, 2006 at 03:15:40PM +0100, Petr Pajas wrote: > Hi Daniel, All, > > I'm using the xmlCharEncoding argument of htmlCreatePushParserCtxt in order > to > force the parser to expect a certain input encoding. It works fine but only > as long as the HTML document contains no header like > > <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2"> > > where charset differs from the encoding which I'm trying to enforce. Imagine, > if you like, that I receive the data from a web server, which already sends > all as UTF-8, declares correctly Content-Type charset as UTF-8 in the HTTP > header, but somehow the document still contains a (forgotten) > <meta ...charset=iso-8859-2>. > > I should mention that both htmlParseDoc() and htmlParseFile(), under the same > scenario, do obey the encoding I specify.
htmlRead... should be preferred nowadays. > Since these methods are more high-level I'm not sure whether it's a bug or > feature that htmlCreatePushParserCtxt() favors <meta> over the encoding > specified in the constructor while htmlParse* do otherwise. In either case, > I'd be interested if there is currently some way to force encoding with the > push parser (so that the <meta> won't overrride it)? I don't have time right now to go check and potentially make a patch. Best is to bugzilla, but don't hold your breath ! Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
