On Friday 10 November 2006 15:37, Daniel Veillard wrote: > On Fri, Nov 10, 2006 at 03:15:40PM +0100, Petr Pajas wrote: > > Hi Daniel, All, > > > > I'm using the xmlCharEncoding argument of htmlCreatePushParserCtxt in > > order to force the parser to expect a certain input encoding. It works > > fine but only as long as the HTML document contains no header like > > > > <meta http-equiv="Content-Type" content="text/html; > > charset=iso-8859-2"> > > > > where charset differs from the encoding which I'm trying to enforce. > > Imagine, if you like, that I receive the data from a web server, which > > already sends all as UTF-8, declares correctly Content-Type charset as > > UTF-8 in the HTTP header, but somehow the document still contains a > > (forgotten) > > <meta ...charset=iso-8859-2>. > > > > I should mention that both htmlParseDoc() and htmlParseFile(), under the > > same scenario, do obey the encoding I specify. > > htmlRead... should be preferred nowadays.
This actually concerned the old dusty XML::LibXML bindings, so my first intention was to do just a minimum surgery, leaving the dust safely where lied. But switching to htmlReadIO proved to be a way better choice than that. It does exactly what I needed with just a few lines of code. Cheers, -- Petr _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
