On Wed, Feb 08, 2006 at 11:46:01AM +0100, Cesar Ortiz wrote: > Hi, > > I am parsing html documents using the html parser from libxml2, and if > the encoding is included in the document it works perfectly but if it > is not, I think it does not work well (probably because I am doing > something wrong).
Well first thing wrong is that this is not libxml2 help mailing list, see http://xmlsoft.org/bugs.html > As it is said in > http://xmlsoft.org/encoding.html<http://www.google.com/url?sa=D&q=http://xmlsoft.org/encoding.html>the > parser should > detect the encoding. autodetection is done on XML based on the XMLDecl and the default values as specified by the XML specification. On HTML all bets are off if you don't have a meta tag or if you didn't indicate the encoding to the parser. > So I tested it putting an utf-8 word in a file and > it does not detect it (it generates a wrong string). Example: > reducción --> reducción. encoding is an entity property (i.e. per file) not per word. So either I don't understand your test or this just can't work. http://xmlsoft.org/html/libxml-HTMLparser.html#htmlCreatePushParserCtxt use the encoding field when creating your parser. For further informations/help, subscribe and use the libxml2 mailing-list, thanks, Daniel -- Daniel Veillard | Red Hat http://redhat.com/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig