You must add the correct encoding info in the xml source file. Ex. using amara:
chinese.xml <?xml version="1.0" encoding="utf-8"?> <test>�u�u啖啖才是�w.���扉L锍才是��</test> >>> import amara >>> doc = amara.parse('chinese.xml') >>> print unicode(doc.test) >>> �u�u啖啖才是�w.���扉L锍才是�� No problem with big5 >>> doc = amara.parse('http://xml.ascc.net/test/wfall/big5/test13.xml') >>> 2007/10/22, Fabian López <[EMAIL PROTECTED]>: > Hi, > I am parsing an XML file that includes chineses characters, like > ^�u�u啖啖才是�w.���扉L锍才是�� or ヘアアイロン... The problem is that I get an error like: > UnicodeEncodeerror:'charmap' codec can't encode characters in position.... > The thing is that I would like to ignore it and parse all the characters > less these ones. So, could anyone help me? I suppose that I can catch an > exception that ignores it or maybe use any function that detects this > chinese characters and after that ignore them. > > Thanks!! > Fabian > > _______________________________________________ > XML-SIG maillist - XML-SIG@python.org > http://mail.python.org/mailman/listinfo/xml-sig > > -- Saludos, -- Luis Miguel
_______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig