Re: [XML-SIG] parsing chinese characters

Luis Miguel Morillas Mon, 22 Oct 2007 13:57:16 -0700

You must add the correct encoding info in the xml source file.

Ex. using amara:


chinese.xml
<?xml version="1.0" encoding="utf-8"?>
<test>�u�u啖啖才是�w.���扉L锍才是��</test>

>>> import amara
>>> doc = amara.parse('chinese.xml')
>>> print unicode(doc.test)
>>> �u�u啖啖才是�w.���扉L锍才是��

No problem with big5

>>> doc = amara.parse('http://xml.ascc.net/test/wfall/big5/test13.xml')
>>>



2007/10/22, Fabian López <[EMAIL PROTECTED]>:
> Hi,
> I am parsing an XML file that includes chineses characters, like
> ^�u�u啖啖才是�w.���扉L锍才是�� or ヘアアイロン... The problem is that I get an error like:
> UnicodeEncodeerror:'charmap' codec can't encode characters in position....
> The thing is that I would like to ignore it and parse all the characters
> less these ones. So, could anyone help me? I suppose that I can catch an
> exception that ignores it or maybe use any function that detects this
> chinese characters and after that ignore them.
>
> Thanks!!
> Fabian
>
> _______________________________________________
> XML-SIG maillist  -  XML-SIG@python.org
> http://mail.python.org/mailman/listinfo/xml-sig
>
>


-- 
Saludos,

--

Luis Miguel

_______________________________________________
XML-SIG maillist  -  XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig

Re: [XML-SIG] parsing chinese characters

Reply via email to