Vinayakc wrote: > Hi all, > > I am new to python. > > I have written one small application which reads data from xml file and > tries to encode data using apprpriate charset. > I am facing problem while encoding one chinese paragraph with charset > "gb2312". > > code is: > > encoded_str = str_data.encode("gb2312") > > The type of str_data is <type 'unicode'> > > The exception is: > > "UnicodeEncodeError: 'gb2312' codec can't encode character u'\xa0' in > position 0: illegal multibyte sequence"
Hmm, this is 'no-break space' in the very beginning of the text. It look suspiciously like a plain text utf-8 signature which is 'zero width no-break space'. If you strip the first character do you still have encoding errors? -- http://mail.python.org/mailman/listinfo/python-list