On 1/10/2012 8:43 AM, jmfauth wrote:
D:\>c:\python32\python.exe
Python 3.2.2 (default, Sep  4 2011, 09:51:08) [MSC v.1500 32 bit
(Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
'\u5de5'.encode('utf-8')
b'\xe5\xb7\xa5'
'\u5de5'.encode('mbcs')
Traceback (most recent call last):
   File "<stdin>", line 1, in<module>
UnicodeEncodeError: 'mbcs' codec can't encode characters in position
0--1: inval
id character

D:\>c:\python27\python.exe
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
(Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
u'\u5de5'.encode('utf-8')
'\xe5\xb7\xa5'
u'\u5de5'.encode('mbcs')
'?'

mbcs encodes according to the current codepage. Only the chinese codepage(s) can encode the chinese char. So the unicode error is correct and 2.7 has a bug in that it is doing "errors='replace'" when it supposedly is doing "errors='strict'". The Py3 fix was done in
http://bugs.python.org/issue850997
2.7 was intentionally left alone because of back-compatibility considerations. (None of this addresses the OP's question.)

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to