Hi,

Recently I encountered a problem with the str.encode() function.  I used the
function like this: s.encode("mbcs", "replace"), expecting it will eliminate
all invalid characters.  However it failed with the following message:
UnicodeEncodeError: 'gbk' codec can't encode character '\ue104' in position
4: i

Am I using it in a wrong way or is it a bug?

Platform: Windows Vista SP1, system default code page: 936 (zh-cn).  Program
(test.py.txt) in attachment.

>python3 test.py
A
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    print(str.encode("mbcs", "replace").decode("mbcs", "replace"))
  File "C:\Python30\lib\io.py", line 1485, in write
    b = encoder.encode(s)
UnicodeEncodeError: 'gbk' codec can't encode character '\ue104' in position
4: i
llegal multibyte sequence
>python3 test.py
A
??黹���{??�惆��z
B
>python3 test.py
A
��??????�q勒��
B

Thanks,

Decheng (AKA Robbie Mosaic) Fan
import sys
import random

str = "".join([chr(random.randint(0, 65535)) for i in range(10)])
str.encode("mbcs", "replace").decode("mbcs")
print("A")
print(str.encode("mbcs", "replace").decode("mbcs", "replace"))
print("B")
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to