Hi,
Recently I encountered a problem with the str.encode() function. I used the
function like this: s.encode("mbcs", "replace"), expecting it will eliminate
all invalid characters. However it failed with the following message:
UnicodeEncodeError: 'gbk' codec can't encode character '\ue104' in position
4: i
Am I using it in a wrong way or is it a bug?
Platform: Windows Vista SP1, system default code page: 936 (zh-cn). Program
(test.py.txt) in attachment.
>python3 test.py
A
Traceback (most recent call last):
File "test.py", line 7, in <module>
print(str.encode("mbcs", "replace").decode("mbcs", "replace"))
File "C:\Python30\lib\io.py", line 1485, in write
b = encoder.encode(s)
UnicodeEncodeError: 'gbk' codec can't encode character '\ue104' in position
4: i
llegal multibyte sequence
>python3 test.py
A
??黹���{??�惆��z
B
>python3 test.py
A
��??????�q勒��
B
Thanks,
Decheng (AKA Robbie Mosaic) Fan
import sys
import random
str = "".join([chr(random.randint(0, 65535)) for i in range(10)])
str.encode("mbcs", "replace").decode("mbcs")
print("A")
print(str.encode("mbcs", "replace").decode("mbcs", "replace"))
print("B")
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe:
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com