Hi, Recently I encountered a problem with the str.encode() function. I used the function like this: s.encode("mbcs", "replace"), expecting it will eliminate all invalid characters. However it failed with the following message: UnicodeEncodeError: 'gbk' codec can't encode character '\ue104' in position 4: i
Am I using it in a wrong way or is it a bug? Platform: Windows Vista SP1, system default code page: 936 (zh-cn). Program (test.py.txt) in attachment. >python3 test.py A Traceback (most recent call last): File "test.py", line 7, in <module> print(str.encode("mbcs", "replace").decode("mbcs", "replace")) File "C:\Python30\lib\io.py", line 1485, in write b = encoder.encode(s) UnicodeEncodeError: 'gbk' codec can't encode character '\ue104' in position 4: i llegal multibyte sequence >python3 test.py A ??黹���{??�惆��z B >python3 test.py A ��??????�q勒�� B Thanks, Decheng (AKA Robbie Mosaic) Fan
import sys import random str = "".join([chr(random.randint(0, 65535)) for i in range(10)]) str.encode("mbcs", "replace").decode("mbcs") print("A") print(str.encode("mbcs", "replace").decode("mbcs", "replace")) print("B")
_______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com