2009/5/6 Antoine Pitrou <solip...@pitrou.net>: > By the way, what are the ASCII characters that are not suppported by > Shift-JIS? > Not many I suppose? (if I read the Wikipedia entry correctly, it's only the > backslash and the tilde).
The biggest problem with Shift-JIS is that a perfectly valid unicode character above 127 can be encoded to a byte sequence that includes bytes in range(128). E.g. the character 掛 (a.k.a. '\u639b') when encoded with Shift-JIS becomes the two bytes sequence b'\x8a|'. Notice that the second byte is 124, which on POSIX is usually interpreted as the pipe character and can have security implications. It's a know problem with Shift-JIS and was fixed in UTF-8. -- Lino Mastrodomenico _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com