> By the way, what are the ASCII characters that are not suppported by > Shift-JIS? > Not many I suppose? (if I read the Wikipedia entry correctly, it's only the > backslash and the tilde).
The problem with this encoding is that bytes below 128 appear as second bytes of a two-byte encoding: py> "\x81@".decode("shift-jis") u'\u3000' py> "\x81A".decode("shift-jis") u'\u3001' So in on decoding, it may be the second byte (i.e. the ASCII byte) that causes a problem: py> "\x81/".decode("shift-jis") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'shift_jis' codec can't decode bytes in position 0-1: illegal multibyte sequence For the shift-jis codec, that's actually not a problem, though: py> b"\x81/".decode("shift-jis","utf8b") '\udc81/' so the utf8b error handler will escape the first of the two bytes, and then pass the second byte to the codec again, which then decodes as ASCII. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com