> By the way, what are the ASCII characters that are not suppported by 
> Shift-JIS?
> Not many I suppose? (if I read the Wikipedia entry correctly, it's only the
> backslash and the tilde).

The problem with this encoding is that bytes below 128 appear as second
bytes of a two-byte encoding:

py> "\x81@".decode("shift-jis")
u'\u3000'
py> "\x81A".decode("shift-jis")
u'\u3001'

So in on decoding, it may be the second byte (i.e. the ASCII byte) that
causes a problem:

py> "\x81/".decode("shift-jis")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'shift_jis' codec can't decode bytes in position
0-1: illegal multibyte sequence

For the shift-jis codec, that's actually not a problem, though:

py> b"\x81/".decode("shift-jis","utf8b")
'\udc81/'

so the utf8b error handler will escape the first of the two bytes,
and then pass the second byte to the codec again, which then decodes
as ASCII.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to