On 16/09/2010, Guido van Rossum <gu...@python.org> wrote: > On Thu, Sep 16, 2010 at 11:16 AM, Toshio Kuratomi <a.bad...@gmail.com> > wrote: >> You were talking about encodings that were supersets of 7-bit ASCII. >> I think Martin was demonstrating a byte string that was a superset of >> 7-bit >> ASCII being fed to a stdlib function which went wrong. > > Whoops, sorry. I don't have access to Windows so I can't reproduce > this though. I also don't understand it. What is the Unicode codepoint > for that 十 character? What is sys.getfilesystemencoding()? What is the > value of "C:\\十".encode(sys.getfilesystemencoding())?
My fault, should have been clearer. I was trying to demonstrate that there's a difference between the unix-friendly encodings like UTF-8 and the EUC codecs which only use high-bit characters for non-ascii text, and the ISO-2022 codecs and Shift JIS. In the example I gave, 十 encodes in CP932 as '\x8f\\', and the function gets confused by the second byte. Obviously the right answer there is just to use unicode, rather than write a function that works with weird multibyte codecs. Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com