On 16/09/2010, Guido van Rossum <gu...@python.org> wrote:
>
> In all cases I can imagine where such polymorphic functions make
> sense, the necessary and sufficient assumption should be that the
> encoding is a superset of 7-bit(*) ASCII. This includes UTF-8, all
> Latin-N variant, and AFAIK also the popular CJK encodings other than
> UTF-16. This is the same assumption made by Python's byte type when
> you use "character-based" methods like lower().

Well, depends on what exactly you're doing, it's pretty easy to go wrong:

Python 3.2a2+ (py3k, Sep 16 2010, 18:43:45) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> os.path.split("C:\\十")
('C:\\', '十')
>>> os.path.split("C:\\十".encode(sys.getfilesystemencoding()))
(b'C:\\\x8f', b'')

Similar things can catch out web developers once they step outside the
percent encoding.

Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to