On Tue, Mar 29, 2011 at 10:55:47PM +0200, Victor Stinner wrote:
> Le mardi 29 mars 2011 à 22:40 +0200, Lennart Regebro a écrit :
> > The lesson here seems to be "if you have to use blacklists, and you
> > use unicode strings for those blacklists, also make sure the string
> > you compare with doesn't have surrogates".
> 
> No. '\u4f60\u597d'.encode('big5').decode('latin1') gives '§A¦n' which
> doesn't contain any surrogate character.
> 
> The lesson is: if you compare Unicode filenames on UNIX, make sure that
> your system is correctly configured (the locale encoding must be the
> filesystem encoding).
>
You're both wrong :-)

Lennart is missing that you just need to use the same encoding
+ surrogateescape (or stick with bytes) for decoding the byte strings that
you are comparing.

You're missing that on UNIX there is no filesystem encoding so the idea of
locale and filesystem encoding matching is false (and unnecessary -- the
encodings that you use within python just need to be the same.  They don't
even need to match up to the reality of what's used on the filesystem or the
user's locale.)

-Toshio

Attachment: pgpbDIzKAesS3.pgp
Description: PGP signature

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to