Martin answered a similar question from Jack Jansen in another thread. OSX doesn't normalize either. It's unlikely to confuse users in practice.
On Tue, Sep 30, 2008 at 4:11 PM, Victor Stinner <[EMAIL PROTECTED]> wrote: > Since it's hard to follow the filename thread on two mailing list, i'm > starting a new thread only on python-3000 about unicode normalization of the > filenames. > > Bad news: it looks like Linux doesn't normalize filenames. So if you used NFC > to create a file, you have to reuse NFC to open your file (and the same for > NFD). > > Python2 example to create files in the different forms: >>>> name=u'xäx' >>>> from unicodedata import normalize >>>> open(u'NFD-' + normalize('NFD', name), 'w').close() >>>> open(u'NFC-' + normalize('NFC', name), 'w').close() >>>> open(u'NFKC-' + normalize('NFKC', name), 'w').close() >>>> open(u'NFKD-' + normalize('NFKD', name), 'w').close() >>>> import os >>>> os.listdir('.') > ['NFD-xa\xcc\x88x', 'NFC-x\xc3\xa4x', 'NFKC-x\xc3\xa4x', 'NFKD-xa\xcc\x88x'] >>>> os.listdir(u'.') > [u'NFD-xa\u0308x', u'NFC-x\xe4x', u'NFKC-x\xe4x', u'NFKD-xa\u0308x'] > > Directory listing using Python3: >>>> import os >>>> [ name.encode('utf-8') for name in os.listdir('.') ] > [b'NFD-xa\xcc\x88x', b'NFC-x\xc3\xa4x', b'NFKC-x\xc3\xa4x', > b'NFKD-xa\xcc\x88x'] >>>> os.listdir('.') > ['NFD-xäx', 'NFC-xäx', 'NFKC-xäx', 'NFKD-xäx'] > > Same results, correct. Then try to open files: >>>> open(normalize('NFC', 'NFC-xäx')).close() >>>> open(normalize('NFD', 'NFC-xäx')).close() > IOError: [Errno 2] No such file or directory: 'NFC-xäx' >>>> open(normalize('NFD', 'NFD-xäx')).close() >>>> open(normalize('NFC', 'NFD-xäx')).close() > IOError: [Errno 2] No such file or directory: 'NFD-xäx' > > If the user chooses a result from os.listdir(): no problem (if he has good > eyes and he's able to find the difference between 'xäx' (NFD) and 'xäx' > (NFC) :-D). > > If the user enters the filename using the keyboard (on the command line or a > GUI dialog), you have to hope that the keyboard is encoded in the same norm > than the filename was encoded... > > -- > Victor Stinner aka haypo > http://www.haypocalc.com/blog/ > _______________________________________________ > Python-3000 mailing list > Python-3000@python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com