On 09/05/2012 03:33 AM, Peter Otten wrote: > Ray Jones wrote: > >> I have directory names that contain Russian characters, Romanian >> characters, French characters, et al. When I search for a file using >> glob.glob(), I end up with stuff like \x93\x8c\xd1 in place of the >> directory names. I thought simply identifying them as Unicode would >> clear that up. Nope. Now I have stuff like \u0456\u0439\u043e. > >>> files = [u"\u0456\u0439\u043e"] # files = glob.glob(unicode_pattern) >>> print files > [u'\u0456\u0439\u043e'] > > To see the actual characters print the unicode strings individually: > >>>> for file in files: > ... print file > ... > ійо Aha! That works. >> These representations of directory names are eventually going to be >> passed to Dolphin (my file manager). Will they pass to Dolphin properly? > How exactly do you "pass" these names? I will be calling Dolphin with subprocess.call() and passing the directories as command line arguments.
> $ cat tmp.py > # -*- coding: utf-8 -*- > print u"Я" > $ python tmp.py > Я > $ python tmp.py | cat > Traceback (most recent call last): > File "tmp.py", line 2, in <module> > print u"Я" > UnicodeEncodeError: 'ascii' codec can't encode character u'\u042f' in > position 0: ordinal not in range(128) > > You can work around that by specifying the appropriate encoding explicitly: > > $ python tmp2.py iso-8859-5 | cat > � > $ python tmp2.py latin1 | cat > Traceback (most recent call last): > File "tmp2.py", line 4, in <module> > print u"Я".encode(encoding) > UnicodeEncodeError: 'latin-1' codec can't encode character u'\u042f' in > position 0: ordinal not in range(256) > But doesn't that entail knowing in advance which encoding you will be working with? How would you automate the process while reading existing files? Ray _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
