Managing non-ascii filenames in python
I created the following filename in windows just as a test - “Dönåld’s™ Néphêws” deg°.txt The quotes are non -ascii, many non english characters, long hyphen etc. Now in DOS I can do a directory and it translates them all to something close. Dönåld'sT Néphêws deg°.txt I thought the correct way to do this in python would be to scan the dir files=os.listdir(os.path.dirname( os.path.realpath( __file__ ) )) then print the filenames for filename in files: print filename but as expected teh filename is not correct - so correct it using the file sysytems encoding print filename.decode(sys.getfilesystemencoding()) But I get UnicodeEncodeError: 'charmap' codec can't encode character u'\u2014' in position 6: character maps to undefined All was working well till these characters came along I need to be able to write (a representation) to the screen (and I don't see why I should not get something as good as DOS shows). Write it to an XML file in UTF-8 and write it to a text file and be able to read it back in. Again I was supprised that this was also difficult - it appears that the file also wanted ascii. Should I have to open the file in binary for write (I expect so) but then what encoding should I write in? I have been beating myself up with this for weeks as I get it working then come across some outher character that causes it all to stop again. Please help. -- http://mail.python.org/mailman/listinfo/python-list
Help needed with filenames
I have a program that reads files using glob and puts them into an XML file in UTF-8 using unicode(file, sys.getfilesystemencoding()).encode(UTF-8) This all works fine including all the odd characters like accents etc. However I also print what it is doing and someone pointed out that many characters are not printing correctly in the Windows command window. I have tried to figure this out but simply get lost in the translation stuff. if I just use print filename it has characters that dont match the ones in the filename (I sorta expected that). So I tried print unicode(file, sys.getfilesystemencoding()) expecting the correct result, but no. UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013' I did notice that when a windows command window does a directory listing of these files the characters seem to be translated into close approximations (long dash to minus, special double quotes to simple double quotes, but still retains many of the accent chars). I looked at translate to do this but did not know how to determine which characters to map. Can anyone tell me what I should be doing here? -- http://mail.python.org/mailman/listinfo/python-list