On Tue, Jul 03, 2007 at 06:04:16PM -0700, Terry Carroll wrote: > >> Has anyone found a silver bullet for ensuring that all the filenames >> encountered by os.walk are treated as UTF-8? Thanks. > >What happens if you specify the starting directory as a Unicode string, >rather than an ascii string, e.g., if you're walking the current >directory: > > for thing in os.walk(u'.'): > >instead of: > > for thing in os.walk('.'):
This is a good thought, and the crux of the problem. I pull the starting directories from an XML file which is UTF-8, but by the time it hits my program, because there are no extended characters in the starting path, os.walk assumes ascii. So, I recast the string as UTF-8, and I get UTF-8 output. The problem happens further down the line. I get a list of paths from the results of os.walk, all in UTF-8, but not identified as such. If I just pass my list to other parts of the program it seems to assume either ascii or UTF-8, based on the individual list elements. If I try to cast the whole list as UTF-8, I get an exception because it is assuming ascii and receiving UTF-8 for some list elements. I suspect that my program will have to make sure to recast all equivalent-to-ascii strings as UTF-8 while leaving the ones that are already extended alone. -- yours, William _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor