William O'Higgins Witteman wrote: >> for thing in os.walk(u'.'): >> >> instead of: >> >> for thing in os.walk('.'): > > This is a good thought, and the crux of the problem. I pull the > starting directories from an XML file which is UTF-8, but by the time it > hits my program, because there are no extended characters in the > starting path, os.walk assumes ascii. So, I recast the string as UTF-8, > and I get UTF-8 output. The problem happens further down the line. > > I get a list of paths from the results of os.walk, all in UTF-8, but not > identified as such. If I just pass my list to other parts of the > program it seems to assume either ascii or UTF-8, based on the > individual list elements. If I try to cast the whole list as UTF-8, I > get an exception because it is assuming ascii and receiving UTF-8 for > some list elements.
FWIW, I'm pretty sure you are confusing Unicode strings and UTF-8 strings, they are not the same thing. A Unicode string uses 16 bits to represent each character. It is a distinct data type from a 'regular' string. Regular Python strings are byte strings with an implicit encoding. One possible encoding is UTF-8 which uses one or more bytes to represent each character. Some good reading on Unicode and utf-8: http://www.joelonsoftware.com/articles/Unicode.html http://effbot.org/zone/unicode-objects.htm If you pass a unicode string (not utf-8) to os.walk(), the resulting lists will also be unicode. Again, it would be helpful to see the code that is getting the error. > I suspect that my program will have to make sure to recast all > equivalent-to-ascii strings as UTF-8 while leaving the ones that are > already extended alone. It is nonsense to talk about 'recasting' an ascii string as UTF-8; an ascii string is *already* UTF-8 because the representation of the characters is identical. OTOH it makes sense to talk about converting an ascii string to a unicode string. Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor