On Tue, Sep 15, 2009 at 9:48 PM, jeffunit <j...@jeffunit.com> wrote: > At 09:25 PM 9/15/2009, Mark Tolonen wrote: >> >> "jeffunit" <j...@jeffunit.com> wrote in message >> news:20090915144123964.ljka6...@cdptpa-omta01.mail.rr.com... >>> >>> I wrote a program that diffs files and prints out matching file names. >>> I will be executing the output with sh, to delete select files. >>> >>> Most of the files names are plain ascii, but about 10% of them have >>> unicode >>> characters in them. When I try to print the string containing the name, I >>> get >>> an exception: >>> >>> 'ascii' codec can't encode character '\udce9' >>> in position 37: ordinal not in range(128) >>> >>> The string is: >>> >>> './Julio_Iglesias-Un_Hombre_Solo-05-Qu\udce9_no_se_rompa_la_noche.mp3' >>> >>> This is on a windows xp system, using python 3.1 which I compiled >>> with the cygwin >>> linux compatability layer tool. >>> >>> Can you tell me what encoding I need to print \udce9 and how to set >>> python to >>> that encoding mode? >> >> That looks like a "surrogate escape" (See PEP 383) >> http://www.python.org/dev/peps/pep-0383/. It indicates the wrong encoding >> was used to decode the filename. > > That seems likely. How do I set the encoding to something correct to decode > the filename? > > Clearly windows knows how to display it. > I suspect since I complied python with cygwin, that it is using a POSIX > standard, > rather than a windows specific standard. Of course ideally, I would like my > code to work > on linux as well as windows, as I back up all of my data to a linux machine > with > samba.
Have you perhaps tried using the native Windows version of Python? Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list