Richard Lewis wrote: > Hi there, > > I'm having a problem with unicode files and ftplib (using Python 2.3.5). > > I've got this code: > > xml_source = codecs.open("foo.xml", 'w+b', "utf8") > #xml_source = file("foo.xml", 'w+b') > > ftp.retrbinary("RETR foo.xml", xml_source.write) > #ftp.retrlines("RETR foo.xml", xml_source.write) > > It opens a new local file using utf8 encoding and then reads from a file > on an FTP server (also utf8 encoded) into that local file. It comes up > with an error, however, on calling the xml_source.write callback (I > think) saying that: > > "File "myscript.py", line 75, in get_content > ftp.retrbinary("RETR foo.xml", xml_source.write) > File "/usr/lib/python2.3/ftplib.py", line 384, in retrbinary > callback(data) > File "/usr/lib/python2.3/codecs.py", line 400, in write > return self.writer.write(data) > File "/usr/lib/python2.3/codecs.py", line 178, in write > data, consumed = self.encode(object, self.errors) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 76: > ordinal not in range(128)" > > I've tried using both the commented lines of code in the above example > (i.e. using file() instead of codecs.open() and retlines() instead of > retbinary()). retlines() makes no difference, but if I use file() > instead of codecs.open() I can open the file, but the extended > characters from the source file (e.g. foreign characters, copyright > symbol, etc.) all appear with an extra character in front of them > (because of the two char width in utf8?).
Saying "appear with an extra character in front of them" is close to useless for diagnostic purposes -- print repr(sample_string) would be more informative. In any case, the file with the "foreign" [attitude?] characters may well be what you want. > > Is the xml_source.write callback causing the problem here? Or is it > something else? Is there any way that I can correctly retrieve a utf8 > encoded file from an FTP server? To get an exact copy of a file via FTP -- doesn't matter whether it's encoded in utf8 or ESCII or whatever -- use the following combination: xml_source = file("foo.xml", 'w+b') ftp.retrbinary("RETR foo.xml", xml_source.write) If you were using a command-line FTP client, you would use the "binary" command before doing a "get" or "mget". HTH, John -- http://mail.python.org/mailman/listinfo/python-list