> sys.setdefaultencoding() > exists for a reason, wouldn't it be better if stdlib could cope with > that at least with zipfile?
sys.setdefaultencoding just does not work. Many more things break when you call it. It only exists because people like you insisted that it exists. > Also note that I'm trying to ask if zipfile should be improved, how it > should be improved, and this possible improvement is not even for me > (because now I know how zipfile behaves and I will work correctly with > it, but someone else might stumble upon this very unexpectedly). If you want to come up with a patch: sure. The zipfile module should handle Unicode strings, encoding them in the encoding that the ZIP specification defines (both the formal one, and the informal-defined-by-pkwares-implementation). The tricky question is what to do when reading in zipfiles with non-ASCII characters (and yes, I understand that in your case there were only ASCII characters in the file names). > The problem was that sourcedir was unicode, and on my machine > everything went ok multiple times. zipfile.ZipInfo.FileHeader would > return unicode, but then when it writes it to a file it gets back to > str (because mappings back and forth were identical). The problem > happened when on a different machine header suddenly got byte 0x98 in > position 10 (seems to be compress_size), which cp1251 codec couldn't > decode. You see, arcname didn't even have unicode characters, but the > mere fact that it was unicode made header upgrade to unicode in > "return header + self.filename + self.extra". Ok, now I understand. If filename is a Unicode string, header is converted using the system encoding; depending on the exact value of header and depending on the system encoding, this may cause a decoding error. This bug has been reported as http://bugs.python.org/1170311 > Because that's not supposed to work sanely when self.filename is > unicode I'm asking if the right behavior would be to a) disallow > unicode filenames in zipfile.ZipInfo, b) automatically convert > filename to str in zipfile.ZipInfo, c) leave everything as it is. The correct behavior would be b); the difficult details are what encoding to use. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com