[issue4621] zipfile returns string but expects binary
Tor Arvid Lund torar...@gmail.com added the comment: I was wondering what has prevented Eddies patch from being included into python. Has nobody volunteered to verify that it works? I would be willing to do that, though I have never compiled python on any platform before. It just seems a bit silly to me that python cannot work with zip files with unicode file names... I just now had to do 'os.system(unzip.exe ...)' because zipfile did not work for me... -- nosy: +talund ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4621] zipfile returns string but expects binary
STINNER Victor victor.stin...@haypocalc.com added the comment: This issue looks to be a duplicate of #10801 which was only fixed (33543b4e0e5d) in Python 3.2. See also #12048: similar issue in Python 3.1. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4621] zipfile returns string but expects binary
STINNER Victor victor.stin...@haypocalc.com added the comment: The initial problem is clearly a duplicate of issue #10801 which is now fixed in Python 3.1+ (I just backported the fix to Python 3.1). I just discovered that attempting to open zip member test\file fails where attempting to open test/file works. (...) It seems pretty clear that zipfile should do that for me, though. @v+python: I don't think so, but others may agree with you. Please open a new issue, because it is unrelated to the initial bug report. I'm closing this issue because the initial is now fixed. For x.zip (UTF-8 encoded filenames with the Unicode flag) problem, there is already the issue #10614 which handles this case. -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4621] zipfile returns string but expects binary
Glenn Linderman v+pyt...@g.nevcal.com added the comment: I just discovered that attempting to open zip member test\file fails where attempting to open test/file works. Granted the zip contains / not \ characters, but using the os.path stuff (on windows) to manipulate the names before attempting to open the zip member produces \ characters. Clearly, I could switch them back. It seems pretty clear that zipfile should do that for me, though. A small, self-contained zip file test case is attached, being a zip that is named .py My testing using Python 3.1.1 -- nosy: +v+python Added file: http://bugs.python.org/file16674/testzip.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4621] zipfile returns string but expects binary
STINNER Victor victor.stin...@haypocalc.com added the comment: In the ZIP file format, a filename is a byte string because we don't know the encoding. You can not guess the encoding because it's not stored in the ZIP file and it depends on the OS and the OS configuration. So t1.filename have to be a byte string and testzip.read() have to use bytes and not str. -- nosy: +haypo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4621] zipfile returns string but expects binary
STINNER Victor victor.stin...@haypocalc.com added the comment: Oh, I see that zipfile.py uses the following code to choose the filename encoding: if flags 0x800: # UTF-8 file names extension filename = filename.decode('utf-8') else: # Historical ZIP filename encoding filename = filename.decode('cp437') So I'm maybe wrong: the encoding is known using a flag? ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4621] zipfile returns string but expects binary
STINNER Victor victor.stin...@haypocalc.com added the comment: Test on Ubuntu Gutsy (utf8 file system) with zip 2.32: $ mkdir x $ touch x/hé $ zip -r x.zip x adding: x/ (stored 0%) adding: x/hé (stored 0%) $ python # 3.0 trunk import zipfile testzip = zipfile.ZipFile('x.zip') testzip.infolist()[1].filename 'x/h├⌐' print(ascii(testzip.infolist()[1].filename)) 'x/h\u251c\u2310' Using my own file parse (hachoir-wx), I can see that flags=0 and filename=bytes {78 2f 68 c3 a9} (x/hé in UTF-8). You can try x.zip: I attached the file. Added file: http://bugs.python.org/file12406/x.zip ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4621] zipfile returns string but expects binary
Eddie skr...@gmail.com added the comment: The problem is not about reading the filenames, but reading the contents of a file with filename that has non-ascii charaters. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4621] zipfile returns string but expects binary
Eddie skr...@gmail.com added the comment: I read again what STINNER Victor and I think that he found another bug. Because, when listing the filenames of that zip file, the names are not displayed correctly. In fact 'x/h├⌐' == 'x/hé'.encode('utf-8').decode('cp437') So, there is again a problem with encodings when reading the contents. The problem here is that when reading one can not give the filename, because is not a key in the NameToInfo dictionary. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4621] zipfile returns string but expects binary
Eddie skr...@gmail.com added the comment: Attached is a patch that solves (I hope) the initial problem, the one from Francesco Ricciardi. -- keywords: +patch Added file: http://bugs.python.org/file12409/patch.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4621] zipfile returns string but expects binary
Eddie skr...@gmail.com added the comment: Sorry, my bad. I did tried it but with the wrong version (2.5). And it worked perfectly. So sorry again for my mistake. Anyways, I've found the error. The problem is caused by different encodings used when zipping. In open, the method is comparing b't\x82st.xml' against b't\xc3\xa9st.xml', and of course they are different. But they are no so different, because b't\x82st.xml' is 'tést'.encode('cp437') and b't\xc3\xa9st.xml' is 'tést'.encode(utf-8). The problem arises because the open method supposes the filename is in utf-8 encoding, but in __init__ it realizes that the encoding depends on the flags. if flags 0x800: filename = filename.decode.('utf-8') else: filename = filename.decode.('cp437') ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4621] zipfile returns string but expects binary
Francesco Ricciardi francesco.riccia...@hp.com added the comment: If that is what is requested, then the manual entry for ZipFile.read must be corrected, because it states: ZipFile.read(name[, pwd]) name is the name of the file in the archive, or a ZipInfo object. However, Eddie, you haven't tried what you suggested, because this is what you would get: import zipfile testzip = zipfile.ZipFile('test.zip') t1 = testzip.infolist()[0] t1.filename 'tést.xml' data = testzip.read(t1.filename) Traceback (most recent call last): File stdin, line 1, in module File C:\Python30\lib\zipfile.py, line 843, in read return self.open(name, r, pwd).read() File C:\Python30\lib\zipfile.py, line 883, in open % (zinfo.orig_filename, fname)) zipfile.BadZipfile: File name in directory 'tést.xml' and header b't\x82st.xml' differ. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4621] zipfile returns string but expects binary
New submission from Francesco Ricciardi [EMAIL PROTECTED]: Each entry of a zip file, as read by the zipfile module, can be accessed via a ZipInfo object. The filename attribute of ZipInfo is a string. However, the read method of a ZipFile object expects a binary as argument, or at least this is what I can deduct from the following behavior: import zipfile testzip = zipfile.ZipFile('test.zip') t1 = testzip.infolist()[0] t1.filename 'tést.xml' data = testzip.read(testzip.infolist()[0]) Traceback (most recent call last): File stdin, line 1, in module File C:\Python30\lib\zipfile.py, line 843, in read return self.open(name, r, pwd).read() File C:\Python30\lib\zipfile.py, line 883, in open % (zinfo.orig_filename, fname)) zipfile.BadZipfile: File name in directory 'tést.xml' and header b't\x82st.xml' differ. The test.zip file is attached as help in reproducing this error. -- components: Library (Lib) files: test.zip messages: 77555 nosy: francescor severity: normal status: open title: zipfile returns string but expects binary type: behavior versions: Python 3.0 Added file: http://bugs.python.org/file12319/test.zip ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue4621 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com