On 6/10/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > So the general idea is that at least directory filename has some sort > > of convention of using oem (dos, console) encoding on Windows, cp866 > > in my case. Header filenames have different encodings, and seem to be > > ignored. > Ok, then this is what the zipfile module should implement.
But this is only on Windows! I have no clue what's the common situation on other OSes and don't even know how to sanely get OEM codepage on Windows (the obvious way with ctypes.kernel32.GetOEMCP() doesn't seem good to me). So I guess that's bad idea anyway, maybe conforming to language bit is better (ascii will stay ascii anyway). What about this? Index: Lib/zipfile.py =================================================================== --- Lib/zipfile.py (revision 55850) +++ Lib/zipfile.py (working copy) @@ -252,6 +252,7 @@ self.extract_version = max(45, self.extract_version) self.create_version = max(45, self.extract_version) + self._encodeFilename() header = struct.pack(structFileHeader, stringFileHeader, self.extract_version, self.reserved, self.flag_bits, self.compress_type, dostime, dosdate, CRC, @@ -259,6 +260,16 @@ len(self.filename), len(extra)) return header + self.filename + extra + def _encodeFilename(self): + if isinstance(self.filename, unicode): + self.filename = self.filename.encode('utf-8') + self.flag_bits = self.flag_bits | 0x800 + + def _decodeFilename(self): + if self.flag_bits & 0x800: + self.filename = self.filename.decode('utf-8') + self.flag_bits = self.flag_bits & ~0x800 + def _decodeExtra(self): # Try to decode the extra field. extra = self.extra @@ -683,6 +694,7 @@ t>>11, (t>>5)&0x3F, (t&0x1F) * 2 ) x._decodeExtra() + x._decodeFilename() x.header_offset = x.header_offset + concat self.filelist.append(x) self.NameToInfo[x.filename] = x @@ -967,6 +979,7 @@ extract_version = zinfo.extract_version create_version = zinfo.create_version + zinfo._encodeFilename() centdir = struct.pack(structCentralDir, stringCentralDir, create_version, zinfo.create_system, extract_version, zinfo.reserved, Index: Lib/test/test_zipfile.py =================================================================== --- Lib/test/test_zipfile.py (revision 55850) +++ Lib/test/test_zipfile.py (working copy) @@ -515,6 +515,11 @@ # and report that the first file in the archive was corrupt. self.assertRaises(RuntimeError, zipf.testzip) + def testUnicodeFilenames(self): + zf = zipfile.ZipFile(TESTFN, "w") + zf.writestr(u"foo.txt", "Test for unicode filename") + zf.close() + def tearDown(self): support.unlink(TESTFN) support.unlink(TESTFN2) The problem is that I don't know if anything actually supports bit 11 at the time and can't even tell if I did this correctly or not. :( _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com