On 6/10/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > So the general idea is that at least directory filename has some sort
> > of convention of using oem (dos, console) encoding on Windows, cp866
> > in my case. Header filenames have different encodings, and seem to be
> > ignored.
> Ok, then this is what the zipfile module should implement.

But this is only on Windows! I have no clue what's the common
situation on other OSes and don't even know how to sanely get OEM
codepage on Windows (the obvious way with ctypes.kernel32.GetOEMCP()
doesn't seem good to me).

So I guess that's bad idea anyway, maybe conforming to language bit is
better (ascii will stay ascii anyway).

What about this?

Index: Lib/zipfile.py
===================================================================
--- Lib/zipfile.py      (revision 55850)
+++ Lib/zipfile.py      (working copy)
@@ -252,6 +252,7 @@
             self.extract_version = max(45, self.extract_version)
             self.create_version = max(45, self.extract_version)

+        self._encodeFilename()
         header = struct.pack(structFileHeader, stringFileHeader,
                  self.extract_version, self.reserved, self.flag_bits,
                  self.compress_type, dostime, dosdate, CRC,
@@ -259,6 +260,16 @@
                  len(self.filename), len(extra))
         return header + self.filename + extra

+    def _encodeFilename(self):
+        if isinstance(self.filename, unicode):
+            self.filename = self.filename.encode('utf-8')
+            self.flag_bits = self.flag_bits | 0x800
+
+    def _decodeFilename(self):
+        if self.flag_bits & 0x800:
+            self.filename = self.filename.decode('utf-8')
+            self.flag_bits = self.flag_bits & ~0x800
+
     def _decodeExtra(self):
         # Try to decode the extra field.
         extra = self.extra
@@ -683,6 +694,7 @@
                                      t>>11, (t>>5)&0x3F, (t&0x1F) * 2 )

             x._decodeExtra()
+            x._decodeFilename()
             x.header_offset = x.header_offset + concat
             self.filelist.append(x)
             self.NameToInfo[x.filename] = x
@@ -967,6 +979,7 @@
                     extract_version = zinfo.extract_version
                     create_version = zinfo.create_version

+                zinfo._encodeFilename()
                 centdir = struct.pack(structCentralDir,
                   stringCentralDir, create_version,
                   zinfo.create_system, extract_version, zinfo.reserved,
Index: Lib/test/test_zipfile.py
===================================================================
--- Lib/test/test_zipfile.py    (revision 55850)
+++ Lib/test/test_zipfile.py    (working copy)
@@ -515,6 +515,11 @@
         # and report that the first file in the archive was corrupt.
         self.assertRaises(RuntimeError, zipf.testzip)

+    def testUnicodeFilenames(self):
+        zf = zipfile.ZipFile(TESTFN, "w")
+        zf.writestr(u"foo.txt", "Test for unicode filename")
+        zf.close()
+
     def tearDown(self):
         support.unlink(TESTFN)
         support.unlink(TESTFN2)

The problem is that I don't know if anything actually supports bit 11
at the time and can't even tell if I did this correctly or not. :(
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to