Patches item #1446489, was opened at 2006-03-09 15:58 Message generated for change (Comment added) made by ronaldoussoren You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1446489&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Submitted By: Ronald Oussoren (ronaldoussoren) Assigned to: Ronald Oussoren (ronaldoussoren) Summary: zipfile: support for ZIP64 Initial Comment: The attached patch implements support for ZIP64, that is zipfiles containing very large (>4GByte) files and zipfiles that are larger than 4GByte themselves. The output of this patch can be read by pkzip (see below for the actual version I used for testing). ---------------------------------------------------------------------- >Comment By: Ronald Oussoren (ronaldoussoren) Date: 2006-05-23 15:10 Message: Logged In: YES user_id=580910 I've found some time to work on this. I've added zipfile-zip64- version2.patch, this version: * Makes zip64 behaviour optional (defaults to off because zip(1) doesn't support zip64) * Is significantly faster for large zipfiles because it doesn't scan the entire zipfile just to check that the file headers are consistent with the central directory w.r.t. filename (this check is now done when trying to read a file) * Updates the reference documentation. * Adds unittests. There are two sets of tests: one set tests the behaviour of zip64 extensions using small files by lowering the zip64 cutoff point and is run every time, the other set do tests with huge zipfiles and are run when the largefile feature is enabled when running the tests. There one backward incompatible change: ZipInfo objects no longer have a file_offset attribute. That was the other reason for scanning the entire zipfile when opening it. IMNSHO this should have been a private attribute and the cost of this feature is not worth its *very* limited usefulness. As an indication of its cost: I got a 6x speedup when I removed the calculation of the file_offset attribute, something that adds up when you are dealing with huge zipfiles (I wrote this patch because I'm dealing with 10+GByte zipfiles with tens of thousands of files at work). I noticed that zipfile raises RuntimeError in some places. I've changed one of those to zipfile.BadZipfile, but others remain. I don't like this, most of them should be replaced by TypeError or ValueError exceptions. BTW. This patch also supports storing files >4GByte in the zipfile, but that feature isn't very useful because zipfile doesn't have an API for reading file data incrementally. ---------------------------------------------------------------------- Comment By: Ronald Oussoren (ronaldoussoren) Date: 2006-05-16 09:55 Message: Logged In: YES user_id=580910 I haven't had time to work on this, all time I had to work on python related stuff has been eaten by finishing PyObjC's port to intel macs and universal binary patches. The former is now done, the latter almost so I'll have some time to work on this again especially because I'm using this patch at work and might be able to claim some time to work on this during work-hours. ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-05-16 09:41 Message: Logged In: YES user_id=849994 Since 2.5 beta is coming close, have you made progress on the tests/docs? ---------------------------------------------------------------------- Comment By: Ronald Oussoren (ronaldoussoren) Date: 2006-04-02 21:13 Message: Logged In: YES user_id=580910 The "don't use the ZIP64 extension" flag is a good idea, zipfiles that use this extension aren't readable by the infozip tools (zip and unzip on most unix systems). I'll add tests and documentation in the near future. The version of zipfile that I'm currently using also contains a patch for speeding up the opening of zipfiles, for the type of files I'm dealing with (about 11GByte large with tens of thousands of files) the speedup is very significant. I suppose it's better to file that as a separate patch after this has been approved. ---------------------------------------------------------------------- Comment By: Anthony Baxter (anthonybaxter) Date: 2006-04-02 07:02 Message: Logged In: YES user_id=29957 I'd like to see a testcase and possibly a note for the documentation about the new semantics. Also, should it be possible to say "don't use the ZIP64 extension, instead raise an Error" for people who don't want to generate these? ---------------------------------------------------------------------- Comment By: Ronald Oussoren (ronaldoussoren) Date: 2006-03-09 16:28 Message: Logged In: YES user_id=580910 Oops, I've uploaded the wrong file. zipfile-zip64.patch is the correct one. I've tested the correctness of created archives using this version of pkzip: pkzipc -version PKZIP(R) Server Version 8 ZIP Compression Utility for Linux X86 Copyright (C) 1989-2005 PKWARE, Inc. All Rights Reserved. Evaluation Version PKZIP Reg. U.S. Pat. and Tm. Off. Patent No. 5,051,745 Patent Pending Version 8.40.66 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1446489&group_id=5470 _______________________________________________ Patches mailing list [email protected] http://mail.python.org/mailman/listinfo/patches
