Patches item #1446489, was opened at 2006-03-09 06:58 Message generated for change (Comment added) made by greg You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1446489&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Submitted By: Ronald Oussoren (ronaldoussoren) Assigned to: Ronald Oussoren (ronaldoussoren) Summary: zipfile: support for ZIP64 Initial Comment: The attached patch implements support for ZIP64, that is zipfiles containing very large (>4GByte) files and zipfiles that are larger than 4GByte themselves. The output of this patch can be read by pkzip (see below for the actual version I used for testing). ---------------------------------------------------------------------- >Comment By: Gregory P. Smith (greg) Date: 2006-06-11 13:33 Message: Logged In: YES user_id=413 reading zipfile64-version64.patch: * why does the zipfile module import itself? * Why is the default ZIP64 limit 1 << 30? shouldn't that be 1 << 31 - 1 (or slightly less) for maximum compatibility on existing <2GiB zip files or zips with data just under 2GiB. Don't force zip64's use unless the size actually exceeds a 32bit signed integer. * assert diskno == 0 and assert nodisks == 1 should be turned into BadZipFile exceptions with an explanation that multi-disk zip files aren't supported. * in main() document the -t option in the usage string. * TestZip64InSmallFiles changes zipfile.ZIP64_LIMIT but will not restore the value if a test fails (that could lead to other unrelated test failures). not a problem in the hopefully normal case of all tests passing. use a try: finally: to make sure that gets reset. * documentation: "Is does optionally handle" is awkward. how about "It can handle" The removal of the file_offset attribute makes sense but does make me wonder how much existing code that could break. I suggest leaving file_offset out and if any python 2.5 beta tester complains, restoring it or making scanning to look file offsets up a ZipFile option (defaulting to True). ---------------------------------------------------------------------- Comment By: Ronald Oussoren (ronaldoussoren) Date: 2006-05-30 06:28 Message: Logged In: YES user_id=580910 I've added some more tests for pre-existing functionality. The unittests are still far from comprehensive, but at least touch upon most functionality of zipfile. Does anyone feel like reviewing this? I'd like to get this into python2.5. ---------------------------------------------------------------------- Comment By: Ronald Oussoren (ronaldoussoren) Date: 2006-05-26 01:26 Message: Logged In: YES user_id=580910 I've attached yet another version, this version reintroduces some functionalitity that was unintentionally removed and fixes a lame bug that caused test_zipimport to fail. ---------------------------------------------------------------------- Comment By: Ronald Oussoren (ronaldoussoren) Date: 2006-05-23 06:10 Message: Logged In: YES user_id=580910 I've found some time to work on this. I've added zipfile-zip64- version2.patch, this version: * Makes zip64 behaviour optional (defaults to off because zip(1) doesn't support zip64) * Is significantly faster for large zipfiles because it doesn't scan the entire zipfile just to check that the file headers are consistent with the central directory w.r.t. filename (this check is now done when trying to read a file) * Updates the reference documentation. * Adds unittests. There are two sets of tests: one set tests the behaviour of zip64 extensions using small files by lowering the zip64 cutoff point and is run every time, the other set do tests with huge zipfiles and are run when the largefile feature is enabled when running the tests. There one backward incompatible change: ZipInfo objects no longer have a file_offset attribute. That was the other reason for scanning the entire zipfile when opening it. IMNSHO this should have been a private attribute and the cost of this feature is not worth its *very* limited usefulness. As an indication of its cost: I got a 6x speedup when I removed the calculation of the file_offset attribute, something that adds up when you are dealing with huge zipfiles (I wrote this patch because I'm dealing with 10+GByte zipfiles with tens of thousands of files at work). I noticed that zipfile raises RuntimeError in some places. I've changed one of those to zipfile.BadZipfile, but others remain. I don't like this, most of them should be replaced by TypeError or ValueError exceptions. BTW. This patch also supports storing files >4GByte in the zipfile, but that feature isn't very useful because zipfile doesn't have an API for reading file data incrementally. ---------------------------------------------------------------------- Comment By: Ronald Oussoren (ronaldoussoren) Date: 2006-05-16 00:55 Message: Logged In: YES user_id=580910 I haven't had time to work on this, all time I had to work on python related stuff has been eaten by finishing PyObjC's port to intel macs and universal binary patches. The former is now done, the latter almost so I'll have some time to work on this again especially because I'm using this patch at work and might be able to claim some time to work on this during work-hours. ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-05-16 00:41 Message: Logged In: YES user_id=849994 Since 2.5 beta is coming close, have you made progress on the tests/docs? ---------------------------------------------------------------------- Comment By: Ronald Oussoren (ronaldoussoren) Date: 2006-04-02 12:13 Message: Logged In: YES user_id=580910 The "don't use the ZIP64 extension" flag is a good idea, zipfiles that use this extension aren't readable by the infozip tools (zip and unzip on most unix systems). I'll add tests and documentation in the near future. The version of zipfile that I'm currently using also contains a patch for speeding up the opening of zipfiles, for the type of files I'm dealing with (about 11GByte large with tens of thousands of files) the speedup is very significant. I suppose it's better to file that as a separate patch after this has been approved. ---------------------------------------------------------------------- Comment By: Anthony Baxter (anthonybaxter) Date: 2006-04-01 21:02 Message: Logged In: YES user_id=29957 I'd like to see a testcase and possibly a note for the documentation about the new semantics. Also, should it be possible to say "don't use the ZIP64 extension, instead raise an Error" for people who don't want to generate these? ---------------------------------------------------------------------- Comment By: Ronald Oussoren (ronaldoussoren) Date: 2006-03-09 07:28 Message: Logged In: YES user_id=580910 Oops, I've uploaded the wrong file. zipfile-zip64.patch is the correct one. I've tested the correctness of created archives using this version of pkzip: pkzipc -version PKZIP(R) Server Version 8 ZIP Compression Utility for Linux X86 Copyright (C) 1989-2005 PKWARE, Inc. All Rights Reserved. Evaluation Version PKZIP Reg. U.S. Pat. and Tm. Off. Patent No. 5,051,745 Patent Pending Version 8.40.66 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1446489&group_id=5470 _______________________________________________ Patches mailing list [email protected] http://mail.python.org/mailman/listinfo/patches
