Re: memory error with zipfile module
On 20/05/06, Bruno Desthuilliers [EMAIL PROTECTED] wrote: Roger Miller a écrit : The basic problem is that the zipfile interface only reads and writes whole files, so it may perform poorly or fail on huge files. At one time I implemented a patch to allow reading files in chunks. However I believe that the current interface has too many problems to solve by incremental patching,Yeps, that was the general tone of some thread on python-dev. And fromwhat I saw of the source code, it may effectively not be the cleanest part of the stdlib. But what, it does what it was written for at first :provide working support for zipped packages. and that a zipfile2 module is probably warranted. (Is anyone working on this?) Seems like Bob Ippolito was on the rank, but I guess you'll get betteranswers on python-dev. In the meantime I think the best solution is often to just run an external zip/unzip utility to do the heavy lifting. Indeed !-)But while having zip/unzip installed OOTB on a unix-like system is closeto warrented, it may not be the case on Windows.-- http://mail.python.org/mailman/listinfo/python-listShame, I would like to try to improve this but seeing as Roger Miller has already submitted a patch I don't know how much I can do for this. In the end I resorted to using an external zip utility via os.system().I'll be interested to know if there is any work done on improving this as I'm in favour of native python usage, rather than using os.system() and relying on the operating system having a zip command, which I'm not convinced is the case on all windows machines, and also, I'm sure gentoo installs don't have zip by default, since I had to emerge it on a server for this script to work. Is it me or is having to use os.system() all the time symtomatic of a deficiency/things which are missing from python as a language? Not that I'm complaining, I'm just curious... I'm a fledgeling programmer so I don't mind being gently corrected by any veterans around. Hari -- http://mail.python.org/mailman/listinfo/python-list
Re: memory error with zipfile module
Hari Sekhon wrote: Is it me or is having to use os.system() all the time symtomatic of a deficiency/things which are missing from python as a language? it's you. /F -- http://mail.python.org/mailman/listinfo/python-list
Re: memory error with zipfile module
Fredrik Lundh wrote: Hari Sekhon wrote: Is it me or is having to use os.system() all the time symtomatic of a deficiency/things which are missing from python as a language? it's you. /F I take it that it's still a work in progress to be able to pythonify everything, and until then we're just gonna have to rely on shell and those great C coded coreutils and stuff like that. Ok, I'm rather fond of Bash+coreutils, highest ratio of code lines to work I've ever seen it's the real strength of Linux. Shame about Windows... -- http://mail.python.org/mailman/listinfo/python-list
Re: memory error with zipfile module
Hari Sekhon wrote: I take it that it's still a work in progress to be able to pythonify everything, and until then we're just gonna have to rely on shell and those great C coded coreutils and stuff like that. Ok, I'm rather fond of Bash+coreutils, highest ratio of code lines to work I've ever seen it's the real strength of Linux. Shame about Windows... you make very little sense. /F -- http://mail.python.org/mailman/listinfo/python-list
Re: memory error with zipfile module
Fredrik Lundh wrote: Hari Sekhon wrote: I take it that it's still a work in progress to be able to pythonify everything, and until then we're just gonna have to rely on shell and those great C coded coreutils and stuff like that. Ok, I'm rather fond of Bash+coreutils, highest ratio of code lines to work I've ever seen it's the real strength of Linux. Shame about Windows... you make very little sense. /F how so, this is effectively what we do when we run os.system(). Usually people are running system commands on unix like machines using coreutils written in C to do things that are either difficult or near impossible to do in python... I've seen people using everything from zip to touch, either out of laziness or out of the fact it wouldn't work very well in python, this zip case is a good example. Sometimes when doing system scripts, they're effectively Bash scripting, but taking longer to do in many more lines of code cos it's in python. That makes very little sense. -- http://mail.python.org/mailman/listinfo/python-list
Re: memory error with zipfile module
Hari Sekhon wrote: I've seen people using everything from zip to touch, either out of laziness or out of the fact it wouldn't work very well in python, this zip case is a good example. so based on a limitation in one library, and some random code you've seen on the internet, you're making generalizations about the language ? the zip case is a pretty lousy example, btw; after all, using the existing API, it's not that hard to implement an *incremental* read function if the provided read-into-string version isn't sufficient: import zipfile, zlib ## # Given a 'zip' instance, copy data from the 'name' to the # 'out' stream. def explode(out, zip, name): zinfo = zip.getinfo(name) if zinfo.compress_type == zipfile.ZIP_STORED: decoder = None elif zinfo.compress_type == zipfile.ZIP_DEFLATED: decoder = zlib.decompressobj(-zlib.MAX_WBITS) else: raise zipfile.BadZipFile(unsupported compression method) zip.fp.seek(zinfo.file_offset) size = zinfo.compress_size while 1: data = zip.fp.read(min(size, 8192)) if not data: break size -= len(data) if decoder: data = decoder.decompress(data) out.write(data) if decoder: out.write(decoder.decompress('Z')) out.write(decoder.flush()) /F -- http://mail.python.org/mailman/listinfo/python-list
memory error with zipfile module
I do import zipfile zip=zipfile.ZipFile('d:\somepath\cdimage.zip') zip.namelist() ['someimage.iso'] then either of the two: A) file('someimage.iso','w').write(zip.read('someimage.iso')) or B) content=zip.read('someimage.iso') but both result in the same error: Traceback (most recent call last): File stdin, line 1, in ? File D:\u\Python24\lib\zipfile.py, line 357, in read bytes = dc.decompress(bytes) MemoryError I thought python was supposed to handle memory for you? The python zipfile module is obviously broken... Any advise? -- http://mail.python.org/mailman/listinfo/python-list
Re: memory error with zipfile module
Hari Sekhon [EMAIL PROTECTED] writes: Traceback (most recent call last): File stdin, line 1, in ? File D:\u\Python24\lib\zipfile.py, line 357, in read bytes = dc.decompress(bytes) MemoryError Looks like the .iso file is huge. Even if it's only a CD image (approx 650MB), reading it all into memory in a single string is not a good idea. The python zipfile module is obviously broken... Indeed. I am surprised that there is no API that returns a file object. Ganesan -- Ganesan Rajagopal -- http://mail.python.org/mailman/listinfo/python-list
Re: memory error with zipfile module
Hari Sekhon wrote: I do import zipfile zip=zipfile.ZipFile('d:\somepath\cdimage.zip') zip.namelist() ['someimage.iso'] then either of the two: A) file('someimage.iso','w').write(zip.read('someimage.iso')) or B) content=zip.read('someimage.iso') but both result in the same error: Traceback (most recent call last): File stdin, line 1, in ? File D:\u\Python24\lib\zipfile.py, line 357, in read bytes = dc.decompress(bytes) MemoryError otIs that the *full* traceback ?/ot I thought python was supposed to handle memory for you? Err... This doesn't mean that it bypasses system's memory management. http://pyref.infogami.com/MemoryError http://mail.zope.org/pipermail/zope/2004-October/153882.html MemoryError is raised by Python when an underlying (OS-level) allocation fails. (...) Normally this would mean that you were out of even virtual memory (swap), but it could also be a symptom of a libc bug, a bad RAM chip, etc. What do you think will append if you try to allocate a huge block when you've already eaten all available memory ? Do you really hope that Python will give you extra ram for free ?-) Please try this code: import zipfile zip=zipfile.ZipFile('d:\somepath\cdimage.zip') info = zip.getinfo('someimage.iso') csize = info.compress_size fsize = info.file_size print someimage compressed size is : %s % csize print someimage real file size is : %s % fsize print So, knowing how zipfile.read() is actually implemented, total needed ram is : %s % (csize + fsize) print Well... Do I have that much memory available ??? The python zipfile module is obviously broken... s/is obviously broken/could be improved to handle huge files/ Making such statements may not be the best way to make friends... Any advise? Yes : Python is free software ('free' as in 'free speach' *and* as in 'free beer'), mostly written by benevolent contributors. So try and improve the zipfile module by yourself, and submit your enhancements. Then we all will be very grateful, and your name will be forever in the Python Hall of Fame. Or choose to behave as a whiny-whiny clueless luser making dumb statements, and your name will be buried for a long long time in a lot of killfiles. It's up to you !-) NB : If you go the first route, this may help: http://www.python.org/doc/2.4.2/lib/module-zlib.html with particular attention to the decompressobj. HTH -- bruno desthuilliers python -c print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in '[EMAIL PROTECTED]'.split('@')]) -- http://mail.python.org/mailman/listinfo/python-list
Re: memory error with zipfile module
Take a look at the pywin32 extension, which I believe has some lower level memory allocation and file capabilities that might help you in this situation. If I'm completely wrong, someone please tell me XD. Of course, you could just make the read() a step process, reading, O lets say 8192 bytes at a time (could be bigger if u want), writes them to the new file, and then reads the next portion. This will be slower (not sure how much) than if you had some AMD X2 64 with 3 gigs of ram and could just read the file all at once, but it should work. -- http://mail.python.org/mailman/listinfo/python-list
Re: memory error with zipfile module
[EMAIL PROTECTED] wrote: Take a look at the pywin32 extension, which I believe has some lower level memory allocation and file capabilities that might help you in this situation. But then the solution would not be portable, which would be a shame since the zlib module (on which ZipFile relies for compression / decompression) already has everything needed to handle streams. -- bruno desthuilliers python -c print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in '[EMAIL PROTECTED]'.split('@')]) -- http://mail.python.org/mailman/listinfo/python-list
Re: memory error with zipfile module
Hari Sekhon [EMAIL PROTECTED] wrote: import zipfile zip=zipfile.ZipFile('d:\somepath\cdimage.zip') zip.namelist() ['someimage.iso'] [ ... ] B) content=zip.read('someimage.iso') Traceback (most recent call last): File stdin, line 1, in ? File D:\u\Python24\lib\zipfile.py, line 357, in read bytes = dc.decompress(bytes) MemoryError I thought python was supposed to handle memory for you? Yes, but it can't handle more memory than the operating system is prepared to give it. How big is cdimage.zip? How big is the uncompressed someimage.iso? How much memory do you have? The python zipfile module is obviously broken... This isn't at all obvious to me. -- \S -- [EMAIL PROTECTED] -- http://www.chaos.org.uk/~sion/ ___ | Frankly I have no feelings towards penguins one way or the other \X/ |-- Arthur C. Clarke her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump -- http://mail.python.org/mailman/listinfo/python-list
Re: memory error with zipfile module
bruno at modulix [EMAIL PROTECTED] wrote: http://mail.zope.org/pipermail/zope/2004-October/153882.html MemoryError is raised by Python when an underlying (OS-level) allocation fails. (...) Normally this would mean that you were out of even virtual memory (swap), but it could also be a symptom of a libc bug, a bad RAM chip, etc. There's another possibility, which I ran into recently. Which is a problem with physical+virtual memory exceding the space addressable by a process. So I've got 2G physical + 4G swap and I'm getting just such a memory error -- I'm sure the compressed + uncompressed data isn't going to eat all of that. But on a 32bit OS, it doesn't need to of course. 2G is quite enough to cause problems -- \S -- [EMAIL PROTECTED] -- http://www.chaos.org.uk/~sion/ ___ | Frankly I have no feelings towards penguins one way or the other \X/ |-- Arthur C. Clarke her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump -- http://mail.python.org/mailman/listinfo/python-list
Re: memory error with zipfile module
Sion Arrowsmith wrote: Hari Sekhon [EMAIL PROTECTED] wrote: (snip) The python zipfile module is obviously broken... This isn't at all obvious to me. zipfile.read() does not seem to take full advantage of zlib's decompressobj's features. This could perhaps be improved (left as an exercice to the OP, who is obviously very good at detecting broken memory management g). Also, there's a known bug with file headers beginning past 2+GB - which is not a very common case... http://sourceforge.net/tracker/index.php?func=detailaid=1189216group_id=5470atid=105470 So yes, there is actually something broken - but this has nothing to do with the OP problem - *and* there are actually some limitions (FWIW, the main goal of zipfile was mostly to implement support for zipped python packages, not to replace Winzip). But what, the OP is going to fix this, isn't he ?-) -- bruno desthuilliers python -c print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for p in '[EMAIL PROTECTED]'.split('@')]) -- http://mail.python.org/mailman/listinfo/python-list
Re: memory error with zipfile module
The basic problem is that the zipfile interface only reads and writes whole files, so it may perform poorly or fail on huge files. At one time I implemented a patch to allow reading files in chunks. However I believe that the current interface has too many problems to solve by incremental patching, and that a zipfile2 module is probably warranted. (Is anyone working on this?) In the meantime I think the best solution is often to just run an external zip/unzip utility to do the heavy lifting. -- http://mail.python.org/mailman/listinfo/python-list
Re: memory error with zipfile module
Roger Miller a écrit : The basic problem is that the zipfile interface only reads and writes whole files, so it may perform poorly or fail on huge files. At one time I implemented a patch to allow reading files in chunks. However I believe that the current interface has too many problems to solve by incremental patching, Yeps, that was the general tone of some thread on python-dev. And from what I saw of the source code, it may effectively not be the cleanest part of the stdlib. But what, it does what it was written for at first : provide working support for zipped packages. and that a zipfile2 module is probably warranted. (Is anyone working on this?) Seems like Bob Ippolito was on the rank, but I guess you'll get better answers on python-dev. In the meantime I think the best solution is often to just run an external zip/unzip utility to do the heavy lifting. Indeed !-) But while having zip/unzip installed OOTB on a unix-like system is close to warrented, it may not be the case on Windows. -- http://mail.python.org/mailman/listinfo/python-list