Sorry my initial post was muddled. Let me try again. I've got a zipped archive that I can extract files from with my standard archive unzipping program, 7-zip. I'd like to extract the files in python via the zipfile module. However, when I extract the file from the archive with ZipFile.read(), it isn't the same as the 7- zip-extracted file. For text files, the zipfile-extracted version has '\r\n' everywhere the 7-zip-extracted file only has '\n'. I haven't tried comparing binary files via the two extraction methods yet.
Regarding the code I posted; I was writing it from memory, and made a mistake. I didn't use: z = zipfile.ZipFile(open('foo.zip', 'r')) I used this: z = zipfile.ZipFile('foo.zip') But Duncan's comment was useful, as I generally only ever work with text files, and I didn't realise you have to use 'rb' or 'wb' options when reading and writing binary files. To answer John's questions - I was calling '\r' a newline. I should have said carriage return. I'm not sure what operating system the original zip file was created on. I didn't fiddle with the extracted file contents, other than replacing '\r' with ''. I wrote out all the files with open('outputfile','w') - I seems that I should have been using 'wb' when writing out the binary files. Thanks for the quick responses - any ideas why the zipfile-extracted files and 7-zip-extracted files are different? On Mar 10, 9:37 pm, John Machin <[EMAIL PROTECTED]> wrote: > On Mar 10, 11:14 pm, Duncan Booth <[EMAIL PROTECTED]> > wrote: > > > > > "Neil Crighton" <[EMAIL PROTECTED]> wrote: > > > I'm using the zipfile library to read a zip file in Windows, and it > > > seems to be adding too many newlines to extracted files. I've found > > > that for extracted text-encoded files, removing all instances of '\r' > > > in the extracted file seems to fix the problem, but I can't find an > > > easy solution for binary files. > > > > The code I'm using is something like: > > > > from zipfile import Zipfile > > > z = Zipfile(open('zippedfile.zip')) > > > extractedfile = z.read('filename_in_zippedfile') > > > > I'm using Python version 2.5. Has anyone else had this problem > > > before, or know how to fix it? > > > > Thanks, > > > Zip files aren't text. Try opening the zipfile file in binary mode: > > > open('zippedfile.zip', 'rb') > > Good pickup, but that indicates that the OP may have *TWO* problems, > the first of which is not posting the code that was actually executed. > > If the OP actually executed the code that he posted, it is highly > likely to have died in a hole long before it got to the z.read() > stage, e.g. > > >>> import zipfile > >>> z = zipfile.ZipFile(open('foo.zip')) > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "C:\python25\lib\zipfile.py", line 346, in __init__ > self._GetContents() > File "C:\python25\lib\zipfile.py", line 366, in _GetContents > self._RealGetContents() > File "C:\python25\lib\zipfile.py", line 404, in _RealGetContents > centdir = struct.unpack(structCentralDir, centdir) > File "C:\python25\lib\struct.py", line 87, in unpack > return o.unpack(s) > struct.error: unpack requires a string argument of length 46 > > >>> z = zipfile.ZipFile(open('foo.zip', 'rb')) # OK > >>> z = zipfile.ZipFile('foo.zip', 'r') # OK > > If it somehow made it through the open stage, it surely would have > blown up at the read stage, when trying to decompress a contained > file. > > Cheers, > John -- http://mail.python.org/mailman/listinfo/python-list