New submission from Kevin Hendricks <kevin.hendri...@sympatico.ca>:

The current version of zipfile.py is not robust to slight errors at the end of 
zip archives.  Many file servers **improperly** append a new line to the end of 
files that do not have a new line when they are uploaded from a browser.  This 
bug ends up adding 0x0d 0xa to the end of the zip archive.  This in turn makes 
zipfile.py eventually throw a "Not a zip file" exception when no other zip 
tools seem to have trouble with them.  Even unzip -t passes these "problem" zip 
archives with flying colours.

I hate to have to extract and create my own zipfile.py script just to be robust 
to zip archives that are commonly found on the net and that are handled more 
robustly by other software.

So please consider changing this code from _EndRecData below to simply ignore 
any trailing data after the proper stringEndArchive and structEndArchive are 
found instead of looking for the comment and verifying if the comment is 
properly formatted and throwing an exception if not correct.  Ignoring the 
"comment" seems to be more robust in this case as everything needed to unpack 
the zip archive has been found.


    # Either this is not a ZIP file, or it is a ZIP file with an archive
    # comment.  Search the end of the file for the "end of central directory"
    # record signature. The comment is the last item in the ZIP file and may be
    # up to 64K long.  It is assumed that the "end of central directory" magic
    # number does not appear in the comment.
    maxCommentStart = max(filesize - (1 << 16) - sizeEndCentDir, 0)
    fpin.seek(maxCommentStart, 0)
    data = fpin.read()
    start = data.rfind(stringEndArchive)
    if start >= 0:
        # found the magic number; attempt to unpack and interpret
        recData = data[start:start+sizeEndCentDir]
        endrec = list(struct.unpack(structEndArchive, recData))
        comment = data[start+sizeEndCentDir:]
        # check that comment length is correct
        if endrec[_ECD_COMMENT_SIZE] == len(comment):
            # Append the archive comment and start offset
            endrec.append(comment)
            endrec.append(maxCommentStart + start)
            if endrec[_ECD_OFFSET] == 0xffffffff:
                # There is apparently a "Zip64 end of central directory"
                # structure present, so go look for it
                return _EndRecData64(fpin, start - filesize, endrec)
            return endrec


This will in turn make the Python implementation of zipfile.py more robust to 
data improperly appended when some zip archives are uploaded or downloaded 
(similar to how other zip tools handle this issue).

Thank you for your time and consideration.

----------
messages: 123891
nosy: KevinH
priority: normal
severity: normal
status: open
title: zipfile.py end of central directory detection not robust
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10694>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to