Gregory P. Smith <g...@krypto.org> added the comment:

for what it's worth: false positives are always going to be possible in any 
such "magic" check as is_zipfile is.

we don't check the start of the file because zip files are defined by their end 
of file central directory which contains length information to determine where 
within the file the zip archive actually starts.

The issue28494 tests are a demonstration of this; It is somewhat common 
practice to append a zipfile to an executable of various forms for use as 
application specific data.

If you need more more reliable determination of file type not tied to a 
specific Python release, you might look at what the various file type sniffing 
magic libraries do for you, some examples include:
 https://pypi.org/project/filetype/
 https://pypi.org/project/puremagic/
 https://pypi.org/project/python-magic/

I _can_ reproduce this issue with the testdata @bckohan provided.

But I can't promise there is anything to fix here.  Even if we make the test 
slightly more robust by looking at another byte or two, it is always possible 
for files to appear to be a bunch of things at once based on small data 
signatures.

If nothing else we should reinforce in the documentation that is_zipfile is at 
best a guess.  False means it is not as far as the zipfile module is concerned. 
 True cannot guarantee that it is.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42096>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to