[issue42096] zipfile.is_zipfile incorrectly identifying a gzipped file as a zip archive

2020-10-27 Thread Brian Kohan


Brian Kohan  added the comment:

I concur with Gregory. It seems that the action here is to just make it 
apparent in the docs the very real possibility of false positives.

In my experience processing data from the wild, I see a pretty high rate of 
about 1/1000. I'm sure the probability is a function of the types of files I'm 
working with. But in any case, is_zipfile can't be made to be sufficient in and 
of itself for reliably identifying zip files. It still has utility in weeding 
out true negatives though. In my case I don't ever expect to see a self 
extracting file or a file compounded into an executable so I use the results of 
is_zipfile as well as a manual check of the magic bytes at the start. So far so 
good.

--

___
Python tracker 
<https://bugs.python.org/issue42096>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42096] zipfile.is_zipfile incorrectly identifying a gzipped file as a zip archive

2020-10-22 Thread Brian Kohan


Brian Kohan  added the comment:

Hi all,

I'm experiencing the same issue. I took a look at the is_zipfile code - seems 
like its not checking the start of the file for the magic numbers, but looking 
deeper in. I presume because the magic numbers at the start are considered 
unreliable for some reason? Seems like this opens the check up to unlucky 
random false positives though.

Offending file:

https://www.dropbox.com/s/t2kafn6ek1m2huy/CHPI_Rinex.crx?dl=1

--
nosy: +bckohan
versions: +Python 3.7, Python 3.8

___
Python tracker 
<https://bugs.python.org/issue42096>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com