[issue42096] zipfile.is_zipfile incorrectly identifying a gzipped file as a zip archive
Brian Kohan added the comment: I concur with Gregory. It seems that the action here is to just make it apparent in the docs the very real possibility of false positives. In my experience processing data from the wild, I see a pretty high rate of about 1/1000. I'm sure the probability is a function of the types of files I'm working with. But in any case, is_zipfile can't be made to be sufficient in and of itself for reliably identifying zip files. It still has utility in weeding out true negatives though. In my case I don't ever expect to see a self extracting file or a file compounded into an executable so I use the results of is_zipfile as well as a manual check of the magic bytes at the start. So far so good. -- ___ Python tracker <https://bugs.python.org/issue42096> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42096] zipfile.is_zipfile incorrectly identifying a gzipped file as a zip archive
Brian Kohan added the comment: Hi all, I'm experiencing the same issue. I took a look at the is_zipfile code - seems like its not checking the start of the file for the magic numbers, but looking deeper in. I presume because the magic numbers at the start are considered unreliable for some reason? Seems like this opens the check up to unlucky random false positives though. Offending file: https://www.dropbox.com/s/t2kafn6ek1m2huy/CHPI_Rinex.crx?dl=1 -- nosy: +bckohan versions: +Python 3.7, Python 3.8 ___ Python tracker <https://bugs.python.org/issue42096> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com