Peter Santoro added the comment:

It seems clear to me that the logic in shutil._unpack_zipfile that silently 
skips paths that start with '/' (indicates absolute path) or that contain 
references to the parent directory ('..') was added to prevent malicious zip 
files from making potential malicious/unwanted modifications to the filesystem 
(perhaps at a time when zipfile did not itself contain such logic).  This 
conservative approach works, but it can have unexpected results.  For example, 
if all entries in a zip file contain these invalid characters, then 
shutil._unpack_zipfile appears to do nothing (i.e. the zip file is not 
unpacked).  This is good (except for the silent part), if the zip file is truly 
malicious.  However, I recently had to deal with thousands of zip files created 
by well known software vendors where hundreds of the zip files were created 
incorrectly and contained these invalid characters.  These files were not 
malicious, but they were created improperly. Note that shutil._unpack_zipfile 
silently fai
 led to unzip these files, but by using ZipFile.extractall I could unzip them.

It appears that most unzipping software today either either ignores (sometimes 
silently) potentially malicious zip entries (e.g. Windows 7 Explorer displays 
an invalid zip file error) or it attempts to filter out/replace known bad 
characters so that the zip entries can be extracted (e.g. WinZip, gnu unzip).  
I created this issue because the Python library uses both approaches, which may 
need rethinking.

The newer logic in ZipFile._extract_member, which is used by 
ZipFile.extractall, takes a different approach.  Instead of silently ignoring 
potentially malicious zip entries, it attempts to filter out or replace known 
invalid characters before extracting the zip entries.

>From the Python zipfile docs:
---
If a member filename is an absolute path, a drive/UNC sharepoint and leading 
(back)slashes will be stripped, e.g.: ///foo/bar becomes foo/bar on Unix, and 
C:\foo\bar becomes foo\bar on Windows. And all ".." components in a member 
filename will be removed, e.g.: ../../foo../../ba..r becomes foo../ba..r. On 
Windows illegal characters (:, <, >, |, ", ?, and *) replaced by underscore (_).
---

As ZipFile._extract_member filters out/replaces more invalid characters than 
shutil._unpack_zipfile handles, one could argue that the (apparent older) 
approach used by shutil._unpack_zipfile is less safe.

The approaches used by shutil._unpack_zipfile and ZipFile.extractall to deal 
with potentially malicious zip file entries are different.  This issue could be 
closed if not deemed important by the Python core developers or it could be 
handled by documentation and/or coding changes.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20907>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to