New submission from Peter Landry: `cgi.FieldStorage` can't parse a multipart with a `Content-Length` header set on a part:
```Python 3.4.3 (default, May 22 2015, 15:35:46) [GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.49)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import cgi >>> from io import BytesIO >>> >>> BOUNDARY = "JfISa01" >>> POSTDATA = """--JfISa01 ... Content-Disposition: form-data; name="submit-name" ... Content-Length: 5 ... ... Larry ... --JfISa01""" >>> env = { ... 'REQUEST_METHOD': 'POST', ... 'CONTENT_TYPE': 'multipart/form-data; boundary={}'.format(BOUNDARY), ... 'CONTENT_LENGTH': str(len(POSTDATA))} >>> fp = BytesIO(POSTDATA.encode('latin-1')) >>> fs = cgi.FieldStorage(fp, environ=env, encoding="latin-1") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 571, in __init__ self.read_multi(environ, keep_blank_values, strict_parsing) File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 726, in read_multi self.encoding, self.errors) File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 573, in __init__ self.read_single() File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 736, in read_single self.read_binary() File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py", line 758, in read_binary self.file.write(data) TypeError: must be str, not bytes >>> ``` This happens because of a mismatch between the code that creates a temp file to write to and the code that chooses to read in binary mode or not: * the presence of `filename` in the `Content-Disposition` header triggers creation of a binary mode file * the present of a `Content-Length` header for the part triggers a binary read When `Content-Length` is present but `filename` is absent, `bytes` are written to the non-binary temp file, causing the error above. I've reviewed the relevant RFCs, and I'm not really sure what the correct way to handle this is. I don't believe `Content-Length` is addressed for part bodies in the MIME spec[0], and HTTP has its own semantics[1]. At the very least, I think this behavior is confusing and unexpected. Some libraries, like Retrofit[2], will by default include `Content-Length`, and break when submitting POST data to a python server. I've made an attempt to work in the way I'd expect, and attached a patch, but I'm really not sure if it's the proper decision. My patch kind of naively accepts the existing semantics of `Content-Length` that presume bytes, and treats the creation of a non-binary file as the "bug". [0]: http://www.ietf.org/rfc/rfc2045.txt [1]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4 [2]: http://square.github.io/retrofit/ ---------- components: Library (Lib) files: cgi_multipart.patch keywords: patch messages: 247751 nosy: Peter Landry, haypo priority: normal severity: normal status: open title: cgi.FieldStorage can't parse multipart part headers with Content-Length and no filename in Content-Disposition versions: Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6 Added file: http://bugs.python.org/file40084/cgi_multipart.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue24764> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com