New submission from Peter Landry:

`cgi.FieldStorage` can't parse a multipart with a `Content-Length` header set 
on a part:

```Python 3.4.3 (default, May 22 2015, 15:35:46)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.49)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cgi
>>> from io import BytesIO
>>>
>>> BOUNDARY = "JfISa01"
>>> POSTDATA = """--JfISa01
... Content-Disposition: form-data; name="submit-name"
... Content-Length: 5
...
... Larry
... --JfISa01"""
>>> env = {
...     'REQUEST_METHOD': 'POST',
...     'CONTENT_TYPE': 'multipart/form-data; boundary={}'.format(BOUNDARY),
...     'CONTENT_LENGTH': str(len(POSTDATA))}
>>> fp = BytesIO(POSTDATA.encode('latin-1'))
>>> fs = cgi.FieldStorage(fp, environ=env, encoding="latin-1")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File 
"/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py",
 line 571, in __init__
    self.read_multi(environ, keep_blank_values, strict_parsing)
  File 
"/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py",
 line 726, in read_multi
    self.encoding, self.errors)
  File 
"/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py",
 line 573, in __init__
    self.read_single()
  File 
"/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py",
 line 736, in read_single
    self.read_binary()
  File 
"/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/cgi.py",
 line 758, in read_binary
    self.file.write(data)
TypeError: must be str, not bytes
>>>
```

This happens because of a mismatch between the code that creates a temp file to 
write to and the code that chooses to read in binary mode or not:

* the presence of `filename` in the `Content-Disposition` header triggers 
creation of a binary mode file
* the present of a `Content-Length` header for the part triggers a binary read

When `Content-Length` is present but `filename` is absent, `bytes` are written 
to the non-binary temp file, causing the error above.

I've reviewed the relevant RFCs, and I'm not really sure what the correct way 
to handle this is. I don't believe `Content-Length` is addressed for part 
bodies in the MIME spec[0], and HTTP has its own semantics[1].

At the very least, I think this behavior is confusing and unexpected. Some 
libraries, like Retrofit[2], will by default include `Content-Length`, and 
break when submitting POST data to a python server.

I've made an attempt to work in the way I'd expect, and attached a patch, but 
I'm really not sure if it's the proper decision. My patch kind of naively 
accepts the existing semantics of `Content-Length` that presume bytes, and 
treats the creation of a non-binary file as the "bug".

[0]: http://www.ietf.org/rfc/rfc2045.txt
[1]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4
[2]: http://square.github.io/retrofit/

----------
components: Library (Lib)
files: cgi_multipart.patch
keywords: patch
messages: 247751
nosy: Peter Landry, haypo
priority: normal
severity: normal
status: open
title: cgi.FieldStorage can't parse multipart part headers with Content-Length 
and no filename in Content-Disposition
versions: Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6
Added file: http://bugs.python.org/file40084/cgi_multipart.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24764>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to