On Wed, 15 Feb 2017 at 08:14 Ben Hoyt <benh...@gmail.com> wrote: > I posted this on StackOverflow [1], but I'm posting it here as well, as I > believe this is a bug (or at least quirk) in cgi.FieldStorage where you > can't access a file upload properly if "filename=" is not present in the > MIME part's Content-Disposition header. There are a couple of related bugs > open (and closed) on bugs.python.ord, but not quite this issue. > > Is it legitimate for cgi.FieldStorage to use the presence of "filename=" > to determine "this is a binary file" (in which case this is not a bug and > my client is just buggy), or is this a bug? I lean towards the latter as > the spec indicates that the filename is optional [2]. >
Assuming this isn't a recent change in semantics I would say this is now a quick considering how old the module is and people probably rely on its current semantics. -Brett > > Copying from my StackOverflow question, including a test/repro case: > > When I use `cgi.FieldStorage` to parse a `multipart/form-data` request (or > any web framework like Pyramid which uses `cgi.FieldStorage`) I have > trouble processing file uploads from certain clients which don't provide a > `filename=file.ext` in the part's `Content-Disposition` header. > > If the `filename=` option is missing, `FieldStorage()` tries to decode the > contents of the file as UTF-8 and return a string. And obviously many files > are binary and not UTF-8 and as such give bogus results. > > For example: > > >>> import cgi > >>> import io > >>> body = (b'--KQNTvuH-itP09uVKjjZiegh7\r\n' + > ... b'Content-Disposition: form-data; name=payload\r\n\r\n' + > ... b'\xff\xd8\xff\xe0\x00\x10JFIF') > >>> env = { > ... 'REQUEST_METHOD': 'POST', > ... 'CONTENT_TYPE': 'multipart/form-data; > boundary=KQNTvuH-itP09uVKjjZiegh7', > ... 'CONTENT_LENGTH': len(body), > ... } > >>> fs = cgi.FieldStorage(fp=io.BytesIO(body), environ=env) > >>> (fs['payload'].filename, fs['payload'].file.read()) > (None, '����\x00\x10JFIF') > > Browsers, and *most* HTTP libraries do include the `filename=` option for > file uploads, but I'm currently dealing with a client that doesn't (and > omitting the `filename` does seem to be valid according to the spec). > > Currently I'm using a pretty hacky workaround by subclassing > `FieldStorage` and replacing the relevant `Content-Disposition` header with > one that does have the filename: > > import cgi > import os > > class FileFieldStorage(cgi.FieldStorage): > """To use, subclass FileFieldStorage and override _file_fields > with a tuple > of the names of the file field(s). You can also override > _file_name with > the filename to add. > """ > > _file_fields = () > _file_name = 'file_name' > > def __init__(self, fp=None, headers=None, outerboundary=b'', > environ=os.environ, keep_blank_values=0, > strict_parsing=0, > limit=None, encoding='utf-8', errors='replace'): > > if self._file_fields and headers and > headers.get('content-disposition'): > content_disposition = headers['content-disposition'] > key, pdict = cgi.parse_header(content_disposition) > if (key == 'form-data' and pdict.get('name') in > self._file_fields and > 'filename' not in pdict): > del headers['content-disposition'] > quoted_file_name = self._file_name.replace('"', '\\"') > headers['content-disposition'] = '{}; > filename="{}"'.format( > content_disposition, quoted_file_name) > > super().__init__(fp=fp, headers=headers, > outerboundary=outerboundary, > environ=environ, > keep_blank_values=keep_blank_values, > strict_parsing=strict_parsing, limit=limit, > encoding=encoding, errors=errors) > > Using the `body` and `env` in my first test, this works now: > > >>> class TestFieldStorage(FileFieldStorage): > ... _file_fields = ('payload',) > >>> fs = TestFieldStorage(fp=io.BytesIO(body), environ=env) > >>> (fs['payload'].filename, fs['payload'].file.read()) > ('file_name', b'\xff\xd8\xff\xe0\x00\x10JFIF') > > Is there some way to avoid this hack and tell `FieldStorage` not to decode > as UTF-8? It would be nice if you could provide `encoding=None` or > something, but it doesn't look like it supports that. > > Thanks, > Ben. > > > [1] > https://stackoverflow.com/questions/42213318/cgi-fieldstorage-with-multipart-form-data-tries-to-decode-binary-file-as-utf-8-e > [2] https://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.5.1 > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com