Re: [Python-Dev] py3k, cgi, email, and form-data

MRAB Mon, 11 May 2009 13:19:10 -0700

Robert Brewer wrote:

There's a major change in functionality in the cgi module between Python
2 and Python 3 which I've just run across: the behavior of
FieldStorage.read_multi, specifically when an HTTP app accepts a file
upload within a multipart/form-data payload.


In Python 2, each part would be read in sequence within its own
FieldStorage instance. This allowed file uploads to be shunted to a
TemporaryFile (via make_file) as needed:

    klass = self.FieldStorageClass or self.__class__
    part = klass(self.fp, {}, ib,
                 environ, keep_blank_values, strict_parsing)
    # Throw first part away
    while not part.done:
        headers = rfc822.Message(self.fp)
        part = klass(self.fp, headers, ib,
                     environ, keep_blank_values, strict_parsing)
        self.list.append(part)

In Python 3 (svn revision 72466), the whole request body is read into
memory first via fp.read(), and then broken into separate parts in a
second step:

    klass = self.FieldStorageClass or self.__class__
    parser = email.parser.FeedParser()
    # Create bogus content-type header for proper multipart parsing
    parser.feed('Content-Type: %s; boundary=%s\r\n\r\n' % (self.type, ib))
    parser.feed(self.fp.read())
    full_msg = parser.close()
    # Get subparts
    msgs = full_msg.get_payload()
    for msg in msgs:
        fp = StringIO(msg.get_payload())
        part = klass(fp, msg, ib, environ, keep_blank_values,
                     strict_parsing)
        self.list.append(part)

This makes the cgi module in Python 3 somewhat crippled for handling
multipart/form-data file uploads of any significant size (and since
the client is the one determining the size, opens a server up for an
unexpected Denial of Service vector).

I *think* the FeedParser is designed to accept incremental writes,
but I haven't yet found a way to do any kind of incremental reads
from it in order to shunt the fp.read out to a tempfile again.
I'm secretly hoping Barry has a one-liner fix for this. ;)

It think what it needs is for the email.parser.FeedParser class to have
a feed_from_file() method, supported by the class BufferedSubFile.

The BufferedSubFile class keeps an internal list of lines. Perhaps it
could also have a list of files, so that when the list of lines becomes
empty it can continue by reading lines from the files instead, dropping
a file from the list when it reaches the end, something like this:

[Module feedparser.py]
...
class BufferedSubFile(object):
...
    def __init__(self):
        # The last partial line pushed into this object.
        self._partial = ''
        # The list of full, pushed lines, in reverse order
        self._lines = []
        # The list of files.
        self._files = []
        ...

    ...
    def readline(self):
        while not self._lines and self._files:
            data = self._files[0].read(MAX_DATA_SIZE)
            if data:
                self.push(data)
            else:
                del self._files[0]
        if not self._lines:
            if self._closed:
                return ''
            return NeedMoreData
        ...

    def push_file(self, data_file):
        """Push some new data from a file into this object."""
        self._files.append(data_file)

    ...


and then:

...
class FeedParser:
    ...
    def feed(self, data):
        """Push more data into the parser."""
        self._input.push(data)
        self._call_parse()

    def feed_from_file(self, data_file):
        """Push more data from a file into the parser."""
        self._input.push_file(data_file)
        self._call_parse()

    ...
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] py3k, cgi, email, and form-data

Reply via email to