Graham, I confess that it was I who brought up the idea of a wsgi.input iterator at the WSGI Open Space yesterday evening. :-) The discussion seemed to be assuming a file-like input object that could be read from by a piece of middleware, then "backed up" or "rewound" before passing it down to the next layer. This seemed to have problems: it doesn't support the case where the middleware wants to alter the input or pass it piecemeal down to the client as it comes in, and it also means that the *entire* input stream has to be kept around in memory for the lifetime of the whole request in case the client reading it is not the "real client" at the bottom of the stack, and a request is coming that will ask for the whole thing to be replayed.
So, I suggested placing the responsibility for rewind and buffering on the middleware. You want to read 2k of the input to make a middleware decision before invoking the next layer down? Then read it, and pass along a fresh iterator that first yields that 2k, then starts yielding everything from the partially-read iterator. Or, you can pass along a filter iterator that scans or changes the entire stream as it reads it from the upstream iterator. But, having through more about the idea, I think that your criticisms, Graham, are exactly on-target. Iterators don't give enough control to the reader to ask about the chunks (lines or blocks) that get delivered as they read. So at the very least we should indeed be looking at a file-like object; it's still easy to construct a file-like object that's really streaming from another file as it comes in, and we could even provide shortcuts to build files from inline iterators or something. And, the idea that each piece of middleware does its *own* buffering might be a bad one too. One might naively store everything in RAM, another might put blocks on disk, another might run you out of /tmp space trying to do the same thing - even storing duplicates of the same data if we're not careful! The same 1MB initial block could wind up on disk two or three times if each piece of middleware thinks it's the one with it cached to pass along to the bottom layer that's reading 16k blocks at a time. So what's left of my suggestion? I suggest that we *not* commit to unlimited rewinding of the input stream; that was my single real insight, and an uncontrollable iterator design gives up too much in order to achieve that. A file-like object is more appropriate, but we either need to make middleware do its own caching of partially-consumed data, *or* we need some way for middleware to signal whether it needs the older data kept. Could "input.bookmark()" signal that everything from this point on in the stream needs to be retained, in memory or on disk, to be rewound to later? And the data be released only after the bookmark is deleted? b = input.bookmark() input.read()... input2 = b.file() del b Or, we could allow the "input" object to support cloning, where all data is cached from the clone-that's-read-least-far to the one that's read the farthest: c = input.clone() input.read(100) # 100 bytes are now cached by the framework, in RAM or on disk or on # a USB keyfob or wherever this framework puts it. (Django will write # their own caching that's different from everyone else's). c.read(100) # the bytes are released del c # Now that there's just one active clone, no buffering takes place. That way one could "read ahead" on your own input, while passing the complete stream back down to the next level. This has the disadvantage that if a middleware piece wants to keep the first 100MB and last 100MB from a stream but throw out the middle, it's got no way to do so without dropping back to its own caching scheme that the framework can't coordinate with other schemes; but it seems to cover the majority of cases that I can think of. Anyway: no unlimited caching, no unlimited rewind; that's my argument. Iterators were just one way of cleaning getting there, but probably, in the light of the next day, not a powerful enough way. -- Brandon Craig Rhodes bran...@rhodesmill.org http://rhodesmill.org/brandon _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com