Graham Dumpleton wrote: > Can the group mind provide some clarification on the following please. > > 1. The WSGI specification does not require that a WSGI > adapter provide an EOF indicator if an attempt is made to > read more data from wsgi.input than defined by request > Content-Length.
This is not a problem when the Content-Length header is provided in the request, because the application should never read more than <Content-Length> bytes. RFC 2616 says "The presence of a message-body in a request is signaled by the inclusion of a Content-Length or Transfer-Encoding header field in the request's message-headers." If those headers are missing, then the application has to assume there is no message body, and the WSGI gateway is free to dispose of any message body it can detect. I do agree that the handling of chunked request bodies is not ideal; the current wording implies that the gateway must buffer the entire chunked request body until it can calculate the Content-Length, before calling the application object. This pretty much defeats the purpose of chunked encoding. On the other hand, it is a pretty minor issue because chunked request bodies are very rare. > Is though a WSGI adapter required to > explicitly discard any request content which wasn't consumed > or is the WSGI applications responsibility to ensure that all > request content up to the length specified is always consumed? Given the existing body of applications that ignore extraneous message bodies, it only makes sense to put the burden on the gateway. In particular, a request entity is allowed syntactically on a GET request, but any such entity must not effect the semantics of the request--that is, an application should always ignore it. And, I've never seen any WSGI applications that attempt to consume request entities on a GET request. It is pretty common to ignore the request entities on PUT and POST requests too (e.g. for conditional requests). > I have seen some reports to suggest that some WSGI > adapter/servers do not discard unread content up to > Content-Length, resulting in the problem that if Keep-Alive > was enabled that the server may incorrectly try and interpret > the remaining content as the header of the next request on > that same socket connection. If the WSGI gateway cannot detect the end of one request and the start of the next one, regardless of what the application does, then it is faulty. That is the primary reason that requires Content-Length or Transfer-Encoding headers on messages with entity bodies. The WSGI spec. could be more explicit, I don't think anybody is going to stand up and say "I refuse to parse requests correctly because PEP 333 doesn't explicitly require me to." I think we just need to report these bugs to the gateway authors and let (help) them fix them. > 2. If a WSGI application sets a Content-Length in a response > and then returns request content of a greater length, should > the WSGI adapter attempt to discard any additional output > beyond the length set by the application or just pass it > through? What obligations do WSGI middleware have in this respect? > > If the answer is that the WSGI adapter shouldn't care and > should just pass everything through, then would it be seen as > at least prudent that the WSGI adapter log a warning message > that the returned response content differs in length to the > specified Content-Length? Same applies where a WSGI > application finished successfully but didn't return as much > output as it said it was going to. If the application wants well-defined behavior, then it should always ensure that it sends a response body that is exactly <Content-Length> bytes long. That is because all the front-end web servers, proxy servers, and client applications that process the response depend on the response being compliant with RFC 2616. When the Content-Length header is wrong, the results are unpredictable, regardless of what the WSGI gateway tries to do. When you have to choose between being compliant with RFC 2616 or being compliant with PEP 333, always choose RFC 2616. Consequently, the server is free to do whatever it wants when the Content-Length is wrong: it can truncate overly long entities, or drop the connection entirely. Such results are likely to occur somewhere along the way to the client anyway. The application shouldn't expect a successful or even consistent result. (Note that when I say "the Content-Length is wrong" I am not referring to the case where the application does not include a Content-Length header at all.) > 3. Similarly, where a WSGI adapter supports wsgi.file_wrapper > and the Content-Length header was set in the response, should > the WSGI adapter send only at most that amount of data? This > question applies whether or not the WSGI adapter is able to > optimise the sending of the response because of the presence > of fileno() or other platform specific feature which would > facilitate such optimisations. The specification is clear about this: "The semantics [...] should be the same as if the application had returned iter(filelike.read, ''). In other words, transmission should begin at the current position within the "file" at the time that transmission begins, and continue until the end is reached." However, I think this is truly an error in the specification--the gateway should not be required to send more than <Content-Length> bytes if the application set the Content-Length header. Really, this is just a special case of the situation described above, where the application is trying to send a larger (or smaller) body than it claimed in the Content-Length header. Again, when you have to choose between being compliant with RFC 2616 or being compliant with PEP 333, always choose RFC 2616. > 4. Where a WSGI adapter supports wsgi.file_wrapper and the > Content-Length header was NOT set in the response, where > optimisations are being performed and the WSGI adapter can > (or must in order to send > it) calculate the length of the output, can the WSGI adapter > add its own Content-Length header indicating the actual > amount of response content sent. PEP 333 already clearly states that the WSGI gateway can add a Content-Length header whenenever it wants to, if the application didn't supply one: "[...T]he server or gateway may be able to either generate a Content-Length header, or at least avoid the need to close the client connection." I do think think that it is a good idea to include these clarifications in (an addendum to) the WSGI spec, as these are all issues that are often overlooked in implementations. - Brian _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com