2008/11/28 Robert Brewer <[EMAIL PROTECTED]>: > Brian Smith wrote: >> Randy Syring wrote: >> > Hopefully you can clarify something for me. Lets assume that the >> > client does not use '100 Continue' but sends data immediately, after >> > sending the headers. If the server never reads the request content, >> > what does that mean exactly? Does the data get transferred over the >> > wire but then discarded or does the client not get to send the data >> > until the server reads the request body? I.e. the client tries to >> > "send" it, but the content isn't actually transferred across the >> > wire until the server reads it. I am just wondering if there >> > is a buffer or queue or something between the server and the client >> > that allows data to be transferred even if the server doesn't >> > "read" the request body. Or, is it just like a straight pipe >> > where one end (the client) can't push data through until the other >> > end (the server) reads it. >> >> Under Apache CGI or mod_wsgi, in many situations you will get a >> deadlock in >> this scenario. The input and the output are buffered separately both > of >> those buffers can fill up. Neither mod_wsgi nor mod_cgid implement the >> non-blocking I/O logic needed to prevent deadlocks. I heard (but did >> not >> verify) that mod_fastcgi does not have this deadlocking problem. The >> sizes >> of the buffers determines the size of the inputs and outputs needed to >> cause >> a deadlock. On some platforms (e.g. Mac OS X), they are only 8K by >> default. >> >> Therefore, for maximum portability, a WSGI application should ALWAYS >> consume >> the *whole* request body if it wants to avoid the deadlock using the >> reference WSGI adapter in PEP 333 or mod_wsgi. > > Indeed. This is covered in RFC 2616 Section 8.2.3: > > If an origin server receives a request that does not include an > Expect request-header field with the "100-continue" expectation, > the request includes a request body, and the server responds > with a final status code before reading the entire request body > from the transport connection, then the server SHOULD NOT close > the transport connection until it has read the entire request, > or until the client closes the connection. Otherwise, the client > might not reliably receive the response message. However, this > requirement is not be construed as preventing a server from > defending itself against denial-of-service attacks, or from > badly broken client implementations. > > CherryPy's wsgiserver will read any remaining request body (which the > application hasn't read) before sending response headers.
A WSGI application could technically want to send response headers and only then read remaining request content. I don't believe there is anything in the WSGI specification which prevents that. If you are discarding the request content as soon as response headers are generated, that could technically be a problem for some use cases, even if they may be obscure. I cant tell from looking at latest CherryPy WSGI server code as has been changed since last I looked at it and haven't yet had time to grok it and run some tests, but previously in respect of where WSGI specification says: """The server is not required to read past the client's specified Content-Length, and is allowed to simulate an end-of-file condition if the application attempts to read past that point.""" the CherryPy WSGI server code chose NOT to simulate an end-of-file condition. This was the case as the amount of data read from wsgi.input was never tracked. This meant that if application did try and read more content than available and request pipelining occurring then the read would hang as would not get an empty string returned as would be normal for end-of-file condition for file like object. If the code is still behaving this way, then it wouldn't be possible for it to discard remaining input as how much was read wasn't tracked. Looking at latest code I do note the presence of a wrapper around socket used for wsgi.input, but haven't been able to work out yet whether it returns a traditional empty string as end-of-file condition, or whether it is going to instead raise your MaxSizeExceeded exception and thus not be file like in it behaviour. Can you perhaps explain what is going to happen when an attempt is made to read more content than what was available and whether it is actually going to raise an exception rather than just return an empty string like file like objects would. Personally I think that that part of WSGI specification should be amended such that it is required that an end-of-file condition MUST be indicated using an empty string just like with normal file like objects. Just this one change would mean that one could call read() with no arguments and have it return all input, whereas at the moment WSGI specification does allow argument to read() be optional. This would actually negate the whole need for applications to even check/use CONTENT_LENGTH except for situations where it mattered such as 413 response or where how it decided to process it was dependent on size. That is, to get all request content you would just call read() with no argument. If you wanted to process it in chunks, then it would just loop reading a set chunk size until empty string returned and it wouldn't need to track how much it read and short read the last chunk. If applications worked this way then one could handle mutating input filters that changed amount of request content, ie., decompression of data, plus could handle chunked transfer encoding on request content in a reasonable way without having to read it all in and buffer it just to work out CONTENT_LENGTH. Up till now, the only major WGSI server (ignoring wsgiref perhaps) I knew of which didn't allow read() with no argument or which didn't simulate end-of-file through empty string being returned was CherryPy WSGI server. Now its code has been changed, but not sure if it still does that or whether it has done something totally different to everything else by raising an exception instead. Graham _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com