Have cc'd this other to the web-sig list in case anyone wants to shoot me down. :-)
On 30/03/07, Robert Brewer <[EMAIL PROTECTED]> wrote: > > Robert, was doing some testing with CherryPy WSGI server and noted > > that if read() is called with no arguments on wsgi.input that it just > > seems to hang indefinitely. Is there a problem here or have I managed > > to stuff up very simple test. It works okay when I explicitly specific > > content length. > > That's right. We simply hand the (blocking, makefiled) socket to the app > as wsgi.input. PEP 333 says: > > "The server is not required to read past the client's > specified Content-Length, and is allowed to simulate > an end-of-file condition if the application attempts > to read past that point. The application should not > attempt to read more data than is specified by the > CONTENT_LENGTH variable." > > We chose to not simulate the EOF, requiring app authors do that for > themselves (mostly to give apps more flexibility). Note that the app > side of CherryPy handles this for you by default. But since the spec > clearly places the responsibility or checking content-length on the > application side, it seemed redundant to perform the check both on the > app side and the server side. As I believe I have pointed out on the Python web-sig list before, the statement: ""The application should not attempt to read more data than is specified by the CONTENT_LENGTH variable.""" is actually a bit bogus. This is because a WSGI middleware component or web server could be acting as an input filter and decompressing a content encoding of gzip for request. Since it knows the size will change but will not know what the new size would be, except by buffering it all, it by rights should remove CONTENT_LENGTH. This presents a problem for an application as no CONTENT_LENGTH then to rely on to know whether it has read to much input. If you leave CONTENT_LENGTH intact, it may think it has read everything when there is in fact more. Also, with chunked transfer encoding you will not have CONTENT_LENGTH either. I know you read it all in and buffer it so you can calculate it, but that prevents streaming with chunked encoding where content length may be based on a series of end to communications. Thus, an application should really be just ignoring CONTENT_LENGTH and just successively calling read() in some way until it returns an empty string. It can't really work reliably in any other way. I believe that the WSGI adapter should be required (not just allowed) to simulate EOF if it believes that no more input is available for that request. For example, it knows at low level that CONTENT_LENGTH was valid because no filtering by that point, or that in chunked encoding that null block has been sent. The adapter is the only place it will generally know that this is the case. The only time that CONTENT_LENGTH may be of interest to an application is if it is acting as a proxy to downstream web server as then it needs to put it in downstream request. If no CONTENT_LENGTH or chunked transfer encoding it would be forced to use chunked encoding for downstream request. FWIW, what I have come to the conclusion of is that read() with no arguments is used then rather than say attempt to read all input in in one go based on some content length, is that underneath the adapter should insert its own size argument transparently. This size would be based on some block size deemed to perhaps give best performance based on technology being used. Thus read() with no arguments would always return potentially partial data and not all data. This is valid because semantics of read() for a file like object is that one should call it until it returns an empty string as EOF indicator. WSGI PEP is ambiguous in that respect as it says it is a file like object but then says you aren't supposed to read more than CONTENT_LENGTH and that an adapter doesn't have to simulate to EOF. One may say that this overrides file like object properties, but the WSGI way will not work all the time. Graham _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com