Randy Syring wrote: > Hopefully you can clarify something for me. Lets assume that the > client does not use '100 Continue' but sends data immediately, after > sending the headers. If the server never reads the request content, > what does that mean exactly? Does the data get transferred over the > wire but then discarded or does the client not get to send the data > until the server reads the request body? I.e. the client tries to > "send" it, but the content isn't actually transferred across the > wire until the server reads it. I am just wondering if there > is a buffer or queue or something between the server and the client > that allows data to be transferred even if the server doesn't > "read" the request body. Or, is it just like a straight pipe > where one end (the client) can't push data through until the other > end (the server) reads it.
Under Apache CGI or mod_wsgi, in many situations you will get a deadlock in this scenario. The input and the output are buffered separately both of those buffers can fill up. Neither mod_wsgi nor mod_cgid implement the non-blocking I/O logic needed to prevent deadlocks. I heard (but did not verify) that mod_fastcgi does not have this deadlocking problem. The sizes of the buffers determines the size of the inputs and outputs needed to cause a deadlock. On some platforms (e.g. Mac OS X), they are only 8K by default. Therefore, for maximum portability, a WSGI application should ALWAYS consume the *whole* request body if it wants to avoid the deadlock using the reference WSGI adapter in PEP 333 or mod_wsgi. Probably other WSGI gateways have similar issues. It would be nice if there was a standard entry in the WSGI environment (e.g. "wsgi.may_ignore_request_body") that could be used to safely detect when we can skip the request body. It would be even nicer if WSGI gateways were updated to avoid this problem. However, that is easier said than done. If you know C, it is relatively simple to modify mod_wsgi to use a different Apache<->daemon communication protocol so that the daemon mode works as you would expect (no deadlocks, proper 100-continue support, request body isn't read unless your application asks for it). A long time ago I had a patch that did this (among other things) but I don't think I have it any more. However, once you get to that point, you still run into problems. If your goal is to avoid reading the request body, then you need to close the connection in your error response; Otherwise, if the request was a HTTP/1.1 request, you still need to read the entire request body in order to process any requests that follow it in the request pipeline. Unfortunately, a WSGI application doesn't have any way of signaling that the connection is to be closed; the WSGI specification forbids the WSGI application from returning the Connection header since it is hop-by-hop. And, even if there was such a mechanism, a poorly-coded client is likely to still cause a deadlock if the server doesn't read its full request. Make sure you test with all your targeted browsers. Consequently... > > If you are using daemon mode however, > > then the request content would always be read by Apache child worker > > process, even if client asked for '100 Continue' response. This is > > because the Apache child worker process will always proxy request > > content to the daemon process. > > > Thats good to know. I think at this point I have talked myself into > thinking that there is no good reason to handle it at the application > level, but would appreciate any further feedback you might have. ...if your users will often attempt to upload large files exceed your limits, is to best to mitigate the problem on the client-side. First, document the file size limit clearly on the page where the upload happens. Secondly, implement a flash-based and/or java-based file upload control that can be used when the user has Flash installed (fall back to the regular control otherwise). With such an uploader, you can check the file size on the client and prevent these requests from even being made (in the typical case). You will still have to implement the validation logic on the server to prevent malicious use and/or disabled Javascript/Flash/Java. There are additional benefits to this approach (better UI, multi-file selection, compression, encryption, doesn't waste the user's time, saves bandwidth) but it comes with all the drawbacks inherent with Flash/Java/Javascript. Regards, Brian _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com