Re: [Web-SIG] Implementing File Upload Size Limits
2008/11/28 Robert Brewer [EMAIL PROTECTED]: Brian Smith wrote: Randy Syring wrote: Hopefully you can clarify something for me. Lets assume that the client does not use '100 Continue' but sends data immediately, after sending the headers. If the server never reads the request content, what does that mean exactly? Does the data get transferred over the wire but then discarded or does the client not get to send the data until the server reads the request body? I.e. the client tries to send it, but the content isn't actually transferred across the wire until the server reads it. I am just wondering if there is a buffer or queue or something between the server and the client that allows data to be transferred even if the server doesn't read the request body. Or, is it just like a straight pipe where one end (the client) can't push data through until the other end (the server) reads it. Under Apache CGI or mod_wsgi, in many situations you will get a deadlock in this scenario. The input and the output are buffered separately both of those buffers can fill up. Neither mod_wsgi nor mod_cgid implement the non-blocking I/O logic needed to prevent deadlocks. I heard (but did not verify) that mod_fastcgi does not have this deadlocking problem. The sizes of the buffers determines the size of the inputs and outputs needed to cause a deadlock. On some platforms (e.g. Mac OS X), they are only 8K by default. Therefore, for maximum portability, a WSGI application should ALWAYS consume the *whole* request body if it wants to avoid the deadlock using the reference WSGI adapter in PEP 333 or mod_wsgi. Indeed. This is covered in RFC 2616 Section 8.2.3: If an origin server receives a request that does not include an Expect request-header field with the 100-continue expectation, the request includes a request body, and the server responds with a final status code before reading the entire request body from the transport connection, then the server SHOULD NOT close the transport connection until it has read the entire request, or until the client closes the connection. Otherwise, the client might not reliably receive the response message. However, this requirement is not be construed as preventing a server from defending itself against denial-of-service attacks, or from badly broken client implementations. CherryPy's wsgiserver will read any remaining request body (which the application hasn't read) before sending response headers. A WSGI application could technically want to send response headers and only then read remaining request content. I don't believe there is anything in the WSGI specification which prevents that. If you are discarding the request content as soon as response headers are generated, that could technically be a problem for some use cases, even if they may be obscure. I cant tell from looking at latest CherryPy WSGI server code as has been changed since last I looked at it and haven't yet had time to grok it and run some tests, but previously in respect of where WSGI specification says: The server is not required to read past the client's specified Content-Length, and is allowed to simulate an end-of-file condition if the application attempts to read past that point. the CherryPy WSGI server code chose NOT to simulate an end-of-file condition. This was the case as the amount of data read from wsgi.input was never tracked. This meant that if application did try and read more content than available and request pipelining occurring then the read would hang as would not get an empty string returned as would be normal for end-of-file condition for file like object. If the code is still behaving this way, then it wouldn't be possible for it to discard remaining input as how much was read wasn't tracked. Looking at latest code I do note the presence of a wrapper around socket used for wsgi.input, but haven't been able to work out yet whether it returns a traditional empty string as end-of-file condition, or whether it is going to instead raise your MaxSizeExceeded exception and thus not be file like in it behaviour. Can you perhaps explain what is going to happen when an attempt is made to read more content than what was available and whether it is actually going to raise an exception rather than just return an empty string like file like objects would. Personally I think that that part of WSGI specification should be amended such that it is required that an end-of-file condition MUST be indicated using an empty string just like with normal file like objects. Just this one change would mean that one could call read() with no arguments and have it return all input, whereas at the moment WSGI specification does allow argument to read() be optional. This would actually negate the whole need for applications to even check/use CONTENT_LENGTH except for situations where it mattered such as 413 response or where
Re: [Web-SIG] Implementing File Upload Size Limits
Brian Smith wrote: 2008/11/26 Brian Smith [EMAIL PROTECTED]: Under Apache CGI or mod_wsgi, in many situations you will get a deadlock in this scenario. It isn't 'many situations', it is a quite specific situation. Right. I meant that it can happen quite often (every time) that situation occurs, depending on the characteristics of the application. If you know C, it is relatively simple to modify mod_wsgi to use a different Apache-daemon communication protocol Depends on your definition of simple. It would be quite fiddly to do and get right, or one would have to rewrite a large amount of code. I wouldn't regard either as really that simple. I did it by implementing the communication protocol that I had proposed on the mod_wsgi mailing list a while ago. It is straightforward to do, but it does take a lot of time to learn how mod_wsgi works in order to make the change, especially if you have never written an Apache module before. - Brian ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Implementing File Upload Size Limits
I did find this: http://wiki.pylonshq.com/display/pylonscookbook/A+Better+Way+To+Limit+File+Upload+Size Which was good, but still leaves some unanswered questions: * What if one is not using the paste http server? * This method gives an unfriendly response. What would be the best method to propagate this error condition down to the app so that a message could be given to the user in the context of the form they had previously submitted (i.e. an error message under the input field reminding them of the max upload size and even possibly telling them how big the file was they uploaded). Thanks. -- Randy Syring RCS Computers Web Solutions 502-644-4776 http://www.rcs-comp.com Whether, then, you eat or drink or whatever you do, do all to the glory of God. 1 Cor 10:31 Randy Syring wrote: I am looking for opinions and thoughts on best practice for limiting file upload size. I have a few considerations: * Ultimately, I would want my application with my method of handling forms to be able to give the user a message that the file size was too big. That means that however, the size is limited, just blanking out wsgi.input and setting content-length to zero doesn't seem correct. That would make it look like the form wasn't submitted with any data I believe. * Given the above, it seems that something would need to get put in the environment to tell middleware and the application that the file input was aborted, but what would be the best way for doing it? Should it be some kind of standard, or just dependent on your server or middleware? * It seems best to implement this functionality as the very first middleware in the stack. Since other middleware read and manipulate wsgi.input, handling the upload size at the application level wouldn't prevent middlware from wasting resources dealing with a very large file. Is it possible to prevent the server from even accepting all the data (i.e. trying to save bandwidth and server resources) if the content-length is known to be too big? Or is the server required to take all the client's data regardless, even if it ends up going in the bit bucket? I realize some of this is server specific, not WSGI specific, but I would be interested in knowing how the most popular servers handle this or what the HTTP specs require if anyone knows. Thanks in advance for any insight you might be able to provide. -- -- Randy Syring RCS Computers Web Solutions 502-644-4776 http://www.rcs-comp.com Whether, then, you eat or drink or whatever you do, do all to the glory of God. 1 Cor 10:31 ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/randy%40rcs-comp.com ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Implementing File Upload Size Limits
2008/11/22 Randy Syring [EMAIL PROTECTED]: I am looking for opinions and thoughts on best practice for limiting file upload size. I have a few considerations: Ultimately, I would want my application with my method of handling forms to be able to give the user a message that the file size was too big. That means that however, the size is limited, just blanking out wsgi.input and setting content-length to zero doesn't seem correct. That would make it look like the form wasn't submitted with any data I believe. Given the above, it seems that something would need to get put in the environment to tell middleware and the application that the file input was aborted, but what would be the best way for doing it? Should it be some kind of standard, or just dependent on your server or middleware? It seems best to implement this functionality as the very first middleware in the stack. Since other middleware read and manipulate wsgi.input, handling the upload size at the application level wouldn't prevent middlware from wasting resources dealing with a very large file. Is it possible to prevent the server from even accepting all the data (i.e. trying to save bandwidth and server resources) if the content-length is known to be too big? Or is the server required to take all the client's data regardless, even if it ends up going in the bit bucket? I realize some of this is server specific, not WSGI specific, but I would be interested in knowing how the most popular servers handle this or what the HTTP specs require if anyone knows. Thanks in advance for any insight you might be able to provide. If you use Apache/mod_wsgi to host your WSGI application, the best way of handling this is use the Apache LimitRequestNody directive for appropriate context. This will result in Apache returning a HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response to the client. If you need a custom error document for that response type use Apache ErrorDocument directive to specify URL of handler which would generate it. Except for the custom error document if delegated to the WSGI application, doing it this way results in it all being handled by Apache/mod_wsgi and your WSGI application will not even be invoked. The request body content would also not even be read by Apache at all. Do note that whether this avoids the client sending the request body input depends on whether the client was expecting a '100 Continue' response before it send the data. Most web browsers still I believe don't use '100 Continue' response. This would be the preferred solution for Apache/mod_wsgi as it is handled at lowest levels and guaranteed that request content wouldn't be read at that point. It is however taking control out of your application. For Apache/mod_wsgi, if you do not do it this way but instead validate content length in the WSGI application and have the WSGI application return HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response, then whether the request content gets read depends on whether you are using embedded mode or daemon mode of mod_wsgi. If you use embedded mode, so long as your WSGI application doesn't read the input and just returns the error response, the request content wouldn't be read at all. If you are using daemon mode however, then the request content would always be read by Apache child worker process, even if client asked for '100 Continue' response. This is because the Apache child worker process will always proxy request content to the daemon process. Anyway, that is how things are for Apache/mod_wsgi. Graham ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com
Re: [Web-SIG] Implementing File Upload Size Limits
[forgot to copy list] Graham Dumpleton wrote: 2008/11/22 Randy Syring [EMAIL PROTECTED]: I am looking for opinions and thoughts on best practice for limiting file upload size. I have a few considerations: snip If you use Apache/mod_wsgi to host your WSGI application, the best way of handling this is use the Apache LimitRequestNody directive for appropriate context. This will result in Apache returning a HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response to the client. If you need a custom error document for that response type use Apache ErrorDocument directive to specify URL of handler which would generate it. Graham, Thank you for your response. What you noted above does seem to be the lowest level solution possible if you are using apache. I suppose using an error document that is part of the application would at least allow me to serve a specific page from my application that could detail the error. If I wanted to get fancy, each time a form with an input element was sent to a user, I could save that path in a special variable in the user's session. My error page could then look for that value in the user session and if present, load the correct form, giving the user an error message noting that the file uploaded was too big. The downfall to that approach is that the form comes back empty. It might be better to just have the error page give them some details and encourage them to use the back button, in which case the form's fields would hopefully still be filled in. Except for the custom error document if delegated to the WSGI application, doing it this way results in it all being handled by Apache/mod_wsgi and your WSGI application will not even be invoked. The request body content would also not even be read by Apache at all. Do note that whether this avoids the client sending the request body input depends on whether the client was expecting a '100 Continue' response before it send the data. Most web browsers still I believe don't use '100 Continue' response. This would be the preferred solution for Apache/mod_wsgi as it is handled at lowest levels and guaranteed that request content wouldn't be read at that point. It is however taking control out of your application. Hopefully you can clarify something for me. Lets assume that the client does not use '100 Continue' but sends data immediately, after sending the headers. If the server never reads the request content, what does that mean exactly? Does the data get transferred over the wire but then discarded or does the client not get to send the data until the server reads the request body? I.e. the client tries to send it, but the content isn't actually transferred across the wire until the server reads it. I am just wondering if there is a buffer or queue or something between the server and the client that allows data to be transferred even if the server doesn't read the request body. Or, is it just like a straight pipe where one end (the client) can't push data through until the other end (the server) reads it. I agree that it does take control out of the application. From a usability perspective, the best solution IMO would be for the user to get the form back and have a red error messsage under the input field indicating the file size uploaded was too big and giving them the max file size allowed. However, on second thought, that may not be true. As noted above, because the entire request body was rejected, the form loaded would have none of the information they submitted and most users would probably think they have to fill out the whole form again. Probably better to just give them a non-form error page and let them use the back button (or even provide a link that uses javascript to go back) and in so doing hopefully salvage the time they put into the form. I suppose, though, that two different kinds of file size limits need to be thought through. The first limit would be an application wide limit that is set for security/resource reasons. That, I believe, is what we have been discussing up to this point. I am just realizing that it would also be fine to limit upload sizes at the application level and give more user-friendly error messages. So I might decide on a 10MB application-wide upload limit, but I might also restrict free accounts and paid accounts to 256k and 5MB respectively. As long as a user uploads something less than 10MB, they get a friendly in-line error message. If they upload over 10MB, we handle that at the apache level and send them to a custom error page. For Apache/mod_wsgi, if you do not do it this way but instead validate content length in the WSGI application and have the WSGI application return HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response, then whether the request content gets read depends on whether you are using embedded mode or daemon mode of mod_wsgi. If you use embedded mode, so long as your WSGI application doesn't read the input and just returns the error
[Web-SIG] Implementing File Upload Size Limits
I am looking for opinions and thoughts on best practice for limiting file upload size. I have a few considerations: * Ultimately, I would want my application with my method of handling forms to be able to give the user a message that the file size was too big. That means that however, the size is limited, just blanking out wsgi.input and setting content-length to zero doesn't seem correct. That would make it look like the form wasn't submitted with any data I believe. * Given the above, it seems that something would need to get put in the environment to tell middleware and the application that the file input was aborted, but what would be the best way for doing it? Should it be some kind of standard, or just dependent on your server or middleware? * It seems best to implement this functionality as the very first middleware in the stack. Since other middleware read and manipulate wsgi.input, handling the upload size at the application level wouldn't prevent middlware from wasting resources dealing with a very large file. Is it possible to prevent the server from even accepting all the data (i.e. trying to save bandwidth and server resources) if the content-length is known to be too big? Or is the server required to take all the client's data regardless, even if it ends up going in the bit bucket? I realize some of this is server specific, not WSGI specific, but I would be interested in knowing how the most popular servers handle this or what the HTTP specs require if anyone knows. Thanks in advance for any insight you might be able to provide. -- -- Randy Syring RCS Computers Web Solutions 502-644-4776 http://www.rcs-comp.com Whether, then, you eat or drink or whatever you do, do all to the glory of God. 1 Cor 10:31 ___ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com