Re: [Web-SIG] Implementing File Upload Size Limits

2008-11-27 Thread Graham Dumpleton
2008/11/28 Robert Brewer [EMAIL PROTECTED]:
 Brian Smith wrote:
 Randy Syring wrote:
  Hopefully you can clarify something for me.  Lets assume that the
  client does not use '100 Continue' but sends data immediately, after
  sending the headers.  If the server never reads the request content,
  what does that mean exactly?  Does the data get transferred over the
  wire but then discarded or does the client not get to send the data
  until the server reads the request body?  I.e. the client tries to
  send it, but the content isn't actually transferred across the
  wire until the server reads it.  I am just wondering if there
  is a buffer or queue or something between the server and the client
  that allows data to be transferred even if the server doesn't
  read the request body.  Or, is it just like a straight pipe
  where one end (the client) can't push data through until the other
  end (the server) reads it.

 Under Apache CGI or mod_wsgi, in many situations you will get a
 deadlock in
 this scenario. The input and the output are buffered separately both
 of
 those buffers can fill up. Neither mod_wsgi nor mod_cgid implement the
 non-blocking I/O logic needed to prevent deadlocks. I heard (but did
 not
 verify) that mod_fastcgi does not have this deadlocking problem. The
 sizes
 of the buffers determines the size of the inputs and outputs needed to
 cause
 a deadlock. On some platforms (e.g. Mac OS X), they are only 8K by
 default.

 Therefore, for maximum portability, a WSGI application should ALWAYS
 consume
 the *whole* request body if it wants to avoid the deadlock using the
 reference WSGI adapter in PEP 333 or mod_wsgi.

 Indeed. This is covered in RFC 2616 Section 8.2.3:

If an origin server receives a request that does not include an
Expect request-header field with the 100-continue expectation,
the request includes a request body, and the server responds
with a final status code before reading the entire request body
from the transport connection, then the server SHOULD NOT close
the transport connection until it has read the entire request,
or until the client closes the connection. Otherwise, the client
might not reliably receive the response message. However, this
requirement is not be construed as preventing a server from
defending itself against denial-of-service attacks, or from
badly broken client implementations.

 CherryPy's wsgiserver will read any remaining request body (which the
 application hasn't read) before sending response headers.

A WSGI application could technically want to send response headers and
only then read remaining request content. I don't believe there is
anything in the WSGI specification which prevents that. If you are
discarding the request content as soon as response headers are
generated, that could technically be a problem for some use cases,
even if they may be obscure.

I cant tell from looking at latest CherryPy WSGI server code as has
been changed since last I looked at it and haven't yet had time to
grok it and run some tests, but previously in respect of where WSGI
specification says:

The server is not required to read past the client's specified
Content-Length, and is allowed to simulate an end-of-file condition if
the application attempts to read past that point.

the CherryPy WSGI server code chose NOT to simulate an end-of-file
condition. This was the case as the amount of data read from
wsgi.input was never tracked. This meant that if application did try
and read more content than available and request pipelining occurring
then the read would hang as would not get an empty string returned as
would be normal for end-of-file condition for file like object.

If the code is still behaving this way, then it wouldn't be possible
for it to discard remaining input as how much was read wasn't tracked.

Looking at latest code I do note the presence of a wrapper around
socket used for wsgi.input, but haven't been able to work out yet
whether it returns a traditional empty string as end-of-file
condition, or whether it is going to instead raise your
MaxSizeExceeded exception and thus not be file like in it behaviour.

Can you perhaps explain what is going to happen when an attempt is
made to read more content than what was available and whether it is
actually going to raise an exception rather than just return an empty
string like file like objects would.

Personally I think that that part of WSGI specification should be
amended such that it is required that an end-of-file condition MUST be
indicated using an empty string just like with normal file like
objects. Just this one change would mean that one could call read()
with no arguments and have it return all input, whereas at the moment
WSGI specification does allow argument to read() be optional.

This would actually negate the whole need for applications to even
check/use CONTENT_LENGTH except for situations where it mattered such
as 413 response or where 

Re: [Web-SIG] Implementing File Upload Size Limits

2008-11-26 Thread Brian Smith
Brian Smith wrote:
 2008/11/26 Brian Smith [EMAIL PROTECTED]:
  Under Apache CGI or mod_wsgi, in many situations you will get a
  deadlock in this scenario. 
 
 It isn't 'many situations', it is a quite specific situation.

Right. I meant that it can happen quite often (every time) that situation
occurs, depending on the characteristics of the application.
 
  If you know C, it is relatively simple to modify mod_wsgi to use a
  different Apache-daemon communication protocol 
 
 Depends on your definition of simple. It would be quite fiddly to do
 and get right, or one would have to rewrite a large amount of code. I
 wouldn't regard either as really that simple.

I did it by implementing the communication protocol that I had proposed on
the mod_wsgi mailing list a while ago. It is straightforward to do, but it
does take a lot of time to learn how mod_wsgi works in order to make the
change, especially if you have never written an Apache module before.

- Brian


___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Implementing File Upload Size Limits

2008-11-22 Thread Randy Syring

I did find this:

http://wiki.pylonshq.com/display/pylonscookbook/A+Better+Way+To+Limit+File+Upload+Size

Which was good, but still leaves some unanswered questions:

   * What if one is not using the paste http server?
   * This method gives an unfriendly response.  What would be the best
 method to propagate this error condition down to the app so that a
 message could be given to the user in the context of the form they
 had previously submitted (i.e. an error message under the input
 field reminding them of the max upload size and even possibly
 telling them how big the file was they uploaded).

Thanks.

--
Randy Syring
RCS Computers  Web Solutions
502-644-4776
http://www.rcs-comp.com

Whether, then, you eat or drink or 
whatever you do, do all to the glory

of God. 1 Cor 10:31



Randy Syring wrote:
I am looking for opinions and thoughts on best practice for limiting 
file upload size.  I have a few considerations:


* Ultimately, I would want my application with my method of
  handling forms to be able to give the user a message that the
  file size was too big.  That means that however, the size is
  limited, just blanking out wsgi.input and setting content-length
  to zero doesn't seem correct.  That would make it look like the
  form wasn't submitted with any data I believe.
* Given the above, it seems that something would need to get put
  in the environment to tell middleware and the application that
  the file input was aborted, but what would be the best way for
  doing it?  Should it be some kind of standard, or just dependent
  on your server or middleware?
* It seems best to implement this functionality as the very first
  middleware in the stack.  Since other middleware read and
  manipulate wsgi.input, handling the upload size at the
  application level wouldn't prevent middlware from wasting
  resources dealing with a very large file.

Is it possible to prevent the server from even accepting all the data 
(i.e. trying to save bandwidth and server resources) if the 
content-length is known to be too big?  Or is the server required to 
take all the client's data regardless, even if it ends up going in the 
bit bucket?  I realize some of this is server specific, not WSGI 
specific, but I would be interested in knowing how the most popular 
servers handle this or what the HTTP specs require if anyone knows.


Thanks in advance for any insight you might be able to provide.
--
--
Randy Syring
RCS Computers  Web Solutions
502-644-4776
http://www.rcs-comp.com

Whether, then, you eat or drink or 
whatever you do, do all to the glory

of God. 1 Cor 10:31
  



___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/randy%40rcs-comp.com
  
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Implementing File Upload Size Limits

2008-11-22 Thread Graham Dumpleton
2008/11/22 Randy Syring [EMAIL PROTECTED]:
 I am looking for opinions and thoughts on best practice for limiting file
 upload size.  I have a few considerations:

 Ultimately, I would want my application with my method of handling forms to
 be able to give the user a message that the file size was too big.  That
 means that however, the size is limited, just blanking out wsgi.input and
 setting content-length to zero doesn't seem correct.  That would make it
 look like the form wasn't submitted with any data I believe.
 Given the above, it seems that something would need to get put in the
 environment to tell middleware and the application that the file input was
 aborted, but what would be the best way for doing it?  Should it be some
 kind of standard, or just dependent on your server or middleware?
 It seems best to implement this functionality as the very first middleware
 in the stack.  Since other middleware read and manipulate wsgi.input,
 handling the upload size at the application level wouldn't prevent middlware
 from wasting resources dealing with a very large file.

 Is it possible to prevent the server from even accepting all the data (i.e.
 trying to save bandwidth and server resources) if the content-length is
 known to be too big?  Or is the server required to take all the client's
 data regardless, even if it ends up going in the bit bucket?  I realize some
 of this is server specific, not WSGI specific, but I would be interested in
 knowing how the most popular servers handle this or what the HTTP specs
 require if anyone knows.

 Thanks in advance for any insight you might be able to provide.

If you use Apache/mod_wsgi to host your WSGI application, the best way
of handling this is use the Apache LimitRequestNody directive for
appropriate context. This will result in Apache returning a
HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response to the client. If
you need a custom error document for that response type use Apache
ErrorDocument directive to specify URL of handler which would generate
it.

Except for the custom error document if delegated to the WSGI
application, doing it this way results in it all being handled by
Apache/mod_wsgi and your WSGI application will not even be invoked.
The request body content would also not even be read by Apache at all.
Do note that whether this avoids the client sending the request body
input depends on whether the client was expecting a '100 Continue'
response before it send the data. Most web browsers still I believe
don't use '100 Continue' response.

This would be the preferred solution for Apache/mod_wsgi as it is
handled at lowest levels and guaranteed that request content wouldn't
be read at that point. It is however taking control out of your
application.

For Apache/mod_wsgi, if you do not do it this way but instead validate
content length in the WSGI application and have the WSGI application
return HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response, then
whether the request content gets read depends on whether you are using
embedded mode or daemon mode of mod_wsgi.

If you use embedded mode, so long as your WSGI application doesn't
read the input and just returns the error response, the request
content wouldn't be read at all. If you are using daemon mode however,
then the request content would always be read by Apache child worker
process, even if client asked for '100 Continue' response. This is
because the Apache child worker process will always proxy request
content to the daemon process.

Anyway, that is how things are for Apache/mod_wsgi.

Graham
___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com


Re: [Web-SIG] Implementing File Upload Size Limits

2008-11-22 Thread Randy Syring

[forgot to copy list]

Graham Dumpleton wrote:

2008/11/22 Randy Syring [EMAIL PROTECTED]:
  

I am looking for opinions and thoughts on best practice for limiting file
upload size.  I have a few considerations:

snip


If you use Apache/mod_wsgi to host your WSGI application, the best way
of handling this is use the Apache LimitRequestNody directive for
appropriate context. This will result in Apache returning a
HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response to the client. If
you need a custom error document for that response type use Apache
ErrorDocument directive to specify URL of handler which would generate
it.
  

Graham,

Thank you for your response.  What you noted above does seem to be the
lowest level solution possible if you are using apache.  I suppose using
an error document that is part of the application would at least allow
me to serve a specific page from my application that could detail the
error.  If I wanted to get fancy, each time a form with an input element
was sent to a user, I could save that path in a special variable in the
user's session.  My error page could then look for that value in the
user session and if present, load the correct form, giving the user an
error message noting that the file uploaded was too big.  The downfall
to that approach is that the form comes back empty.  It might be better
to just have the error page give them some details and encourage them to
use the back button, in which case the form's fields would hopefully
still be filled in.

Except for the custom error document if delegated to the WSGI
application, doing it this way results in it all being handled by
Apache/mod_wsgi and your WSGI application will not even be invoked.
The request body content would also not even be read by Apache at all.
Do note that whether this avoids the client sending the request body
input depends on whether the client was expecting a '100 Continue'
response before it send the data. Most web browsers still I believe
don't use '100 Continue' response.

This would be the preferred solution for Apache/mod_wsgi as it is
handled at lowest levels and guaranteed that request content wouldn't
be read at that point. It is however taking control out of your
application.
  

Hopefully you can clarify something for me.  Lets assume that the client
does not use '100 Continue' but sends data immediately, after sending
the headers.  If the server never reads the request content, what does
that mean exactly?  Does the data get transferred over the wire but then
discarded or does the client not get to send the data until the server
reads the request body?  I.e. the client tries to send it, but the
content isn't actually transferred across the wire until the server
reads it.  I am just wondering if there is a buffer or queue or
something between the server and the client that allows data to be
transferred even if the server doesn't read the request body.  Or, is
it just like a straight pipe where one end (the client) can't push data
through until the other end (the server) reads it.

I agree that it does take control out of the application.  From a
usability perspective, the best solution IMO would be for the user to
get the form back and have a red error messsage under the input field
indicating the file size uploaded was too big and giving them the max
file size allowed.  However, on second thought, that may not be true.
As noted above, because the entire request body was rejected, the form
loaded would have none of the information they submitted and most users
would probably think they have to fill out the whole form again.
Probably better to just give them a non-form error page and let them use
the back button (or even provide a link that uses javascript to go back)
and in so doing hopefully salvage the time they put into the form.

I suppose, though, that two different kinds of file size limits need to
be thought through.  The first limit would be an application wide limit
that is set for security/resource reasons.  That, I believe, is what we
have been discussing up to this point.  I am just realizing that it
would also be fine to limit upload sizes at the application level and
give more user-friendly error messages.  So I might decide on a 10MB
application-wide upload limit, but I might also restrict free accounts
and paid accounts to 256k and 5MB respectively.  As long as a user
uploads something less than 10MB, they get a friendly in-line error
message.  If they upload over 10MB, we handle that at the apache level
and send them to a custom error page.

For Apache/mod_wsgi, if you do not do it this way but instead validate
content length in the WSGI application and have the WSGI application
return HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response, then
whether the request content gets read depends on whether you are using
embedded mode or daemon mode of mod_wsgi.

If you use embedded mode, so long as your WSGI application doesn't
read the input and just returns the error 

[Web-SIG] Implementing File Upload Size Limits

2008-11-21 Thread Randy Syring
I am looking for opinions and thoughts on best practice for limiting 
file upload size.  I have a few considerations:


   * Ultimately, I would want my application with my method of handling
 forms to be able to give the user a message that the file size was
 too big.  That means that however, the size is limited, just
 blanking out wsgi.input and setting content-length to zero doesn't
 seem correct.  That would make it look like the form wasn't
 submitted with any data I believe.
   * Given the above, it seems that something would need to get put in
 the environment to tell middleware and the application that the
 file input was aborted, but what would be the best way for doing
 it?  Should it be some kind of standard, or just dependent on your
 server or middleware?
   * It seems best to implement this functionality as the very first
 middleware in the stack.  Since other middleware read and
 manipulate wsgi.input, handling the upload size at the application
 level wouldn't prevent middlware from wasting resources dealing
 with a very large file.

Is it possible to prevent the server from even accepting all the data 
(i.e. trying to save bandwidth and server resources) if the 
content-length is known to be too big?  Or is the server required to 
take all the client's data regardless, even if it ends up going in the 
bit bucket?  I realize some of this is server specific, not WSGI 
specific, but I would be interested in knowing how the most popular 
servers handle this or what the HTTP specs require if anyone knows.


Thanks in advance for any insight you might be able to provide.

--
--
Randy Syring
RCS Computers  Web Solutions
502-644-4776
http://www.rcs-comp.com

Whether, then, you eat or drink or 
whatever you do, do all to the glory

of God. 1 Cor 10:31

___
Web-SIG mailing list
Web-SIG@python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: 
http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com