RE: ap_setup_client_block and Content-Length

2008-02-15 Thread Brian Smith
Nick Kew wrote:
 Charles Fry [EMAIL PROTECTED] wrote:
 
  Hi,
  
  I desire to access a request's Content-Length from an input filter.
 
 When it exists, you can get it with
 apr_table_get(r-headers_in, Content-Length)

There is no way to get an accurate content length if other input filters
have modified the request body's length ahead of yours. For example, try
putting mod_deflate's input filter before yours in the filter chain,
send a Content-Encoding: deflate request, and notice that mod_deflate
changed the input length without updating Content-Length or anything
else you can use to retrieve how much data you will actually get.

The *only* way to reliably tell how much input is available to your
filter/content-handler is to read all the input until you get to the EOS
bucket or until ap_get_client_block returns 0, counting the bytes along
the way. 

- Brian



RE: Reading of input after headers sent and 100-continue.

2008-01-30 Thread Brian Smith
 

 -Original Message-
 From: Graham Dumpleton [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, January 29, 2008 4:29 PM
 To: modules-dev@httpd.apache.org
 Subject: Reading of input after headers sent and 100-continue.
 
 The HTTP output filter will send a 100 result back to a 
 client when the first attempt to read input occurs and an 
 Except header with 100-continue was received. Ie., from 
 http_filters.c we have:
 
 if ((ctx-state == BODY_CHUNK ||
   (ctx-state == BODY_LENGTH  ctx-remaining  0)) 
   f-r-expecting_100  f-r-proto_num = HTTP_VERSION(1,1)) {

This is from ap_http_filter(). If you look at http_core.c, you can see
that it is registered as an input filter, not an output filter. So, if
you never read from the input brigade, the 100 continue will never be
sent. I'm not sure if the module needs to just ignore the input brigade,
or actively throw it away, though.

 The problem then is if only after having sent some response 
 content and triggering the response headers to be sent one 
 actually goes to read the input, then the HTTP output filter 
 above is still sending the 100 status response string. In 
 other words, the 100 response status string is appearing in 
 the middle of the actual response content.

Doctor, it hurts when I do this! :)

If a module is sending a response before a 100 continue has been sent,
then it shouldn't read from the input brigade, because it is going
against the HTTP spec. 

 My question then is, what should a handler do if it is trying 
 to generate response content (non buffered), before having 
 attempted to read any input, ie., what is the correct way to 
 stop Apache still sending the 100 status response for the 
 100-continue header? I know that setting r-expecting_100 to 
 0 at time that first response content is being sent will 
 prevent it, but is there something else that should be done
 instead?

Since ap_http_filter is an input filter only, it should be enough to
just avoid reading from the input brigade. (AFAICT, anyway.)

 BTW, this is partly theoretical in that have no actual code 
 that is doing this, but technically in systems like 
 mod_python or mod_wsgi where one doesn't know what the Python 
 application code running on top is doing, a user could 
 trigger this situation.

The module can provide an interface to the input and output brigades
that prevents the application from doing this. mod_wsgi is doing this
already. As I mentioned on the Web-SIG list, it is difficult to have an
uniform, automatic mechanism for doing this for all request methods, or
even a uniform way of doing it for a particular method. So, it basically
has to be left up to the handler/application.

- Brian



RE: Reading of input after headers sent and 100-continue.

2008-01-30 Thread Brian Smith
Graham Dumpleton wrote:
 Effectively, if a 200 response came back, it seems to suggest 
 that the client still should send the request body, just that 
 it 'SHOULD NOT wait for an indefinite period'. It doesn't say 
 explicitly for the client that it shouldn't still send the 
 request body if another response code comes back.

This behavior is to support servers that don't understand the Expect:
header. 

Basically, if the server responds with a 100, the client must send the
request body. If the server responds with a 4xx or 5xx, the client must
not send the request body. If the server responds with a 2xx or a 3xx,
then the client should must send (the rest of) the request body, on the
assumption that the server doesn't understand Expect:. To be
completely compliant, a server should always respond with a 100 in front
of a 2xx or 3xx, I guess. Thanks for clarifying that for me. I guess the
rules make sense after all.

 So technically, if the client has to still send the request 
 content, something could still read it. It would not be ideal 
 that there is a delay depending on what the client does, but 
 would still be possible from what I read of this section.

You are right. To avoid confusion, you should probably force mod_wsgi to
send a 100-continue in front of any 2xx or 3xx response.

 It MUST NOT perform the requested method if it returns a final status
code.

The implication is that the only time it will avoid sending a 100 is
when it is sending a 4xx, and it should never perform the requested
method if it already said the method failed. The only excuse for not
sending a 100 is that you don't know about Expect: 100-continue. But,
that can't be true if you are reading this part of the spec!

If it responds with a final status
 code, it MAY close the transport connection or it MAY continue
 to read and discard the rest of the request.

If the client receives a 2xx or 3xx without a 100 first, it has to send
the request body (well, depending on which 3xx it is, that is not true).
But, the server doesn't have to read it! But, again, the assumption is
that the server will only send a response without a 100 if it is a 4xx
or 5xx.

 It seems by what you are saying that if 100-continue is 
 present this wouldn't be allowed, and that to ensure correct 
 behaviour the handler would have to read at least some of the 
 request body before sending back the response headers.

You are right, I was wrong. 

  Since ap_http_filter is an input filter only, it should be 
 enough to 
  just avoid reading from the input brigade. (AFAICT, anyway.)
 
 In other words block the handler from reading, potentially 
 raise an error in the process. Except to be fair and 
 consistent, you would have to apply the same rule even if 
 100-continue isn't present. Whether that would break some 
 existing code in doing that is the concern I have, even if it 
 is some simple test program that just echos back the request 
 body as the response body.

Technically, even if the server returns a 4xx, it can still read the
request body, but it might not get anything or it might only get part of
it. I guess, the change to the WSGI spec that is needed is to say that
the gateway must not send the 100 continue if it has already sent some
headers, and that it should send a 100 continue before any 2xx or 3xx
code, which is basically what James Knight suggested (sorry James). The
gateway must indicate EOF if only a partial request body was received. I
don't think the gateway should be required to provide any of the partial
request content on a 4xx, though.

- Brian