Re: [RFC] Guide to writing output filters

2007-03-23 Thread Joe Orton
On Mon, Mar 19, 2007 at 10:15:03PM +0100, Ruediger Pluem wrote:
[on passing empty brigades and mod_cache]
 Once we detect that we have a fresh content entity in the cache, the 
 quick handler of mod_cache starts the filter chain by calling
 
 ap_pass_brigade(r-output_filters, out);
 
 where out is an empty brigade created by the quick handler. The brigade gets 
 filled
 with the cached content by the CACHE_OUT filter down in the chain.

I think that was simply abuse of the filter interface, and fixing 
mod_cache was correct.  If a filter was passed a non-empty brigade, it 
may legitimately buffer the contents of that brigade without passing it 
on down the chain.  So to then require explicitly that an *empty* 
brigade *must* be passed down makes little sense IMO.

...
 - Procesing buckets: I think with mmap enabled a file bucket will morph into
   a mmap bucket and the remaining file bucket. I think the heap bucket will
   only be created if mmap is turned off. But I agree that this  possibly 
  introduces
   too much complexity to the example and distracts the reader from the 
  important
   point.

  Yeah.  It's an implementation detail, and the risk is that if it gets 
  documented, people will rely on it somehow.
 
 Why can't you rely on this? Isn't that the publicly defined behaviour 
 of a file bucket when MMAP is enabled?

Simply that there's nothing in the definition of a FILE bucket which 
states it will always morph into MMAP under conditions X, Y, Z.

joe


Re: [RFC] Guide to writing output filters

2007-03-19 Thread Joe Orton
Thanks a lot for the review!

On Sat, Mar 17, 2007 at 04:30:24PM +0100, Ruediger Pluem wrote:
 Some comments from my side:
 
 - Passing empty brigades: While I agree that a filter should never create
   an empty brigade and pass it down the chain, I think it actually should
   pass an empty brigade down the chain if it gets one passed instead of
   simply returning. For reasons why see
 
 http://issues.apache.org/bugzilla/show_bug.cgi?id=40090
 http://mail-archives.apache.org/mod_mbox/httpd-dev/200607.mbox/[EMAIL 
 PROTECTED]

This is interesting, but I don't really understand how the problem 
described happens.  How is ap_finalize_request_protocol() getting called 
before a response has been sent?  (there is some symmetry between that 
problem and PR 38014)

 - apr_brigade_destroy: I think the danger about using apr_brigade_destroy is
   to call it and *reuse* this brigade afterwards, because in this case the
   removed pool cleanup can cause a memory leak. 

Right - and that should be true of every brigade which a filter uses 
(the passed-in brigade should not be destroyed; any brigades the filter 
itself creates *must* be long-lived and re-used).  So there's really no 
case in which to recommend use of apr_brigade_destroy() for a filter.  
It's almost always safer to use _cleanup.

I'll add a note about using _cleanup() to this section.

 - Procesing buckets: I think with mmap enabled a file bucket will morph into
   a mmap bucket and the remaining file bucket. I think the heap bucket will
   only be created if mmap is turned off. But I agree that this possibly 
 introduces
   too much complexity to the example and distracts the reader from the 
 important
   point.

Yeah.  It's an implementation detail, and the risk is that if it gets 
documented, people will rely on it somehow.
 
 - Filtering brigades:
   - Although this is not my opinion I know that there had been some 
 discussions in the
 past that no examples should be given on how you should *not* do things.

I was wary of doing this too, but it's such a common mistake that I 
thought it to explain explicitly why it's bad.

 - Maintaining state:
   - f-ctx = state = apr_palloc(sizeof(*state), f-r-pool); instead of
 f-ctx = state = apr_palloc(sizeof *state, f-r-pool); ?

I vaguely prefer sizeof without the brackets where valid ;)

   - IMHO ap_save_brigade can operate on an existing brigade. So this can
 be a brigade created once per invocation. I agree with the warning that
 especially on PIPE buckets ap_save_brigade can consume quite a lot of
 memory. Plus ap_save_brigade does a blocking read on the bucket.

Good points, I'll add some text.

joe


Re: [RFC] Guide to writing output filters

2007-03-19 Thread Joe Orton
Again, thanks for the review! 

On Sat, Mar 17, 2007 at 07:44:02AM -0400, Jeff Trawick wrote:
 I guess I'm confused about the up/down direction convention for output
 filters?  I thought passing the next output filter is down and
 returning to the prior input filter is up?

My confusion - I always think about filters in terms of the call stack, 
where ap_pass_brigade passes brigades up the stack.

 More detail about error handling would be invaluable.  Something that
 has been a thorn in the past has been the two types of status and
 understanding the limited relationship:
 
 apr_status_t as returned by a filter
 HTTP status as returned by a handler
 
 What can a filter do to influence HTTP status (set r-status on first
 invocation for a response?)?  What will be logged if a filter returns
 non-APR_SUCCESS?

Excellent questions for which I lack excellent answers :)

Also: is it better to use the elusive HTTP error bucket rather than 
setting r-status?  What should filters do with apr_bucket_read() 
errors?  Must filters _cleanup() the brigade before returning an error?  
(there is a bug in bugzilla caused by that one IIRC)

joe


Re: [RFC] Guide to writing output filters

2007-03-19 Thread Ruediger Pluem


On 03/19/2007 03:06 PM, Joe Orton wrote:
 Thanks a lot for the review!
 
 On Sat, Mar 17, 2007 at 04:30:24PM +0100, Ruediger Pluem wrote:
 
Some comments from my side:

- Passing empty brigades: While I agree that a filter should never create
  an empty brigade and pass it down the chain, I think it actually should
  pass an empty brigade down the chain if it gets one passed instead of
  simply returning. For reasons why see

http://issues.apache.org/bugzilla/show_bug.cgi?id=40090
http://mail-archives.apache.org/mod_mbox/httpd-dev/200607.mbox/[EMAIL 
PROTECTED]
 
 
 This is interesting, but I don't really understand how the problem 
 described happens.  How is ap_finalize_request_protocol() getting called 
 before a response has been sent?  (there is some symmetry between that 

Once we detect that we have a fresh content entity in the cache, the quick 
handler
of mod_cache starts the filter chain by calling

ap_pass_brigade(r-output_filters, out);

where out is an empty brigade created by the quick handler. The brigade gets 
filled
with the cached content by the CACHE_OUT filter down in the chain.
In the original code path and PR configuration mod_deflate was the first
filter in r-output_filters which caused the filter chain to return immediately 
and before
it reached the CACHE_OUT filter (mod_deflate detected an empty brigade and 
returned
APR_SUCCESS). This caused the quick handler of mod_cache to be left with OK, 
which triggers
ap_finalize_request_protocol. The correct fix in this situation of cause was to 
remove all
filters from r-output_filters until we hit mod_cache's CACHE_OUT filter, 
because this
is the position in the filter chain where the content went to the cache during 
caching.

 problem and PR 38014)

I think PR 38014 is somewhat different. The ap_http_filter simply was not 
completely aware
that it could be called in two situations:

- During request processing.
- After request processing in order to cleanup the connection.


 
 
- apr_brigade_destroy: I think the danger about using apr_brigade_destroy is
  to call it and *reuse* this brigade afterwards, because in this case the
  removed pool cleanup can cause a memory leak. 
 
 
 Right - and that should be true of every brigade which a filter uses 
 (the passed-in brigade should not be destroyed; any brigades the filter 
 itself creates *must* be long-lived and re-used).  So there's really no 
 case in which to recommend use of apr_brigade_destroy() for a filter.  

The only possible one from my perspective is on a brigade that was created
by the filter as long lived state brigade in the case that the filter either
sees an EOS bucket or decides to remove itself from the filter chain for some
other reason. But even in this case _cleanup is sufficient. We can only save
walking thru the brigades pool cleanup once the pool used for its creation
gets destroyed.

 It's almost always safer to use _cleanup.

Agreed. Seeing an apr_brigade_destroy in a filter should ring an alarm bell.

 
 I'll add a note about using _cleanup() to this section.
 
 
- Procesing buckets: I think with mmap enabled a file bucket will morph into
  a mmap bucket and the remaining file bucket. I think the heap bucket will
  only be created if mmap is turned off. But I agree that this possibly 
 introduces
  too much complexity to the example and distracts the reader from the 
 important
  point.
 
 
 Yeah.  It's an implementation detail, and the risk is that if it gets 
 documented, people will rely on it somehow.

Why can't you rely on this? Isn't that the publicly defined behaviour of a file 
bucket
when MMAP is enabled?

Regards

Rüdiger


Re: [RFC] Guide to writing output filters

2007-03-17 Thread Ruediger Pluem


On 03/16/2007 11:55 PM, Joe Orton wrote:
 http://people.apache.org/~jorton/output-filters.html
 
 How does this look?  Anything missed out, anything that doesn't make 
 sense?  I think this covers most of the major problems in output filters 
 which keep coming up.

Thanks for doing this. It looks very good to me, especially as it gives
us a set of rules and best practises even though I think there might be
a discussion on the details.

Some comments from my side:

- Passing empty brigades: While I agree that a filter should never create
  an empty brigade and pass it down the chain, I think it actually should
  pass an empty brigade down the chain if it gets one passed instead of
  simply returning. For reasons why see

http://issues.apache.org/bugzilla/show_bug.cgi?id=40090
http://mail-archives.apache.org/mod_mbox/httpd-dev/200607.mbox/[EMAIL PROTECTED]

- apr_brigade_destroy: I think the danger about using apr_brigade_destroy is
  to call it and *reuse* this brigade afterwards, because in this case the
  removed pool cleanup can cause a memory leak. I think it is ok to call it
  and never *reuse* it. So the remark of destroyed brigades containg buckets
  sounds a bit misleading to me. It can only contain buckets if it is reused
  after it was destroyed. But I agree that especially on passed brigades it
  is not always possible to find out whether this is reused and as it does
  not get us actual advantages apr_brigade_cleanup should be used instead
  in 99% of all cases. Maybe worth mentioning apr_brigade_cleanup in this 
context.

- Procesing buckets: I think with mmap enabled a file bucket will morph into
  a mmap bucket and the remaining file bucket. I think the heap bucket will
  only be created if mmap is turned off. But I agree that this possibly 
introduces
  too much complexity to the example and distracts the reader from the important
  point.

- Filtering brigades:
  - Although this is not my opinion I know that there had been some discussions 
in the
past that no examples should be given on how you should *not* do things.
  - There is one word too much (use / consume) in
in contrast, the implementation below will use consume a fixed
  - I guess it should be APR_BRIGADE_INSERT_HEAD(tmpbb, e); instead of
APR_BRIGADE_INSERT_HEAD(tmpbb);

- Maintaining state:
  - f-ctx = state = apr_palloc(sizeof(*state), f-r-pool); instead of
f-ctx = state = apr_palloc(sizeof *state, f-r-pool); ?
  - IMHO ap_save_brigade can operate on an existing brigade. So this can
be a brigade created once per invocation. I agree with the warning that
especially on PIPE buckets ap_save_brigade can consume quite a lot of
memory. Plus ap_save_brigade does a blocking read on the bucket.

- Ten rules for output filters
  - 1. See my comments above
  - 6. It needs to be apr_brigade_cleanup instead of apr_brigade_clear.

 
 I'd also like to add a simple buffering filter which does things right 
 and can be used as a reference; all other in-tree filters are either too 
 complicated (filters/*, http/* etc) or too awful (experimental/*).  Any 
 objections?

Sounds good to me.  Just did not have the time to review it so far.

Regards

Rüdiger



Re: [RFC] Guide to writing output filters

2007-03-17 Thread Jeff Trawick

On 3/16/07, Joe Orton [EMAIL PROTECTED] wrote:

http://people.apache.org/~jorton/output-filters.html


I guess I'm confused about the up/down direction convention for output
filters?  I thought passing the next output filter is down and
returning to the prior input filter is up?


a few examples:



Generally, all metadata buckets should be passed up the filter chain

by an output filter.
down the filter chain


Filters can create FLUSH buckets and pass these up the filter chain if desired.

down the filter chain


An output filter should never pass an empty brigade up the filter chain.

down the filter chain


/* Pass brigade upstream. */
  rv = ap_pass_brigade(f-next, tmpbb);


/* Pass brigade downstream. */

minor grammatical tweaks:


the number of times a filter is invoked for single response

for a single response


The PIPE  bucket type is an example of a bucket type has an

indeterminate length;
a bucket type which has an indeterminate length

Possible confusion:


# Output filters which read all the buckets in a brigade must process

a fixed number of buckets (or amount of data) at a time, to ensure
that memory consumption is not proportional to the size of the
content being filtered.

This may be unclear, since you're not referring to buckets received on
input to the filter but buckets returned by bucket-read (after
morphing).

Perhaps must process a fixed amount of data at a time is less
subject to incorrect interpretation?



How does this look?


It looks like a great start to me.

More detail about error handling would be invaluable.  Something that
has been a thorn in the past has been the two types of status and
understanding the limited relationship:

apr_status_t as returned by a filter
HTTP status as returned by a handler

What can a filter do to influence HTTP status (set r-status on first
invocation for a response?)?  What will be logged if a filter returns
non-APR_SUCCESS?


Re: [RFC] Guide to writing output filters

2007-03-17 Thread Nick Kew
On Fri, 16 Mar 2007 22:55:21 +
Joe Orton [EMAIL PROTECTED] wrote:

 http://people.apache.org/~jorton/output-filters.html
 
 How does this look?  Anything missed out, anything that doesn't make 
 sense?  I think this covers most of the major problems in output
 filters which keep coming up.

On the broad picture ...

If we have this in the documentation, people will take it as definitive.
So it had better be clear about its exact scope: namely, direct
manipulation of buckets and brigades.

Example: We know that the stdio-style buffered I/O (ap_fwrite et al)
is what you need for some filters.  People reading our documentation
may not know that.  To avoid the risk of misleading them, the document
should add an explicit note that this is not covered because it falls
outside your scope, not because there's anything wrong with it.

Or, better, an explicit statement of scope at the beginning, making
it clear that there are many aspects of filtering that fall outside
the scope of it.  Protocol handling, to name one more.

 I'd also like to add a simple buffering filter which does things
 right and can be used as a reference; all other in-tree filters are
 either too complicated (filters/*, http/* etc) or too awful
 (experimental/*).  Any objections?

I think I have a better idea.  But I think I'd rather hack it up
(round tuits permitting) than discuss it here in the abstract.

Actually if I'm right, your proposed example will still help,
as it'll be a useful contrast.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/


Re: [RFC] Guide to writing output filters

2007-03-17 Thread Ruediger Pluem

On 03/16/2007 11:55 PM, Joe Orton wrote:
 http://people.apache.org/~jorton/output-filters.html
 
 How does this look?  Anything missed out, anything that doesn't make 
 sense?  I think this covers most of the major problems in output filters 
 which keep coming up.

Thanks for doing this. It looks very good to me, especially as it gives
us a set of rules and best practises even though I think there might be
a discussion on the details.

Some comments from my side:

- Passing empty brigades: While I agree that a filter should never create
  an empty brigade and pass it down the chain, I think it actually should
  pass an empty brigade down the chain if it gets one passed instead of
  simply returning. For reasons why see

http://issues.apache.org/bugzilla/show_bug.cgi?id=40090
http://mail-archives.apache.org/mod_mbox/httpd-dev/200607.mbox/[EMAIL PROTECTED]

- apr_brigade_destroy: I think the danger about using apr_brigade_destroy is
  to call it and *reuse* this brigade afterwards, because in this case the
  removed pool cleanup can cause a memory leak. I think it is ok to call it
  and never *reuse* it. So the remark of destroyed brigades containg buckets
  sounds a bit misleading to me. It can only contain buckets if it is reused
  after it was destroyed. But I agree that especially on passed brigades it
  is not always possible to find out whether this is reused and as it does
  not get us actual advantages apr_brigade_cleanup should be used instead
  in 99% of all cases. Maybe worth mentioning apr_brigade_cleanup in this 
context.

- Procesing buckets: I think with mmap enabled a file bucket will morph into
  a mmap bucket and the remaining file bucket. I think the heap bucket will
  only be created if mmap is turned off. But I agree that this possibly 
introduces
  too much complexity to the example and distracts the reader from the important
  point.

- Filtering brigades:
  - Although this is not my opinion I know that there had been some discussions 
in the
past that no examples should be given on how you should *not* do things.
  - There is one word too much (use / consume) in
in contrast, the implementation below will use consume a fixed
  - I guess it should be APR_BRIGADE_INSERT_HEAD(tmpbb, e); instead of
APR_BRIGADE_INSERT_HEAD(tmpbb);

- Maintaining state:
  - f-ctx = state = apr_palloc(sizeof(*state), f-r-pool); instead of
f-ctx = state = apr_palloc(sizeof *state, f-r-pool); ?
  - IMHO ap_save_brigade can operate on an existing brigade. So this can
be a brigade created once per invocation. I agree with the warning that
especially on PIPE buckets ap_save_brigade can consume quite a lot of
memory. Plus ap_save_brigade does a blocking read on the bucket.

- Ten rules for output filters
  - 1. See my comments above
  - 6. It needs to be apr_brigade_cleanup instead of apr_brigade_clear.

 
 I'd also like to add a simple buffering filter which does things right 
 and can be used as a reference; all other in-tree filters are either too 
 complicated (filters/*, http/* etc) or too awful (experimental/*).  Any 
 objections?

Sounds good to me.  Just did not have the time to review it so far.

Regards

Rüdiger




Re: [RFC] Guide to writing output filters

2007-03-17 Thread Niklas Edmundsson

On Sat, 17 Mar 2007, Ruediger Pluem wrote:


On 03/16/2007 11:55 PM, Joe Orton wrote:

http://people.apache.org/~jorton/output-filters.html

How does this look?  Anything missed out, anything that doesn't make
sense?  I think this covers most of the major problems in output filters
which keep coming up.


Thanks for doing this. It looks very good to me, especially as it gives
us a set of rules and best practises even though I think there might be
a discussion on the details.


As a not-so-clued person on httpd internals I have to whole-hearedly 
agree and add a Bravo! to this effort.


httpd is seriously lacking on the devel-docco-front, meaning that the 
little in-tree documentation and examples that exists is generally 
outdated or broken, and out-of-tree docco doesn't count in this 
regard. This is truly a step in the right direction.


Now, if only someone clued could have a go at the existing pages that 
says this should be improved/updated/written life would be bliss :)


And yes, I know that writing documentation is a drag. However, in the 
long run it pays off. Really.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Is virus a 'micro' organism?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=