Re: [RFC] Guide to writing output filters
On Mon, Mar 19, 2007 at 10:15:03PM +0100, Ruediger Pluem wrote: [on passing empty brigades and mod_cache] Once we detect that we have a fresh content entity in the cache, the quick handler of mod_cache starts the filter chain by calling ap_pass_brigade(r-output_filters, out); where out is an empty brigade created by the quick handler. The brigade gets filled with the cached content by the CACHE_OUT filter down in the chain. I think that was simply abuse of the filter interface, and fixing mod_cache was correct. If a filter was passed a non-empty brigade, it may legitimately buffer the contents of that brigade without passing it on down the chain. So to then require explicitly that an *empty* brigade *must* be passed down makes little sense IMO. ... - Procesing buckets: I think with mmap enabled a file bucket will morph into a mmap bucket and the remaining file bucket. I think the heap bucket will only be created if mmap is turned off. But I agree that this possibly introduces too much complexity to the example and distracts the reader from the important point. Yeah. It's an implementation detail, and the risk is that if it gets documented, people will rely on it somehow. Why can't you rely on this? Isn't that the publicly defined behaviour of a file bucket when MMAP is enabled? Simply that there's nothing in the definition of a FILE bucket which states it will always morph into MMAP under conditions X, Y, Z. joe
Re: [RFC] Guide to writing output filters
Thanks a lot for the review! On Sat, Mar 17, 2007 at 04:30:24PM +0100, Ruediger Pluem wrote: Some comments from my side: - Passing empty brigades: While I agree that a filter should never create an empty brigade and pass it down the chain, I think it actually should pass an empty brigade down the chain if it gets one passed instead of simply returning. For reasons why see http://issues.apache.org/bugzilla/show_bug.cgi?id=40090 http://mail-archives.apache.org/mod_mbox/httpd-dev/200607.mbox/[EMAIL PROTECTED] This is interesting, but I don't really understand how the problem described happens. How is ap_finalize_request_protocol() getting called before a response has been sent? (there is some symmetry between that problem and PR 38014) - apr_brigade_destroy: I think the danger about using apr_brigade_destroy is to call it and *reuse* this brigade afterwards, because in this case the removed pool cleanup can cause a memory leak. Right - and that should be true of every brigade which a filter uses (the passed-in brigade should not be destroyed; any brigades the filter itself creates *must* be long-lived and re-used). So there's really no case in which to recommend use of apr_brigade_destroy() for a filter. It's almost always safer to use _cleanup. I'll add a note about using _cleanup() to this section. - Procesing buckets: I think with mmap enabled a file bucket will morph into a mmap bucket and the remaining file bucket. I think the heap bucket will only be created if mmap is turned off. But I agree that this possibly introduces too much complexity to the example and distracts the reader from the important point. Yeah. It's an implementation detail, and the risk is that if it gets documented, people will rely on it somehow. - Filtering brigades: - Although this is not my opinion I know that there had been some discussions in the past that no examples should be given on how you should *not* do things. I was wary of doing this too, but it's such a common mistake that I thought it to explain explicitly why it's bad. - Maintaining state: - f-ctx = state = apr_palloc(sizeof(*state), f-r-pool); instead of f-ctx = state = apr_palloc(sizeof *state, f-r-pool); ? I vaguely prefer sizeof without the brackets where valid ;) - IMHO ap_save_brigade can operate on an existing brigade. So this can be a brigade created once per invocation. I agree with the warning that especially on PIPE buckets ap_save_brigade can consume quite a lot of memory. Plus ap_save_brigade does a blocking read on the bucket. Good points, I'll add some text. joe
Re: [RFC] Guide to writing output filters
Again, thanks for the review! On Sat, Mar 17, 2007 at 07:44:02AM -0400, Jeff Trawick wrote: I guess I'm confused about the up/down direction convention for output filters? I thought passing the next output filter is down and returning to the prior input filter is up? My confusion - I always think about filters in terms of the call stack, where ap_pass_brigade passes brigades up the stack. More detail about error handling would be invaluable. Something that has been a thorn in the past has been the two types of status and understanding the limited relationship: apr_status_t as returned by a filter HTTP status as returned by a handler What can a filter do to influence HTTP status (set r-status on first invocation for a response?)? What will be logged if a filter returns non-APR_SUCCESS? Excellent questions for which I lack excellent answers :) Also: is it better to use the elusive HTTP error bucket rather than setting r-status? What should filters do with apr_bucket_read() errors? Must filters _cleanup() the brigade before returning an error? (there is a bug in bugzilla caused by that one IIRC) joe
Re: [RFC] Guide to writing output filters
On 03/19/2007 03:06 PM, Joe Orton wrote: Thanks a lot for the review! On Sat, Mar 17, 2007 at 04:30:24PM +0100, Ruediger Pluem wrote: Some comments from my side: - Passing empty brigades: While I agree that a filter should never create an empty brigade and pass it down the chain, I think it actually should pass an empty brigade down the chain if it gets one passed instead of simply returning. For reasons why see http://issues.apache.org/bugzilla/show_bug.cgi?id=40090 http://mail-archives.apache.org/mod_mbox/httpd-dev/200607.mbox/[EMAIL PROTECTED] This is interesting, but I don't really understand how the problem described happens. How is ap_finalize_request_protocol() getting called before a response has been sent? (there is some symmetry between that Once we detect that we have a fresh content entity in the cache, the quick handler of mod_cache starts the filter chain by calling ap_pass_brigade(r-output_filters, out); where out is an empty brigade created by the quick handler. The brigade gets filled with the cached content by the CACHE_OUT filter down in the chain. In the original code path and PR configuration mod_deflate was the first filter in r-output_filters which caused the filter chain to return immediately and before it reached the CACHE_OUT filter (mod_deflate detected an empty brigade and returned APR_SUCCESS). This caused the quick handler of mod_cache to be left with OK, which triggers ap_finalize_request_protocol. The correct fix in this situation of cause was to remove all filters from r-output_filters until we hit mod_cache's CACHE_OUT filter, because this is the position in the filter chain where the content went to the cache during caching. problem and PR 38014) I think PR 38014 is somewhat different. The ap_http_filter simply was not completely aware that it could be called in two situations: - During request processing. - After request processing in order to cleanup the connection. - apr_brigade_destroy: I think the danger about using apr_brigade_destroy is to call it and *reuse* this brigade afterwards, because in this case the removed pool cleanup can cause a memory leak. Right - and that should be true of every brigade which a filter uses (the passed-in brigade should not be destroyed; any brigades the filter itself creates *must* be long-lived and re-used). So there's really no case in which to recommend use of apr_brigade_destroy() for a filter. The only possible one from my perspective is on a brigade that was created by the filter as long lived state brigade in the case that the filter either sees an EOS bucket or decides to remove itself from the filter chain for some other reason. But even in this case _cleanup is sufficient. We can only save walking thru the brigades pool cleanup once the pool used for its creation gets destroyed. It's almost always safer to use _cleanup. Agreed. Seeing an apr_brigade_destroy in a filter should ring an alarm bell. I'll add a note about using _cleanup() to this section. - Procesing buckets: I think with mmap enabled a file bucket will morph into a mmap bucket and the remaining file bucket. I think the heap bucket will only be created if mmap is turned off. But I agree that this possibly introduces too much complexity to the example and distracts the reader from the important point. Yeah. It's an implementation detail, and the risk is that if it gets documented, people will rely on it somehow. Why can't you rely on this? Isn't that the publicly defined behaviour of a file bucket when MMAP is enabled? Regards Rüdiger
Re: [RFC] Guide to writing output filters
On 03/16/2007 11:55 PM, Joe Orton wrote: http://people.apache.org/~jorton/output-filters.html How does this look? Anything missed out, anything that doesn't make sense? I think this covers most of the major problems in output filters which keep coming up. Thanks for doing this. It looks very good to me, especially as it gives us a set of rules and best practises even though I think there might be a discussion on the details. Some comments from my side: - Passing empty brigades: While I agree that a filter should never create an empty brigade and pass it down the chain, I think it actually should pass an empty brigade down the chain if it gets one passed instead of simply returning. For reasons why see http://issues.apache.org/bugzilla/show_bug.cgi?id=40090 http://mail-archives.apache.org/mod_mbox/httpd-dev/200607.mbox/[EMAIL PROTECTED] - apr_brigade_destroy: I think the danger about using apr_brigade_destroy is to call it and *reuse* this brigade afterwards, because in this case the removed pool cleanup can cause a memory leak. I think it is ok to call it and never *reuse* it. So the remark of destroyed brigades containg buckets sounds a bit misleading to me. It can only contain buckets if it is reused after it was destroyed. But I agree that especially on passed brigades it is not always possible to find out whether this is reused and as it does not get us actual advantages apr_brigade_cleanup should be used instead in 99% of all cases. Maybe worth mentioning apr_brigade_cleanup in this context. - Procesing buckets: I think with mmap enabled a file bucket will morph into a mmap bucket and the remaining file bucket. I think the heap bucket will only be created if mmap is turned off. But I agree that this possibly introduces too much complexity to the example and distracts the reader from the important point. - Filtering brigades: - Although this is not my opinion I know that there had been some discussions in the past that no examples should be given on how you should *not* do things. - There is one word too much (use / consume) in in contrast, the implementation below will use consume a fixed - I guess it should be APR_BRIGADE_INSERT_HEAD(tmpbb, e); instead of APR_BRIGADE_INSERT_HEAD(tmpbb); - Maintaining state: - f-ctx = state = apr_palloc(sizeof(*state), f-r-pool); instead of f-ctx = state = apr_palloc(sizeof *state, f-r-pool); ? - IMHO ap_save_brigade can operate on an existing brigade. So this can be a brigade created once per invocation. I agree with the warning that especially on PIPE buckets ap_save_brigade can consume quite a lot of memory. Plus ap_save_brigade does a blocking read on the bucket. - Ten rules for output filters - 1. See my comments above - 6. It needs to be apr_brigade_cleanup instead of apr_brigade_clear. I'd also like to add a simple buffering filter which does things right and can be used as a reference; all other in-tree filters are either too complicated (filters/*, http/* etc) or too awful (experimental/*). Any objections? Sounds good to me. Just did not have the time to review it so far. Regards Rüdiger
Re: [RFC] Guide to writing output filters
On 3/16/07, Joe Orton [EMAIL PROTECTED] wrote: http://people.apache.org/~jorton/output-filters.html I guess I'm confused about the up/down direction convention for output filters? I thought passing the next output filter is down and returning to the prior input filter is up? a few examples: Generally, all metadata buckets should be passed up the filter chain by an output filter. down the filter chain Filters can create FLUSH buckets and pass these up the filter chain if desired. down the filter chain An output filter should never pass an empty brigade up the filter chain. down the filter chain /* Pass brigade upstream. */ rv = ap_pass_brigade(f-next, tmpbb); /* Pass brigade downstream. */ minor grammatical tweaks: the number of times a filter is invoked for single response for a single response The PIPE bucket type is an example of a bucket type has an indeterminate length; a bucket type which has an indeterminate length Possible confusion: # Output filters which read all the buckets in a brigade must process a fixed number of buckets (or amount of data) at a time, to ensure that memory consumption is not proportional to the size of the content being filtered. This may be unclear, since you're not referring to buckets received on input to the filter but buckets returned by bucket-read (after morphing). Perhaps must process a fixed amount of data at a time is less subject to incorrect interpretation? How does this look? It looks like a great start to me. More detail about error handling would be invaluable. Something that has been a thorn in the past has been the two types of status and understanding the limited relationship: apr_status_t as returned by a filter HTTP status as returned by a handler What can a filter do to influence HTTP status (set r-status on first invocation for a response?)? What will be logged if a filter returns non-APR_SUCCESS?
Re: [RFC] Guide to writing output filters
On Fri, 16 Mar 2007 22:55:21 + Joe Orton [EMAIL PROTECTED] wrote: http://people.apache.org/~jorton/output-filters.html How does this look? Anything missed out, anything that doesn't make sense? I think this covers most of the major problems in output filters which keep coming up. On the broad picture ... If we have this in the documentation, people will take it as definitive. So it had better be clear about its exact scope: namely, direct manipulation of buckets and brigades. Example: We know that the stdio-style buffered I/O (ap_fwrite et al) is what you need for some filters. People reading our documentation may not know that. To avoid the risk of misleading them, the document should add an explicit note that this is not covered because it falls outside your scope, not because there's anything wrong with it. Or, better, an explicit statement of scope at the beginning, making it clear that there are many aspects of filtering that fall outside the scope of it. Protocol handling, to name one more. I'd also like to add a simple buffering filter which does things right and can be used as a reference; all other in-tree filters are either too complicated (filters/*, http/* etc) or too awful (experimental/*). Any objections? I think I have a better idea. But I think I'd rather hack it up (round tuits permitting) than discuss it here in the abstract. Actually if I'm right, your proposed example will still help, as it'll be a useful contrast. -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/
Re: [RFC] Guide to writing output filters
On 03/16/2007 11:55 PM, Joe Orton wrote: http://people.apache.org/~jorton/output-filters.html How does this look? Anything missed out, anything that doesn't make sense? I think this covers most of the major problems in output filters which keep coming up. Thanks for doing this. It looks very good to me, especially as it gives us a set of rules and best practises even though I think there might be a discussion on the details. Some comments from my side: - Passing empty brigades: While I agree that a filter should never create an empty brigade and pass it down the chain, I think it actually should pass an empty brigade down the chain if it gets one passed instead of simply returning. For reasons why see http://issues.apache.org/bugzilla/show_bug.cgi?id=40090 http://mail-archives.apache.org/mod_mbox/httpd-dev/200607.mbox/[EMAIL PROTECTED] - apr_brigade_destroy: I think the danger about using apr_brigade_destroy is to call it and *reuse* this brigade afterwards, because in this case the removed pool cleanup can cause a memory leak. I think it is ok to call it and never *reuse* it. So the remark of destroyed brigades containg buckets sounds a bit misleading to me. It can only contain buckets if it is reused after it was destroyed. But I agree that especially on passed brigades it is not always possible to find out whether this is reused and as it does not get us actual advantages apr_brigade_cleanup should be used instead in 99% of all cases. Maybe worth mentioning apr_brigade_cleanup in this context. - Procesing buckets: I think with mmap enabled a file bucket will morph into a mmap bucket and the remaining file bucket. I think the heap bucket will only be created if mmap is turned off. But I agree that this possibly introduces too much complexity to the example and distracts the reader from the important point. - Filtering brigades: - Although this is not my opinion I know that there had been some discussions in the past that no examples should be given on how you should *not* do things. - There is one word too much (use / consume) in in contrast, the implementation below will use consume a fixed - I guess it should be APR_BRIGADE_INSERT_HEAD(tmpbb, e); instead of APR_BRIGADE_INSERT_HEAD(tmpbb); - Maintaining state: - f-ctx = state = apr_palloc(sizeof(*state), f-r-pool); instead of f-ctx = state = apr_palloc(sizeof *state, f-r-pool); ? - IMHO ap_save_brigade can operate on an existing brigade. So this can be a brigade created once per invocation. I agree with the warning that especially on PIPE buckets ap_save_brigade can consume quite a lot of memory. Plus ap_save_brigade does a blocking read on the bucket. - Ten rules for output filters - 1. See my comments above - 6. It needs to be apr_brigade_cleanup instead of apr_brigade_clear. I'd also like to add a simple buffering filter which does things right and can be used as a reference; all other in-tree filters are either too complicated (filters/*, http/* etc) or too awful (experimental/*). Any objections? Sounds good to me. Just did not have the time to review it so far. Regards Rüdiger
Re: [RFC] Guide to writing output filters
On Sat, 17 Mar 2007, Ruediger Pluem wrote: On 03/16/2007 11:55 PM, Joe Orton wrote: http://people.apache.org/~jorton/output-filters.html How does this look? Anything missed out, anything that doesn't make sense? I think this covers most of the major problems in output filters which keep coming up. Thanks for doing this. It looks very good to me, especially as it gives us a set of rules and best practises even though I think there might be a discussion on the details. As a not-so-clued person on httpd internals I have to whole-hearedly agree and add a Bravo! to this effort. httpd is seriously lacking on the devel-docco-front, meaning that the little in-tree documentation and examples that exists is generally outdated or broken, and out-of-tree docco doesn't count in this regard. This is truly a step in the right direction. Now, if only someone clued could have a go at the existing pages that says this should be improved/updated/written life would be bliss :) And yes, I know that writing documentation is a drag. However, in the long run it pays off. Really. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED] --- Is virus a 'micro' organism? =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=