RE: [Patch] Async write completion for the full connection filter stack
-Original Message- From: Jim Jagielski [mailto:j...@jagunet.com] Sent: Montag, 8. September 2014 21:31 To: dev@httpd.apache.org Subject: Re: [Patch] Async write completion for the full connection filter stack Another consideration: We now have the idea of a master and slave connection, and maybe something there would also help... FWIW: I like using an empty bucket conceptually since it should be ez and quick to check. Agreed, but I think from design perspective using the empty brigade is a side effect we assign to it that is not immediately jumping at your eyes especially if you are just developing modules. Thinking the below further we might need some kind of advisor API for the filters that tells how much data they should consume to avoid buffering too much and how much they can send down the chain without ending up in a blocking write. How much buffering is advised could be set by a configuration directive. Regards Rüdiger On Sep 8, 2014, at 2:53 PM, Ruediger Pluem rpl...@apache.org wrote: Wouldn't it make more sense instead of using an empty brigade to create yet another metabucket that signals write completion? It could also contain information how much data to send down the chain for single filters if they e.g. send heap or transient buckets. Otherwise how should they know? If you have a filter that has a large file bucket set aside and it does transform it e.g. to a heap bucket during it's processing because it changes data on it I guess it doesn't make sense if it does send all stuff once it gets triggered for write completion as we would end up in a blocking write then in the core filter. But if it knows how much is left in the core filter buffer it could try to just sent this and avoid thus blocking writes. And if there is no room left in the buffer or if what is left is too small for the filter to operate on it, the filter could just pass the bucket down the chain and if it would end up in the core output filter, the core output filter would just try to write what it has buffered. Regards Rüdiger Jim Jagielski wrote: Gotcha... +1 On Sep 8, 2014, at 11:29 AM, Graham Leggett minf...@sharp.fm wrote: On 08 Sep 2014, at 3:50 PM, Jim Jagielski j...@jagunet.com wrote: This is pretty cool... haven't played too much with it, but via inspection I like the implementation.
Re: [Patch] Async write completion for the full connection filter stack
On Mon, 2014-09-08 at 17:25 +0200, Graham Leggett wrote: Ideally, filters should do this, but generally they don’t: /* Do nothing if asked to filter nothing. */ if (APR_BRIGADE_EMPTY(bb)) { return ap_pass_brigade(f-next, bb); } Why on Earth should filters want to do that, as opposed to: Some filters, like mod_deflate, do this: /* Do nothing if asked to filter nothing. */ if (APR_BRIGADE_EMPTY(bb)) { return APR_SUCCESS; } or similar variants? In these cases ap_pass_brigade() is never called, so we detect this by keeping a marker that is changed on every call to ap_pass_brigade(). If the marker wasn’t changed during the call to the filter, we compensate by calling each downstream filter until the marker is changed, or we run out of filters. Yes. The logic is that we call ap_pass_brigade when there's something to pass. Not when there's nothing: that would just look like superfluous overhead. If you have a reason to propagate an immediate event regardless of that logic, surely that's the business of a FLUSH bucket. Then the question becomes, is it ever right to absorb (or buffer) and fail to propagate a FLUSH? You seem instead to be ascribing FLUSH semantics to an empty brigade! As a filter developer, it's my business to pass a brigade when: 1) I'm ready to pass data. 2) I encounter EOS, when I must finish up and propagate it. 3) I am explicitly signalled to FLUSH whatever I can. What am I missing? Do we have a need to refine the FLUSH bucket type? Maybe an EVENT bucket carrying an event descriptor? -- Nick Kew
Re: [Patch] Async write completion for the full connection filter stack
On 09 Sep 2014, at 10:58 AM, Nick Kew n...@webthing.com wrote: Ideally, filters should do this, but generally they don’t: /* Do nothing if asked to filter nothing. */ if (APR_BRIGADE_EMPTY(bb)) { return ap_pass_brigade(f-next, bb); } Why on Earth should filters want to do that, as opposed to: Some filters, like mod_deflate, do this: /* Do nothing if asked to filter nothing. */ if (APR_BRIGADE_EMPTY(bb)) { return APR_SUCCESS; } or similar variants? Because if they did, the compensation code in ap_pass_brigade() wouldn’t be necessary. In these cases ap_pass_brigade() is never called, so we detect this by keeping a marker that is changed on every call to ap_pass_brigade(). If the marker wasn’t changed during the call to the filter, we compensate by calling each downstream filter until the marker is changed, or we run out of filters. Yes. The logic is that we call ap_pass_brigade when there's something to pass. Not when there's nothing: that would just look like superfluous overhead. If you have a reason to propagate an immediate event regardless of that logic, surely that's the business of a FLUSH bucket. Then the question becomes, is it ever right to absorb (or buffer) and fail to propagate a FLUSH? You seem instead to be ascribing FLUSH semantics to an empty brigade! To be clear, an empty brigade does _not_ mean flush, not even slightly. Flush means “stop everything and perform this potentially expensive task to completion right now”, and is the exact opposite of what we’re trying to achieve. As a filter developer, it's my business to pass a brigade when: 1) I'm ready to pass data. 2) I encounter EOS, when I must finish up and propagate it. 3) I am explicitly signalled to FLUSH whatever I can. What am I missing? Do we have a need to refine the FLUSH bucket type? Maybe an EVENT bucket carrying an event descriptor? In a synchronous world where it doesn’t matter how long a unit of work takes, sure. In an async world you need to break up long running tasks into short running ones, so that others get a chance to have their data sent in the same thread, then this doesn’t work. Filters need to be able to yield and setaside data when they’re given too much data to process just like the core filter can - but right now they can’t, because that filter will never get called again, because upstream has no data to send. Regards, Graham —
Re: [Patch] Async write completion for the full connection filter stack
On 08 Sep 2014, at 8:53 PM, Ruediger Pluem rpl...@apache.org wrote: Wouldn't it make more sense instead of using an empty brigade to create yet another metabucket that signals write completion? It could also contain information how much data to send down the chain for single filters if they e.g. send heap or transient buckets. Otherwise how should they know? If you have a filter that has a large file bucket set aside and it does transform it e.g. to a heap bucket during it's processing because it changes data on it I guess it doesn't make sense if it does send all stuff once it gets triggered for write completion as we would end up in a blocking write then in the core filter. But if it knows how much is left in the core filter buffer it could try to just sent this and avoid thus blocking writes. And if there is no room left in the buffer or if what is left is too small for the filter to operate on it, the filter could just pass the bucket down the chain and if it would end up in the core output filter, the core output filter would just try to write what it has buffered. I spent a lot of time going down this path of having a dedicated metabucket, and quickly got bogged down in complexity. The key problem was “what does a filter actually do when you get one”, it was unclear and it made my head bleed. That makes life hard for module authors and that is bad. As I recall there were also broken filters out there that only knew about FLUSH and EOS buckets (eg ap_http_chunk_filter()). The problem we’re trying to solve is one of starvation - no filters can set aside data for later (except core via the NULL hack), because there is no guarantee that they’ll ever be called again later. You have to write it now, or potentially write it never. The start of the solution is ensure filters aren’t starved: if you have data in the output filters - and obviously you have no idea which filters have setaside data - you need a way to wake them all up. The simplest and least disruptive way is to pass them all an empty brigade, job done. We’ve got precedent for this - we’ve been sending NULL to the core filter to achieve the same thing, we want something that works with any filter. The second part of the problem is filters biting off more than they can chew. Example: give mod_ssl a 1GB file bucket and mod_ssl won’t yield until that entire 1GB file has been sent for the reason (now solved) above. The next step to enable write completion is to teach filters like mod_ssl to yield when handling large quantities of data. The core filter has an algorithm to yield, including various checks for flow control and sanity with respect to file handles. If a variant of this algorithm could be exposed generically and made available to critical filters like mod_ssl, we’ll crack write completion. Regards, Graham —
AW: [Patch] Async write completion for the full connection filter stack
-Ursprüngliche Nachricht- Von: Graham Leggett [mailto:minf...@sharp.fm] Gesendet: Dienstag, 9. September 2014 17:45 An: dev@httpd.apache.org Betreff: Re: [Patch] Async write completion for the full connection filter stack On 08 Sep 2014, at 8:53 PM, Ruediger Pluem rpl...@apache.org wrote: Wouldn't it make more sense instead of using an empty brigade to create yet another metabucket that signals write completion? It could also contain information how much data to send down the chain for single filters if they e.g. send heap or transient buckets. Otherwise how should they know? If you have a filter that has a large file bucket set aside and it does transform it e.g. to a heap bucket during it's processing because it changes data on it I guess it doesn't make sense if it does send all stuff once it gets triggered for write completion as we would end up in a blocking write then in the core filter. But if it knows how much is left in the core filter buffer it could try to just sent this and avoid thus blocking writes. And if there is no room left in the buffer or if what is left is too small for the filter to operate on it, the filter could just pass the bucket down the chain and if it would end up in the core output filter, the core output filter would just try to write what it has buffered. I spent a lot of time going down this path of having a dedicated metabucket, and quickly got bogged down in complexity. The key problem was what does a filter actually do when you get one, it was unclear Don't we have the same problem with an empty brigade? Some filters are not going to handle it as we expect. Hence the additional logic in ap_pass_brigade. I guess the minimum behavior we need to get from every filter is to ignore and pass on. and it made my head bleed. That makes life hard for module authors and that is bad. As I recall there were also broken filters out there that only knew about FLUSH and EOS buckets (eg ap_http_chunk_filter()). We already have additional metabuckets like error buckets or EOR. So I don't see an issue creating a new one. Any filter not passing a meta bucket that is does not understand or even try to process it is simply broken. The problem we're trying to solve is one of starvation - no filters can set aside data for later (except core via the NULL hack), because there is no guarantee that they'll ever be called again later. You have to write it now, or potentially write it never. The start of the solution is ensure filters aren't starved: if you have data in the output filters - and obviously you have no idea which filters have setaside data - you need a way to wake them all up. The simplest and least disruptive way is to pass them all an empty brigade, job done. We've got precedent for this - we've been sending NULL to the core filter to achieve the same But this is *our* filter and it will not hit any custom filters. So we can do this kind of hacky game here. thing, we want something that works with any filter. Yes, and this is the reason why I still believe a meta bucket is better. The second part of the problem is filters biting off more than they can chew. Example: give mod_ssl a 1GB file bucket and mod_ssl won't yield until that entire 1GB file has been sent for the reason (now solved) above. The next step to enable write completion is to teach filters like mod_ssl to yield when handling large quantities of data. The core filter has an algorithm to yield, including various checks for flow control and sanity with respect to file handles. If a variant of this algorithm could be exposed generically and made available to critical filters like mod_ssl, we'll crack write completion. See my other post. I proposed some kind of advisor API that tells a filter how much it should write to avoid buffering too much and consuming too much memory and how much it could write to likely avoid a blocking write. As this will not be always accurate I call it advisor API. Regards Rüdiger