subject:"Re\: \[Patch\] Async write completion for the full connection filter stack"

Re: Serf and mod_lua and libmill (Was: Re: [Patch] Async write completion for the full connection filter stack)

2015-10-07 Thread Graham Leggett

On 05 Oct 2015, at 5:13 PM, Eric Covener  wrote:

> It seems like
> extending that to handlers (and implicitly, request filters) is more
> the  "can't be done" part.

Having had a chance to look at the handlers in more detail, I suspect the 
problem is that async support wasn’t extended far enough. You can return 
SUSPENDED from a handler, but only on the main request. If you’re in an 
internal redirect or a subrequest, SUSPENDED is treated as an error.

Request filters are already asynchronous, they aren’t a problem. A caller can 
only take advantage of the async nature of the input filters if the caller 
itself is asynchronous.

It should be possible to use an async input filter from an async output filter 
simply by changing the sense of the connection and letting the core deal with 
it. This is next for me, I plan to fix any bugs that stops this being possible.

Regards,
Graham
—

Re: [Patch] Async write completion for the full connection filter stack

2015-10-05 Thread Yann Ylavic

On Sun, Oct 4, 2015 at 1:40 PM, Graham Leggett  wrote:
>
> The next bit of this is the ability to safely suspend a filter.
>
> A suspended filter is a filter that is waiting for some external event, a 
> callback of some kind, some timer that it might have set, and in the mean 
> time doesn’t want to be kicked. From the perspective of 
> ap_filter_should_yield() the filter has data in it and a well behaved 
> upstream filter should wait before sending anything further.
>
> The idea behind suspending filters over and above connections is that if two 
> separate filters want to suspend a connection, what stops them from stomping 
> on one another?

Why suspend a filter rather than the calling handler (on EAGAIN/EWOULDBLOCK)?

>
> I am thinking of the sharp end of mod_proxy_httpd (and friends) becoming an 
> async filter that suspends or not based on whether data is available on the 
> other connection. In the process, mod_proxy becomes asynchronous.

It seems that suspending the handlers would avoid rewriting them as filters.
Or is your plan to go for something like (mod_)serf?

Regards,
Yann.

Serf and mod_lua and libmill (Was: Re: [Patch] Async write completion for the full connection filter stack)

2015-10-05 Thread Jim Jagielski

I was wondering: how much of what we want to do would be easier
if we decided to make serf and/or mod_lua and/or libmill as
dependencies for 2.6/3.0? We could leverage Lua (or libmill)
as libs implementing some level of coroutines and serf for
a more organic bucket impl.

Of course, this could also result in the issue we currently
have w/ APR in which APR itself lags behind what httpd itself
needs and we end up add stuff to httpd anyway...

H...

Re: [Patch] Async write completion for the full connection filter stack

2015-10-05 Thread Graham Leggett

On 05 Oct 2015, at 1:13 PM, Yann Ylavic  wrote:

> Why suspend a filter rather than the calling handler (on EAGAIN/EWOULDBLOCK)?

Primarily because there are many filters, but only one handler. What I have in 
mind is combining an async bandwidth-limitation filter with an async handler. 
Right now you can’t do that, and you should be able to, that’s the point of the 
filter stack.

> It seems that suspending the handlers would avoid rewriting them as filters.
> Or is your plan to go for something like (mod_)serf?

Suspending the handler is an option, but it’s less than ideal if the handler 
suspends the connection while there is data buffered in the filter stack. In 
this case data writes will be delayed until the source of the data pushes more 
data through after the connection is woken up again. Ideally outbound data 
should be pulled out the stack, not pushed into it.

Regards,
Graham
—

Re: Serf and mod_lua and libmill (Was: Re: [Patch] Async write completion for the full connection filter stack)

2015-10-05 Thread Graham Leggett

On 05 Oct 2015, at 3:41 PM, Jim Jagielski  wrote:

> Of course, this could also result in the issue we currently
> have w/ APR in which APR itself lags behind what httpd itself
> needs and we end up add stuff to httpd anyway...
> 
> H...

This was my approach:

I wanted the moon on a stick.

What exactly constituted a moon on a stick? I wanted all of the following:

- I wanted to make httpd asynchronous, and decisively allow httpd to join the 
list of servers that can support hundreds of thousands of concurrent requests 
across the board, not just as a special case for specific content (files).

- I didn’t want to wait. This means I wanted whatever change I made to be 
backport-able to httpd v2.4. I don’t care to wait 5 years for httpd v2.6 to 
finally to appear as the default version in stable OS distros.

- I wanted to work with what was already there. That meant making the existing 
ap_filter API and the existing buckets work with backwards compatible changes 
to httpd and no changes to apr/apr-util.

- I wanted a mechanism that was not brittle. This is probably the hardest part 
of async programming, how do you prevent something spinning or hanging? When 
the new mechanism is used wrong the thread just becomes synchronous until the 
data is written and eventually works anyway, which is existing behaviour and 
not the end of the world.

I am aware that some people have said that it couldn’t be done in httpd’s 
design. I figured let me test that hypothesis by trying anyway, the worst that 
could happen was that they were right but the server could at least be improved 
as best it could. I believe however that there is a good chance the moon on a 
stick has been achieved.

Regards,
Graham
—

Re: [Patch] Async write completion for the full connection filter stack

2015-10-05 Thread Eric Covener

On Mon, Oct 5, 2015 at 10:37 AM, Graham Leggett  wrote:
> On 05 Oct 2015, at 4:25 PM, Eric Covener  wrote:
>
>> If content/request filters will now sometimes run on a different
>> thread, is this a special kind of major MMN bump?
>
>
> Content/request filters won’t run on a different thread, no.
>
Sorry, dunno how I got turned around on that -- clear from the first email.

Re: [Patch] Async write completion for the full connection filter stack

2015-10-05 Thread Eric Covener

On Sun, Sep 7, 2014 at 9:31 PM, Graham Leggett  wrote:
> The test suite passes with this patch, I need some more eyeballs to verify I 
> didn’t miss anything.

If content/request filters will now sometimes run on a different
thread, is this a special kind of major MMN bump?

Re: Serf and mod_lua and libmill (Was: Re: [Patch] Async write completion for the full connection filter stack)

2015-10-05 Thread Eric Covener

On Mon, Oct 5, 2015 at 10:57 AM, Graham Leggett  wrote:
>
> What exactly constituted a moon on a stick? I wanted all of the following:
>
> - I wanted to make httpd asynchronous, and decisively allow httpd to join the 
> list of servers that can support hundreds of thousands of concurrent requests 
> across the board, not just as a special case for specific content (files).
>
> - I didn’t want to wait. This means I wanted whatever change I made to be 
> backport-able to httpd v2.4. I don’t care to wait 5 years for httpd v2.6 to 
> finally to appear as the default version in stable OS distros.
>
> - I wanted to work with what was already there. That meant making the 
> existing ap_filter API and the existing buckets work with backwards 
> compatible changes to httpd and no changes to apr/apr-util.
>
> - I wanted a mechanism that was not brittle. This is probably the hardest 
> part of async programming, how do you prevent something spinning or hanging? 
> When the new mechanism is used wrong the thread just becomes synchronous 
> until the data is written and eventually works anyway, which is existing 
> behaviour and not the end of the world.
>
> I am aware that some people have said that it couldn’t be done in httpd’s 
> design. I figured let me test that hypothesis by trying anyway, the worst 
> that could happen was that they were right but the server could at least be 
> improved as best it could. I believe however that there is a good chance the 
> moon on a stick has been achieved.

Is it that common for so much of the server to be tied up in write
completion, or just a very big problem for some systems? Most of my
experience is the opposite  problem --  slow (java!) backends rather
than clients not keeping up with the firehose.  It seems like
extending that to handlers (and implicitly, request filters) is more
the  "can't be done" part.

Re: [Patch] Async write completion for the full connection filter stack

2015-10-05 Thread Graham Leggett

On 05 Oct 2015, at 4:25 PM, Eric Covener  wrote:

> If content/request filters will now sometimes run on a different
> thread, is this a special kind of major MMN bump?

Content/request filters won’t run on a different thread, no.

The only filters that will do so are filters that have been updated to use the 
new async API, and right now that’s mod_ssl and the core network filter (which 
always had this capability).

If it were to become a problem, the whole patch can be disabled by forcing 
ap_filter_should_yield() to return false. I don’t believe we need a switch at 
this stage however, but it is there should it be needed.

Regards,
Graham
—

Re: [Patch] Async write completion for the full connection filter stack

2015-10-05 Thread Jim Jagielski


> On Oct 4, 2015, at 9:59 AM, Tim Bannister  wrote:
> 
> On 4 Oct 2015, at 12:40, Graham Leggett  wrote:
>> 
>> The next bit of this is the ability to safely suspend a filter.
> …
>> I am thinking of the sharp end of mod_proxy_httpd (and friends) becoming an 
>> async filter that suspends or not based on whether data is available on the 
>> other connection. In the process, mod_proxy becomes asynchronous.
> 
> Also super cool mojo.

++1!

Re: Serf and mod_lua and libmill (Was: Re: [Patch] Async write completion for the full connection filter stack)

2015-10-05 Thread Graham Leggett

On 05 Oct 2015, at 5:13 PM, Eric Covener  wrote:

> Is it that common for so much of the server to be tied up in write
> completion, or just a very big problem for some systems? Most of my
> experience is the opposite  problem --  slow (java!) backends rather
> than clients not keeping up with the firehose.

It is not common enough for the server to be in write completion. Until now 
mod_ssl couldn’t enter write completion because it had no way to yield without 
writing the entire response successfully to the network. As soon as you added 
SSL the server got synchronous.

The key problem to solve was the “mod_cache problem”. How does a backend write 
into mod_cache at full speed and then go away as soon as possible, while a slow 
client eventually reads the response? We can now (soon, when mod_cache is 
taught to be async like mod_ssl is) do that, and that is very cool.

The slow clients that become a killer are those on a dodgy mobile phone 
connection that ends up using a slot for far longer than they should. If that 
slot is tied through via the proxy to a slow java app, that sucks even more.

>  It seems like
> extending that to handlers (and implicitly, request filters) is more
> the  "can't be done" part.

I’ve been working from right to left, from the network filter backwards to the 
handlers, and the solution I’ve explored is “if it can’t be done in a handler, 
make it possible to be done in a filter instead”.

The idea is that the handler sets up the filter stack, and then sends down some 
data to kick things off (or no data like mod_cache does). At that point the 
handler exits and we’re now in write completion. The filter now (soon) has the 
power to suspend the connection if we’re waiting for someone else (like a 
connection from the proxy, a timer callback, whatever), and also has the power 
to switch the “sense” of a connection from a write to a read. So the output 
filter might switch the sense to read and then leave, and then when the read 
event is triggered and we’re kicked, try do a read on the input filter stack.

I suspect that a handler can do the same thing right now - switch the sense to 
read and then suspend itself. On the next read, the handler will be woken up, 
and off we go, ready to read once. Rinse repeat until we’re done reading.

Regards,
Graham
—

Re: Serf and mod_lua and libmill (Was: Re: [Patch] Async write completion for the full connection filter stack)

2015-10-05 Thread Jim Jagielski

Thx (somewhat behind) :)

> On Oct 5, 2015, at 12:24 PM, Graham Leggett  wrote:
> 
> On 05 Oct 2015, at 6:10 PM, Jim Jagielski  wrote:
> 
>> Is there any reason to not fold this into trunk and start playing
>> around?
> 
> I committed it yesterday:
> 
> http://svn.apache.org/r1706669
> http://svn.apache.org/r1706670
> 
> Regards,
> Graham
> —
>

Re: Serf and mod_lua and libmill (Was: Re: [Patch] Async write completion for the full connection filter stack)

2015-10-05 Thread Graham Leggett

On 05 Oct 2015, at 6:10 PM, Jim Jagielski  wrote:

> Is there any reason to not fold this into trunk and start playing
> around?

I committed it yesterday:

http://svn.apache.org/r1706669
http://svn.apache.org/r1706670

Regards,
Graham
—

Re: Serf and mod_lua and libmill (Was: Re: [Patch] Async write completion for the full connection filter stack)

2015-10-05 Thread Jim Jagielski

Achieved!!

Once again: this is some super cool mojo!!!

> On Oct 5, 2015, at 10:57 AM, Graham Leggett  wrote:
> 
> On 05 Oct 2015, at 3:41 PM, Jim Jagielski  wrote:
> 
>> Of course, this could also result in the issue we currently
>> have w/ APR in which APR itself lags behind what httpd itself
>> needs and we end up add stuff to httpd anyway...
>> 
>> H...
> 
> This was my approach:
> 
> I wanted the moon on a stick.
> 
> What exactly constituted a moon on a stick? I wanted all of the following:
> 
> - I wanted to make httpd asynchronous, and decisively allow httpd to join the 
> list of servers that can support hundreds of thousands of concurrent requests 
> across the board, not just as a special case for specific content (files).
> 
> - I didn’t want to wait. This means I wanted whatever change I made to be 
> backport-able to httpd v2.4. I don’t care to wait 5 years for httpd v2.6 to 
> finally to appear as the default version in stable OS distros.
> 
> - I wanted to work with what was already there. That meant making the 
> existing ap_filter API and the existing buckets work with backwards 
> compatible changes to httpd and no changes to apr/apr-util.
> 
> - I wanted a mechanism that was not brittle. This is probably the hardest 
> part of async programming, how do you prevent something spinning or hanging? 
> When the new mechanism is used wrong the thread just becomes synchronous 
> until the data is written and eventually works anyway, which is existing 
> behaviour and not the end of the world.
> 
> I am aware that some people have said that it couldn’t be done in httpd’s 
> design. I figured let me test that hypothesis by trying anyway, the worst 
> that could happen was that they were right but the server could at least be 
> improved as best it could. I believe however that there is a good chance the 
> moon on a stick has been achieved.
> 
> Regards,
> Graham
> —
>

Re: [Patch] Async write completion for the full connection filter stack

2015-10-04 Thread Graham Leggett

On 28 Sep 2015, at 2:11 PM, Jim Jagielski  wrote:

> This is super cool magic mojo.

The next bit of this is the ability to safely suspend a filter.

A suspended filter is a filter that is waiting for some external event, a 
callback of some kind, some timer that it might have set, and in the mean time 
doesn’t want to be kicked. From the perspective of ap_filter_should_yield() the 
filter has data in it and a well behaved upstream filter should wait before 
sending anything further.

The idea behind suspending filters over and above connections is that if two 
separate filters want to suspend a connection, what stops them from stomping on 
one another?

Patch to follow.

I am thinking of the sharp end of mod_proxy_httpd (and friends) becoming an 
async filter that suspends or not based on whether data is available on the 
other connection. In the process, mod_proxy becomes asynchronous.

Regards,
Graham
—

Re: [Patch] Async write completion for the full connection filter stack

2015-10-04 Thread Tim Bannister

On 4 Oct 2015, at 12:40, Graham Leggett  wrote:
> 
> The next bit of this is the ability to safely suspend a filter.
…
> I am thinking of the sharp end of mod_proxy_httpd (and friends) becoming an 
> async filter that suspends or not based on whether data is available on the 
> other connection. In the process, mod_proxy becomes asynchronous.

Also super cool mojo.

-- 
Tim Bannister – is...@c8h10n4o2.org.uk

Re: [Patch] Async write completion for the full connection filter stack

2015-10-03 Thread Graham Leggett

On 28 Sep 2015, at 12:18 PM, Graham Leggett  wrote:

> Well, cracked the async problem for all file buckets and non-morphing 
> buckets, morphing buckets will currently still be flushed.
> 
> The solution to that seems to be to try keep a morphing bucket from reaching 
> the connection filters. A morphing bucket can safely be set aside in a 
> request filter without any problems with pools. If a filter existed at the 
> end of the request filters that detected morphing buckets and read them 
> before sending them downstream, that filter could safely set aside the rest 
> of the morphing bucket without being forced to read the whole thing in (and 
> thus block). If the filter received a flush the whole thing gets sent and we 
> block, but that’s what flush means so that’s fine.
> 
> Patch to follow.

Here is is.

Regards,
Graham
--


httpd-async-fullstack-ssl7.patch
Description: Binary data

Re: [Patch] Async write completion for the full connection filter stack

2015-09-28 Thread Jim Jagielski

This is super cool magic mojo.

> On Sep 27, 2015, at 2:41 PM, Graham Leggett  wrote:
> 
> Hi all,
> 
> I think I have cracked the async problem for both request and connection 
> output filters with this patch.
> 
> It provides three new functions:
> 
> - ap_filter_reinstate_brigade() - Used at the start of a filter, brigades 
> that were set aside earlier are reinstated for sending.
> 
> - ap_filter_should_yield() - Returns true if downstream filters have set 
> aside data. A filter would typically respond by setting aside the data it is 
> working with and returning in the expectation of being called again.
> 
> - ap_filter_setaside_brigade() - Used at the end of a filter, any brigades 
> that were not processed can be set aside to continue the job when called 
> later.
> 
> The magic happens in the MPM itself. The first time 
> ap_filter_setaside_brigade() is called the filter is added to a hashtable and 
> a cleanup is registered to have the filter removed when the request and/or 
> connection is cleared. The MPM iterates through this hashtable, and sends 
> empty brigades to any filter with setaside data.
> 
> Key to this technique is that existing filters remain unaffected - in the 
> absence of any changes to a filter the whole brigade will be processed and 
> sent downstream, and existing behaviour is maintained. Same with FLUSH 
> buckets - as expected, flush behaviour remains unchanged.
> 
> If however the filters in the chain are able to setaside buckets, they can 
> defer themselves to the write completion phase which in turn can take full 
> advantage of the event MPM. These filters will be expected to handle an empty 
> brigade to “kick” them back into life and continue the writing of their 
> setaside data. As soon as their setaside brigade becomes empty the kicks then 
> stop. All filters with setaside data get kicked exactly once, so none of the 
> filters should get starved. Filters that are removed from the stack get their 
> setaside emptied, and so become ineligible for kicks. When the pool cleanup 
> gets triggered, the filter is permanently removed from the connection and 
> disappears.
> 
> Over and above the network filter the first filter to be modified is mod_ssl, 
> and this will allow files served over SSL to take advantage of write 
> completion. Another filter that will benefit from using the above calls is 
> mod_deflate.
> 
> I have an additional goal of adding an ap_filter_suspend() method that will 
> allow us to suspend a filter for a given period of time or until some 
> callback is triggered, but that will be a separate patch.
> 
> Regards,
> Graham
> --
>

Re: [Patch] Async write completion for the full connection filter stack

2015-09-28 Thread Graham Leggett

On 27 Sep 2015, at 8:41 PM, Graham Leggett  wrote:

> I think I have cracked the async problem for both request and connection 
> output filters with this patch.

Well, cracked the async problem for all file buckets and non-morphing buckets, 
morphing buckets will currently still be flushed.

The solution to that seems to be to try keep a morphing bucket from reaching 
the connection filters. A morphing bucket can safely be set aside in a request 
filter without any problems with pools. If a filter existed at the end of the 
request filters that detected morphing buckets and read them before sending 
them downstream, that filter could safely set aside the rest of the morphing 
bucket without being forced to read the whole thing in (and thus block). If the 
filter received a flush the whole thing gets sent and we block, but that’s what 
flush means so that’s fine.

Patch to follow.

Regards,
Graham
—

Re: [Patch] Async write completion for the full connection filter stack

2015-09-27 Thread Graham Leggett

Hi all,

I think I have cracked the async problem for both request and connection output 
filters with this patch.

It provides three new functions:

- ap_filter_reinstate_brigade() - Used at the start of a filter, brigades that 
were set aside earlier are reinstated for sending.

- ap_filter_should_yield() - Returns true if downstream filters have set aside 
data. A filter would typically respond by setting aside the data it is working 
with and returning in the expectation of being called again.

- ap_filter_setaside_brigade() - Used at the end of a filter, any brigades that 
were not processed can be set aside to continue the job when called later.

The magic happens in the MPM itself. The first time 
ap_filter_setaside_brigade() is called the filter is added to a hashtable and a 
cleanup is registered to have the filter removed when the request and/or 
connection is cleared. The MPM iterates through this hashtable, and sends empty 
brigades to any filter with setaside data.

Key to this technique is that existing filters remain unaffected - in the 
absence of any changes to a filter the whole brigade will be processed and sent 
downstream, and existing behaviour is maintained. Same with FLUSH buckets - as 
expected, flush behaviour remains unchanged.

If however the filters in the chain are able to setaside buckets, they can 
defer themselves to the write completion phase which in turn can take full 
advantage of the event MPM. These filters will be expected to handle an empty 
brigade to “kick” them back into life and continue the writing of their 
setaside data. As soon as their setaside brigade becomes empty the kicks then 
stop. All filters with setaside data get kicked exactly once, so none of the 
filters should get starved. Filters that are removed from the stack get their 
setaside emptied, and so become ineligible for kicks. When the pool cleanup 
gets triggered, the filter is permanently removed from the connection and 
disappears.

Over and above the network filter the first filter to be modified is mod_ssl, 
and this will allow files served over SSL to take advantage of write 
completion. Another filter that will benefit from using the above calls is 
mod_deflate.

I have an additional goal of adding an ap_filter_suspend() method that will 
allow us to suspend a filter for a given period of time or until some callback 
is triggered, but that will be a separate patch.

Regards,
Graham
--


httpd-async-fullstack-ssl6.patch
Description: Binary data

RE: [Patch] Async write completion for the full connection filter stack

2014-09-15 Thread Plüm , Rüdiger , Vodafone Group

 -Original Message-
 From: Yann Ylavic [mailto:ylavic@gmail.com]
 Sent: Sonntag, 14. September 2014 22:19
 To: httpd
 Subject: Re: [Patch] Async write completion for the full connection filter
 stack

 On Sat, Sep 13, 2014 at 9:23 PM, Ruediger Pluem rpl...@apache.org wrote:

  How about to require that the caller of ap_filter_setaside_brigade just
 hands over a non NULL bucket_brigade as
  buffered_bb (so changing apr_bucket_brigade ** to apr_bucket_brigade *)
 and that it should handle *deferred_write_pool
  as opaque structure and that it should not allocate from it?

 Can that be enforced by the API as I proposed?

The opaque structure? Likely.

Regards

Rüdiger

Re: [Patch] Async write completion for the full connection filter stack

2014-09-15 Thread Graham Leggett

On 13 Sep 2014, at 9:23 PM, Ruediger Pluem rpl...@apache.org wrote:

 How about to require that the caller of ap_filter_setaside_brigade just hands 
 over a non NULL bucket_brigade as
 buffered_bb (so changing apr_bucket_brigade ** to apr_bucket_brigade *) and 
 that it should handle *deferred_write_pool
 as opaque structure and that it should not allocate from it? Or we request 
 the caller to provide a non NULL
 deferred_write_pool as well (so changing apr_pool_t ** to apr_pool_t *) and 
 warn the caller that the pool might be
 cleared during the call of ap_filter_setaside_brigade. Hence solving all 
 lifetime issues would be in the responsibility
 of the caller.

There is an issue I hadn’t considered - the effect of calling 
ap_remove_output_filter().

If we remove the filter with buckets having been set aside, the 
c-data_in_output_filters will remain unchanged and off-by-one, leading to a 
spin (it will never count back down to zero). This is a nasty bug to have to 
chase down.

In the attached patch I have investigated adding both buffered_bb and the 
deferred_write_pool to ap_filter_t, and having ap_remove_output_filter() “do 
the right thing” when called. Both of these can be optionally set by the 
caller, but are otherwise not handled by the caller, the filter API worries 
about cleaning up, and worries about ensuring that c-data_in_output_filters is 
always accurate.

I have also introduced an optimisation where if this is a request filter, we 
use r-pool to setaside buckets, while if we’re a connection filter we use the 
deferred write pool.

Looks like a similar thing needs to be done with input filters.

Regards,
Graham
—


httpd-async-fullstack-ssl4.patch
Description: Binary data

Re: [Patch] Async write completion for the full connection filter stack

2014-09-15 Thread Graham Leggett

On 15 Sep 2014, at 2:13 PM, Graham Leggett minf...@sharp.fm wrote:

 There is an issue I hadn’t considered - the effect of calling 
 ap_remove_output_filter().
 
 If we remove the filter with buckets having been set aside, the 
 c-data_in_output_filters will remain unchanged and off-by-one, leading to a 
 spin (it will never count back down to zero). This is a nasty bug to have to 
 chase down.
 
 In the attached patch I have investigated adding both buffered_bb and the 
 deferred_write_pool to ap_filter_t, and having ap_remove_output_filter() “do 
 the right thing” when called. Both of these can be optionally set by the 
 caller, but are otherwise not handled by the caller, the filter API worries 
 about cleaning up, and worries about ensuring that c-data_in_output_filters 
 is always accurate.
 
 I have also introduced an optimisation where if this is a request filter, we 
 use r-pool to setaside buckets, while if we’re a connection filter we use 
 the deferred write pool.
 
 Looks like a similar thing needs to be done with input filters.

Another refinement - deferred_write_pool is now just deferred_pool, and we only 
manipulate the data_in_output_filters when removing an output filter.

Regards,
Graham
—


httpd-async-fullstack-ssl5.patch
Description: Binary data

Re: [Patch] Async write completion for the full connection filter stack

2014-09-14 Thread Yann Ylavic

On Sat, Sep 13, 2014 at 7:03 PM, Graham Leggett minf...@sharp.fm wrote:
 On 12 Sep 2014, at 4:57 PM, Yann Ylavic ylavic@gmail.com wrote:

 Makes sense, however ap_core_output_filter() and
 ssl_io_filter_output() *know* that their buffered_bb does not contain
 such bucket (exclusively filled by ap_filter_setaside_brigade()), in
 that the original code (still in svn) is more optimal since it does
 not walk through the same brigade multiple times (doing an immediate
 nonblocking write with what it has and returning).

 I suppose if we clearly document the behaviour it should be fine. We already
 force the return of should_write if an empty brigade was passed in (as in, “no
 more coming for now, just write it”).

If an empty brigade is given but buffered_bb (controlled and hence
possibly mangled by the user) contains a flush bucket, we will end up
writing up to this bucket in blocking mode and the remaining (if any)
in nonblocking, while the original code does it all nonblocking
(because it knows its buffered_bb is ok).


 To avoid that and the buffered_bb lifetime issue discussed with
 Rüdiger, I'm thinking of an opaque type (a struct containing the
 buffered_bb and the deferred_pool) that could be passed to both
 ap_filter_setaside_brigade() and ap_filter_reinstate_brigade().
 That way no user could modify the content of the brigade or play with
 the pool (opaque), so that ap_filter_setaside_brigade() could
 create/clear the deferred_pool as it wish, and
 ap_filter_reinstate_brigade() could be sure buffered_bb contains only
 suitable buckets.

 Are we not just trying to solve a dangling pointer to a brigade by replacing 
 it with a dangling pointer to apr_filter_aside_t?

I don't think so, the only way to free an apr_filter_aside_t (outside
util_filter.c) is to clear c-pool, which any sane filter will never
do (no need to document, that's httpd's job).

Either the apr_filter_aside_t* is NULL, or it's fields are solely
controlled by ap_filter_setaside_brigade() /
ap_filter_reinstate_brigade(). Do the right thing in these functions
and everything is fine.

Regards,
Yann.

Re: [Patch] Async write completion for the full connection filter stack

2014-09-14 Thread Yann Ylavic

On Sat, Sep 13, 2014 at 9:23 PM, Ruediger Pluem rpl...@apache.org wrote:

 How about to require that the caller of ap_filter_setaside_brigade just hands 
 over a non NULL bucket_brigade as
 buffered_bb (so changing apr_bucket_brigade ** to apr_bucket_brigade *) and 
 that it should handle *deferred_write_pool
 as opaque structure and that it should not allocate from it?

Can that be enforced by the API as I proposed?

 Or we request the caller to provide a non NULL
 deferred_write_pool as well (so changing apr_pool_t ** to apr_pool_t *) and 
 warn the caller that the pool might be
 cleared during the call of ap_filter_setaside_brigade. Hence solving all 
 lifetime issues would be in the responsibility
 of the caller.

IMHO that would complicate the caller's job/code, probably doing
what's in ap_filter_setaside_brigade() now (use a subpool to avoid the
leak, manage the cleanup..). The function would become a simple
ap_save_brigade() call (and data_in_output_filter++), no real gain...

Regards,
Yann.

Re: [Patch] Async write completion for the full connection filter stack

2014-09-13 Thread Graham Leggett

On 12 Sep 2014, at 7:56 AM, Ruediger Pluem rpl...@apache.org wrote:

 Agreed - as rpluem suggested a pool cleanup should be registered for this, 
 I’ve added it.
 
 I know that I proposed this, but rethinking I see one risk with the cleanup. 
 Take the following code
 with long_living_apr_bucket_brigade_ptr being NULL in the beginning:
 
 apr_bucket_brigade *stack_variable;
 
 stack_variable = long_living_apr_bucket_brigade_ptr;
 ap_save_brigade(f, stack_variable, bb, *deferred_write_pool);
 long_living_apr_bucket_brigade_ptr = stack_variable;
 
 I guess in this case this could cause unpredictable behaviour as the stack of 
 the calling function is likely to be
 repurposed when the pool dies. Do we see that as something that we need to 
 consider or do we see this as a bug on
 caller side? If we see it as a bug on caller side we should probably 
 document our expectations.

Hmmm…

It does somewhat limit the caller if we force them to put the pointer somewhere 
not-on-the-stack.

Maybe we should remove the cleanup and document the expectation that the caller 
must clean it up. In most cases the caller is allocating the space from the 
parent pool anyway, so it all gets cleaned up regardless.

Regards,
Graham
—

Re: [Patch] Async write completion for the full connection filter stack

2014-09-13 Thread Ruediger Pluem



On 09/13/2014 07:05 PM, Graham Leggett wrote:
 On 12 Sep 2014, at 7:56 AM, Ruediger Pluem rpl...@apache.org wrote:
 
 Agreed - as rpluem suggested a pool cleanup should be registered for this, 
 I’ve added it.

 I know that I proposed this, but rethinking I see one risk with the cleanup. 
 Take the following code
 with long_living_apr_bucket_brigade_ptr being NULL in the beginning:

 apr_bucket_brigade *stack_variable;

 stack_variable = long_living_apr_bucket_brigade_ptr;
 ap_save_brigade(f, stack_variable, bb, *deferred_write_pool);
 long_living_apr_bucket_brigade_ptr = stack_variable;

 I guess in this case this could cause unpredictable behaviour as the stack 
 of the calling function is likely to be
 repurposed when the pool dies. Do we see that as something that we need to 
 consider or do we see this as a bug on
 caller side? If we see it as a bug on caller side we should probably 
 document our expectations.
 
 Hmmm…
 
 It does somewhat limit the caller if we force them to put the pointer 
 somewhere not-on-the-stack.
 
 Maybe we should remove the cleanup and document the expectation that the 
 caller must clean it up. In most cases the caller is allocating the space 
 from the parent pool anyway, so it all gets cleaned up regardless.
 

How about to require that the caller of ap_filter_setaside_brigade just hands 
over a non NULL bucket_brigade as
buffered_bb (so changing apr_bucket_brigade ** to apr_bucket_brigade *) and 
that it should handle *deferred_write_pool
as opaque structure and that it should not allocate from it? Or we request the 
caller to provide a non NULL
deferred_write_pool as well (so changing apr_pool_t ** to apr_pool_t *) and 
warn the caller that the pool might be
cleared during the call of ap_filter_setaside_brigade. Hence solving all 
lifetime issues would be in the responsibility
of the caller.

Regards

Rüdiger

Re: [Patch] Async write completion for the full connection filter stack

2014-09-12 Thread Yann Ylavic

On Thu, Sep 11, 2014 at 5:20 PM, Graham Leggett minf...@sharp.fm wrote:
 On 11 Sep 2014, at 2:12 PM, Yann Ylavic ylavic@gmail.com wrote:

 I would still do the flush check anyway - the ap_filter_reinstate_brigade() 
 has
 no idea what happened outside this function, we may have flushed buckets,
 we may not have and setaside those flush buckets for whatever reason, and
 we want the behaviour to remain predictable.

Makes sense, however ap_core_output_filter() and
ssl_io_filter_output() *know* that their buffered_bb does not contain
such bucket (exclusively filled by ap_filter_setaside_brigade()), in
that the original code (still in svn) is more optimal since it does
not walk through the same brigade multiple times (doing an immediate
nonblocking write with what it has and returning).

To avoid that and the buffered_bb lifetime issue discussed with
Rüdiger, I'm thinking of an opaque type (a struct containing the
buffered_bb and the deferred_pool) that could be passed to both
ap_filter_setaside_brigade() and ap_filter_reinstate_brigade().
That way no user could modify the content of the brigade or play with
the pool (opaque), so that ap_filter_setaside_brigade() could
create/clear the deferred_pool as it wish, and
ap_filter_reinstate_brigade() could be sure buffered_bb contains only
suitable buckets.

Something like :

# util_filter.h
typedef struct ap_filter_aside ap_filter_aside_t;

AP_DECLARE(apr_status_t) ap_filter_setaside_brigade(ap_filter_t *f,
apr_filter_aside_t **aside,
apr_bucket_brigade *bb);

AP_DECLARE(int) ap_filter_reinstate_brigade(ap_filter_t *f,
apr_filter_aside_t *aside,
apr_bucket_brigade *bb,
apr_bucket **flush_upto);

# util_filter.c
struct ap_filter_aside
{
apr_bucket_brigade *bb;
apr_pool_t *pool;
};

AP_DECLARE(apr_status_t) ap_filter_setaside_brigade(ap_filter_t *f,
ap_filter_aside_t **aside,
apr_pool_t
**deferred_write_pool,
apr_bucket_brigade *bb)
{
if (!APR_BRIGADE_EMPTY(bb)) {
if (!*aside) {
*aside = apr_palloc(f-c-pool, sizeof **aside);
apr_pool_create((*aside)-pool, f-c-pool);
apr_pool_tag((*aside)-pool, deferred_write);
(*aside)-bb = NULL;
}
if (!(*aside)-bb || APR_BRIGADE_EMPTY((*aside)-bb))) {
f-c-data_in_output_filters++;
}
if (bb != (*aside)-bb) {
return ap_save_brigade(f, (*aside)-bb, bb, (*aside)-pool);
}
}
else if ((*aside)-pool) {
/*
 * There are no more requests in the pipeline. We can just clear the
 * pool.
 */
((*aside)-bb = NULL;
apr_pool_clear((*aside)-pool);
}
return APR_SUCCESS;
}

AP_DECLARE(int) ap_filter_reinstate_brigade(ap_filter_t *f,
ap_filter_aside_t *aside,
apr_bucket_brigade *bb,
apr_bucket **flush_upto)
{
apr_bucket *bucket, *next;
apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
int eor_buckets_in_brigade, morphing_bucket_in_brigade;
int loglevel = ap_get_conn_module_loglevel(f-c, APLOG_MODULE_INDEX);
int empty = APR_BRIGADE_EMPTY(bb);

if ((*aside)-bb  !APR_BRIGADE_EMPTY((*aside)-bb)) {
APR_BRIGADE_PREPEND(bb, (*aside)-bb);
f-c-data_in_output_filters--;
if (empty) {
return 1;
}
}

...
}

# core_filters and mod_ssl
use a ap_filter_aside_t* instead of buffered_bb and deferred_write_pool.

WDYT?

Re: [Patch] Async write completion for the full connection filter stack

2014-09-12 Thread Yann Ylavic

This was just a POC, with coding mistakes...

On Fri, Sep 12, 2014 at 4:57 PM, Yann Ylavic ylavic@gmail.com wrote:
 [...]
 AP_DECLARE(apr_status_t) ap_filter_setaside_brigade(ap_filter_t *f,
 ap_filter_aside_t **aside,
 apr_pool_t
 **deferred_write_pool,

Forgot to remove this arg.

 apr_bucket_brigade *bb)
 [...]
 AP_DECLARE(int) ap_filter_reinstate_brigade(ap_filter_t *f,
 ap_filter_aside_t *aside,
 apr_bucket_brigade *bb,
 apr_bucket **flush_upto)
 {
 [...]
 if ((*aside)-bb  !APR_BRIGADE_EMPTY((*aside)-bb)) {
 APR_BRIGADE_PREPEND(bb, (*aside)-bb);

Of course here, read aside- instead of (*aside)-.

 f-c-data_in_output_filters--;
 if (empty) {
 return 1;
 }
 }

 ...
 }

Regards,
Yann.

Re: [Patch] Async write completion for the full connection filter stack

2014-09-12 Thread Yann Ylavic

On Fri, Sep 12, 2014 at 4:57 PM, Yann Ylavic ylavic@gmail.com wrote:
 AP_DECLARE(int) ap_filter_reinstate_brigade(ap_filter_t *f,
 ap_filter_aside_t *aside,
 apr_bucket_brigade *bb,
 apr_bucket **flush_upto)
 {
 apr_bucket *bucket, *next;
 apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
 int eor_buckets_in_brigade, morphing_bucket_in_brigade;
 int loglevel = ap_get_conn_module_loglevel(f-c, APLOG_MODULE_INDEX);
 int empty = APR_BRIGADE_EMPTY(bb);

 if ((*aside)-bb  !APR_BRIGADE_EMPTY((*aside)-bb)) {
 APR_BRIGADE_PREPEND(bb, (*aside)-bb);
 f-c-data_in_output_filters--;
 if (empty) {
 return 1;
 }
 }

The (empty) check could even be done outside the block, in any case.


 ...
 }

Re: [Patch] Async write completion for the full connection filter stack

2014-09-12 Thread Yann Ylavic

On Fri, Sep 12, 2014 at 4:57 PM, Yann Ylavic ylavic@gmail.com wrote:
 AP_DECLARE(apr_status_t) ap_filter_setaside_brigade(ap_filter_t *f,
 ap_filter_aside_t **aside,
 apr_bucket_brigade *bb)
 {
 if (!APR_BRIGADE_EMPTY(bb)) {
 if (!*aside) {
 *aside = apr_palloc(f-c-pool, sizeof **aside);
 apr_pool_create((*aside)-pool, f-c-pool);
 apr_pool_tag((*aside)-pool, deferred_write);
 (*aside)-bb = NULL;

To avoid a (small) possible leak here, maybe replace this by :

  apr_pool_t *p;
  apr_pool_create(p, f-c-pool);
  apr_pool_tag(p, deferred_write);
  *aside = apr_palloc(p, sizeof **aside);
  (*aside)-bb = NULL;
  (*aside)-pool = p;

 }
 if (!(*aside)-bb || APR_BRIGADE_EMPTY((*aside)-bb))) {
 f-c-data_in_output_filters++;
 }
 if (bb != (*aside)-bb) {
 return ap_save_brigade(f, (*aside)-bb, bb, (*aside)-pool);
 }
 }
 else if ((*aside)-pool) {
 /*
  * There are no more requests in the pipeline. We can just clear the
  * pool.
  */
 ((*aside)-bb = NULL;
 apr_pool_clear((*aside)-pool);

And this by :

  apr_pool_t *p = (*aside)-pool;
  /*
   * There are no more requests in the pipeline. We can just clear the
   * pool.
   */
  apr_pool_clear(p);
  *aside = apr_palloc(p, sizeof **aside);
  (*aside)-bb = NULL;
  (*aside)-pool = p;

 }
 return APR_SUCCESS;
 }

That wouldn't allow the use of *aside in a f-c-pool cleanup (post),
but this seems unlikely.

Re: [Patch] Async write completion for the full connection filter stack

2014-09-11 Thread Tim Bannister

On 10 Sep 2014, at 18:19, Jim Jagielski j...@jagunet.com wrote:

 On Sep 10, 2014, at 12:07 PM, Graham Leggett minf...@sharp.fm wrote:
 
 Having thought long and hard about this, giving filters an opportunity to 
 write has nothing to do with either data or metadata, we just want to give 
 the filter an opportunity to write. “Dear filter, please wake up and if 
 you’ve setaside data please write it, otherwise do nothing, thanks”. Empty 
 brigade means “do nothing new”.
 
 Hey filter, here is an empty brigade, meaning I have no data for you; if you 
 have setaside data, now is the time to push it thru.

If someone is new to writing filters, I think that's a bit of a gotcha. An 
empty brigade meaning “do nothing new” is easy to code for and unsurprising.

I'm thinking about how I'd help httpd cope with a filter that didn't know about 
this special behaviour. I would introduce a new bucket: NUDGE, and use that to 
wake up filters.

If a filter doesn't understand NUDGE, this could cause a hang from the client's 
point of view. To avoid this, I'd have a configurable timeout after which a 
NUDGE-d filter would get a FLUSH bucket - say 30 seconds as a default, nice and 
high because FLUSH is an expensive operation. Server admins could drop this if 
they know that they have a problematic filter.

If a filter gets a NUDGE and returns APR_SUCCESS I think it makes sense to 
NUDGE the next downstream filter.


A new metadata bucket has all sorts of compatibility issues that need thinking 
about (and I don't know enough about). Changing the meaning of “empty brigade” 
also has compatibility issues but they will show up much later than build time.

-- 
Tim Bannister – is...@jellybaby.net

RE: [Patch] Async write completion for the full connection filter stack

2014-09-11 Thread Plüm , Rüdiger , Vodafone Group

From: Graham Leggett [mailto:minf...@sharp.fm]
Sent: Donnerstag, 11. September 2014 06:40
To: dev@httpd.apache.org
Subject: Re: [Patch] Async write completion for the full connection filter stack

On 11 Sep 2014, at 1:51 AM, Yann Ylavic 
ylavic@gmail.commailto:ylavic@gmail.com wrote:

 +else if (*deferred_write_pool) {
 +/*
 + * There are no more requests in the pipeline. We can just clear the
 + * pool.
 + */

 Shouldn't *buffered_bb be set to NULL here when *deferred_write_pool
 == (*buffered_bb)-p, or more generally
 apr_pool_is_ancestor(*deferred_write_pool, (*buffered_bb)-p). We
 can't leave a dangling pointer in this case.

 +apr_pool_clear(*deferred_write_pool);

Hmmm... this came from the original code.
We can't set buffered_bb to NULL unless we are sure we created buffered_bb, and 
this isn't necessarily the case. In the core filter, buffered_bb is created 
when the connection is created.

How about doing a

apr_brigade_cleanup(buffered_bb)

before the  apr_pool_clear?

Regards

Rüdiger

RE: [Patch] Async write completion for the full connection filter stack

2014-09-11 Thread Plüm , Rüdiger , Vodafone Group

 -Original Message-
 From: Yann Ylavic [mailto:ylavic@gmail.com]
 Sent: Donnerstag, 11. September 2014 09:52
 To: httpd
 Subject: Re: [Patch] Async write completion for the full connection filter
 stack

 On Thu, Sep 11, 2014 at 6:39 AM, Graham Leggett minf...@sharp.fm wrote:
  On 11 Sep 2014, at 1:51 AM, Yann Ylavic ylavic@gmail.com wrote:

  +else if (*deferred_write_pool) {
  +/*
  + * There are no more requests in the pipeline. We can just
 clear the
  + * pool.
  + */

  Shouldn't *buffered_bb be set to NULL here when *deferred_write_pool
  == (*buffered_bb)-p, or more generally
  apr_pool_is_ancestor(*deferred_write_pool, (*buffered_bb)-p). We
  can't leave a dangling pointer in this case.

  +apr_pool_clear(*deferred_write_pool);

  Hmmm… this came from the original code.

  We can’t set buffered_bb to NULL unless we are sure we created
 buffered_bb, and this isn’t necessarily the case. In the core filter,
 buffered_bb is created when the connection is created.

Another question is if we should clear a pool we haven't created.
If we don't then apr_pool_is_ancestor(*deferred_write_pool, (*buffered_bb)-p)
should be a save bet that we created the bucket brigade.
Another approach below.

 If we or any filter has created the brigade on *deferred_write_pool
 (or an ancestor, I'm only talking about this case), it is dead now
 (unaccessible), and IMO should be marked as so.
 In the case the caller let the function handle the brigade (as we do),
 using NULL the first time only, (s)he expects to not crash whenever
 (s)he come back with it after a clear (which (s)he isn't supposed to
 know about).

Agree. Maybe if buffered_bb is NULL we should add a pool clean up to 
*deferred_write_pool that sets *buffered_bb to NULL

 Another solution would be to not require a pool at all, something like :

I fear that this creates high memory consumption / temporary memory leaks.

Regards

Rüdiger

Re: [Patch] Async write completion for the full connection filter stack

2014-09-11 Thread Yann Ylavic

On Thu, Sep 11, 2014 at 9:51 AM, Yann Ylavic ylavic@gmail.com wrote:
 On Thu, Sep 11, 2014 at 6:39 AM, Graham Leggett minf...@sharp.fm wrote:
 I don’t follow - that means something different as I’m reading it. We must 
 only signal that we have less data in the output filters if we actually 
 originally had data in the output filters, ie buffered_bb existed and wasn’t 
 empty.

 You're right, I misread ap_core_output_filter() where I thought
 ap_filter_reinstate_brigade() wasn't called when
 APR_BRIGADE_EMPTY(bb).

 Since ap_filter_reinstate_brigade() can be called with an empty bb,
 ap_core_output_filter() can (still) be simplified though, ie :

 should_write = ap_filter_reinstate_brigade(f, ctx-buffered_bb, bb,
 flush_upto);
 if (APR_BRIGADE_EMPTY(bb)) {
 return APR_SUCCESS;
 }

 if (flush_upto != NULL) {
 ...
 }

 if (should_write) {
 ...
 }

 flush_upto will always be NULL with an empty bb.

Again I missed something...

This code is indeed what I'd like ap_core_output_filter() (or any
filter) to be able to do, but it won't work without a slight change in
ap_filter_reinstate_brigade(), which is :

AP_DECLARE(int) ap_filter_reinstate_brigade(ap_filter_t *f,
apr_bucket_brigade *buffered_bb,
apr_bucket_brigade *bb,
apr_bucket **flush_upto)
{
apr_bucket *bucket, *next;
apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
int eor_buckets_in_brigade, morphing_bucket_in_brigade;
int loglevel = ap_get_conn_module_loglevel(f-c, APLOG_MODULE_INDEX);

*flush_upto = NULL;

if ((buffered_bb != NULL)  !APR_BRIGADE_EMPTY(buffered_bb)) {
int flush = APR_BRIGADE_EMPTY(bb);
APR_BRIGADE_PREPEND(bb, buffered_bb);
f-c-data_in_output_filters--;
if (flush) {
return 1;
}
}

...
}

The new thing is the flush check to return 1 immediatly when an
empty bb is given.
In this case the intent of the caller is to use what we have (push it
thru would say Jim), we don't need to walk the buffered_bb again.

That's, I think, what the original code did in ap_core_output_filter()
to send_brigade_nonblocking() immediatly in this case, and what the
current patch misses.

Thoughts?

Re: [Patch] Async write completion for the full connection filter stack

2014-09-11 Thread Yann Ylavic

On Thu, Sep 11, 2014 at 2:12 PM, Yann Ylavic ylavic@gmail.com wrote:

 AP_DECLARE(int) ap_filter_reinstate_brigade(ap_filter_t *f,
 apr_bucket_brigade *buffered_bb,
 apr_bucket_brigade *bb,
 apr_bucket **flush_upto)
 {
 apr_bucket *bucket, *next;
 apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
 int eor_buckets_in_brigade, morphing_bucket_in_brigade;
 int loglevel = ap_get_conn_module_loglevel(f-c, APLOG_MODULE_INDEX);

 *flush_upto = NULL;

Oups, this is an unrelated change, but probably good to have too.

Re: [Patch] Async write completion for the full connection filter stack

2014-09-11 Thread Graham Leggett

On 11 Sep 2014, at 9:51 AM, Yann Ylavic ylavic@gmail.com wrote:

 If we or any filter has created the brigade on *deferred_write_pool
 (or an ancestor, I'm only talking about this case), it is dead now
 (unaccessible), and IMO should be marked as so.
 In the case the caller let the function handle the brigade (as we do),
 using NULL the first time only, (s)he expects to not crash whenever
 (s)he come back with it after a clear (which (s)he isn't supposed to
 know about).

Agreed - as rpluem suggested a pool cleanup should be registered for this, I’ve 
added it.

 The core and mod_ssl deferred_write_pool won't be needed anymore.

The reason the deferred write pool exists is that connections might live for a 
very long time. Any memory allocated during that time will accumulate and won’t 
be cleaned up until the connection is closed. The deferred write pool allows us 
to temporarily create data during the connection and then clean it out as soon 
as we no longer need it.

 You're right, I misread ap_core_output_filter() where I thought
 ap_filter_reinstate_brigade() wasn't called when
 APR_BRIGADE_EMPTY(bb).
 
 Since ap_filter_reinstate_brigade() can be called with an empty bb,
 ap_core_output_filter() can (still) be simplified though, ie :
 
should_write = ap_filter_reinstate_brigade(f, ctx-buffered_bb, bb,
flush_upto);
if (APR_BRIGADE_EMPTY(bb)) {
return APR_SUCCESS;
}
 
if (flush_upto != NULL) {
...
}
 
if (should_write) {
...
}
 
 flush_upto will always be NULL with an empty bb.

This is indeed true - I’ve simplified it as above.

Regards,
Graham
—

Re: [Patch] Async write completion for the full connection filter stack

2014-09-11 Thread Graham Leggett

On 11 Sep 2014, at 2:12 PM, Yann Ylavic ylavic@gmail.com wrote:

 Again I missed something...
 
 This code is indeed what I'd like ap_core_output_filter() (or any
 filter) to be able to do, but it won't work without a slight change in
 ap_filter_reinstate_brigade(), which is :
 
 AP_DECLARE(int) ap_filter_reinstate_brigade(ap_filter_t *f,
apr_bucket_brigade *buffered_bb,
apr_bucket_brigade *bb,
apr_bucket **flush_upto)
 {
apr_bucket *bucket, *next;
apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
int eor_buckets_in_brigade, morphing_bucket_in_brigade;
int loglevel = ap_get_conn_module_loglevel(f-c, APLOG_MODULE_INDEX);
 
*flush_upto = NULL;
 
if ((buffered_bb != NULL)  !APR_BRIGADE_EMPTY(buffered_bb)) {
int flush = APR_BRIGADE_EMPTY(bb);
APR_BRIGADE_PREPEND(bb, buffered_bb);
f-c-data_in_output_filters--;
if (flush) {
return 1;
}
}
 
...
 }
 
 The new thing is the flush check to return 1 immediatly when an
 empty bb is given.
 In this case the intent of the caller is to use what we have (push it
 thru would say Jim), we don't need to walk the buffered_bb again.
 
 That's, I think, what the original code did in ap_core_output_filter()
 to send_brigade_nonblocking() immediatly in this case, and what the
 current patch misses.
 
 Thoughts?

We need this yes - if we’ve been called with an empty brigade and we still have 
a buffered brigade that is “too small to write” we want to just write that 
remaining brigade regardless of it’s size, otherwise we spin.

I would still do the flush check anyway - the ap_filter_reinstate_brigade() has 
no idea what happened outside this function, we may have flushed buckets, we 
may not have and setaside those flush buckets for whatever reason, and we want 
the behaviour to remain predictable.

I’ve updated the patch to force the should_write if the original brigade was 
empty.

Regards,
Graham
—


httpd-async-fullstack-ssl3.patch
Description: Binary data

Re: [Patch] Async write completion for the full connection filter stack

2014-09-11 Thread Ruediger Pluem



On 09/11/2014 05:20 PM, Graham Leggett wrote:
 On 11 Sep 2014, at 9:51 AM, Yann Ylavic ylavic@gmail.com wrote:
 
 If we or any filter has created the brigade on *deferred_write_pool
 (or an ancestor, I'm only talking about this case), it is dead now
 (unaccessible), and IMO should be marked as so.
 In the case the caller let the function handle the brigade (as we do),
 using NULL the first time only, (s)he expects to not crash whenever
 (s)he come back with it after a clear (which (s)he isn't supposed to
 know about).
 
 Agreed - as rpluem suggested a pool cleanup should be registered for this, 
 I’ve added it.

I know that I proposed this, but rethinking I see one risk with the cleanup. 
Take the following code
with long_living_apr_bucket_brigade_ptr being NULL in the beginning:

apr_bucket_brigade *stack_variable;

stack_variable = long_living_apr_bucket_brigade_ptr;
ap_save_brigade(f, stack_variable, bb, *deferred_write_pool);
long_living_apr_bucket_brigade_ptr = stack_variable;

I guess in this case this could cause unpredictable behaviour as the stack of 
the calling function is likely to be
repurposed when the pool dies. Do we see that as something that we need to 
consider or do we see this as a bug on
caller side? If we see it as a bug on caller side we should probably document 
our expectations.


Regards

Rüdiger

Re: [Patch] Async write completion for the full connection filter stack

2014-09-10 Thread Graham Leggett

On 09 Sep 2014, at 9:19 PM, Plüm, Rüdiger, Vodafone Group 
ruediger.pl...@vodafone.com wrote:

 I spent a lot of time going down this path of having a dedicated
 metabucket, and quickly got bogged down in complexity. The key problem
 was what does a filter actually do when you get one, it was unclear
 
 Don't we have the same problem with an empty brigade? Some filters are not 
 going
 to handle it as we expect. Hence the additional logic in ap_pass_brigade.
 I guess the minimum behavior we need to get from every filter is to ignore 
 and pass on.

We don’t have the same problem with an empty brigade, no. Empty brigades “fail 
safe” in that a naive filter will simply try and loop past the brigade, exiting 
immediately having done nothing.

Meta buckets are inherently unsafe in comparison, because if a module doesn’t 
pass the meta bucket downstream for whatever buggy reason, the server will 
start to spin. We could use compensation code to stop this, but the 
compensation code would be complex. Testing for an empty brigade is cheap and 
trivial, looping through a brigade to search for NOOP buckets so as to trigger 
bypass of the compensation code is not so trivial.

 and it made my head bleed. That makes life hard for module authors and
 that is bad. As I recall there were also broken filters out there that
 only knew about FLUSH and EOS buckets (eg ap_http_chunk_filter()).
 
 We already have additional metabuckets like error buckets or EOR.
 So I don't see an issue creating a new one. Any filter not passing a meta 
 bucket
 that is does not understand or even try to process it is simply broken. 

Agreed that such a filter is broken, but such filters exist (the chunk filter 
being one example, if I looked harder I would find others) and we need to work 
within this limitation.

Having thought long and hard about this, giving filters an opportunity to write 
has nothing to do with either data or metadata, we just want to give the filter 
an opportunity to write. “Dear filter, please wake up and if you’ve setaside 
data please write it, otherwise do nothing, thanks”. Empty brigade means “do 
nothing new”.

 The problem we're trying to solve is one of starvation - no filters can
 set aside data for later (except core via the NULL hack), because there
 is no guarantee that they'll ever be called again later. You have to
 write it now, or potentially write it never. The start of the solution
 is ensure filters aren't starved: if you have data in the output filters
 - and obviously you have no idea which filters have setaside data - you
 need a way to wake them all up. The simplest and least disruptive way is
 to pass them all an empty brigade, job done. We've got precedent for
 this - we've been sending NULL to the core filter to achieve the same
 
 But this is *our* filter and it will not hit any custom filters. So we can
 do this kind of hacky game here.
 
 thing, we want something that works with any filter.
 
 Yes, and this is the reason why I still believe a meta bucket is better.

I suspect it is not completely clear how the patch works.

Previously, in the core, we handled write completion by passing NULL to the 
last filter in the chain, the core output filter. This worked fine for simple 
requests, but as soon as another filter was involved - primarily mod_ssl - 
there was no write completion. The work of mod_ssl had to be completed in its 
entirety before write completion would kick in, and by that point it was too 
late to be useful.

The fix for this problem starts by including the entire connection filter chain 
into write completion, not just the last filter. That means every connection 
filter both httpd internal and external can take advantage of write completion, 
not just the core filter.

So what do you pass during this write completion phase? Obviously not data, 
you’ve finished passing the data already when the handler exited. We can’t pass 
NULL like the core filter has, because all filters would just segfault. This 
leaves an obvious choice - an empty brigade.

We have a new problem - not all filters pass data downstream for one of many 
reasons. Maybe they’re buggy. Maybe they’re buffering data. Maybe that’s what 
the filter believes it was asked to do. So we must compensate for this by 
“waking up” filters downstream of the filter that didn’t write. All we need to 
do to those filters is wake them up, that’s it.

At this point we’re still not done - we need cooperation from filters in order 
to support write completion, a filter must yield and setaside the data it was 
given to be called back later, just like the core does. This gives other 
requests an opportunity to get CPU time. It turns out the algorithm to achieve 
this yield is built into the core filter, so the next step is to generalise the 
algorithm and make it available to any filter, starting with mod_ssl.

The attached patch does that.

Regards,
Graham
—


httpd-async-fullstack-ssl.patch
Description: Binary data

Re: [Patch] Async write completion for the full connection filter stack

2014-09-10 Thread Jim Jagielski


On Sep 10, 2014, at 12:07 PM, Graham Leggett minf...@sharp.fm wrote:
 
 Having thought long and hard about this, giving filters an opportunity to 
 write has nothing to do with either data or metadata, we just want to give 
 the filter an opportunity to write. “Dear filter, please wake up and if 
 you’ve setaside data please write it, otherwise do nothing, thanks”. Empty 
 brigade means “do nothing new”.
 

Hey filter, here is an empty brigade, meaning I have no
data for you; if you have setaside data, now is the time
to push it thru.

Re: [Patch] Async write completion for the full connection filter stack

2014-09-10 Thread Yann Ylavic

On Wed, Sep 10, 2014 at 6:07 PM, Graham Leggett minf...@sharp.fm wrote:

 The attached patch does that.


Hi Graham,

nice patch, thank you.

A few comments below...

+AP_DECLARE(apr_status_t) ap_filter_setaside_brigade(ap_filter_t *f,
+
apr_bucket_brigade **buffered_bb,
+apr_pool_t
**deferred_write_pool,
+apr_bucket_brigade *bb)
+{
+if (bb == NULL) {

Is this test really needed? This does not seem to be possible with you
patch, and now this is part of the API, a crash if often better in
this case (like for any other ap_filter_*() function which is given a
NULL bb, or the ap[r] ones in general which tend -sanely- to crash
where the code is faulty).

+return APR_SUCCESS;
+}
+if (!APR_BRIGADE_EMPTY(bb)) {
+if (APR_BRIGADE_EMPTY((*buffered_bb))) {
+f-c-data_in_output_filters++;
+}
+if (bb != *buffered_bb) {
+if (!(*deferred_write_pool)) {
+apr_pool_create(deferred_write_pool, f-c-pool);
+apr_pool_tag(*deferred_write_pool, deferred_write);
+}
+return ap_save_brigade(f, buffered_bb, bb,
+*deferred_write_pool);
+}
+}
+else if (*deferred_write_pool) {
+/*
+ * There are no more requests in the pipeline. We can just clear the
+ * pool.
+ */

Shouldn't *buffered_bb be set to NULL here when *deferred_write_pool
== (*buffered_bb)-p, or more generally
apr_pool_is_ancestor(*deferred_write_pool, (*buffered_bb)-p). We
can't leave a dangling pointer in this case.

+apr_pool_clear(*deferred_write_pool);
+}
+return APR_SUCCESS;
+}
+
+AP_DECLARE(int) ap_filter_reinstate_brigade(ap_filter_t *f,
+apr_bucket_brigade *buffered_bb,
+apr_bucket_brigade *bb,
+apr_bucket **flush_upto)
+{
+apr_bucket *bucket, *next;
+apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
+int eor_buckets_in_brigade, morphing_bucket_in_brigade;
+int loglevel = ap_get_conn_module_loglevel(f-c, APLOG_MODULE_INDEX);
+
+if ((buffered_bb != NULL)  !APR_BRIGADE_EMPTY(buffered_bb)) {
+APR_BRIGADE_PREPEND(bb, buffered_bb);
+f-c-data_in_output_filters--;

Maybe we could use instead :

 if (!APR_BRIGADE_EMPTY(bb)) {
f-c-data_in_output_filters--;
 }
 APR_BRIGADE_PREPEND(bb, buffered_bb);

so that it remains symetric with ap_filter_setaside_brigade() above
(which increments c-data_in_output_filters).

It also has the advantage to simplify the calling code in
ap_core_output_filter(), where we could simply do :

if ((ctx-buffered_bb != NULL)  !APR_BRIGADE_EMPTY(ctx-buffered_bb)) {
should_write = ap_filter_reinstate_brigade(f, ctx-buffered_bb, bb,
flush_upto);
}
else if (APR_BRIGADE_EMPTY(bb)) {
return APR_SUCCESS;
}

if (flush_upto != NULL) {
...
}

if (should_write) {
...
}

removing the empty variable danse and the duplicated
send_brigade_nonblocking() code.

+}
+
[...]
+
+bytes_in_brigade = 0;
+non_file_bytes_in_brigade = 0;
+eor_buckets_in_brigade = 0;
+morphing_bucket_in_brigade = 0;
+
+for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb);
+ bucket = next) {
+next = APR_BUCKET_NEXT(bucket);
+
[...]
+
+if (APR_BUCKET_IS_FLUSH(bucket)
+|| non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER
+|| morphing_bucket_in_brigade
+|| eor_buckets_in_brigade  MAX_REQUESTS_IN_PIPELINE) {
+/* this segment of the brigade MUST be sent before returning. */
+
[...]
+/*
+ * Defer the actual blocking write to avoid doing many writes.
+ */
+if (flush_upto) {
+*flush_upto = next;
+}
+
+bytes_in_brigade = 0;
+non_file_bytes_in_brigade = 0;
+eor_buckets_in_brigade = 0;
+morphing_bucket_in_brigade = 0;

By resetting the counters here, and mainly bytes_in_brigade, the value
returned by the function indicates whether we should write
(nonblocking) the bytes *after* the flush_upto bucket, ie. not
including the write (blocking) we should do for the bytes *before*
this bucket.
This is not clear in the description of the function (util_filter.h),
particularly because flush_upto can be passed NULL.
Does it make sense to call it without flush_upto? How would the caller
know about the preceding bytes?

+}
+}
+
[...]
+
+return bytes_in_brigade = THRESHOLD_MIN_WRITE;
+}
+
+AP_DECLARE(apr_status_t) ap_filter_should_yield(ap_filter_t *f)
+{

Shouldn't this function return an int?

+return f-c-data_in_output_filters;
+}
+

Regards,
Yann.

Re: [Patch] Async write completion for the full connection filter stack

2014-09-10 Thread Yann Ylavic

On Thu, Sep 11, 2014 at 1:51 AM, Yann Ylavic ylavic@gmail.com wrote:
 On Wed, Sep 10, 2014 at 6:07 PM, Graham Leggett minf...@sharp.fm wrote:

 +AP_DECLARE(int) ap_filter_reinstate_brigade(ap_filter_t *f,
 +apr_bucket_brigade *buffered_bb,
 +apr_bucket_brigade *bb,
 +apr_bucket **flush_upto)
 +{
 +apr_bucket *bucket, *next;
 +apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
 +int eor_buckets_in_brigade, morphing_bucket_in_brigade;
 +int loglevel = ap_get_conn_module_loglevel(f-c, APLOG_MODULE_INDEX);
 +
 +if ((buffered_bb != NULL)  !APR_BRIGADE_EMPTY(buffered_bb)) {
 +APR_BRIGADE_PREPEND(bb, buffered_bb);
 +f-c-data_in_output_filters--;

 Maybe we could use instead :

  if (!APR_BRIGADE_EMPTY(bb)) {
 f-c-data_in_output_filters--;
  }
  APR_BRIGADE_PREPEND(bb, buffered_bb);

 so that it remains symetric with ap_filter_setaside_brigade() above
 (which increments c-data_in_output_filters).

 It also has the advantage to simplify the calling code in
 ap_core_output_filter(), where we could simply do :

 if ((ctx-buffered_bb != NULL)  !APR_BRIGADE_EMPTY(ctx-buffered_bb)) {
 should_write = ap_filter_reinstate_brigade(f, ctx-buffered_bb, bb,
 flush_upto);
 }
 else if (APR_BRIGADE_EMPTY(bb)) {
 return APR_SUCCESS;
 }

Or even simpler :

should_write = ap_filter_reinstate_brigade(f, ctx-buffered_bb, bb,
 flush_upto);
if (APR_BRIGADE_EMPTY(bb)) {
return APR_SUCCESS;
}


 if (flush_upto != NULL) {
 ...
 }

 if (should_write) {
 ...
 }

 removing the empty variable danse and the duplicated
 send_brigade_nonblocking() code.

 +}

Re: [Patch] Async write completion for the full connection filter stack

2014-09-10 Thread Graham Leggett

On 11 Sep 2014, at 1:51 AM, Yann Ylavic ylavic@gmail.com wrote:

 +{
 +if (bb == NULL) {
 
 Is this test really needed? This does not seem to be possible with you
 patch, and now this is part of the API, a crash if often better in
 this case (like for any other ap_filter_*() function which is given a
 NULL bb, or the ap[r] ones in general which tend -sanely- to crash
 where the code is faulty).

The test doesn’t seem to be needed no, that’s legacy.

 +else if (*deferred_write_pool) {
 +/*
 + * There are no more requests in the pipeline. We can just clear the
 + * pool.
 + */
 
 Shouldn't *buffered_bb be set to NULL here when *deferred_write_pool
 == (*buffered_bb)-p, or more generally
 apr_pool_is_ancestor(*deferred_write_pool, (*buffered_bb)-p). We
 can't leave a dangling pointer in this case.
 
 +apr_pool_clear(*deferred_write_pool);

Hmmm… this came from the original code.



httpd-async-fullstack-ssl2.patch
Description: Binary data
We can’t set buffered_bb to NULL unless we are sure we created buffered_bb, and 
this isn’t necessarily the case. In the core filter, buffered_bb is created 
when the connection is created.

 +}
 +return APR_SUCCESS;
 +}
 +
 +AP_DECLARE(int) ap_filter_reinstate_brigade(ap_filter_t *f,
 +apr_bucket_brigade *buffered_bb,
 +apr_bucket_brigade *bb,
 +apr_bucket **flush_upto)
 +{
 +apr_bucket *bucket, *next;
 +apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
 +int eor_buckets_in_brigade, morphing_bucket_in_brigade;
 +int loglevel = ap_get_conn_module_loglevel(f-c, APLOG_MODULE_INDEX);
 +
 +if ((buffered_bb != NULL)  !APR_BRIGADE_EMPTY(buffered_bb)) {
 +APR_BRIGADE_PREPEND(bb, buffered_bb);
 +f-c-data_in_output_filters--;
 
 Maybe we could use instead :
 
 if (!APR_BRIGADE_EMPTY(bb)) {
f-c-data_in_output_filters--;
 }
 APR_BRIGADE_PREPEND(bb, buffered_bb);
 
 so that it remains symetric with ap_filter_setaside_brigade() above
 (which increments c-data_in_output_filters).

I don’t follow - that means something different as I’m reading it. We must only 
signal that we have less data in the output filters if we actually originally 
had data in the output filters, ie buffered_bb existed and wasn’t empty.

 It also has the advantage to simplify the calling code in
 ap_core_output_filter(), where we could simply do :
 
if ((ctx-buffered_bb != NULL)  !APR_BRIGADE_EMPTY(ctx-buffered_bb)) {
should_write = ap_filter_reinstate_brigade(f, ctx-buffered_bb, bb,
flush_upto);
}
else if (APR_BRIGADE_EMPTY(bb)) {
return APR_SUCCESS;
}
 
if (flush_upto != NULL) {
...
}
 
if (should_write) {
...
}
 
 removing the empty variable danse and the duplicated
 send_brigade_nonblocking() code.
 
 +}
 +
 [...]
 +
 +bytes_in_brigade = 0;
 +non_file_bytes_in_brigade = 0;
 +eor_buckets_in_brigade = 0;
 +morphing_bucket_in_brigade = 0;
 +
 +for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb);
 + bucket = next) {
 +next = APR_BUCKET_NEXT(bucket);
 +
 [...]
 +
 +if (APR_BUCKET_IS_FLUSH(bucket)
 +|| non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER
 +|| morphing_bucket_in_brigade
 +|| eor_buckets_in_brigade  MAX_REQUESTS_IN_PIPELINE) {
 +/* this segment of the brigade MUST be sent before returning. */
 +
 [...]
 +/*
 + * Defer the actual blocking write to avoid doing many writes.
 + */
 +if (flush_upto) {
 +*flush_upto = next;
 +}
 +
 +bytes_in_brigade = 0;
 +non_file_bytes_in_brigade = 0;
 +eor_buckets_in_brigade = 0;
 +morphing_bucket_in_brigade = 0;
 
 By resetting the counters here, and mainly bytes_in_brigade, the value
 returned by the function indicates whether we should write
 (nonblocking) the bytes *after* the flush_upto bucket, ie. not
 including the write (blocking) we should do for the bytes *before*
 this bucket.
 This is not clear in the description of the function (util_filter.h),
 particularly because flush_upto can be passed NULL.
 Does it make sense to call it without flush_upto? How would the caller
 know about the preceding bytes?

True. I’ve made flush_upto mandatory.

 
 +}
 +}
 +
 [...]
 +
 +return bytes_in_brigade = THRESHOLD_MIN_WRITE;
 +}
 +
 +AP_DECLARE(apr_status_t) ap_filter_should_yield(ap_filter_t *f)
 +{
 
 Shouldn't this function return an int?

It should. Fixed.

Regards,
Graham
—


httpd-async-fullstack-ssl2.patch
Description: Binary data

RE: [Patch] Async write completion for the full connection filter stack

2014-09-09 Thread Plüm , Rüdiger , Vodafone Group

 -Original Message-
 From: Jim Jagielski [mailto:j...@jagunet.com]
 Sent: Montag, 8. September 2014 21:31
 To: dev@httpd.apache.org
 Subject: Re: [Patch] Async write completion for the full connection filter
 stack

 Another consideration: We now have the idea of a master
 and slave connection, and maybe something there would
 also help...

 FWIW: I like using an empty bucket conceptually since it should
 be ez and quick to check.

Agreed, but I think from design perspective using the empty brigade is a side 
effect
we assign to it that is not immediately jumping at your eyes especially if you 
are
just developing modules.
Thinking the below further we might need some kind of advisor API for the 
filters
that tells how much data they should consume to avoid buffering too much and 
how much they
can send down the chain without ending up in a blocking write.
How much buffering is advised could be set by a configuration directive.

Regards

Rüdiger

 On Sep 8, 2014, at 2:53 PM, Ruediger Pluem rpl...@apache.org wrote:

  Wouldn't it make more sense instead of using an empty brigade to create
 yet another metabucket that signals write
  completion? It could also contain information how much data to send down
 the chain for single filters if they e.g. send
  heap or transient buckets. Otherwise how should they know?
  If you have a filter that has a large file bucket set aside and it does
 transform it e.g. to a heap bucket during it's
  processing because it changes data on it I guess it doesn't make sense
 if it does send all stuff once it gets triggered
  for write completion as we would end up in a blocking write then in the
 core filter. But if it knows how much is left in
  the core filter buffer it could try to just sent this and avoid thus
 blocking writes. And if there is no room left in
  the buffer or if what is left is too small for the filter to operate on
 it, the filter could just pass the bucket down
  the chain and if it would end up in the core output filter, the core
 output filter would just try to write what it has
  buffered.

  Regards

  Rüdiger

  Jim Jagielski wrote:
  Gotcha... +1
  On Sep 8, 2014, at 11:29 AM, Graham Leggett minf...@sharp.fm wrote:

  On 08 Sep 2014, at 3:50 PM, Jim Jagielski j...@jagunet.com wrote:

  This is pretty cool... haven't played too much with it, but
  via inspection I like the implementation.

Re: [Patch] Async write completion for the full connection filter stack

2014-09-09 Thread Nick Kew

On Mon, 2014-09-08 at 17:25 +0200, Graham Leggett wrote:

 Ideally, filters should do this, but generally they don’t:
 
 /* Do nothing if asked to filter nothing. */
 if (APR_BRIGADE_EMPTY(bb)) {
 return ap_pass_brigade(f-next, bb);
 }

Why on Earth should filters want to do that, as opposed to:

 Some filters, like mod_deflate, do this:
 
 /* Do nothing if asked to filter nothing. */
 if (APR_BRIGADE_EMPTY(bb)) {
 return APR_SUCCESS;
 }

or similar variants?

 In these cases ap_pass_brigade() is never called, so we detect this by 
 keeping a marker that is changed on every call to ap_pass_brigade(). If the 
 marker wasn’t changed during the call to the filter, we compensate by calling 
 each downstream filter until the marker is changed, or we run out of filters.

Yes.  The logic is that we call ap_pass_brigade when there's
something to pass.  Not when there's nothing: that would just
look like superfluous overhead.

If you have a reason to propagate an immediate event regardless
of that logic, surely that's the business of a FLUSH bucket.
Then the question becomes, is it ever right to absorb (or buffer)
and fail to propagate a FLUSH?  You seem instead to be ascribing
FLUSH semantics to an empty brigade!

As a filter developer, it's my business to pass a brigade when:
 1) I'm ready to pass data.
 2) I encounter EOS, when I must finish up and propagate it.
 3) I am explicitly signalled to FLUSH whatever I can.
What am I missing?  Do we have a need to refine the FLUSH
bucket type?  Maybe an EVENT bucket carrying an event descriptor?

-- 
Nick Kew

Re: [Patch] Async write completion for the full connection filter stack

2014-09-09 Thread Graham Leggett


On 09 Sep 2014, at 10:58 AM, Nick Kew n...@webthing.com wrote:

 Ideally, filters should do this, but generally they don’t:
 
/* Do nothing if asked to filter nothing. */
if (APR_BRIGADE_EMPTY(bb)) {
return ap_pass_brigade(f-next, bb);
}
 
 Why on Earth should filters want to do that, as opposed to:
 
 Some filters, like mod_deflate, do this:
 
/* Do nothing if asked to filter nothing. */
if (APR_BRIGADE_EMPTY(bb)) {
return APR_SUCCESS;
}
 
 or similar variants?

Because if they did, the compensation code in ap_pass_brigade() wouldn’t be 
necessary.

 In these cases ap_pass_brigade() is never called, so we detect this by 
 keeping a marker that is changed on every call to ap_pass_brigade(). If the 
 marker wasn’t changed during the call to the filter, we compensate by 
 calling each downstream filter until the marker is changed, or we run out of 
 filters.
 
 Yes.  The logic is that we call ap_pass_brigade when there's
 something to pass.  Not when there's nothing: that would just
 look like superfluous overhead.
 
 If you have a reason to propagate an immediate event regardless
 of that logic, surely that's the business of a FLUSH bucket.
 Then the question becomes, is it ever right to absorb (or buffer)
 and fail to propagate a FLUSH?  You seem instead to be ascribing
 FLUSH semantics to an empty brigade!

To be clear, an empty brigade does _not_ mean flush, not even slightly.

Flush means “stop everything and perform this potentially expensive task to 
completion right now”, and is the exact opposite of what we’re trying to 
achieve.

 As a filter developer, it's my business to pass a brigade when:
 1) I'm ready to pass data.
 2) I encounter EOS, when I must finish up and propagate it.
 3) I am explicitly signalled to FLUSH whatever I can.
 What am I missing?  Do we have a need to refine the FLUSH
 bucket type?  Maybe an EVENT bucket carrying an event descriptor?

In a synchronous world where it doesn’t matter how long a unit of work takes, 
sure.

In an async world you need to break up long running tasks into short running 
ones, so that others get a chance to have their data sent in the same thread, 
then this doesn’t work. Filters need to be able to yield and setaside data when 
they’re given too much data to process just like the core filter can - but 
right now they can’t, because that filter will never get called again, because 
upstream has no data to send.

Regards,
Graham
—

Re: [Patch] Async write completion for the full connection filter stack

2014-09-09 Thread Graham Leggett

On 08 Sep 2014, at 8:53 PM, Ruediger Pluem rpl...@apache.org wrote:

 Wouldn't it make more sense instead of using an empty brigade to create yet 
 another metabucket that signals write
 completion? It could also contain information how much data to send down the 
 chain for single filters if they e.g. send
 heap or transient buckets. Otherwise how should they know?
 If you have a filter that has a large file bucket set aside and it does 
 transform it e.g. to a heap bucket during it's
 processing because it changes data on it I guess it doesn't make sense if it 
 does send all stuff once it gets triggered
 for write completion as we would end up in a blocking write then in the core 
 filter. But if it knows how much is left in
 the core filter buffer it could try to just sent this and avoid thus blocking 
 writes. And if there is no room left in
 the buffer or if what is left is too small for the filter to operate on it, 
 the filter could just pass the bucket down
 the chain and if it would end up in the core output filter, the core output 
 filter would just try to write what it has
 buffered.

I spent a lot of time going down this path of having a dedicated metabucket, 
and quickly got bogged down in complexity. The key problem was “what does a 
filter actually do when you get one”, it was unclear and it made my head bleed. 
That makes life hard for module authors and that is bad. As I recall there were 
also broken filters out there that only knew about FLUSH and EOS buckets (eg 
ap_http_chunk_filter()).

The problem we’re trying to solve is one of starvation - no filters can set 
aside data for later (except core via the NULL hack), because there is no 
guarantee that they’ll ever be called again later. You have to write it now, or 
potentially write it never. The start of the solution is ensure filters aren’t 
starved: if you have data in the output filters - and obviously you have no 
idea which filters have setaside data - you need a way to wake them all up. The 
simplest and least disruptive way is to pass them all an empty brigade, job 
done. We’ve got precedent for this - we’ve been sending NULL to the core filter 
to achieve the same thing, we want something that works with any filter.

The second part of the problem is filters biting off more than they can chew. 
Example: give mod_ssl a 1GB file bucket and mod_ssl won’t yield until that 
entire 1GB file has been sent for the reason (now solved) above. The next step 
to enable write completion is to teach filters like mod_ssl to yield when 
handling large quantities of data.

The core filter has an algorithm to yield, including various checks for flow 
control and sanity with respect to file handles. If a variant of this algorithm 
could be exposed generically and made available to critical filters like 
mod_ssl, we’ll crack write completion.

Regards,
Graham
—

Re: [Patch] Async write completion for the full connection filter stack

2014-09-08 Thread Jim Jagielski

This is pretty cool... haven't played too much with it, but
via inspection I like the implementation.

One question:

===
--- server/util_filter.c(revision 1622873)
+++ server/util_filter.c(working copy)
@@ -566,6 +566,16 @@
 {
 if (next) {
 apr_bucket *e;
+unsigned int activity;
+int empty = APR_BRIGADE_EMPTY(bb);
+apr_status_t status;
...
+while (APR_SUCCESS == status  empty) {
+next = next-next;
+if (next  next-c-activity == activity) {
+status = next-frec-filter_func.out_func(next, bb);
+}
+else {
+break;
+}
+}
+
+return status;

Why does while check for empty? Wouldn't it be faster to
wrap that while in a 'if (empty)' ? It would safe an
addition check on a var that doesn't change on each loop
thru the while().

Re: [Patch] Async write completion for the full connection filter stack

2014-09-08 Thread Graham Leggett

On 08 Sep 2014, at 7:50 AM, Nick Kew n...@webthing.com wrote:

 +/* No problems found, and we were we sent an empty brigade,
 and
 + * did this empty brigade not get passed on by a filter to
 the next
 + * filter in the chain? Compensate by passing the empty
 brigade to
 + * the next filter, so every filter gets a turn to write.
 This
 + * occurs during write completion.
 + */ 
 
 You mean this code can only ever be entered during write completion?

In theory yes, but practically only when the brigade is empty and there is 
nothing to write.

 Else how does this play with a filter that legitimately/necessarily 
 buffers everything - e.g. XSLT?

Right now, that filter will carry on doing that.

 Or a brigade containing a single metadata bucket and no data,
 introduced by one filter as a signal to another?

Then the brigade wouldn’t be empty.

Basically what the code does is ensure that every filter in the chain gets 
called and therefore has an opportunity to write cached data, regardless of 
whether one filter has elected to not call apr_pass_brigade(). The filters are 
called in order, and no filters are missed.

Ideally, filters should do this, but generally they don’t:

/* Do nothing if asked to filter nothing. */
if (APR_BRIGADE_EMPTY(bb)) {
return ap_pass_brigade(f-next, bb);
}

Some filters, like mod_deflate, do this:

/* Do nothing if asked to filter nothing. */
if (APR_BRIGADE_EMPTY(bb)) {
return APR_SUCCESS;
}

Others do the same thing by virtue of looping across the brigade bucket by 
bucket and then exiting - no buckets, we just have an exit.

In these cases ap_pass_brigade() is never called, so we detect this by keeping 
a marker that is changed on every call to ap_pass_brigade(). If the marker 
wasn’t changed during the call to the filter, we compensate by calling each 
downstream filter until the marker is changed, or we run out of filters.

One thing to think about is whether to to do this unconditionally - in other 
words send an empty brigade downstream when no downstream ap_pass_brigade() was 
detected regardless of whether the incoming brigade was empty or not. This 
covers the case where a buffering filter like the one you describe is reading 
in data from request N+1, while data in the core from request N is still being 
streamed over the network (and isn’t, because the core filter never gets called 
until buffering-filter is done). Request N isn’t delayed waiting for request 
N+1 to finish. To ensure that we don’t waste cycles we can do this when 
c-data_in_output_filters is non zero. We can also turn 
c-data_in_output_filters from a switch usable by the core filter only into a 
nested counter usable by any filter, so any filter can setaside buckets for any 
reason, and we’ll do the right thing. Patch attached.

Regards,
Graham
—


httpd-async-fullstack2.patch
Description: Binary data

Re: [Patch] Async write completion for the full connection filter stack

2014-09-08 Thread Graham Leggett

On 08 Sep 2014, at 3:50 PM, Jim Jagielski j...@jagunet.com wrote:

 This is pretty cool... haven't played too much with it, but
 via inspection I like the implementation.
 
 One question:
 
 ===
 --- server/util_filter.c  (revision 1622873)
 +++ server/util_filter.c  (working copy)
 @@ -566,6 +566,16 @@
 {
 if (next) {
 apr_bucket *e;
 +unsigned int activity;
 +int empty = APR_BRIGADE_EMPTY(bb);
 +apr_status_t status;
 ...
 +while (APR_SUCCESS == status  empty) {
 +next = next-next;
 +if (next  next-c-activity == activity) {
 +status = next-frec-filter_func.out_func(next, bb);
 +}
 +else {
 +break;
 +}
 +}
 +
 +return status;
 
 Why does while check for empty? Wouldn't it be faster to
 wrap that while in a 'if (empty)' ? It would safe an
 addition check on a var that doesn't change on each loop
 thru the while().

It would - that was originally an if that became a while when I realised we had 
to compensate for multiple filters not calling ap_pass_brigade(), instead of 
just one.

The new patch takes out empty completely and instead tracks 
c-data_in_output_filters, which could change during the loop.

Regards,
Graham
—

Re: [Patch] Async write completion for the full connection filter stack

2014-09-08 Thread Jim Jagielski

Gotcha... +1
On Sep 8, 2014, at 11:29 AM, Graham Leggett minf...@sharp.fm wrote:

 On 08 Sep 2014, at 3:50 PM, Jim Jagielski j...@jagunet.com wrote:
 
 This is pretty cool... haven't played too much with it, but
 via inspection I like the implementation.
 
 One question:
 
 ===
 --- server/util_filter.c (revision 1622873)
 +++ server/util_filter.c (working copy)
 @@ -566,6 +566,16 @@
 {
if (next) {
apr_bucket *e;
 +unsigned int activity;
 +int empty = APR_BRIGADE_EMPTY(bb);
 +apr_status_t status;
 ...
 +while (APR_SUCCESS == status  empty) {
 +next = next-next;
 +if (next  next-c-activity == activity) {
 +status = next-frec-filter_func.out_func(next, bb);
 +}
 +else {
 +break;
 +}
 +}
 +
 +return status;
 
 Why does while check for empty? Wouldn't it be faster to
 wrap that while in a 'if (empty)' ? It would safe an
 addition check on a var that doesn't change on each loop
 thru the while().
 
 It would - that was originally an if that became a while when I realised we 
 had to compensate for multiple filters not calling ap_pass_brigade(), instead 
 of just one.
 
 The new patch takes out empty completely and instead tracks 
 c-data_in_output_filters, which could change during the loop.
 
 Regards,
 Graham
 —

Re: [Patch] Async write completion for the full connection filter stack

2014-09-08 Thread Ruediger Pluem

Wouldn't it make more sense instead of using an empty brigade to create yet 
another metabucket that signals write
completion? It could also contain information how much data to send down the 
chain for single filters if they e.g. send
heap or transient buckets. Otherwise how should they know?
If you have a filter that has a large file bucket set aside and it does 
transform it e.g. to a heap bucket during it's
processing because it changes data on it I guess it doesn't make sense if it 
does send all stuff once it gets triggered
for write completion as we would end up in a blocking write then in the core 
filter. But if it knows how much is left in
the core filter buffer it could try to just sent this and avoid thus blocking 
writes. And if there is no room left in
the buffer or if what is left is too small for the filter to operate on it, the 
filter could just pass the bucket down
the chain and if it would end up in the core output filter, the core output 
filter would just try to write what it has
buffered.


Regards

Rüdiger

Jim Jagielski wrote:
 Gotcha... +1
 On Sep 8, 2014, at 11:29 AM, Graham Leggett minf...@sharp.fm wrote:
 
 On 08 Sep 2014, at 3:50 PM, Jim Jagielski j...@jagunet.com wrote:

 This is pretty cool... haven't played too much with it, but
 via inspection I like the implementation.

Re: [Patch] Async write completion for the full connection filter stack

2014-09-08 Thread Jim Jagielski

Another consideration: We now have the idea of a master
and slave connection, and maybe something there would
also help...

FWIW: I like using an empty bucket conceptually since it should
be ez and quick to check.
On Sep 8, 2014, at 2:53 PM, Ruediger Pluem rpl...@apache.org wrote:

 Wouldn't it make more sense instead of using an empty brigade to create yet 
 another metabucket that signals write
 completion? It could also contain information how much data to send down the 
 chain for single filters if they e.g. send
 heap or transient buckets. Otherwise how should they know?
 If you have a filter that has a large file bucket set aside and it does 
 transform it e.g. to a heap bucket during it's
 processing because it changes data on it I guess it doesn't make sense if it 
 does send all stuff once it gets triggered
 for write completion as we would end up in a blocking write then in the core 
 filter. But if it knows how much is left in
 the core filter buffer it could try to just sent this and avoid thus blocking 
 writes. And if there is no room left in
 the buffer or if what is left is too small for the filter to operate on it, 
 the filter could just pass the bucket down
 the chain and if it would end up in the core output filter, the core output 
 filter would just try to write what it has
 buffered.
 
 
 Regards
 
 Rüdiger
 
 Jim Jagielski wrote:
 Gotcha... +1
 On Sep 8, 2014, at 11:29 AM, Graham Leggett minf...@sharp.fm wrote:
 
 On 08 Sep 2014, at 3:50 PM, Jim Jagielski j...@jagunet.com wrote:
 
 This is pretty cool... haven't played too much with it, but
 via inspection I like the implementation.

Re: [Patch] Async write completion for the full connection filter stack

2014-09-07 Thread Nick Kew

On Mon, 2014-09-08 at 03:31 +0200, Graham Leggett wrote:

Hmm, maybe I'm just being thick 'cos I should be asleep
at this hour, but ...

 +/* No problems found, and we were we sent an empty brigade,
 and
 + * did this empty brigade not get passed on by a filter to
 the next
 + * filter in the chain? Compensate by passing the empty
 brigade to
 + * the next filter, so every filter gets a turn to write.
 This
 + * occurs during write completion.
 + */ 

You mean this code can only ever be entered during write completion?

Else how does this play with a filter that legitimately/necessarily 
buffers everything - e.g. XSLT?

Or a brigade containing a single metadata bucket and no data,
introduced by one filter as a signal to another?

-- 
Nick Kew

55 matches

Mail list logo