Re: core_output, files and setaside

2016-05-04 Thread Graham Leggett
On 04 May 2016, at 3:22 PM, Stefan Eissing  wrote:

> file_bucket_setaside() currently does not care about the refcount. setaside 
> affects *all* shared file buckets, wherever they currently reside. So it 
> moves the contained apr_file_t into the filter deferred pool, eventually 
> clears that pool and the file is closed via cleanup of the file (not the 
> bucket).

That’s broken, we need to fix that so it works properly.

> While dup'ing the file descriptor would work, it seems overly costly in this 
> case. What is the case?
> 
> The output filter already makes the distinction wether filter->r is set or 
> not. When filter->r is set, it uses filter->r->pool to setaside buckets it 
> wants to keep around. This is safe since it knows that some time in the 
> future, an EOR bucket will come along and cleanup - at the right time.
> 
> HTTP/2 has a similar bucket lifetime case: only now, there are several 
> requests ongoing and interleaved onto the one master connection. But the 
> basic assumption still holds: there will be some kind of EOR bucket that 
> manages the lifetimes of buckets before it correctly.
> 
> But the output filter does not know this and even if, would not know which 
> pool to setaside which bucket to.

That’s expected - during requests, we set aside into the request pool, where 
requests are one shot and you’re done. During connections however we cannot use 
the connection pool, because the connection pool lives potentially for a very 
long time. This is why the deferred pool exists.

None of this matters though, buckets should work correctly in both cases.

> One way for a generic approach is a new META bucket: POOL_LIFE that carries a 
> pool or NULL. It's contract is: all buckets that follow me have the lifetime 
> of my pool (at least) and this holds until another POOL_LIFE bucket comes 
> along. Pools announced in this way are promised to only disappear after some 
> kind of EOR or FLUSH has been sent.

This breaks the contract of pools - every bucket has a pool, and now there 
would be a second mechanism that duplicates this first mechanism, and as soon 
as there is a mismatch we crash.

Let’s just fix the original bug - make sure that file buckets behave correctly 
when setaside+refcount is used at the same time.

Regards,
Graham
—



Re: core_output, files and setaside

2016-05-04 Thread Stefan Eissing

> Am 04.05.2016 um 13:49 schrieb Graham Leggett :
> 
> On 04 May 2016, at 11:13 AM, Stefan Eissing  
> wrote:
> 
>> The problem is not the apr_bucket_destroy(). The file bucket setaside, calls 
>> apr_file_setaside(), in core_output on a deferred pool, and then core_output 
>> clears that pool. This invalidates all still existing file buckets using 
>> that apr_file.
> 
> This scenario should still work properly, it shouldn’t cause anything to 
> break.
> 
> First off, file buckets need reference counting to make sure that only on the 
> last removal of the bucket the file is closed (it may already do this, I 
> haven’t looked, but then if it did do this properly it should work).
> 
> Next, if a file bucket is setaside, but the reference count isn’t one (in 
> other words, other file buckets exist pointing at the same file descriptor in 
> other places), and the pool we’re setting aside into isn’t the same or a 
> child pool, we should dup the file descriptor during the setaside.
> 
> The typical scenario for the deferred pool should be the first scenario above.

file_bucket_setaside() currently does not care about the refcount. setaside 
affects *all* shared file buckets, wherever they currently reside. So it moves 
the contained apr_file_t into the filter deferred pool, eventually clears that 
pool and the file is closed via cleanup of the file (not the bucket).

While dup'ing the file descriptor would work, it seems overly costly in this 
case. What is the case?

The output filter already makes the distinction wether filter->r is set or not. 
When filter->r is set, it uses filter->r->pool to setaside buckets it wants to 
keep around. This is safe since it knows that some time in the future, an EOR 
bucket will come along and cleanup - at the right time.

HTTP/2 has a similar bucket lifetime case: only now, there are several requests 
ongoing and interleaved onto the one master connection. But the basic 
assumption still holds: there will be some kind of EOR bucket that manages the 
lifetimes of buckets before it correctly.

But the output filter does not know this and even if, would not know which pool 
to setaside which bucket to.

So.

One way for a generic approach is a new META bucket: POOL_LIFE that carries a 
pool or NULL. It's contract is: all buckets that follow me have the lifetime of 
my pool (at least) and this holds until another POOL_LIFE bucket comes along. 
Pools announced in this way are promised to only disappear after some kind of 
EOR or FLUSH has been sent.

Given that requests R1, R2, R3 with pools P1, P2, P3 are under way, a brigade 
passed to core output would look like this:

  [PL P1][R1 DATA][R1 DATA][PL P3][R3 DATA][PL P2][R2 EOR][PL P1][R1 DATA][PL 
NULL]...

where the PL buckets would switch the setaside pool used by core_output. if no 
POOL_LIFE is seen or if the contained pool is NULL, the current defaults 
(r->pool / deferred) are used.

Using pools instead of higher level, protocol specific entities (such as 
request_rec) should make that flexible enough for different use cases.

Thoughts?

-Stefan



Re: core_output, files and setaside

2016-05-04 Thread Graham Leggett
On 04 May 2016, at 11:13 AM, Stefan Eissing  
wrote:

> The problem is not the apr_bucket_destroy(). The file bucket setaside, calls 
> apr_file_setaside(), in core_output on a deferred pool, and then core_output 
> clears that pool. This invalidates all still existing file buckets using that 
> apr_file.

This scenario should still work properly, it shouldn’t cause anything to break.

First off, file buckets need reference counting to make sure that only on the 
last removal of the bucket the file is closed (it may already do this, I 
haven’t looked, but then if it did do this properly it should work).

Next, if a file bucket is setaside, but the reference count isn’t one (in other 
words, other file buckets exist pointing at the same file descriptor in other 
places), and the pool we’re setting aside into isn’t the same or a child pool, 
we should dup the file descriptor during the setaside.

The typical scenario for the deferred pool should be the first scenario above.

Regards,
Graham
—



Re: core_output, files and setaside

2016-05-04 Thread Stefan Eissing

> Am 04.05.2016 um 11:09 schrieb Graham Leggett :
> 
> On 04 May 2016, at 10:45 AM, Stefan Eissing  
> wrote:
> 
>> I have been wrong before...but...
>> 
>> mod_http2 needs to send out a file response:
>> 1. it starts with the response body brigade: [file:0-len][eos]
>> 2. it sends the first 16K frame by splitting the file bucket: 
>>  -> passing to core output: [heap:frame header][file:0-16k]
>>  -> remaining body:  [file:16K-len][eos]
>> 3. core_output decides to setaside:
>>  -> setaside (deferred pool): [heap:frame header][file:0-16k]
>>  -> remaining body:  [file:16K-len][eos]
>> 4. core_output sends and, sometimes, clears the deferred pool
>>  -> which closes the file descriptor
>> 5. next 16K frame: [heap:frame header][file:16k-32K] results in APR_EBADF
> 
> This smells wrong - if you split a file bucket (and there is nothing wrong 
> with splitting a file bucket) you should end up with two file buckets, and 
> destroying the first file bucket while the second file bucket still exists 
> shouldn’t cause the second file bucket descriptor to close.

The problem is not the apr_bucket_destroy(). The file bucket setaside, calls 
apr_file_setaside(), in core_output on a deferred pool, and then core_output 
clears that pool. This invalidates all still existing file buckets using that 
apr_file.

At least, that is my reading of what happens.

> Regards,
> Graham
> —
> 



Re: core_output, files and setaside

2016-05-04 Thread Graham Leggett
On 04 May 2016, at 10:45 AM, Stefan Eissing  
wrote:

> I have been wrong before...but...
> 
> mod_http2 needs to send out a file response:
> 1. it starts with the response body brigade: [file:0-len][eos]
> 2. it sends the first 16K frame by splitting the file bucket: 
>   -> passing to core output: [heap:frame header][file:0-16k]
>   -> remaining body:  [file:16K-len][eos]
> 3. core_output decides to setaside:
>   -> setaside (deferred pool): [heap:frame header][file:0-16k]
>   -> remaining body:  [file:16K-len][eos]
> 4. core_output sends and, sometimes, clears the deferred pool
>   -> which closes the file descriptor
> 5. next 16K frame: [heap:frame header][file:16k-32K] results in APR_EBADF

This smells wrong - if you split a file bucket (and there is nothing wrong with 
splitting a file bucket) you should end up with two file buckets, and 
destroying the first file bucket while the second file bucket still exists 
shouldn’t cause the second file bucket descriptor to close.

Regards,
Graham
—



core_output, files and setaside

2016-05-04 Thread Stefan Eissing
I have been wrong before...but...

mod_http2 needs to send out a file response:
1. it starts with the response body brigade: [file:0-len][eos]
2. it sends the first 16K frame by splitting the file bucket: 
   -> passing to core output: [heap:frame header][file:0-16k]
   -> remaining body:  [file:16K-len][eos]
3. core_output decides to setaside:
   -> setaside (deferred pool): [heap:frame header][file:0-16k]
   -> remaining body:  [file:16K-len][eos]
4. core_output sends and, sometimes, clears the deferred pool
   -> which closes the file descriptor
5. next 16K frame: [heap:frame header][file:16k-32K] results in APR_EBADF

What is different to HTTP/1?
a) http/1 sends the body in one pass when it can. file buckets are never split 
intentionally
b) http/1 sends with a request set in filter->r, so the filter brigade saving 
uses not a new, "deferred" pool, but the request pool. That means that the file 
bucket setaside becomes a NOP

b) is the thing mod_http2 can currently not emulate. There is no request_rec on 
the master connection, and yet the buckets belong to a http2 stream whose 
memory pool is under control. 

2.4.x seems to have similar handling in its output.

What to do? 
I. Not sending file buckets is safe, but http: performance will suffer severely.
II. Sending a new kind of "stream file" bucket that knows how to handle 
setaside better, would make it safe. However the sendfile() code in core will 
not trigger on those.
III. setaside the file to conn_rec->pool and apr_file_close() explicitly at end 
of stream. will work, but wastes a small amount of conn_rec->pool for each file.

I will go ahead and try III, but if you have another idea, I'm all ears.

-Stefan