Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)

2009-02-20 Thread Joe Orton
On Thu, Feb 19, 2009 at 10:00:50PM +0100, Ruediger Pluem wrote:
 On 02/19/2009 12:32 PM, Joe Orton wrote:
...
  @@ -497,13 +500,17 @@
   next = APR_BUCKET_NEXT(bucket);
   }
   bytes_in_brigade += bucket-length;
  -if (!APR_BUCKET_IS_FILE(bucket)) {
  +if (APR_BUCKET_IS_FILE(bucket)) {
  +num_files_in_brigade++;
  +} 
  +else {
   non_file_bytes_in_brigade += bucket-length;
   }
   }
   }
   
  -if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER) {
  +if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER
  +|| num_files_in_brigade = THRESHOLD_MAX_FILES) {
 
 If the 16 FD's were split over more then one brigade and the
 brigades before us were set aside the FD's belong already to the wrong pool
 (the connection pool). Deleting a file bucket doesn't close the FD it uses.

Not sure what the concern is there - this loop is iterating over the 
concatenation of the buffered brigade an the new brigade (right?), so 
it will count the total number of buckets which are potentially left 
buffered after this c_o_f invocation terminates.

w.r.t. MMAP buckets: the 64K bytes limit will apply here, since FILE 
only morphs into MMAP if the file size is  8K.  (And no, theoretically, 
the fd from which an MMAP bucket was derived is not needed after the 
mmap() call, but I don't think the fds actually get closed earlier than 
pool closure, normally)

w.r.t. multiple FILE buckets for a single fd; failing to buffer as much 
as possible will not be the end of the world, but I suppose that case is 
common, so we could cope with it something like this.  (completely 
untested)

Index: server/core_filters.c
===
--- server/core_filters.c   (revision 734690)
+++ server/core_filters.c   (working copy)
@@ -367,6 +367,7 @@
 
 #define THRESHOLD_MIN_WRITE 4096
 #define THRESHOLD_MAX_BUFFER 65536
+#define THRESHOLD_MAX_FILES 16
 
 /* Optional function coming from mod_logio, used for logging of output
  * traffic
@@ -380,7 +381,8 @@
 core_output_filter_ctx_t *ctx = net-out_ctx;
 apr_bucket_brigade *bb;
 apr_bucket *bucket, *next;
-apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
+apr_size_t bytes_in_brigade, non_file_bytes_in_brigade, 
num_files_in_brigade;
+apr_file_t *last_file_seen;
 
 /* Fail quickly if the connection has already been aborted. */
 if (c-aborted) {
@@ -466,6 +468,9 @@
 
 bytes_in_brigade = 0;
 non_file_bytes_in_brigade = 0;
+num_files_in_brigade = 0;
+last_file_seen = NULL;
+
 for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb);
  bucket = next) {
 next = APR_BUCKET_NEXT(bucket);
@@ -497,13 +502,20 @@
 next = APR_BUCKET_NEXT(bucket);
 }
 bytes_in_brigade += bucket-length;
-if (!APR_BUCKET_IS_FILE(bucket)) {
+if (APR_BUCKET_IS_FILE(bucket) 
+ (last_file_seen == NULL
+|| last_file_seen != ((apr_bucket_file 
*)bucket-data)-fd)) {
+num_files_in_brigade++;
+last_file_seen = ((apr_bucket_file *)bucket-data)-fd;
+} 
+else {
 non_file_bytes_in_brigade += bucket-length;
 }
 }
 }
 
-if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER) {
+if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER
+|| num_files_in_brigade = THRESHOLD_MAX_FILES) {
 /* ### Writing the entire brigade may be excessive; we really just
  * ### need to send enough data to be under THRESHOLD_MAX_BUFFER.
  */


Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)

2009-02-19 Thread Joe Orton
On Wed, Feb 18, 2009 at 09:39:31PM +0100, Ruediger Pluem wrote:
 On 02/18/2009 11:16 AM, Joe Orton wrote:
 There is still a nasty issue with the trunk code that can cause you to
 run out of FD's as the new non blocking core output filter has some trouble
 setting aside the file buckets to the correct pool (it puts them in the
 connection pool which is too long living and bad).
 
 See thread starting at
 
 http://mail-archives.apache.org/mod_mbox/httpd-dev/200901.mbox/%3c495ff1b9.9070...@kippdata.de%3e
 
 I admit that I lost momentum on this, but this definitely needs to be fixed.

Yes, that looks broken.

Something like the patch below should fix it, though I'm not sure it's 
really a good idea to be allowing a pathological case of 16 fds consumed 
per client.  It might present some interesting tuning challenges; 
hitting fd limits before hitting num(client) limits.

Index: core_filters.c
===
--- core_filters.c  (revision 734690)
+++ core_filters.c  (working copy)
@@ -367,6 +367,7 @@
 
 #define THRESHOLD_MIN_WRITE 4096
 #define THRESHOLD_MAX_BUFFER 65536
+#define THRESHOLD_MAX_FILES 16
 
 /* Optional function coming from mod_logio, used for logging of output
  * traffic
@@ -380,7 +381,7 @@
 core_output_filter_ctx_t *ctx = net-out_ctx;
 apr_bucket_brigade *bb;
 apr_bucket *bucket, *next;
-apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
+apr_size_t bytes_in_brigade, non_file_bytes_in_brigade, 
num_files_in_brigade;
 
 /* Fail quickly if the connection has already been aborted. */
 if (c-aborted) {
@@ -466,6 +467,8 @@
 
 bytes_in_brigade = 0;
 non_file_bytes_in_brigade = 0;
+num_files_in_brigade = 0;
+
 for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb);
  bucket = next) {
 next = APR_BUCKET_NEXT(bucket);
@@ -497,13 +500,17 @@
 next = APR_BUCKET_NEXT(bucket);
 }
 bytes_in_brigade += bucket-length;
-if (!APR_BUCKET_IS_FILE(bucket)) {
+if (APR_BUCKET_IS_FILE(bucket)) {
+num_files_in_brigade++;
+} 
+else {
 non_file_bytes_in_brigade += bucket-length;
 }
 }
 }
 
-if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER) {
+if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER
+|| num_files_in_brigade = THRESHOLD_MAX_FILES) {
 /* ### Writing the entire brigade may be excessive; we really just
  * ### need to send enough data to be under THRESHOLD_MAX_BUFFER.
  */



Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)

2009-02-19 Thread Ruediger Pluem


On 02/19/2009 12:32 PM, Joe Orton wrote:
 On Wed, Feb 18, 2009 at 09:39:31PM +0100, Ruediger Pluem wrote:
 On 02/18/2009 11:16 AM, Joe Orton wrote:
 There is still a nasty issue with the trunk code that can cause you to
 run out of FD's as the new non blocking core output filter has some trouble
 setting aside the file buckets to the correct pool (it puts them in the
 connection pool which is too long living and bad).

 See thread starting at

 http://mail-archives.apache.org/mod_mbox/httpd-dev/200901.mbox/%3c495ff1b9.9070...@kippdata.de%3e

 I admit that I lost momentum on this, but this definitely needs to be fixed.
 
 Yes, that looks broken.
 
 Something like the patch below should fix it, though I'm not sure it's 
 really a good idea to be allowing a pathological case of 16 fds consumed 
 per client.  It might present some interesting tuning challenges; 
 hitting fd limits before hitting num(client) limits.

Thanks for the patch and sorry for picking on the details :-)

 
 Index: core_filters.c
 ===
 --- core_filters.c(revision 734690)
 +++ core_filters.c(working copy)
 @@ -367,6 +367,7 @@
  
  #define THRESHOLD_MIN_WRITE 4096
  #define THRESHOLD_MAX_BUFFER 65536
 +#define THRESHOLD_MAX_FILES 16
  
  /* Optional function coming from mod_logio, used for logging of output
   * traffic
 @@ -380,7 +381,7 @@
  core_output_filter_ctx_t *ctx = net-out_ctx;
  apr_bucket_brigade *bb;
  apr_bucket *bucket, *next;
 -apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
 +apr_size_t bytes_in_brigade, non_file_bytes_in_brigade, 
 num_files_in_brigade;
  
  /* Fail quickly if the connection has already been aborted. */
  if (c-aborted) {
 @@ -466,6 +467,8 @@
  
  bytes_in_brigade = 0;
  non_file_bytes_in_brigade = 0;
 +num_files_in_brigade = 0;
 +
  for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb);
   bucket = next) {
  next = APR_BUCKET_NEXT(bucket);
 @@ -497,13 +500,17 @@
  next = APR_BUCKET_NEXT(bucket);
  }
  bytes_in_brigade += bucket-length;
 -if (!APR_BUCKET_IS_FILE(bucket)) {
 +if (APR_BUCKET_IS_FILE(bucket)) {
 +num_files_in_brigade++;

Hm, how do you know that the FD in this bucket is different from the one
in the last file bucket(s). It may be possible that we have file buckets
in the brigade that represent different parts of the same file as a filter
might have put something in between tehm in the form of heap buckets or 
something
like that.
Furthermore what about MMAP buckets? Don't they have an open FD as well?

Regards

Rüdiger



Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)

2009-02-19 Thread Ruediger Pluem


On 02/19/2009 12:32 PM, Joe Orton wrote:
 On Wed, Feb 18, 2009 at 09:39:31PM +0100, Ruediger Pluem wrote:
 On 02/18/2009 11:16 AM, Joe Orton wrote:
 There is still a nasty issue with the trunk code that can cause you to
 run out of FD's as the new non blocking core output filter has some trouble
 setting aside the file buckets to the correct pool (it puts them in the
 connection pool which is too long living and bad).

 See thread starting at

 http://mail-archives.apache.org/mod_mbox/httpd-dev/200901.mbox/%3c495ff1b9.9070...@kippdata.de%3e

 I admit that I lost momentum on this, but this definitely needs to be fixed.
 
 Yes, that looks broken.
 
 Something like the patch below should fix it, though I'm not sure it's 
 really a good idea to be allowing a pathological case of 16 fds consumed 
 per client.  It might present some interesting tuning challenges; 
 hitting fd limits before hitting num(client) limits.
 
 Index: core_filters.c
 ===
 --- core_filters.c(revision 734690)
 +++ core_filters.c(working copy)
 @@ -367,6 +367,7 @@
  
  #define THRESHOLD_MIN_WRITE 4096
  #define THRESHOLD_MAX_BUFFER 65536
 +#define THRESHOLD_MAX_FILES 16
  
  /* Optional function coming from mod_logio, used for logging of output
   * traffic
 @@ -380,7 +381,7 @@
  core_output_filter_ctx_t *ctx = net-out_ctx;
  apr_bucket_brigade *bb;
  apr_bucket *bucket, *next;
 -apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
 +apr_size_t bytes_in_brigade, non_file_bytes_in_brigade, 
 num_files_in_brigade;
  
  /* Fail quickly if the connection has already been aborted. */
  if (c-aborted) {
 @@ -466,6 +467,8 @@
  
  bytes_in_brigade = 0;
  non_file_bytes_in_brigade = 0;
 +num_files_in_brigade = 0;
 +
  for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb);
   bucket = next) {
  next = APR_BUCKET_NEXT(bucket);
 @@ -497,13 +500,17 @@
  next = APR_BUCKET_NEXT(bucket);
  }
  bytes_in_brigade += bucket-length;
 -if (!APR_BUCKET_IS_FILE(bucket)) {
 +if (APR_BUCKET_IS_FILE(bucket)) {
 +num_files_in_brigade++;
 +} 
 +else {
  non_file_bytes_in_brigade += bucket-length;
  }
  }
  }
  
 -if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER) {
 +if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER
 +|| num_files_in_brigade = THRESHOLD_MAX_FILES) {

If the 16 FD's were split over more then one brigade and the
brigades before us were set aside the FD's belong already to the wrong pool
(the connection pool). Deleting a file bucket doesn't close the FD it uses.

Regards

Rüdiger




FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)

2009-02-18 Thread Joe Orton
On Mon, Feb 16, 2009 at 03:12:11PM +0100, Ruediger Pluem wrote:
 On 02/16/2009 02:13 PM, Joe Orton wrote:
  Why is it invalid use of the filtering/buckets API to close the file 
  after sending the FILE-containing brigade up the filter stack?  
  
  It seems counter-intuitive to me that *anything* you do after the 
  ap_pass_brigade() call should make a difference to what was passed up 
  the filter stack.  But I suppose this is the case; even most of the 
  memory-backed bucket types don't duplicate referenced memory in the 
  setaside method.
 
 I guess this is not needed in most cases and is not done for performance
 reasons. But maybe we can change the setaside for file buckets to do
 an additional dup on the fd.

The choices seem to be:

1) declare the FLUSH to be be necessary in this case as you said 
originally ;) - and document appropriately

2) rewrite/reinvent all the setaside methods to dup/copy as appropriate

3) declare the core output filter broken and fix it

Of these (2) and (3) will presumably have some negative performance 
impact though I'm not sure to what degree.  Not sure (2) is even going 
to be possible, e.g. with MMAP buckets, scary stuff.

Looking at the cray-zee non-blocking core output filter on the trunk, it 
will setaside with wild abandon unless it hits a FLUSH.  I think all 
these options are pretty distasteful but (1) seems like the only 
feasible route.  What do you think?  

Would be good to get some more opinions on this.

Regards, Joe


Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)

2009-02-18 Thread Ruediger Pluem


On 02/18/2009 11:16 AM, Joe Orton wrote:
 On Mon, Feb 16, 2009 at 03:12:11PM +0100, Ruediger Pluem wrote:
 On 02/16/2009 02:13 PM, Joe Orton wrote:
 Why is it invalid use of the filtering/buckets API to close the file 
 after sending the FILE-containing brigade up the filter stack?  

 It seems counter-intuitive to me that *anything* you do after the 
 ap_pass_brigade() call should make a difference to what was passed up 
 the filter stack.  But I suppose this is the case; even most of the 
 memory-backed bucket types don't duplicate referenced memory in the 
 setaside method.
 I guess this is not needed in most cases and is not done for performance
 reasons. But maybe we can change the setaside for file buckets to do
 an additional dup on the fd.
 
 The choices seem to be:
 
 1) declare the FLUSH to be be necessary in this case as you said 
 originally ;) - and document appropriately
 
 2) rewrite/reinvent all the setaside methods to dup/copy as appropriate
 
 3) declare the core output filter broken and fix it
 
 Of these (2) and (3) will presumably have some negative performance 
 impact though I'm not sure to what degree.  Not sure (2) is even going 
 to be possible, e.g. with MMAP buckets, scary stuff.

Yes, I don't know how MMAP would work with this, so dup might be a bad
approach regarding this. In any case it looks like it would cause a lot
of trouble until working stable again :-)

 
 Looking at the cray-zee non-blocking core output filter on the trunk, it 
 will setaside with wild abandon unless it hits a FLUSH.  I think all 

There is still a nasty issue with the trunk code that can cause you to
run out of FD's as the new non blocking core output filter has some trouble
setting aside the file buckets to the correct pool (it puts them in the
connection pool which is too long living and bad).

See thread starting at

http://mail-archives.apache.org/mod_mbox/httpd-dev/200901.mbox/%3c495ff1b9.9070...@kippdata.de%3e

I admit that I lost momentum on this, but this definitely needs to be fixed.

 these options are pretty distasteful but (1) seems like the only 
 feasible route.  What do you think?  

I guess at least for the moment (1) is the way to go exactly for the
reasons you stated.

Regards

Rüdiger


Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.

2009-02-17 Thread Mladen Turk

Graham Dumpleton wrote:

2009/2/17 Mladen Turk mt...@apache.org:

Graham Dumpleton wrote:

2009/2/17 Joe Orton jor...@redhat.com:

I did used to perform a dup, but was told that this would cause
problems with file locking. Specifically was told:

I'm getting lost here.  What has file locking got to do with it?  Does
mod_wscgi rely on file locking somehow?

I'm lost as well :)


Consider:

  fd1 = 

  lock(fd1)

  fd2 = dup(fd1)

  close(fd2) # will release the lock under some lock APIs even though
not last reference to underlying file object

  write(fd1) # lock has already been released so not gauranteed that only writer

  close(fd1)

At least that is how I understand it from what is being explained to
me and pointed out in various documentation.

So, if fd2 is the file descriptor created for file bucket in Apache,
if it gets closed before application later wants to write to file
through fd1, then application has lost its exclusive ownership
acquired by way of the lock and something else could have acquired
lock and started modifying it on basis that it has exclusive onwership
at that time.



Well, like said that won't work, neither is portable
(eg, apr_os_file_t is HANDLE on win32)

What you will need is the code that will take the Python
object and invoke Python file api feeding the apr_bucket.
(Basically writing the apr_bucket_python_file).

However the simplest thing might be an intermediate temp file, in
which case httpd could reference the file name not the file
object itself. Not sure how woule that work with dynamic file
since apr and python might use different platform locking
mechanisms.

Regards
--
^(TM)


Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.

2009-02-17 Thread Graham Dumpleton
2009/2/17 Mladen Turk mt...@apache.org:
 Graham Dumpleton wrote:

 2009/2/17 Joe Orton jor...@redhat.com:

 I did used to perform a dup, but was told that this would cause
 problems with file locking. Specifically was told:

 I'm getting lost here.  What has file locking got to do with it?  Does
 mod_wscgi rely on file locking somehow?


 I'm lost as well :)

Consider:

  fd1 = 

  lock(fd1)

  fd2 = dup(fd1)

  close(fd2) # will release the lock under some lock APIs even though
not last reference to underlying file object

  write(fd1) # lock has already been released so not gauranteed that only writer

  close(fd1)

At least that is how I understand it from what is being explained to
me and pointed out in various documentation.

So, if fd2 is the file descriptor created for file bucket in Apache,
if it gets closed before application later wants to write to file
through fd1, then application has lost its exclusive ownership
acquired by way of the lock and something else could have acquired
lock and started modifying it on basis that it has exclusive onwership
at that time.

 In WSGI applications, it is possible for the higher level Python web
 application to pass back a file object reference for the response with
 the intent that the WSGI adapter use any optimised methods available
 for sending it back as response. This is where file buckets come into
 the picture to begin with.

 Now it looks that you are trying to intermix the third party
 maintained native OS file descriptors and file buckets.
 You can create the apr_file_t from apr_os_file_t

Which is what it does. Simplified code below:

  apr_os_file_t fd = -1;
  apr_file_t *tmpfile = NULL;

  fd = PyObject_AsFileDescriptor(filelike);

  apr_os_file_put(tmpfile, fd, APR_SENDFILE_ENABLED, self-r-pool);

 (Think you'll have platform portability issues there)

The optimisation is only supported on UNIX systems.

 but the major problem would be to ensure the life cycle
 of the object, since Python has it's own GC and httpd has
 it's pool.
 IMHO you will need a new apr_bucket provider written in
 Python and C for something like that.

CPython uses reference counting. What is referred to as GC in Python
is actually just a mechanism that kicks in under certain circumstances
to break cycles between reference counted objects.

Having a special bucket type which holds a reference to the Python
file object will not help anyway. This is because the close() method
of the Python file object can be called prior to the file bucket being
destroyed. This closing of the Python file object would occur before
the delayed write of file bucket resulting due to the EOS
optimisation. So, same problem as when using naked file descriptor.

Also, using a special bucket type opens another can of works. This is
because multiple interpreters are supported as well as multithreading.
Thus it would be necessary to track the named interpreter in use
within the bucket and have to reaquire the lock on the interpreter
being used and ensure thread state is correctly reinstated. Although
possible to do, it gets a bit messy.

Holding onto the file descriptor to allow the optimisation isn't
really desirable for other reasons as well. This is because the WSGI
specification effectively requires the response content to have been
flushed out to the client before the final call back into the
application to clean up things. In the final call back into the
application to perform cleanup and close stuff like files, it could
technically rewrite the content of the file. If Apache has not
finished writing out the contents of the file, presuming the Python
file object hadn't been closed, then Apache would end up writing
different content to what was expected and possibly truncated content
if file resized.

Summary, you need to have a way of knowing that when you flush
something that it really has been flushed and that Apache is all done
with it.

Graham


Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.

2009-02-17 Thread Graham Dumpleton
2009/2/17 Mladen Turk mt...@apache.org:
 Graham Dumpleton wrote:

 2009/2/17 Mladen Turk mt...@apache.org:

 Graham Dumpleton wrote:

 2009/2/17 Joe Orton jor...@redhat.com:

 I did used to perform a dup, but was told that this would cause
 problems with file locking. Specifically was told:

 I'm getting lost here.  What has file locking got to do with it?  Does
 mod_wscgi rely on file locking somehow?

 I'm lost as well :)

 Consider:

  fd1 = 

  lock(fd1)

  fd2 = dup(fd1)

  close(fd2) # will release the lock under some lock APIs even though
 not last reference to underlying file object

  write(fd1) # lock has already been released so not gauranteed that only
 writer

  close(fd1)

 At least that is how I understand it from what is being explained to
 me and pointed out in various documentation.

 So, if fd2 is the file descriptor created for file bucket in Apache,
 if it gets closed before application later wants to write to file
 through fd1, then application has lost its exclusive ownership
 acquired by way of the lock and something else could have acquired
 lock and started modifying it on basis that it has exclusive onwership
 at that time.


 Well, like said that won't work, neither is portable
 (eg, apr_os_file_t is HANDLE on win32)

I already said I only support the optimisation on UNIX. I don't care
about Windows.

 What you will need is the code that will take the Python
 object and invoke Python file api feeding the apr_bucket.
 (Basically writing the apr_bucket_python_file).

As I already tried to explain, even for the case of the bucket being
used to hold a reference to the Python object, that will not work
because of the gaurantees that WSGI applications require regarding
data needing to be flushed.

 However the simplest thing might be an intermediate temp file, in
 which case httpd could reference the file name not the file
 object itself.

Which would likely be slower than using existing fallback streaming
mechanism available that reads file into memory in blocks and pushes
them through as transient buckets.

 Not sure how woule that work with dynamic file
 since apr and python might use different platform locking
 mechanisms.

Python uses operating system locking mechanisms, just like APR library would.

Graham


Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.

2009-02-16 Thread Joe Orton
On Sat, Feb 14, 2009 at 10:25:08AM +1100, Graham Dumpleton wrote:
...
 What the end result of the code is, is that if you have a file bucket
 getting this far where length of file is less than 8000 and an EOS
 follows it, then the actual file bucket is held over rather than data
 being read and buffered. This is as commented is to avoid doing an
 mmap+memcpy. What it means though is that the file descriptor within
 the file bucket must be maintained and cannot be closed as soon as
 ap_pass_brigade() has been called.

The call to:

ap_save_brigade(f, ctx-b, b, ctx-deferred_write_pool);

in that code path should result in the FILE bucket and the contained fd 
being dup()ed.  (Though if that is failing, you wouldn't know because of 
the lack of error checking)

You say:

 For me this is an issue as the file descriptor has been supplied from
 a special object returned by a higher level application and it would
 be hard to maintain the file as open beyond the life of the request,
 up till end of keep alive or a subsequent request over same
 connection. Doing a dup on the file decriptor is also not necessarily
 an option.

can you explain why a dup shouldn't work?

Regards, Joe


Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.

2009-02-16 Thread Ruediger Pluem


On 02/16/2009 11:07 AM, Joe Orton wrote:
 On Sat, Feb 14, 2009 at 10:25:08AM +1100, Graham Dumpleton wrote:
 ...
 What the end result of the code is, is that if you have a file bucket
 getting this far where length of file is less than 8000 and an EOS
 follows it, then the actual file bucket is held over rather than data
 being read and buffered. This is as commented is to avoid doing an
 mmap+memcpy. What it means though is that the file descriptor within
 the file bucket must be maintained and cannot be closed as soon as
 ap_pass_brigade() has been called.
 
 The call to:
 
 ap_save_brigade(f, ctx-b, b, ctx-deferred_write_pool);
 
 in that code path should result in the FILE bucket and the contained fd 
 being dup()ed.  (Though if that is failing, you wouldn't know because of 
 the lack of error checking)

Are you sure?

apr_file_setaside does not dup the fd.

Regards

Rüdiger



Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.

2009-02-16 Thread Graham Dumpleton
2009/2/16 Joe Orton jor...@redhat.com:
 On Sat, Feb 14, 2009 at 10:25:08AM +1100, Graham Dumpleton wrote:
 ...
 What the end result of the code is, is that if you have a file bucket
 getting this far where length of file is less than 8000 and an EOS
 follows it, then the actual file bucket is held over rather than data
 being read and buffered. This is as commented is to avoid doing an
 mmap+memcpy. What it means though is that the file descriptor within
 the file bucket must be maintained and cannot be closed as soon as
 ap_pass_brigade() has been called.

 The call to:

ap_save_brigade(f, ctx-b, b, ctx-deferred_write_pool);

 in that code path should result in the FILE bucket and the contained fd
 being dup()ed.  (Though if that is failing, you wouldn't know because of
 the lack of error checking)

 You say:

 For me this is an issue as the file descriptor has been supplied from
 a special object returned by a higher level application and it would
 be hard to maintain the file as open beyond the life of the request,
 up till end of keep alive or a subsequent request over same
 connection. Doing a dup on the file decriptor is also not necessarily
 an option.

 can you explain why a dup shouldn't work?

I did used to perform a dup, but was told that this would cause
problems with file locking. Specifically was told:

I am not sure, but it looks like mod_wsgi duplicates the original file
descriptor, sends the file's data, closes the duplicated file
descriptor, and then calls the iterable's close method. That may be
problematic for applications that are using fcntl-based locking; the
closing of the duplicate file descriptor will release the lock before
the iterable's close method gets a chance to execute, so the close
method can no longer depend on the file remaining locked. Ideally, the
duplicate file descriptor shouldn't be closed until *after* the
iterable's close() method has been called.

I never researched it properly as didn't have time at that point, so
simply believed what was being told. A quick check of flock manual
page says:

Locks created by flock() are associated with an open file table
entry. This means that duplicate file descriptors (created by, for
example, fork(2) or dup(2)) refer to the same lock, and this lock may
be modified or released using any of these descriptors. Furthermore,
the lock is released either by an explicit LOCK_UN operation on any of
these duplicate descriptors, or when all such descriptors have been
closed.

So for flock() there is no problem. As to fcntl locking which person
mentioned, I haven't been able to find yet any description of the
behaviour of locks in context of dup'd file descriptors.

If you know of source of information which explains what happens for
fcntl locking and dup'd file descriptors that would help clear things
up. In the mean time I'll keep looking for any information about it.

Graham


Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.

2009-02-16 Thread Joe Orton
On Mon, Feb 16, 2009 at 12:34:26PM +0100, Ruediger Pluem wrote:
 On 02/16/2009 11:07 AM, Joe Orton wrote:
  The call to:
  
  ap_save_brigade(f, ctx-b, b, ctx-deferred_write_pool);
  
  in that code path should result in the FILE bucket and the contained fd 
  being dup()ed.  (Though if that is failing, you wouldn't know because of 
  the lack of error checking)
 
 Are you sure?
 
 apr_file_setaside does not dup the fd.

Oh, good point.  This feels like one of those cases where some core 
assumptions about the filtering/bucket API start melting away underneath 
me.

Why is it invalid use of the filtering/buckets API to close the file 
after sending the FILE-containing brigade up the filter stack?  

It seems counter-intuitive to me that *anything* you do after the 
ap_pass_brigade() call should make a difference to what was passed up 
the filter stack.  But I suppose this is the case; even most of the 
memory-backed bucket types don't duplicate referenced memory in the 
setaside method.

I've always thought of FLUSH as a *hint* towards the desired network 
behaviour.  To say a FLUSH is required here is to model FLUSH as a 
*mandate* on the bucket/object lifetimes.  That's a huge difference.

Indeed, if I go read some API docs in the headers, which is always good 
for a laugh/cry moment:

apr_bucket_eos_make = All filters should flush at this point.

apr_bucket_flush_make = This indicates that filters should flush their
   data.  There is **no guarantee** that they will flush it, but this is 
   the best we can do.  (my emphasis)

So EOS is a better flush then FLUSH, right?

Regards, Joe


Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.

2009-02-16 Thread Joe Orton
On Mon, Feb 16, 2009 at 10:52:15PM +1100, Graham Dumpleton wrote:
 2009/2/16 Joe Orton jor...@redhat.com:
  You say:
 
  For me this is an issue as the file descriptor has been supplied from
  a special object returned by a higher level application and it would
  be hard to maintain the file as open beyond the life of the request,
  up till end of keep alive or a subsequent request over same
  connection. Doing a dup on the file decriptor is also not necessarily
  an option.
 
  can you explain why a dup shouldn't work?
 
 I did used to perform a dup, but was told that this would cause
 problems with file locking. Specifically was told:

I'm getting lost here.  What has file locking got to do with it?  Does 
mod_wscgi rely on file locking somehow?

joe


Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.

2009-02-16 Thread Ruediger Pluem


On 02/16/2009 02:13 PM, Joe Orton wrote:

 
 Why is it invalid use of the filtering/buckets API to close the file 
 after sending the FILE-containing brigade up the filter stack?  
 
 It seems counter-intuitive to me that *anything* you do after the 
 ap_pass_brigade() call should make a difference to what was passed up 
 the filter stack.  But I suppose this is the case; even most of the 
 memory-backed bucket types don't duplicate referenced memory in the 
 setaside method.

I guess this is not needed in most cases and is not done for performance
reasons. But maybe we can change the setaside for file buckets to do
an additional dup on the fd.

 So EOS is a better flush then FLUSH, right?

IMHO no. EOS can cause data to remain in the chain (for optimizing the
pipeline case) whereas FLUSH in almost every case should really get
the buckets processed. Yeah I now the words almost every case are bad,
but I know that some filters are an exception from the almost every case :-(.

Regards

Rüdiger





Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.

2009-02-16 Thread Graham Dumpleton
2009/2/17 Joe Orton jor...@redhat.com:
 On Mon, Feb 16, 2009 at 10:52:15PM +1100, Graham Dumpleton wrote:
 2009/2/16 Joe Orton jor...@redhat.com:
  You say:
 
  For me this is an issue as the file descriptor has been supplied from
  a special object returned by a higher level application and it would
  be hard to maintain the file as open beyond the life of the request,
  up till end of keep alive or a subsequent request over same
  connection. Doing a dup on the file decriptor is also not necessarily
  an option.
 
  can you explain why a dup shouldn't work?

 I did used to perform a dup, but was told that this would cause
 problems with file locking. Specifically was told:

 I'm getting lost here.  What has file locking got to do with it?  Does
 mod_wscgi rely on file locking somehow?

The mod_wsgi package is a gateway for hosting Python web applications.
Modern version of mod_python, but implementing generic/portable Python
WGSI interface rather than being Apache specific. It is what all the
Python people ditching mod_python are moving to.

In WSGI applications, it is possible for the higher level Python web
application to pass back a file object reference for the response with
the intent that the WSGI adapter use any optimised methods available
for sending it back as response. This is where file buckets come into
the picture to begin with.

Now, this file object was created by the Python application and it is
still the owner of it. If it is a file that it had first been
modifying and it needed exclusivity on that, it could well have used
file locks on it. Because locks are involved, the order in which files
contents are used for response and the closure and unlocking of the
file are important.

It appears that fcntl locking on some platforms has the behaviour that
if file descriptor is dup'd, closure of the first reference to the
file will cause release of the lock. That is, lock will not be
released only when last reference to the file is closed. Problems
therefore can arise if you have to dup the file descriptor, because if
the dup'd file descriptor gets closed before Python application had
finished with the file object, possibly involving it having to modify
the file after contents sent, something else could lock it and start
modifying the file before it was done.

In simple terms, the mod_wsgi module doesn't internally open the file
in the first place. Instead it comes from a higher level application
and one can't do things at the lower level that would change the state
of something like the locks associated with the file.

Graham


Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.

2009-02-16 Thread Mladen Turk

Graham Dumpleton wrote:

2009/2/17 Joe Orton jor...@redhat.com:

I did used to perform a dup, but was told that this would cause
problems with file locking. Specifically was told:



I'm getting lost here.  What has file locking got to do with it?  Does
mod_wscgi rely on file locking somehow?




I'm lost as well :)



In WSGI applications, it is possible for the higher level Python web
application to pass back a file object reference for the response with
the intent that the WSGI adapter use any optimised methods available
for sending it back as response. This is where file buckets come into
the picture to begin with.



Now it looks that you are trying to intermix the third party
maintained native OS file descriptors and file buckets.
You can create the apr_file_t from apr_os_file_t
(Think you'll have platform portability issues there)
but the major problem would be to ensure the life cycle
of the object, since Python has it's own GC and httpd has
it's pool.
IMHO you will need a new apr_bucket provider written in
Python and C for something like that.


Regards
--
^(TM)


Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.

2009-02-14 Thread Ruediger Pluem


On 02/14/2009 12:25 AM, Graham Dumpleton wrote:

 
 I know that I can circumvent the EOS optimisation by inserting a flush
 bucket, but based on documentation it isn't gauranteed that a flush
 bucket will always propagate down the filter chain and actually push
 out data.

IMHO using a flush bucket would be the way to go here. This should work
fine.

Regards

Rüdiger



Problems with EOS optimisation in ap_core_output_filter() and file buckets.

2009-02-13 Thread Graham Dumpleton
In ap_core_output_filter() there exists the code starting with:

/* Completed iterating over the brigade, now determine if we want
 * to buffer the brigade or send the brigade out on the network.
 *
 * Save if we haven't accumulated enough bytes to send, the connection
 * is not about to be closed, and:
 *
 *   1) we didn't see a file, we don't have more passes over the
 *  brigade to perform,  AND we didn't stop at a FLUSH bucket.
 *  (IOW, we will save plain old bytes such as HTTP headers)
 * or
 *   2) we hit the EOS and have a keep-alive connection
 *  (IOW, this response is a bit more complex, but we save it
 *   with the hope of concatenating with another response)
 */
if (nbytes + flen  AP_MIN_BYTES_TO_WRITE
 !AP_BUCKET_IS_EOC(last_e)
 ((!fd  !more  !APR_BUCKET_IS_FLUSH(last_e))
|| (APR_BUCKET_IS_EOS(last_e)
 c-keepalive == AP_CONN_KEEPALIVE))) {

This is some sort of optimisation which in the case of a keep alive
connection, will hold over sending data out on connection until later
if EOS is present and amount of data is less than nominal minimum
bytes to send.

Later in this section of code it has:

/* Do a read on each bucket to pull in the
 * data from pipe and socket buckets, so
 * that we don't leave their file descriptors
 * open indefinitely.  Do the same for file
 * buckets, with one exception: allow the
 * first file bucket in the brigade to remain
 * a file bucket, so that we don't end up
 * doing an mmap+memcpy every time a client
 * requests a 8KB file over a keepalive
 * connection.
 */
if (APR_BUCKET_IS_FILE(bucket)  !file_bucket_saved) {
file_bucket_saved = 1;
}
else {
const char *buf;
apr_size_t len = 0;
rv = apr_bucket_read(bucket, buf, len,
 APR_BLOCK_READ);
if (rv != APR_SUCCESS) {
ap_log_cerror(APLOG_MARK, APLOG_ERR, rv,
  c, core_output_filter:
   Error reading from bucket.);
return HTTP_INTERNAL_SERVER_ERROR;
}
}

What the end result of the code is, is that if you have a file bucket
getting this far where length of file is less than 8000 and an EOS
follows it, then the actual file bucket is held over rather than data
being read and buffered. This is as commented is to avoid doing an
mmap+memcpy. What it means though is that the file descriptor within
the file bucket must be maintained and cannot be closed as soon as
ap_pass_brigade() has been called.

For me this is an issue as the file descriptor has been supplied from
a special object returned by a higher level application and it would
be hard to maintain the file as open beyond the life of the request,
up till end of keep alive or a subsequent request over same
connection. Doing a dup on the file decriptor is also not necessarily
an option.

The end result then is that later when file bucket is processed, the
file descriptor has already been closed and one gets the error:

  (9)Bad file descriptor: core_output_filter: writing data to the network

I know that I can circumvent the EOS optimisation by inserting a flush
bucket, but based on documentation it isn't gauranteed that a flush
bucket will always propagate down the filter chain and actually push
out data.

/**
 * Create a flush  bucket.  This indicates that filters should flush their
 * data.  There is no guarantee that they will flush it, but this is the
 * best we can do.
 * @param list The freelist from which this bucket should be allocated
 * @return The new bucket, or NULL if allocation failed
 */
APU_DECLARE(apr_bucket *) apr_bucket_flush_create(apr_bucket_alloc_t *list);

How can one gaurantee that the file bucket will actually be flushed
and not held over by a filter?

If it gets to the core output filter, another way to avoid EOS
optimisation is to forcibly set keepalive to AP_CONN_CLOSE, which for
the application concerned is probably reasonable, but is obviously a
bit of a hack.

Finally, my problem only arises because I insert an eos after file
bucket and before calling ap_pass_brigade(). If one uses ap_send_fd(),
it doesn't insert eos before calling ap_pass_brigade(), with something
else obviously later inserting eos. If ap_pass_brigade() is called
without eos first time and only later with eos, will that help in
ensuring it is flushed out, or can