Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)
On Thu, Feb 19, 2009 at 10:00:50PM +0100, Ruediger Pluem wrote: On 02/19/2009 12:32 PM, Joe Orton wrote: ... @@ -497,13 +500,17 @@ next = APR_BUCKET_NEXT(bucket); } bytes_in_brigade += bucket-length; -if (!APR_BUCKET_IS_FILE(bucket)) { +if (APR_BUCKET_IS_FILE(bucket)) { +num_files_in_brigade++; +} +else { non_file_bytes_in_brigade += bucket-length; } } } -if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER) { +if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER +|| num_files_in_brigade = THRESHOLD_MAX_FILES) { If the 16 FD's were split over more then one brigade and the brigades before us were set aside the FD's belong already to the wrong pool (the connection pool). Deleting a file bucket doesn't close the FD it uses. Not sure what the concern is there - this loop is iterating over the concatenation of the buffered brigade an the new brigade (right?), so it will count the total number of buckets which are potentially left buffered after this c_o_f invocation terminates. w.r.t. MMAP buckets: the 64K bytes limit will apply here, since FILE only morphs into MMAP if the file size is 8K. (And no, theoretically, the fd from which an MMAP bucket was derived is not needed after the mmap() call, but I don't think the fds actually get closed earlier than pool closure, normally) w.r.t. multiple FILE buckets for a single fd; failing to buffer as much as possible will not be the end of the world, but I suppose that case is common, so we could cope with it something like this. (completely untested) Index: server/core_filters.c === --- server/core_filters.c (revision 734690) +++ server/core_filters.c (working copy) @@ -367,6 +367,7 @@ #define THRESHOLD_MIN_WRITE 4096 #define THRESHOLD_MAX_BUFFER 65536 +#define THRESHOLD_MAX_FILES 16 /* Optional function coming from mod_logio, used for logging of output * traffic @@ -380,7 +381,8 @@ core_output_filter_ctx_t *ctx = net-out_ctx; apr_bucket_brigade *bb; apr_bucket *bucket, *next; -apr_size_t bytes_in_brigade, non_file_bytes_in_brigade; +apr_size_t bytes_in_brigade, non_file_bytes_in_brigade, num_files_in_brigade; +apr_file_t *last_file_seen; /* Fail quickly if the connection has already been aborted. */ if (c-aborted) { @@ -466,6 +468,9 @@ bytes_in_brigade = 0; non_file_bytes_in_brigade = 0; +num_files_in_brigade = 0; +last_file_seen = NULL; + for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb); bucket = next) { next = APR_BUCKET_NEXT(bucket); @@ -497,13 +502,20 @@ next = APR_BUCKET_NEXT(bucket); } bytes_in_brigade += bucket-length; -if (!APR_BUCKET_IS_FILE(bucket)) { +if (APR_BUCKET_IS_FILE(bucket) + (last_file_seen == NULL +|| last_file_seen != ((apr_bucket_file *)bucket-data)-fd)) { +num_files_in_brigade++; +last_file_seen = ((apr_bucket_file *)bucket-data)-fd; +} +else { non_file_bytes_in_brigade += bucket-length; } } } -if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER) { +if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER +|| num_files_in_brigade = THRESHOLD_MAX_FILES) { /* ### Writing the entire brigade may be excessive; we really just * ### need to send enough data to be under THRESHOLD_MAX_BUFFER. */
Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)
On Wed, Feb 18, 2009 at 09:39:31PM +0100, Ruediger Pluem wrote: On 02/18/2009 11:16 AM, Joe Orton wrote: There is still a nasty issue with the trunk code that can cause you to run out of FD's as the new non blocking core output filter has some trouble setting aside the file buckets to the correct pool (it puts them in the connection pool which is too long living and bad). See thread starting at http://mail-archives.apache.org/mod_mbox/httpd-dev/200901.mbox/%3c495ff1b9.9070...@kippdata.de%3e I admit that I lost momentum on this, but this definitely needs to be fixed. Yes, that looks broken. Something like the patch below should fix it, though I'm not sure it's really a good idea to be allowing a pathological case of 16 fds consumed per client. It might present some interesting tuning challenges; hitting fd limits before hitting num(client) limits. Index: core_filters.c === --- core_filters.c (revision 734690) +++ core_filters.c (working copy) @@ -367,6 +367,7 @@ #define THRESHOLD_MIN_WRITE 4096 #define THRESHOLD_MAX_BUFFER 65536 +#define THRESHOLD_MAX_FILES 16 /* Optional function coming from mod_logio, used for logging of output * traffic @@ -380,7 +381,7 @@ core_output_filter_ctx_t *ctx = net-out_ctx; apr_bucket_brigade *bb; apr_bucket *bucket, *next; -apr_size_t bytes_in_brigade, non_file_bytes_in_brigade; +apr_size_t bytes_in_brigade, non_file_bytes_in_brigade, num_files_in_brigade; /* Fail quickly if the connection has already been aborted. */ if (c-aborted) { @@ -466,6 +467,8 @@ bytes_in_brigade = 0; non_file_bytes_in_brigade = 0; +num_files_in_brigade = 0; + for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb); bucket = next) { next = APR_BUCKET_NEXT(bucket); @@ -497,13 +500,17 @@ next = APR_BUCKET_NEXT(bucket); } bytes_in_brigade += bucket-length; -if (!APR_BUCKET_IS_FILE(bucket)) { +if (APR_BUCKET_IS_FILE(bucket)) { +num_files_in_brigade++; +} +else { non_file_bytes_in_brigade += bucket-length; } } } -if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER) { +if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER +|| num_files_in_brigade = THRESHOLD_MAX_FILES) { /* ### Writing the entire brigade may be excessive; we really just * ### need to send enough data to be under THRESHOLD_MAX_BUFFER. */
Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)
On 02/19/2009 12:32 PM, Joe Orton wrote: On Wed, Feb 18, 2009 at 09:39:31PM +0100, Ruediger Pluem wrote: On 02/18/2009 11:16 AM, Joe Orton wrote: There is still a nasty issue with the trunk code that can cause you to run out of FD's as the new non blocking core output filter has some trouble setting aside the file buckets to the correct pool (it puts them in the connection pool which is too long living and bad). See thread starting at http://mail-archives.apache.org/mod_mbox/httpd-dev/200901.mbox/%3c495ff1b9.9070...@kippdata.de%3e I admit that I lost momentum on this, but this definitely needs to be fixed. Yes, that looks broken. Something like the patch below should fix it, though I'm not sure it's really a good idea to be allowing a pathological case of 16 fds consumed per client. It might present some interesting tuning challenges; hitting fd limits before hitting num(client) limits. Thanks for the patch and sorry for picking on the details :-) Index: core_filters.c === --- core_filters.c(revision 734690) +++ core_filters.c(working copy) @@ -367,6 +367,7 @@ #define THRESHOLD_MIN_WRITE 4096 #define THRESHOLD_MAX_BUFFER 65536 +#define THRESHOLD_MAX_FILES 16 /* Optional function coming from mod_logio, used for logging of output * traffic @@ -380,7 +381,7 @@ core_output_filter_ctx_t *ctx = net-out_ctx; apr_bucket_brigade *bb; apr_bucket *bucket, *next; -apr_size_t bytes_in_brigade, non_file_bytes_in_brigade; +apr_size_t bytes_in_brigade, non_file_bytes_in_brigade, num_files_in_brigade; /* Fail quickly if the connection has already been aborted. */ if (c-aborted) { @@ -466,6 +467,8 @@ bytes_in_brigade = 0; non_file_bytes_in_brigade = 0; +num_files_in_brigade = 0; + for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb); bucket = next) { next = APR_BUCKET_NEXT(bucket); @@ -497,13 +500,17 @@ next = APR_BUCKET_NEXT(bucket); } bytes_in_brigade += bucket-length; -if (!APR_BUCKET_IS_FILE(bucket)) { +if (APR_BUCKET_IS_FILE(bucket)) { +num_files_in_brigade++; Hm, how do you know that the FD in this bucket is different from the one in the last file bucket(s). It may be possible that we have file buckets in the brigade that represent different parts of the same file as a filter might have put something in between tehm in the form of heap buckets or something like that. Furthermore what about MMAP buckets? Don't they have an open FD as well? Regards Rüdiger
Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)
On 02/19/2009 12:32 PM, Joe Orton wrote: On Wed, Feb 18, 2009 at 09:39:31PM +0100, Ruediger Pluem wrote: On 02/18/2009 11:16 AM, Joe Orton wrote: There is still a nasty issue with the trunk code that can cause you to run out of FD's as the new non blocking core output filter has some trouble setting aside the file buckets to the correct pool (it puts them in the connection pool which is too long living and bad). See thread starting at http://mail-archives.apache.org/mod_mbox/httpd-dev/200901.mbox/%3c495ff1b9.9070...@kippdata.de%3e I admit that I lost momentum on this, but this definitely needs to be fixed. Yes, that looks broken. Something like the patch below should fix it, though I'm not sure it's really a good idea to be allowing a pathological case of 16 fds consumed per client. It might present some interesting tuning challenges; hitting fd limits before hitting num(client) limits. Index: core_filters.c === --- core_filters.c(revision 734690) +++ core_filters.c(working copy) @@ -367,6 +367,7 @@ #define THRESHOLD_MIN_WRITE 4096 #define THRESHOLD_MAX_BUFFER 65536 +#define THRESHOLD_MAX_FILES 16 /* Optional function coming from mod_logio, used for logging of output * traffic @@ -380,7 +381,7 @@ core_output_filter_ctx_t *ctx = net-out_ctx; apr_bucket_brigade *bb; apr_bucket *bucket, *next; -apr_size_t bytes_in_brigade, non_file_bytes_in_brigade; +apr_size_t bytes_in_brigade, non_file_bytes_in_brigade, num_files_in_brigade; /* Fail quickly if the connection has already been aborted. */ if (c-aborted) { @@ -466,6 +467,8 @@ bytes_in_brigade = 0; non_file_bytes_in_brigade = 0; +num_files_in_brigade = 0; + for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb); bucket = next) { next = APR_BUCKET_NEXT(bucket); @@ -497,13 +500,17 @@ next = APR_BUCKET_NEXT(bucket); } bytes_in_brigade += bucket-length; -if (!APR_BUCKET_IS_FILE(bucket)) { +if (APR_BUCKET_IS_FILE(bucket)) { +num_files_in_brigade++; +} +else { non_file_bytes_in_brigade += bucket-length; } } } -if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER) { +if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER +|| num_files_in_brigade = THRESHOLD_MAX_FILES) { If the 16 FD's were split over more then one brigade and the brigades before us were set aside the FD's belong already to the wrong pool (the connection pool). Deleting a file bucket doesn't close the FD it uses. Regards Rüdiger
FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)
On Mon, Feb 16, 2009 at 03:12:11PM +0100, Ruediger Pluem wrote: On 02/16/2009 02:13 PM, Joe Orton wrote: Why is it invalid use of the filtering/buckets API to close the file after sending the FILE-containing brigade up the filter stack? It seems counter-intuitive to me that *anything* you do after the ap_pass_brigade() call should make a difference to what was passed up the filter stack. But I suppose this is the case; even most of the memory-backed bucket types don't duplicate referenced memory in the setaside method. I guess this is not needed in most cases and is not done for performance reasons. But maybe we can change the setaside for file buckets to do an additional dup on the fd. The choices seem to be: 1) declare the FLUSH to be be necessary in this case as you said originally ;) - and document appropriately 2) rewrite/reinvent all the setaside methods to dup/copy as appropriate 3) declare the core output filter broken and fix it Of these (2) and (3) will presumably have some negative performance impact though I'm not sure to what degree. Not sure (2) is even going to be possible, e.g. with MMAP buckets, scary stuff. Looking at the cray-zee non-blocking core output filter on the trunk, it will setaside with wild abandon unless it hits a FLUSH. I think all these options are pretty distasteful but (1) seems like the only feasible route. What do you think? Would be good to get some more opinions on this. Regards, Joe
Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)
On 02/18/2009 11:16 AM, Joe Orton wrote: On Mon, Feb 16, 2009 at 03:12:11PM +0100, Ruediger Pluem wrote: On 02/16/2009 02:13 PM, Joe Orton wrote: Why is it invalid use of the filtering/buckets API to close the file after sending the FILE-containing brigade up the filter stack? It seems counter-intuitive to me that *anything* you do after the ap_pass_brigade() call should make a difference to what was passed up the filter stack. But I suppose this is the case; even most of the memory-backed bucket types don't duplicate referenced memory in the setaside method. I guess this is not needed in most cases and is not done for performance reasons. But maybe we can change the setaside for file buckets to do an additional dup on the fd. The choices seem to be: 1) declare the FLUSH to be be necessary in this case as you said originally ;) - and document appropriately 2) rewrite/reinvent all the setaside methods to dup/copy as appropriate 3) declare the core output filter broken and fix it Of these (2) and (3) will presumably have some negative performance impact though I'm not sure to what degree. Not sure (2) is even going to be possible, e.g. with MMAP buckets, scary stuff. Yes, I don't know how MMAP would work with this, so dup might be a bad approach regarding this. In any case it looks like it would cause a lot of trouble until working stable again :-) Looking at the cray-zee non-blocking core output filter on the trunk, it will setaside with wild abandon unless it hits a FLUSH. I think all There is still a nasty issue with the trunk code that can cause you to run out of FD's as the new non blocking core output filter has some trouble setting aside the file buckets to the correct pool (it puts them in the connection pool which is too long living and bad). See thread starting at http://mail-archives.apache.org/mod_mbox/httpd-dev/200901.mbox/%3c495ff1b9.9070...@kippdata.de%3e I admit that I lost momentum on this, but this definitely needs to be fixed. these options are pretty distasteful but (1) seems like the only feasible route. What do you think? I guess at least for the moment (1) is the way to go exactly for the reasons you stated. Regards Rüdiger
Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.
Graham Dumpleton wrote: 2009/2/17 Mladen Turk mt...@apache.org: Graham Dumpleton wrote: 2009/2/17 Joe Orton jor...@redhat.com: I did used to perform a dup, but was told that this would cause problems with file locking. Specifically was told: I'm getting lost here. What has file locking got to do with it? Does mod_wscgi rely on file locking somehow? I'm lost as well :) Consider: fd1 = lock(fd1) fd2 = dup(fd1) close(fd2) # will release the lock under some lock APIs even though not last reference to underlying file object write(fd1) # lock has already been released so not gauranteed that only writer close(fd1) At least that is how I understand it from what is being explained to me and pointed out in various documentation. So, if fd2 is the file descriptor created for file bucket in Apache, if it gets closed before application later wants to write to file through fd1, then application has lost its exclusive ownership acquired by way of the lock and something else could have acquired lock and started modifying it on basis that it has exclusive onwership at that time. Well, like said that won't work, neither is portable (eg, apr_os_file_t is HANDLE on win32) What you will need is the code that will take the Python object and invoke Python file api feeding the apr_bucket. (Basically writing the apr_bucket_python_file). However the simplest thing might be an intermediate temp file, in which case httpd could reference the file name not the file object itself. Not sure how woule that work with dynamic file since apr and python might use different platform locking mechanisms. Regards -- ^(TM)
Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.
2009/2/17 Mladen Turk mt...@apache.org: Graham Dumpleton wrote: 2009/2/17 Joe Orton jor...@redhat.com: I did used to perform a dup, but was told that this would cause problems with file locking. Specifically was told: I'm getting lost here. What has file locking got to do with it? Does mod_wscgi rely on file locking somehow? I'm lost as well :) Consider: fd1 = lock(fd1) fd2 = dup(fd1) close(fd2) # will release the lock under some lock APIs even though not last reference to underlying file object write(fd1) # lock has already been released so not gauranteed that only writer close(fd1) At least that is how I understand it from what is being explained to me and pointed out in various documentation. So, if fd2 is the file descriptor created for file bucket in Apache, if it gets closed before application later wants to write to file through fd1, then application has lost its exclusive ownership acquired by way of the lock and something else could have acquired lock and started modifying it on basis that it has exclusive onwership at that time. In WSGI applications, it is possible for the higher level Python web application to pass back a file object reference for the response with the intent that the WSGI adapter use any optimised methods available for sending it back as response. This is where file buckets come into the picture to begin with. Now it looks that you are trying to intermix the third party maintained native OS file descriptors and file buckets. You can create the apr_file_t from apr_os_file_t Which is what it does. Simplified code below: apr_os_file_t fd = -1; apr_file_t *tmpfile = NULL; fd = PyObject_AsFileDescriptor(filelike); apr_os_file_put(tmpfile, fd, APR_SENDFILE_ENABLED, self-r-pool); (Think you'll have platform portability issues there) The optimisation is only supported on UNIX systems. but the major problem would be to ensure the life cycle of the object, since Python has it's own GC and httpd has it's pool. IMHO you will need a new apr_bucket provider written in Python and C for something like that. CPython uses reference counting. What is referred to as GC in Python is actually just a mechanism that kicks in under certain circumstances to break cycles between reference counted objects. Having a special bucket type which holds a reference to the Python file object will not help anyway. This is because the close() method of the Python file object can be called prior to the file bucket being destroyed. This closing of the Python file object would occur before the delayed write of file bucket resulting due to the EOS optimisation. So, same problem as when using naked file descriptor. Also, using a special bucket type opens another can of works. This is because multiple interpreters are supported as well as multithreading. Thus it would be necessary to track the named interpreter in use within the bucket and have to reaquire the lock on the interpreter being used and ensure thread state is correctly reinstated. Although possible to do, it gets a bit messy. Holding onto the file descriptor to allow the optimisation isn't really desirable for other reasons as well. This is because the WSGI specification effectively requires the response content to have been flushed out to the client before the final call back into the application to clean up things. In the final call back into the application to perform cleanup and close stuff like files, it could technically rewrite the content of the file. If Apache has not finished writing out the contents of the file, presuming the Python file object hadn't been closed, then Apache would end up writing different content to what was expected and possibly truncated content if file resized. Summary, you need to have a way of knowing that when you flush something that it really has been flushed and that Apache is all done with it. Graham
Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.
2009/2/17 Mladen Turk mt...@apache.org: Graham Dumpleton wrote: 2009/2/17 Mladen Turk mt...@apache.org: Graham Dumpleton wrote: 2009/2/17 Joe Orton jor...@redhat.com: I did used to perform a dup, but was told that this would cause problems with file locking. Specifically was told: I'm getting lost here. What has file locking got to do with it? Does mod_wscgi rely on file locking somehow? I'm lost as well :) Consider: fd1 = lock(fd1) fd2 = dup(fd1) close(fd2) # will release the lock under some lock APIs even though not last reference to underlying file object write(fd1) # lock has already been released so not gauranteed that only writer close(fd1) At least that is how I understand it from what is being explained to me and pointed out in various documentation. So, if fd2 is the file descriptor created for file bucket in Apache, if it gets closed before application later wants to write to file through fd1, then application has lost its exclusive ownership acquired by way of the lock and something else could have acquired lock and started modifying it on basis that it has exclusive onwership at that time. Well, like said that won't work, neither is portable (eg, apr_os_file_t is HANDLE on win32) I already said I only support the optimisation on UNIX. I don't care about Windows. What you will need is the code that will take the Python object and invoke Python file api feeding the apr_bucket. (Basically writing the apr_bucket_python_file). As I already tried to explain, even for the case of the bucket being used to hold a reference to the Python object, that will not work because of the gaurantees that WSGI applications require regarding data needing to be flushed. However the simplest thing might be an intermediate temp file, in which case httpd could reference the file name not the file object itself. Which would likely be slower than using existing fallback streaming mechanism available that reads file into memory in blocks and pushes them through as transient buckets. Not sure how woule that work with dynamic file since apr and python might use different platform locking mechanisms. Python uses operating system locking mechanisms, just like APR library would. Graham
Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.
On Sat, Feb 14, 2009 at 10:25:08AM +1100, Graham Dumpleton wrote: ... What the end result of the code is, is that if you have a file bucket getting this far where length of file is less than 8000 and an EOS follows it, then the actual file bucket is held over rather than data being read and buffered. This is as commented is to avoid doing an mmap+memcpy. What it means though is that the file descriptor within the file bucket must be maintained and cannot be closed as soon as ap_pass_brigade() has been called. The call to: ap_save_brigade(f, ctx-b, b, ctx-deferred_write_pool); in that code path should result in the FILE bucket and the contained fd being dup()ed. (Though if that is failing, you wouldn't know because of the lack of error checking) You say: For me this is an issue as the file descriptor has been supplied from a special object returned by a higher level application and it would be hard to maintain the file as open beyond the life of the request, up till end of keep alive or a subsequent request over same connection. Doing a dup on the file decriptor is also not necessarily an option. can you explain why a dup shouldn't work? Regards, Joe
Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.
On 02/16/2009 11:07 AM, Joe Orton wrote: On Sat, Feb 14, 2009 at 10:25:08AM +1100, Graham Dumpleton wrote: ... What the end result of the code is, is that if you have a file bucket getting this far where length of file is less than 8000 and an EOS follows it, then the actual file bucket is held over rather than data being read and buffered. This is as commented is to avoid doing an mmap+memcpy. What it means though is that the file descriptor within the file bucket must be maintained and cannot be closed as soon as ap_pass_brigade() has been called. The call to: ap_save_brigade(f, ctx-b, b, ctx-deferred_write_pool); in that code path should result in the FILE bucket and the contained fd being dup()ed. (Though if that is failing, you wouldn't know because of the lack of error checking) Are you sure? apr_file_setaside does not dup the fd. Regards Rüdiger
Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.
2009/2/16 Joe Orton jor...@redhat.com: On Sat, Feb 14, 2009 at 10:25:08AM +1100, Graham Dumpleton wrote: ... What the end result of the code is, is that if you have a file bucket getting this far where length of file is less than 8000 and an EOS follows it, then the actual file bucket is held over rather than data being read and buffered. This is as commented is to avoid doing an mmap+memcpy. What it means though is that the file descriptor within the file bucket must be maintained and cannot be closed as soon as ap_pass_brigade() has been called. The call to: ap_save_brigade(f, ctx-b, b, ctx-deferred_write_pool); in that code path should result in the FILE bucket and the contained fd being dup()ed. (Though if that is failing, you wouldn't know because of the lack of error checking) You say: For me this is an issue as the file descriptor has been supplied from a special object returned by a higher level application and it would be hard to maintain the file as open beyond the life of the request, up till end of keep alive or a subsequent request over same connection. Doing a dup on the file decriptor is also not necessarily an option. can you explain why a dup shouldn't work? I did used to perform a dup, but was told that this would cause problems with file locking. Specifically was told: I am not sure, but it looks like mod_wsgi duplicates the original file descriptor, sends the file's data, closes the duplicated file descriptor, and then calls the iterable's close method. That may be problematic for applications that are using fcntl-based locking; the closing of the duplicate file descriptor will release the lock before the iterable's close method gets a chance to execute, so the close method can no longer depend on the file remaining locked. Ideally, the duplicate file descriptor shouldn't be closed until *after* the iterable's close() method has been called. I never researched it properly as didn't have time at that point, so simply believed what was being told. A quick check of flock manual page says: Locks created by flock() are associated with an open file table entry. This means that duplicate file descriptors (created by, for example, fork(2) or dup(2)) refer to the same lock, and this lock may be modified or released using any of these descriptors. Furthermore, the lock is released either by an explicit LOCK_UN operation on any of these duplicate descriptors, or when all such descriptors have been closed. So for flock() there is no problem. As to fcntl locking which person mentioned, I haven't been able to find yet any description of the behaviour of locks in context of dup'd file descriptors. If you know of source of information which explains what happens for fcntl locking and dup'd file descriptors that would help clear things up. In the mean time I'll keep looking for any information about it. Graham
Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.
On Mon, Feb 16, 2009 at 12:34:26PM +0100, Ruediger Pluem wrote: On 02/16/2009 11:07 AM, Joe Orton wrote: The call to: ap_save_brigade(f, ctx-b, b, ctx-deferred_write_pool); in that code path should result in the FILE bucket and the contained fd being dup()ed. (Though if that is failing, you wouldn't know because of the lack of error checking) Are you sure? apr_file_setaside does not dup the fd. Oh, good point. This feels like one of those cases where some core assumptions about the filtering/bucket API start melting away underneath me. Why is it invalid use of the filtering/buckets API to close the file after sending the FILE-containing brigade up the filter stack? It seems counter-intuitive to me that *anything* you do after the ap_pass_brigade() call should make a difference to what was passed up the filter stack. But I suppose this is the case; even most of the memory-backed bucket types don't duplicate referenced memory in the setaside method. I've always thought of FLUSH as a *hint* towards the desired network behaviour. To say a FLUSH is required here is to model FLUSH as a *mandate* on the bucket/object lifetimes. That's a huge difference. Indeed, if I go read some API docs in the headers, which is always good for a laugh/cry moment: apr_bucket_eos_make = All filters should flush at this point. apr_bucket_flush_make = This indicates that filters should flush their data. There is **no guarantee** that they will flush it, but this is the best we can do. (my emphasis) So EOS is a better flush then FLUSH, right? Regards, Joe
Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.
On Mon, Feb 16, 2009 at 10:52:15PM +1100, Graham Dumpleton wrote: 2009/2/16 Joe Orton jor...@redhat.com: You say: For me this is an issue as the file descriptor has been supplied from a special object returned by a higher level application and it would be hard to maintain the file as open beyond the life of the request, up till end of keep alive or a subsequent request over same connection. Doing a dup on the file decriptor is also not necessarily an option. can you explain why a dup shouldn't work? I did used to perform a dup, but was told that this would cause problems with file locking. Specifically was told: I'm getting lost here. What has file locking got to do with it? Does mod_wscgi rely on file locking somehow? joe
Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.
On 02/16/2009 02:13 PM, Joe Orton wrote: Why is it invalid use of the filtering/buckets API to close the file after sending the FILE-containing brigade up the filter stack? It seems counter-intuitive to me that *anything* you do after the ap_pass_brigade() call should make a difference to what was passed up the filter stack. But I suppose this is the case; even most of the memory-backed bucket types don't duplicate referenced memory in the setaside method. I guess this is not needed in most cases and is not done for performance reasons. But maybe we can change the setaside for file buckets to do an additional dup on the fd. So EOS is a better flush then FLUSH, right? IMHO no. EOS can cause data to remain in the chain (for optimizing the pipeline case) whereas FLUSH in almost every case should really get the buckets processed. Yeah I now the words almost every case are bad, but I know that some filters are an exception from the almost every case :-(. Regards Rüdiger
Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.
2009/2/17 Joe Orton jor...@redhat.com: On Mon, Feb 16, 2009 at 10:52:15PM +1100, Graham Dumpleton wrote: 2009/2/16 Joe Orton jor...@redhat.com: You say: For me this is an issue as the file descriptor has been supplied from a special object returned by a higher level application and it would be hard to maintain the file as open beyond the life of the request, up till end of keep alive or a subsequent request over same connection. Doing a dup on the file decriptor is also not necessarily an option. can you explain why a dup shouldn't work? I did used to perform a dup, but was told that this would cause problems with file locking. Specifically was told: I'm getting lost here. What has file locking got to do with it? Does mod_wscgi rely on file locking somehow? The mod_wsgi package is a gateway for hosting Python web applications. Modern version of mod_python, but implementing generic/portable Python WGSI interface rather than being Apache specific. It is what all the Python people ditching mod_python are moving to. In WSGI applications, it is possible for the higher level Python web application to pass back a file object reference for the response with the intent that the WSGI adapter use any optimised methods available for sending it back as response. This is where file buckets come into the picture to begin with. Now, this file object was created by the Python application and it is still the owner of it. If it is a file that it had first been modifying and it needed exclusivity on that, it could well have used file locks on it. Because locks are involved, the order in which files contents are used for response and the closure and unlocking of the file are important. It appears that fcntl locking on some platforms has the behaviour that if file descriptor is dup'd, closure of the first reference to the file will cause release of the lock. That is, lock will not be released only when last reference to the file is closed. Problems therefore can arise if you have to dup the file descriptor, because if the dup'd file descriptor gets closed before Python application had finished with the file object, possibly involving it having to modify the file after contents sent, something else could lock it and start modifying the file before it was done. In simple terms, the mod_wsgi module doesn't internally open the file in the first place. Instead it comes from a higher level application and one can't do things at the lower level that would change the state of something like the locks associated with the file. Graham
Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.
Graham Dumpleton wrote: 2009/2/17 Joe Orton jor...@redhat.com: I did used to perform a dup, but was told that this would cause problems with file locking. Specifically was told: I'm getting lost here. What has file locking got to do with it? Does mod_wscgi rely on file locking somehow? I'm lost as well :) In WSGI applications, it is possible for the higher level Python web application to pass back a file object reference for the response with the intent that the WSGI adapter use any optimised methods available for sending it back as response. This is where file buckets come into the picture to begin with. Now it looks that you are trying to intermix the third party maintained native OS file descriptors and file buckets. You can create the apr_file_t from apr_os_file_t (Think you'll have platform portability issues there) but the major problem would be to ensure the life cycle of the object, since Python has it's own GC and httpd has it's pool. IMHO you will need a new apr_bucket provider written in Python and C for something like that. Regards -- ^(TM)
Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.
On 02/14/2009 12:25 AM, Graham Dumpleton wrote: I know that I can circumvent the EOS optimisation by inserting a flush bucket, but based on documentation it isn't gauranteed that a flush bucket will always propagate down the filter chain and actually push out data. IMHO using a flush bucket would be the way to go here. This should work fine. Regards Rüdiger
Problems with EOS optimisation in ap_core_output_filter() and file buckets.
In ap_core_output_filter() there exists the code starting with: /* Completed iterating over the brigade, now determine if we want * to buffer the brigade or send the brigade out on the network. * * Save if we haven't accumulated enough bytes to send, the connection * is not about to be closed, and: * * 1) we didn't see a file, we don't have more passes over the * brigade to perform, AND we didn't stop at a FLUSH bucket. * (IOW, we will save plain old bytes such as HTTP headers) * or * 2) we hit the EOS and have a keep-alive connection * (IOW, this response is a bit more complex, but we save it * with the hope of concatenating with another response) */ if (nbytes + flen AP_MIN_BYTES_TO_WRITE !AP_BUCKET_IS_EOC(last_e) ((!fd !more !APR_BUCKET_IS_FLUSH(last_e)) || (APR_BUCKET_IS_EOS(last_e) c-keepalive == AP_CONN_KEEPALIVE))) { This is some sort of optimisation which in the case of a keep alive connection, will hold over sending data out on connection until later if EOS is present and amount of data is less than nominal minimum bytes to send. Later in this section of code it has: /* Do a read on each bucket to pull in the * data from pipe and socket buckets, so * that we don't leave their file descriptors * open indefinitely. Do the same for file * buckets, with one exception: allow the * first file bucket in the brigade to remain * a file bucket, so that we don't end up * doing an mmap+memcpy every time a client * requests a 8KB file over a keepalive * connection. */ if (APR_BUCKET_IS_FILE(bucket) !file_bucket_saved) { file_bucket_saved = 1; } else { const char *buf; apr_size_t len = 0; rv = apr_bucket_read(bucket, buf, len, APR_BLOCK_READ); if (rv != APR_SUCCESS) { ap_log_cerror(APLOG_MARK, APLOG_ERR, rv, c, core_output_filter: Error reading from bucket.); return HTTP_INTERNAL_SERVER_ERROR; } } What the end result of the code is, is that if you have a file bucket getting this far where length of file is less than 8000 and an EOS follows it, then the actual file bucket is held over rather than data being read and buffered. This is as commented is to avoid doing an mmap+memcpy. What it means though is that the file descriptor within the file bucket must be maintained and cannot be closed as soon as ap_pass_brigade() has been called. For me this is an issue as the file descriptor has been supplied from a special object returned by a higher level application and it would be hard to maintain the file as open beyond the life of the request, up till end of keep alive or a subsequent request over same connection. Doing a dup on the file decriptor is also not necessarily an option. The end result then is that later when file bucket is processed, the file descriptor has already been closed and one gets the error: (9)Bad file descriptor: core_output_filter: writing data to the network I know that I can circumvent the EOS optimisation by inserting a flush bucket, but based on documentation it isn't gauranteed that a flush bucket will always propagate down the filter chain and actually push out data. /** * Create a flush bucket. This indicates that filters should flush their * data. There is no guarantee that they will flush it, but this is the * best we can do. * @param list The freelist from which this bucket should be allocated * @return The new bucket, or NULL if allocation failed */ APU_DECLARE(apr_bucket *) apr_bucket_flush_create(apr_bucket_alloc_t *list); How can one gaurantee that the file bucket will actually be flushed and not held over by a filter? If it gets to the core output filter, another way to avoid EOS optimisation is to forcibly set keepalive to AP_CONN_CLOSE, which for the application concerned is probably reasonable, but is obviously a bit of a hack. Finally, my problem only arises because I insert an eos after file bucket and before calling ap_pass_brigade(). If one uses ap_send_fd(), it doesn't insert eos before calling ap_pass_brigade(), with something else obviously later inserting eos. If ap_pass_brigade() is called without eos first time and only later with eos, will that help in ensuring it is flushed out, or can