Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-05 Thread Jim Jagielski


On Jan 4, 2009, at 11:57 AM, Rainer Jung wrote:



Here's the gdb story:

When the content file gets opened, its cleanup is correctly  
registered with the request pool. Later in core_filters.c at the end  
of function ap_core_output_filter() line 528 we call  
setaside_remaining_output().


This goes down the stack via ap_save_brigade(),  
file_bucket_setaside() to apr_file_setaside(). This kills the  
cleanup for the request pool and adds it instead to the transaction  
(=connection) pool. There we are.


2.2.x has a different structure, although I can also see two calls  
to ap_save_brigade() in ap_core_output_filter(), but they use  
different pools as new targets, namely a deferred_write_pool resp.  
input_pool.




Uggg... so we need to do the 'same' with the 2.3/2.4 arch
as well...



Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-04 Thread Rainer Jung

On 04.01.2009 01:51, Ruediger Pluem wrote:


On 01/04/2009 12:49 AM, Rainer Jung wrote:

On 04.01.2009 00:36, Paul Querna wrote:

Rainer Jung wrote:

During testing 2.3.1 I noticed a lot of errors of type EMFILE: Too
many open files. I used strace and the problem looks like this:

- The test case is using ab with HTTP keep alive, concurrency 20 and a
small file, so doing about 2000 requests per second.


What is the exact size of the file?


It is the index.html, via URL /, so size is 45 Bytes.

Configuration is very close to original, except for:

40c40
 Listen myhost:8000
---
 Listen 80

455,456c455,456
 EnableMMAP off
 EnableSendfile off
---
 #EnableMMAP off
 #EnableSendfile off

(because installation is on NFS, but the problem also occurs with those 
switches on)


The following Modules are loaded:

LoadModule authn_file_module modules/mod_authn_file.so
LoadModule authn_anon_module modules/mod_authn_anon.so
LoadModule authn_core_module modules/mod_authn_core.so
LoadModule authz_host_module modules/mod_authz_host.so
LoadModule authz_groupfile_module modules/mod_authz_groupfile.so
LoadModule authz_user_module modules/mod_authz_user.so
LoadModule authz_owner_module modules/mod_authz_owner.so
LoadModule authz_core_module modules/mod_authz_core.so
LoadModule access_compat_module modules/mod_access_compat.so
LoadModule auth_basic_module modules/mod_auth_basic.so
LoadModule auth_digest_module modules/mod_auth_digest.so
LoadModule log_config_module modules/mod_log_config.so
LoadModule env_module modules/mod_env.so
LoadModule mime_magic_module modules/mod_mime_magic.so
LoadModule cern_meta_module modules/mod_cern_meta.so
LoadModule expires_module modules/mod_expires.so
LoadModule headers_module modules/mod_headers.so
LoadModule ident_module modules/mod_ident.so
LoadModule usertrack_module modules/mod_usertrack.so
LoadModule unique_id_module modules/mod_unique_id.so
LoadModule setenvif_module modules/mod_setenvif.so
LoadModule version_module modules/mod_version.so
LoadModule mime_module modules/mod_mime.so
LoadModule unixd_module modules/mod_unixd.so
LoadModule status_module modules/mod_status.so
LoadModule autoindex_module modules/mod_autoindex.so
LoadModule asis_module modules/mod_asis.so
LoadModule info_module modules/mod_info.so
LoadModule suexec_module modules/mod_suexec.so
LoadModule vhost_alias_module modules/mod_vhost_alias.so
LoadModule negotiation_module modules/mod_negotiation.so
LoadModule dir_module modules/mod_dir.so
LoadModule imagemap_module modules/mod_imagemap.so
LoadModule actions_module modules/mod_actions.so
LoadModule speling_module modules/mod_speling.so
LoadModule userdir_module modules/mod_userdir.so
LoadModule alias_module modules/mod_alias.so
LoadModule rewrite_module modules/mod_rewrite.so

To reproduce you must use KeepAlive and your MaxKeepAliveRequests 
(Default:100) times concurrency must exceed the maximum number of FDs. 
Even without exceeding, you can use httpd -X and look at /proc/PID/fd 
during the test run. You should be able to notice a huge number of fds, 
all pointing to the index.html.


Regards,

Rainer


Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-04 Thread Rainer Jung

On 04.01.2009 14:14, Ruediger Pluem wrote:


On 01/04/2009 11:24 AM, Rainer Jung wrote:

On 04.01.2009 01:51, Ruediger Pluem wrote:

On 01/04/2009 12:49 AM, Rainer Jung wrote:

On 04.01.2009 00:36, Paul Querna wrote:

Rainer Jung wrote:

During testing 2.3.1 I noticed a lot of errors of type EMFILE: Too
many open files. I used strace and the problem looks like this:

- The test case is using ab with HTTP keep alive, concurrency 20 and a
small file, so doing about 2000 requests per second.

What is the exact size of the file?

It is the index.html, via URL /, so size is 45 Bytes.


Can you try if you run in the same problem on 2.2.x with a file of size 257 
bytes?


I tried on the same type of system with event MPM and 2.2.11. Can't 
reproduce even with content file of size 257 bytes.


The same file with trunk immediately reproduces the problem.

Will try your patch/hack next.

Thanks

Rainer


Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-04 Thread Ruediger Pluem


On 01/04/2009 03:26 PM, Rainer Jung wrote:
 On 04.01.2009 14:14, Ruediger Pluem wrote:

 On 01/04/2009 11:24 AM, Rainer Jung wrote:
 On 04.01.2009 01:51, Ruediger Pluem wrote:
 On 01/04/2009 12:49 AM, Rainer Jung wrote:
 On 04.01.2009 00:36, Paul Querna wrote:
 Rainer Jung wrote:
 During testing 2.3.1 I noticed a lot of errors of type EMFILE: Too
 many open files. I used strace and the problem looks like this:

 - The test case is using ab with HTTP keep alive, concurrency 20
 and a
 small file, so doing about 2000 requests per second.
 What is the exact size of the file?
 It is the index.html, via URL /, so size is 45 Bytes.

 Can you try if you run in the same problem on 2.2.x with a file of
 size 257 bytes?
 
 I tried on the same type of system with event MPM and 2.2.11. Can't
 reproduce even with content file of size 257 bytes.

Possibly you need to increase the number of threads per process with event MPM
and the number of concurrent requests from ab.

Regards

Rüdiger



Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-04 Thread Ruediger Pluem


On 01/04/2009 03:48 PM, Rainer Jung wrote:
 On 04.01.2009 15:40, Ruediger Pluem wrote:

 On 01/04/2009 03:26 PM, Rainer Jung wrote:
 On 04.01.2009 14:14, Ruediger Pluem wrote:
 On 01/04/2009 11:24 AM, Rainer Jung wrote:
 On 04.01.2009 01:51, Ruediger Pluem wrote:
 On 01/04/2009 12:49 AM, Rainer Jung wrote:
 On 04.01.2009 00:36, Paul Querna wrote:
 Rainer Jung wrote:
 During testing 2.3.1 I noticed a lot of errors of type EMFILE:
 Too
 many open files. I used strace and the problem looks like this:

 - The test case is using ab with HTTP keep alive, concurrency 20
 and a
 small file, so doing about 2000 requests per second.
 What is the exact size of the file?
 It is the index.html, via URL /, so size is 45 Bytes.
 Can you try if you run in the same problem on 2.2.x with a file of
 size 257 bytes?
 I tried on the same type of system with event MPM and 2.2.11. Can't
 reproduce even with content file of size 257 bytes.

 Possibly you need to increase the number of threads per process with
 event MPM
 and the number of concurrent requests from ab.
 
 I increased the maximum KeepAlive Requests and the KeepAlive timeout a
 lot and during a longer running test I see always exactly as many open
 FDs for the content file in /proc/PID/fd as I had concurrency in ab. So
 it seems the FDs always get closed before handling the next request in
 the connection.
 
 After testing the patch, I'll try it again with 257 bytes on 2.2.11 with
 prefork or worker.

IMHO this cannot happen with prefork on 2.2.x. So I guess it is not worth 
testing.
It still confuses me that this happens on trunk as it looks like that ab does 
not
do pipelining.

Regards

Rüdiger



Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-04 Thread Rainer Jung

On 04.01.2009 16:22, Rainer Jung wrote:

On 04.01.2009 15:56, Ruediger Pluem wrote:


On 01/04/2009 03:48 PM, Rainer Jung wrote:

On 04.01.2009 15:40, Ruediger Pluem wrote:

On 01/04/2009 03:26 PM, Rainer Jung wrote:

On 04.01.2009 14:14, Ruediger Pluem wrote:

On 01/04/2009 11:24 AM, Rainer Jung wrote:

On 04.01.2009 01:51, Ruediger Pluem wrote:

On 01/04/2009 12:49 AM, Rainer Jung wrote:

On 04.01.2009 00:36, Paul Querna wrote:

Rainer Jung wrote:

During testing 2.3.1 I noticed a lot of errors of type EMFILE:
Too
many open files. I used strace and the problem looks like this:

- The test case is using ab with HTTP keep alive, concurrency 20
and a
small file, so doing about 2000 requests per second.

What is the exact size of the file?

It is the index.html, via URL /, so size is 45 Bytes.

Can you try if you run in the same problem on 2.2.x with a file of
size 257 bytes?

I tried on the same type of system with event MPM and 2.2.11. Can't
reproduce even with content file of size 257 bytes.

Possibly you need to increase the number of threads per process with
event MPM
and the number of concurrent requests from ab.

I increased the maximum KeepAlive Requests and the KeepAlive timeout a
lot and during a longer running test I see always exactly as many open
FDs for the content file in /proc/PID/fd as I had concurrency in ab. So
it seems the FDs always get closed before handling the next request in
the connection.

After testing the patch, I'll try it again with 257 bytes on 2.2.11 with
prefork or worker.


IMHO this cannot happen with prefork on 2.2.x. So I guess it is not
worth testing.
It still confuses me that this happens on trunk as it looks like that
ab does not
do pipelining.


^The strace log shows, that the sequence really is

- new connection

- read request
- open file
- send response
- log request

repeat this triplet a lot of times (maybe as long as KeepAlive is
active) and then there are a lot of close() for the content files. Not
sure, about the exact thing that triggers the close.

So I don't necessarily see pipelining (in the sense of sending more
requests before responses return) being necessary.

I tested your patch (worker, trunk): It does not help. I then added an
error log statement directly after the requests++ and it shows this
number is always 1.


I can now even reproduce without load. Simply open a connection and send 
hand crafted KeepAlive requests via telnet. The file descriptors are 
kept open as long as the connection is alive. I'll run under the 
debugger to see, how the stack looks like, when the file gets closed.


Since the logging is done much earlier (directly after eahc request) the 
problem does not seem to be directly related to EOR. It looks like 
somehow the close file cleanup does not run when the request pool is 
destroyed or maybe it is registered with the connection pool. gdb should 
help.


More later.

Rainer


Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-04 Thread Rainer Jung

On 04.01.2009 15:40, Ruediger Pluem wrote:


On 01/04/2009 03:26 PM, Rainer Jung wrote:

On 04.01.2009 14:14, Ruediger Pluem wrote:

On 01/04/2009 11:24 AM, Rainer Jung wrote:

On 04.01.2009 01:51, Ruediger Pluem wrote:

On 01/04/2009 12:49 AM, Rainer Jung wrote:

On 04.01.2009 00:36, Paul Querna wrote:

Rainer Jung wrote:

During testing 2.3.1 I noticed a lot of errors of type EMFILE: Too
many open files. I used strace and the problem looks like this:

- The test case is using ab with HTTP keep alive, concurrency 20
and a
small file, so doing about 2000 requests per second.

What is the exact size of the file?

It is the index.html, via URL /, so size is 45 Bytes.

Can you try if you run in the same problem on 2.2.x with a file of
size 257 bytes?

I tried on the same type of system with event MPM and 2.2.11. Can't
reproduce even with content file of size 257 bytes.


Possibly you need to increase the number of threads per process with event MPM
and the number of concurrent requests from ab.


I increased the maximum KeepAlive Requests and the KeepAlive timeout a 
lot and during a longer running test I see always exactly as many open 
FDs for the content file in /proc/PID/fd as I had concurrency in ab. So 
it seems the FDs always get closed before handling the next request in 
the connection.


After testing the patch, I'll try it again with 257 bytes on 2.2.11 with 
prefork or worker.


Regards,

Rainer


Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-04 Thread Ruediger Pluem


On 01/04/2009 12:49 AM, Rainer Jung wrote:
 On 04.01.2009 00:36, Paul Querna wrote:
 Rainer Jung wrote:
 During testing 2.3.1 I noticed a lot of errors of type EMFILE: Too
 many open files. I used strace and the problem looks like this:

 - The test case is using ab with HTTP keep alive, concurrency 20 and a
 small file, so doing about 2000 requests per second.
 MaxKeepAliveRequests=100 (Default)

 - the file leading to EMFILE is the static content file, which can be
 observed to be open more than 1000 times in parallel although ab
 concurrency is only 20

 - From looking at the code it seems the file is closed during a
 cleanup function associated to the request pool, which is triggered by
 an EOR bucket

 Now what happens under KeepAlive is that the content files are kept
 open longer than the handling of the request, more precisely until the
 closing of the connection. So when MaxKeepAliveRequests*Concurrency 
 MaxNumberOfFDs we run out of file descriptors.

 I observed the behaviour with 2.3.1 on Linux (SLES10 64Bit) with
 Event, Worker and Prefork. I didn't yet have the time to retest with
 2.2.

 It should only happen in 2.3.x/trunk because the EOR bucket is a new
 feature to let MPMs do async writes once the handler has finished
 running.

 And yes, this sounds like a nasty bug.
 
 I verified I can't reproduce with the same platform and 2.2.11.
 
 Not sure I understand the EOR asynchronicity good enough to analyze the
 root cause.

Can you try the following patch please?

Index: server/core_filters.c
===
--- server/core_filters.c   (Revision 731238)
+++ server/core_filters.c   (Arbeitskopie)
@@ -367,6 +367,7 @@

 #define THRESHOLD_MIN_WRITE 4096
 #define THRESHOLD_MAX_BUFFER 65536
+#define MAX_REQUESTS_QUEUED 10

 /* Optional function coming from mod_logio, used for logging of output
  * traffic
@@ -381,6 +382,7 @@
 apr_bucket_brigade *bb;
 apr_bucket *bucket, *next;
 apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
+int requests;

 /* Fail quickly if the connection has already been aborted. */
 if (c-aborted) {
@@ -466,6 +468,7 @@

 bytes_in_brigade = 0;
 non_file_bytes_in_brigade = 0;
+requests = 0;
 for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb);
  bucket = next) {
 next = APR_BUCKET_NEXT(bucket);
@@ -501,11 +504,22 @@
 non_file_bytes_in_brigade += bucket-length;
 }
 }
+else if (bucket-type == ap_bucket_type_eor) {
+/*
+ * Count the number of requests still queued in the brigade.
+ * Pipelining of a high number of small files can cause
+ * a high number of open file descriptors, which if it happens
+ * on many threads in parallel can cause us to hit the OS limits.
+ */
+requests++;
+}
 }

-if (non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER) {
+if ((non_file_bytes_in_brigade = THRESHOLD_MAX_BUFFER)
+|| (requests  MAX_REQUESTS_QUEUED)) {
 /* ### Writing the entire brigade may be excessive; we really just
- * ### need to send enough data to be under THRESHOLD_MAX_BUFFER.
+ * ### need to send enough data to be under THRESHOLD_MAX_BUFFER or
+ * ### under MAX_REQUESTS_QUEUED
  */
 apr_status_t rv = send_brigade_blocking(net-client_socket, bb,
 (ctx-bytes_written), c);


This is still some sort of a hack, but maybe helpful to understand if this is
the problem.

Regards

Rüdiger

Index: server/core_filters.c
===
--- server/core_filters.c	(Revision 731238)
+++ server/core_filters.c	(Arbeitskopie)
@@ -367,6 +367,7 @@
 
 #define THRESHOLD_MIN_WRITE 4096
 #define THRESHOLD_MAX_BUFFER 65536
+#define MAX_REQUESTS_QUEUED 10
 
 /* Optional function coming from mod_logio, used for logging of output
  * traffic
@@ -381,6 +382,7 @@
 apr_bucket_brigade *bb;
 apr_bucket *bucket, *next;
 apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
+int requests;
 
 /* Fail quickly if the connection has already been aborted. */
 if (c-aborted) {
@@ -466,6 +468,7 @@
 
 bytes_in_brigade = 0;
 non_file_bytes_in_brigade = 0;
+requests = 0;
 for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb);
  bucket = next) {
 next = APR_BUCKET_NEXT(bucket);
@@ -501,11 +504,22 @@
 non_file_bytes_in_brigade += bucket-length;
 }
 }
+else if (bucket-type == ap_bucket_type_eor) {
+/*
+ * Count the number of requests still queued in the brigade.
+ * Pipelining of a high number of small files can cause
+ * a high number of open file descriptors, which if it happens
+ * on many threads in parallel can 

Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-04 Thread Rainer Jung

On 04.01.2009 15:04, Ruediger Pluem wrote:


On 01/04/2009 12:49 AM, Rainer Jung wrote:

On 04.01.2009 00:36, Paul Querna wrote:

Rainer Jung wrote:

During testing 2.3.1 I noticed a lot of errors of type EMFILE: Too
many open files. I used strace and the problem looks like this:

- The test case is using ab with HTTP keep alive, concurrency 20 and a
small file, so doing about 2000 requests per second.
MaxKeepAliveRequests=100 (Default)

- the file leading to EMFILE is the static content file, which can be
observed to be open more than 1000 times in parallel although ab
concurrency is only 20

- From looking at the code it seems the file is closed during a
cleanup function associated to the request pool, which is triggered by
an EOR bucket

Now what happens under KeepAlive is that the content files are kept
open longer than the handling of the request, more precisely until the
closing of the connection. So when MaxKeepAliveRequests*Concurrency
MaxNumberOfFDs we run out of file descriptors.

I observed the behaviour with 2.3.1 on Linux (SLES10 64Bit) with
Event, Worker and Prefork. I didn't yet have the time to retest with
2.2.

It should only happen in 2.3.x/trunk because the EOR bucket is a new
feature to let MPMs do async writes once the handler has finished
running.

And yes, this sounds like a nasty bug.

I verified I can't reproduce with the same platform and 2.2.11.

Not sure I understand the EOR asynchronicity good enough to analyze the
root cause.


Can you try the following patch please?


Here's the gdb story:

When the content file gets opened, its cleanup is correctly registered 
with the request pool. Later in core_filters.c at the end of function 
ap_core_output_filter() line 528 we call setaside_remaining_output().


This goes down the stack via ap_save_brigade(), file_bucket_setaside() 
to apr_file_setaside(). This kills the cleanup for the request pool and 
adds it instead to the transaction (=connection) pool. There we are.


2.2.x has a different structure, although I can also see two calls to 
ap_save_brigade() in ap_core_output_filter(), but they use different 
pools as new targets, namely a deferred_write_pool resp. input_pool.


So now we know, how it happens, but I don't have an immediate idea how 
to solve it.


Regards,

Rainer


Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-04 Thread Rainer Jung

On 04.01.2009 17:57, Rainer Jung wrote:

When the content file gets opened, its cleanup is correctly registered
with the request pool. Later in core_filters.c at the end of function
ap_core_output_filter() line 528 we call setaside_remaining_output().


...


2.2.x has a different structure, although I can also see two calls to
ap_save_brigade() in ap_core_output_filter(), but they use different
pools as new targets, namely a deferred_write_pool resp. input_pool.


And the code already contains the appropriate hint:

static void setaside_remaining_output(...)
{
...
if (make_a_copy) {
/* XXX should this use a separate deferred write pool, like
 * the original ap_core_output_filter?
 */
ap_save_brigade(f, (ctx-buffered_bb), bb, c-pool);
...
}


Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-04 Thread Rainer Jung

On 04.01.2009 15:56, Ruediger Pluem wrote:


On 01/04/2009 03:48 PM, Rainer Jung wrote:

On 04.01.2009 15:40, Ruediger Pluem wrote:

On 01/04/2009 03:26 PM, Rainer Jung wrote:

On 04.01.2009 14:14, Ruediger Pluem wrote:

On 01/04/2009 11:24 AM, Rainer Jung wrote:

On 04.01.2009 01:51, Ruediger Pluem wrote:

On 01/04/2009 12:49 AM, Rainer Jung wrote:

On 04.01.2009 00:36, Paul Querna wrote:

Rainer Jung wrote:

During testing 2.3.1 I noticed a lot of errors of type EMFILE:
Too
many open files. I used strace and the problem looks like this:

- The test case is using ab with HTTP keep alive, concurrency 20
and a
small file, so doing about 2000 requests per second.

What is the exact size of the file?

It is the index.html, via URL /, so size is 45 Bytes.

Can you try if you run in the same problem on 2.2.x with a file of
size 257 bytes?

I tried on the same type of system with event MPM and 2.2.11. Can't
reproduce even with content file of size 257 bytes.

Possibly you need to increase the number of threads per process with
event MPM
and the number of concurrent requests from ab.

I increased the maximum KeepAlive Requests and the KeepAlive timeout a
lot and during a longer running test I see always exactly as many open
FDs for the content file in /proc/PID/fd as I had concurrency in ab. So
it seems the FDs always get closed before handling the next request in
the connection.

After testing the patch, I'll try it again with 257 bytes on 2.2.11 with
prefork or worker.


IMHO this cannot happen with prefork on 2.2.x. So I guess it is not worth 
testing.
It still confuses me that this happens on trunk as it looks like that ab does 
not
do pipelining.


^The strace log shows, that the sequence really is

- new connection

- read request
- open file
- send response
- log request

repeat this triplet a lot of times (maybe as long as KeepAlive is 
active) and then there are a lot of close() for the content files. Not 
sure, about the exact thing that triggers the close.


So I don't necessarily see pipelining (in the sense of sending more 
requests before responses return) being necessary.


I tested your patch (worker, trunk): It does not help. I then added an 
error log statement directly after the requests++ and it shows this 
number is always 1.


Regards,

Rainer


Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-04 Thread Ruediger Pluem


On 01/04/2009 11:24 AM, Rainer Jung wrote:
 On 04.01.2009 01:51, Ruediger Pluem wrote:

 On 01/04/2009 12:49 AM, Rainer Jung wrote:
 On 04.01.2009 00:36, Paul Querna wrote:
 Rainer Jung wrote:
 During testing 2.3.1 I noticed a lot of errors of type EMFILE: Too
 many open files. I used strace and the problem looks like this:

 - The test case is using ab with HTTP keep alive, concurrency 20 and a
 small file, so doing about 2000 requests per second.

 What is the exact size of the file?
 
 It is the index.html, via URL /, so size is 45 Bytes.

Can you try if you run in the same problem on 2.2.x with a file of size 257 
bytes?

Regards

Rüdiger



Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-04 Thread Ruediger Pluem


On 01/04/2009 06:28 PM, Rainer Jung wrote:
 On 04.01.2009 17:57, Rainer Jung wrote:
 When the content file gets opened, its cleanup is correctly registered
 with the request pool. Later in core_filters.c at the end of function
 ap_core_output_filter() line 528 we call setaside_remaining_output().
 
 ...
 
 2.2.x has a different structure, although I can also see two calls to
 ap_save_brigade() in ap_core_output_filter(), but they use different
 pools as new targets, namely a deferred_write_pool resp. input_pool.
 
 And the code already contains the appropriate hint:
 
 static void setaside_remaining_output(...)
 {
 ...
 if (make_a_copy) {
 /* XXX should this use a separate deferred write pool, like
  * the original ap_core_output_filter?
  */
 ap_save_brigade(f, (ctx-buffered_bb), bb, c-pool);
 ...
 }
 

Thanks for the analysis and good catch. Maybe I have a look into this by
tomorrow.

Regards

Rüdiger



Problem with file descriptor handling in httpd 2.3.1

2009-01-03 Thread Rainer Jung
During testing 2.3.1 I noticed a lot of errors of type EMFILE: Too many 
open files. I used strace and the problem looks like this:


- The test case is using ab with HTTP keep alive, concurrency 20 and a 
small file, so doing about 2000 requests per second. 
MaxKeepAliveRequests=100 (Default)


- the file leading to EMFILE is the static content file, which can be 
observed to be open more than 1000 times in parallel although ab 
concurrency is only 20


- From looking at the code it seems the file is closed during a cleanup 
function associated to the request pool, which is triggered by an EOR bucket


Now what happens under KeepAlive is that the content files are kept open 
longer than the handling of the request, more precisely until the 
closing of the connection. So when  MaxKeepAliveRequests*Concurrency  
MaxNumberOfFDs we run out of file descriptors.


I observed the behaviour with 2.3.1 on Linux (SLES10 64Bit) with Event, 
Worker and Prefork. I didn't yet have the time to retest with 2.2.


For Event and Worker I get also crashes (more precisely httpd processes 
stopping) due to apr_socket_accept() also returning with EMFILE.


Regards,

Rainer



Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-03 Thread Paul Querna

Rainer Jung wrote:
During testing 2.3.1 I noticed a lot of errors of type EMFILE: Too many 
open files. I used strace and the problem looks like this:


- The test case is using ab with HTTP keep alive, concurrency 20 and a 
small file, so doing about 2000 requests per second. 
MaxKeepAliveRequests=100 (Default)


- the file leading to EMFILE is the static content file, which can be 
observed to be open more than 1000 times in parallel although ab 
concurrency is only 20


- From looking at the code it seems the file is closed during a cleanup 
function associated to the request pool, which is triggered by an EOR 
bucket


Now what happens under KeepAlive is that the content files are kept open 
longer than the handling of the request, more precisely until the 
closing of the connection. So when  MaxKeepAliveRequests*Concurrency  
MaxNumberOfFDs we run out of file descriptors.


I observed the behaviour with 2.3.1 on Linux (SLES10 64Bit) with Event, 
Worker and Prefork. I didn't yet have the time to retest with 2.2.


It should only happen in 2.3.x/trunk because the EOR bucket is a new 
feature to let MPMs do async writes once the handler has finished running.


And yes, this sounds like a nasty bug.

-Paul



Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-03 Thread Rainer Jung

On 04.01.2009 00:36, Paul Querna wrote:

Rainer Jung wrote:

During testing 2.3.1 I noticed a lot of errors of type EMFILE: Too
many open files. I used strace and the problem looks like this:

- The test case is using ab with HTTP keep alive, concurrency 20 and a
small file, so doing about 2000 requests per second.
MaxKeepAliveRequests=100 (Default)

- the file leading to EMFILE is the static content file, which can be
observed to be open more than 1000 times in parallel although ab
concurrency is only 20

- From looking at the code it seems the file is closed during a
cleanup function associated to the request pool, which is triggered by
an EOR bucket

Now what happens under KeepAlive is that the content files are kept
open longer than the handling of the request, more precisely until the
closing of the connection. So when MaxKeepAliveRequests*Concurrency 
MaxNumberOfFDs we run out of file descriptors.

I observed the behaviour with 2.3.1 on Linux (SLES10 64Bit) with
Event, Worker and Prefork. I didn't yet have the time to retest with 2.2.


It should only happen in 2.3.x/trunk because the EOR bucket is a new
feature to let MPMs do async writes once the handler has finished running.

And yes, this sounds like a nasty bug.


I verified I can't reproduce with the same platform and 2.2.11.

Not sure I understand the EOR asynchronicity good enough to analyze the 
root cause.


Rainer


Re: Problem with file descriptor handling in httpd 2.3.1

2009-01-03 Thread Ruediger Pluem


On 01/04/2009 12:49 AM, Rainer Jung wrote:
 On 04.01.2009 00:36, Paul Querna wrote:
 Rainer Jung wrote:
 During testing 2.3.1 I noticed a lot of errors of type EMFILE: Too
 many open files. I used strace and the problem looks like this:

 - The test case is using ab with HTTP keep alive, concurrency 20 and a
 small file, so doing about 2000 requests per second.

What is the exact size of the file?

Regards

Rüdiger