Re: [PATCH] raise MAX_SERVER_LIMIT

2004-01-28 Thread Colm MacCarthaigh
On Tue, Jan 27, 2004 at 02:24:46PM -0500, Jeff Trawick wrote:
 I'm testing with this patch currently (so far so good):

Same here, I've applied the patch, and right now have 1 hours uptime,
which is 12 times more than I've ever had with worker before. 
Looks like that was it. Where do I send the beer?

Oh, and if someone is committing to worker.c, do me a favour and
add a zero to MAX_SERVER_LIMIT there too ;)

 Index: server/mpm/worker/worker.c
 ===
 RCS file: /home/cvs/httpd-2.0/server/mpm/worker/worker.c,v
 retrieving revision 1.145
 diff -u -r1.145 worker.c
 --- server/mpm/worker/worker.c27 Jan 2004 15:19:58 -  1.145
 +++ server/mpm/worker/worker.c27 Jan 2004 19:20:10 -
 @@ -1441,7 +1441,8 @@
  ++idle_thread_count;
  }
  }
 -if (any_dead_threads  totally_free_length  idle_spawn_rate
 +if (any_dead_threads  totally_free_length  idle_spawn_rate
 + free_length  MAX_SPAWN_RATE
   (!ps-pid   /* no process in the slot */
  || ps-quiescing)) {   /* or at least one is going 
  away */
  if (all_dead_threads) {
 
 

-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]


Re: [PATCH] raise MAX_SERVER_LIMIT

2004-01-28 Thread Colm MacCarthaigh
On Wed, Jan 28, 2004 at 10:40:54AM +, Colm MacCarthaigh wrote:
 On Tue, Jan 27, 2004 at 02:24:46PM -0500, Jeff Trawick wrote:
  I'm testing with this patch currently (so far so good):
 
 Same here, I've applied the patch, and right now have 1 hours uptime,
 which is 12 times more than I've ever had with worker before. 
 Looks like that was it. Where do I send the beer?

Still going well, no problems since the patch, but after all that, 
I can finally benchmark worker Vs prefork properly:

Prefork:

Server Software:Apache/2.0.48  
Server Hostname:ftp.heanet.ie
Server Port:80

Document Path:  /
Document Length:2841 bytes

Concurrency Level:  40
Time taken for tests:   31.223 seconds
Complete requests:  1
Failed requests:0
Broken pipe errors: 0
Total transferred:  2991 bytes
HTML transferred:   2841 bytes
Requests per second:320.28 [#/sec] (mean)
Time per request:   124.89 [ms] (mean)
Time per request:   3.12 [ms] (mean, across all concurrent requests)
Transfer rate:  957.95 [Kbytes/sec] received

Connnection Times (ms)
  min  mean[+/-sd] median   max
Connect:0 01.5  051
Processing:17   124  192.2 54  1692
Waiting:   14   124  192.2 54  1691
Total: 17   124  192.3 55  1693

Percentage of the requests served within a certain time (ms)
  50% 55
  66% 67
  75% 84
  80%108
  90%337
  95%547
  98%706
  99%861
 100%   1693 (last request)

Worker:

Server Software:Apache/2.0.48  
Server Hostname:ftp.heanet.ie
Server Port:80

Document Path:  /
Document Length:2841 bytes

Concurrency Level:  40
Time taken for tests:   39.907 seconds
Complete requests:  1
Failed requests:0
Broken pipe errors: 0
Total transferred:  2991 bytes
HTML transferred:   2841 bytes
Requests per second:250.58 [#/sec] (mean)
Time per request:   159.63 [ms] (mean)
Time per request:   3.99 [ms] (mean, across all concurrent requests)
Transfer rate:  749.49 [Kbytes/sec] received

Connnection Times (ms)  
  min  mean[+/-sd] median   max
Connect:0 00.5  017
Processing: 1   159  187.8 90  1540
Waiting:1   159  187.8 90  1540
Total:  1   159  187.8 90  1540

Percentage of the requests served within a certain time (ms)
  50% 90
  66%124
  75%176
  80%206
  90%345
  95%590
  98%865
  99%968
 100%   1540 (last request) 

Oh well :)

-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]


Re: [PATCH] raise MAX_SERVER_LIMIT

2004-01-28 Thread Jeff Trawick
Colm MacCarthaigh wrote:
On Tue, Jan 27, 2004 at 02:24:46PM -0500, Jeff Trawick wrote:

I'm testing with this patch currently (so far so good):
Same here, I've applied the patch, and right now have 1 hours uptime,
which is 12 times more than I've ever had with worker before. 
Looks like that was it. Where do I send the beer?
cool!

hold off on the beer for now...  I couldn't go through a keg before it went 
flat, and the cans and bottles don't really do it for me, draft device or not...

Oh, and if someone is committing to worker.c, do me a favour and
add a zero to MAX_SERVER_LIMIT there too ;)
yup; MAX_THREAD_LIMIT should be increased too, as I have come close enough to 
20,000 ThreadsPerChild on AIX before to wonder if I could/would hit it...  I'll 
make a note to go through the Unix MPMs soon-ish and give some more breathing room



Re: [PATCH] raise MAX_SERVER_LIMIT

2004-01-27 Thread Jeff Trawick
worker MPM stack corruption in parent:

	int free_slots[MAX_SPAWN_RATE];

...

/* great! we prefer these, because the new process can
 * start more threads sooner.  So prioritize this slot
 * by putting it ahead of any slots with active threads.
 *
 * first, make room by moving a slot that's potentially 
still
 * in use to the end of the array
 */
NEW CODE-  ap_assert(free_length  MAX_SPAWN_RATE);
free_slots[free_length] =
   free_slots[totally_free_length];

[Tue Jan 27 12:20:19 2004] [crit] [Tue Jan 27 12:20:19 2004] file 
worker.c, line 1590, assertion free_length  MAX_SPAWN_RATE failed



Re: [PATCH] raise MAX_SERVER_LIMIT

2004-01-27 Thread Jeff Trawick
Jeff Trawick wrote:
worker MPM stack corruption in parent:

int free_slots[MAX_SPAWN_RATE];

...

/* great! we prefer these, because the new process can
 * start more threads sooner.  So prioritize this slot
 * by putting it ahead of any slots with active threads.
 *
 * first, make room by moving a slot that's potentially 
still
 * in use to the end of the array
 */
NEW CODE-  ap_assert(free_length  MAX_SPAWN_RATE);
free_slots[free_length] =
   free_slots[totally_free_length];

[Tue Jan 27 12:20:19 2004] [crit] [Tue Jan 27 12:20:19 2004] file 
worker.c, line 1590, assertion free_length  MAX_SPAWN_RATE failed
I'm testing with this patch currently (so far so good):

Index: server/mpm/worker/worker.c
===
RCS file: /home/cvs/httpd-2.0/server/mpm/worker/worker.c,v
retrieving revision 1.145
diff -u -r1.145 worker.c
--- server/mpm/worker/worker.c  27 Jan 2004 15:19:58 -  1.145
+++ server/mpm/worker/worker.c  27 Jan 2004 19:20:10 -
@@ -1441,7 +1441,8 @@
 ++idle_thread_count;
 }
 }
-if (any_dead_threads  totally_free_length  idle_spawn_rate
+if (any_dead_threads  totally_free_length  idle_spawn_rate
+ free_length  MAX_SPAWN_RATE
  (!ps-pid   /* no process in the slot */
 || ps-quiescing)) {   /* or at least one is going away */
 if (all_dead_threads) {


Re: [PATCH] raise MAX_SERVER_LIMIT

2004-01-27 Thread Ben Laurie
Colm MacCarthaigh wrote:

On Mon, Jan 26, 2004 at 06:28:03PM +, Colm MacCarthaigh wrote:

I'd love to find out what's causing your worker failures. Are you using
any thread-unsafe modules or libraries?
Not to my knowledge, I wasn't planning to do this till later, but
I've bumped to 2.1, I'll try out the forensic_id and backtrace
modules right now, and see how that goes. 


*sigh*, forensic_id didn't catch it, backtrace didn't catch it,
whatkilledus didn't catch it, all tried individually. The parent just
dumps core; the children live on, serve their content and log their
request and then drop off one by one. No uncomplete requests,
no backtrace or other exception info thrown into any log.
corefile is as useful as ever, unbacktracable. suggestions welcome!
mod_log_forensic?

--
http://www.apache-ssl.org/ben.html   http://www.thebunker.net/
There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit. - Robert Woodruff


Re: [PATCH] raise MAX_SERVER_LIMIT

2004-01-26 Thread Aaron Bannert
On Thu, Jan 15, 2004 at 04:04:38PM +, Colm MacCarthaigh wrote:
 There were other changes co-incidental to that, like going to 12Gb
 of RAM, which certainly helped, so it's hard to narrow it down too
 much.

Ok with 18,000 or so child processes (all in the run queue) what does
your load look like? Also, what kind of memory footprint are you seeing?

 I don't use worker because it still dumps an un-backtracable corefile
 within about 5 minutes for me. I still have no idea why, though plenty
 of corefiles. I havn't tried a serious analysis yet, becasue I've been
 moving house, but I hope to get to it soon. Moving to worker would be
 a good thing :)

I'd love to find out what's causing your worker failures. Are you using
any thread-unsafe modules or libraries?

-aaron


Re: [PATCH] raise MAX_SERVER_LIMIT

2004-01-26 Thread Colm MacCarthaigh
On Mon, Jan 26, 2004 at 10:09:20AM -0800, Aaron Bannert wrote:
 On Thu, Jan 15, 2004 at 04:04:38PM +, Colm MacCarthaigh wrote:
  There were other changes co-incidental to that, like going to 12Gb
  of RAM, which certainly helped, so it's hard to narrow it down too
  much.
 
 Ok with 18,000 or so child processes (all in the run queue) what does
 your load look like? Also, what kind of memory footprint are you seeing?

At the time, we were seeing a laod of between 8 and 15, varying like a
sawtooth waveform. It would climb and climb, there'd be a steady sharp
decrease and the cycle would star again. At one point I miscompiled in
the Linux pre-empt options into the Kernel, and that made things very
intresting. Load was much more radically in it's mood-swings then.

There would be points when it would slow to a crawl, and the ammount
of data we shipped was down - we only managed to peak at 200Mbit/sec
during the heaviest part of it. Our daily peak is about 380Mbit, 
but hopefully we'll be more ready next time. I've managed to commision
the second server, and move the updates to it, see:

http://ftp.heanet.ie/about/

for an idea of the architecture. As for memory footprint, it wasn't
too bad, I actually put the system into 4Gb mode to avoid bounce-buffering 
- something I hadn't fully mapped out yet. We were using all of the RAM,
but that's not unusual for us, we aggressively cache as much of the
filesystem as XFS lets us. All of the Apache instances added up to
about 165Mb of RAM.

  I don't use worker because it still dumps an un-backtracable corefile
  within about 5 minutes for me. I still have no idea why, though plenty
  of corefiles. I havn't tried a serious analysis yet, becasue I've been
  moving house, but I hope to get to it soon. Moving to worker would be
  a good thing :)
 
 I'd love to find out what's causing your worker failures. Are you using
 any thread-unsafe modules or libraries?

Not to my knowledge, I wasn't planning to do this till later, but
I've bumped to 2.1, I'll try out the forensic_id and backtrace
modules right now, and see how that goes. 

-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]


Re: [PATCH] raise MAX_SERVER_LIMIT

2004-01-26 Thread Colm MacCarthaigh
On Mon, Jan 26, 2004 at 06:28:03PM +, Colm MacCarthaigh wrote:
  I'd love to find out what's causing your worker failures. Are you using
  any thread-unsafe modules or libraries?
 
 Not to my knowledge, I wasn't planning to do this till later, but
 I've bumped to 2.1, I'll try out the forensic_id and backtrace
 modules right now, and see how that goes. 

*sigh*, forensic_id didn't catch it, backtrace didn't catch it,
whatkilledus didn't catch it, all tried individually. The parent just
dumps core; the children live on, serve their content and log their
request and then drop off one by one. No uncomplete requests,
no backtrace or other exception info thrown into any log.

corefile is as useful as ever, unbacktracable. suggestions welcome!

-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]


Re: [PATCH] raise MAX_SERVER_LIMIT

2004-01-26 Thread Glenn
On Mon, Jan 26, 2004 at 07:37:23PM +, Colm MacCarthaigh wrote:
 On Mon, Jan 26, 2004 at 06:28:03PM +, Colm MacCarthaigh wrote:
   I'd love to find out what's causing your worker failures. Are you using
   any thread-unsafe modules or libraries?
  
  Not to my knowledge, I wasn't planning to do this till later, but
  I've bumped to 2.1, I'll try out the forensic_id and backtrace
  modules right now, and see how that goes. 
 
 *sigh*, forensic_id didn't catch it, backtrace didn't catch it,
 whatkilledus didn't catch it, all tried individually. The parent just
 dumps core; the children live on, serve their content and log their
 request and then drop off one by one. No uncomplete requests,
 no backtrace or other exception info thrown into any log.
 
 corefile is as useful as ever, unbacktracable. suggestions welcome!

Have you tried setting up a signal handler for SIGSEGV and calling
   kill(getpid(), SIGSTOP);
in the signal handler?  After attaching to the process with gdb, send
a CONT signal to the process from another terminal.  It's worth a shot.
(Is the process dying from SIGSEGV or some other signal?
Does the core file tell you?)


Can you get a tcpdump of the traffic leading up to the crash?
(Yeah I know it would be a lot)

If you can get a tcpdump, and then can replay the traffic and 
reproduce it, more of us can look at this.

Cheers,
Glenn


Re: [PATCH] raise MAX_SERVER_LIMIT

2004-01-26 Thread Jeff Trawick
Colm MacCarthaigh wrote:
On Mon, Jan 26, 2004 at 06:28:03PM +, Colm MacCarthaigh wrote:

I'd love to find out what's causing your worker failures. Are you using
any thread-unsafe modules or libraries?
Not to my knowledge, I wasn't planning to do this till later, but
I've bumped to 2.1, I'll try out the forensic_id and backtrace
modules right now, and see how that goes. 


*sigh*, forensic_id didn't catch it,
forensic_id is just for crash in child

backtrace didn't catch it,
whatkilledus didn't catch it, all tried individually.
disable the check for geteuid()==0 and see if you get backtrace? 
exception hook purposefully doesn't run as root (I assume your parent is 
running as root)



Re: [PATCH] raise MAX_SERVER_LIMIT

2004-01-26 Thread Colm MacCarthaigh
On Mon, Jan 26, 2004 at 04:25:58PM -0500, Jeff Trawick wrote:
 *sigh*, forensic_id didn't catch it,
 
 forensic_id is just for crash in child

I know, but I couldnt rule out a crash in the child being a root cause
... until now, it doesn't look like it's trigger by a particular URI
anyway.

 backtrace didn't catch it,
 whatkilledus didn't catch it, all tried individually.
 
 disable the check for geteuid()==0 and see if you get backtrace? 
 exception hook purposefully doesn't run as root (I assume your parent is 
 running as root)

No problem, first thing tomorrow :)

-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]


Re: [PATCH] raise MAX_SERVER_LIMIT

2004-01-15 Thread gregames
Colm MacCarthaigh wrote:
Not entirely serious, but today, we actually hit this, in production :)
The hardware, a dual 2Ghz Xeon with 12Gb RAM with Linux 2.6.1-rc2 coped,
and remained responsive. So 20,000 may no longer be outside the realms
of what administrators reasonably desire to have.

-#define MAX_SERVER_LIMIT 2
+#define MAX_SERVER_LIMIT 10
dang!

Committed a limit of 20.

A couple of observations:

* I don't think you could do this with an early 2.4 kernel on i386 because of 
eating up kernel memory with LDTs, assuming APR thinks it can support threads. 
Not sure about current kernels from popular distros.

* Should I assume you tried worker but it uses too much CPU?  If so, is it a 
small percentage more or really really bad?

Thanks,
Greg


Re: [PATCH] raise MAX_SERVER_LIMIT

2004-01-15 Thread Colm MacCarthaigh
On Thu, Jan 15, 2004 at 10:49:43AM -0500, [EMAIL PROTECTED] wrote:
 -#define MAX_SERVER_LIMIT 2
 +#define MAX_SERVER_LIMIT 10
 
 dang!
 
 Committed a limit of 20.
 
 A couple of observations:
 
 * I don't think you could do this with an early 2.4 kernel on i386 because 
 of eating up kernel memory with LDTs, assuming APR thinks it can support 
 threads. Not sure about current kernels from popular distros.

I'm running 2.6.1-mm2 now, and things are much much better, we got away
with it - just about with 2.6.1 vanilla, but -mm2 has improved the
stability a lot. Peaked at just over 18,000 with 2.6.1-mm2 so far and
it was a lot more bearable.

We were having major stability problems with 2.4, and as soon as we
went to 2.6 we found out why - we were getting a lot more client
requests than we thought, but they were queuing. We hit 20,000 within
2 days of going to 2.6.1-rc2. 

There were other changes co-incidental to that, like going to 12Gb
of RAM, which certainly helped, so it's hard to narrow it down too
much.

 * Should I assume you tried worker but it uses too much CPU?  If so, is it 
 a small percentage more or really really bad?

I don't use worker because it still dumps an un-backtracable corefile
within about 5 minutes for me. I still have no idea why, though plenty
of corefiles. I havn't tried a serious analysis yet, becasue I've been
moving house, but I hope to get to it soon. Moving to worker would be
a good thing :)

If I get time, I'll compile the forensic logging module and see if
I can find if it has a  request-specific trigger I can replicate for
testing. I have worker running for months on end on the exact same
platform with much lower yield rates, so I suspect the problem is
only being triggered by the sheer volume of requests.

-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]


[PATCH] raise MAX_SERVER_LIMIT

2004-01-07 Thread Colm MacCarthaigh

Not entirely serious, but today, we actually hit this, in production :)
The hardware, a dual 2Ghz Xeon with 12Gb RAM with Linux 2.6.1-rc2 coped,
and remained responsive. So 20,000 may no longer be outside the realms
of what administrators reasonably desire to have.

Index: server/mpm/prefork/prefork.c
===
RCS file: /home/cvspublic/httpd-2.0/server/mpm/prefork/prefork.c,v
retrieving revision 1.286
diff -u -u -r1.286 prefork.c
--- server/mpm/prefork/prefork.c1 Jan 2004 13:26:25 -
1.286
+++ server/mpm/prefork/prefork.c7 Jan 2004 21:24:55 -
@@ -123,7 +123,7 @@
  * some sort of compile-time limit to help catch typos.
  */
 #ifndef MAX_SERVER_LIMIT
-#define MAX_SERVER_LIMIT 2
+#define MAX_SERVER_LIMIT 10
 #endif
 
 #ifndef HARD_THREAD_LIMIT
-- 
Colm MacCárthaighPublic Key: [EMAIL PROTECTED]