Re: Re: Re: mod_fcgid concurrency bottleneck, issue#53693

2012-09-04 Thread pqf
Hi, all
I took some times to setup a stress test for mod_fcgid, to see is there any 
other concurrency bottleneck left, it seems fine to me.

I did a stress test with callgrind on mod_fcgid, this is the big picture: 
http://people.apache.org/~pqf/mod_fcgid_performance/performance.png and this is 
the output of callgrind: 
http://people.apache.org/~pqf/mod_fcgid_performance/callgrind.out
As we can see, most cpu time are used in ap_create_environment() + 
init_environment(). ap_create_environment() copy environment vars to a char[][] 
buffer, and init_environment() copy char[][] to a char[] buffer for FCGI 
protocol. I have to keep using ap_create_environment() to be compatible with 
other cgi modules(cgid, cgi...), I will not shortcut these two calls for less 
cpu utils.

And I make a strace trace to the system calls, to make sure there is no obvious 
IO throughput bottleneck, and no luck either

4622  13:17:16.737888 ... futex resumed ) = 0
4622  13:17:16.737920 futex(0x95ce65c, FUTEX_WAKE, 1 unfinished ...
4622  13:17:16.738038 ... futex resumed ) = 0
4622  13:17:16.740735 gettimeofday({1346217436, 740787}, NULL) = 0
4622  13:17:16.740835 getsockname(12, {sa_family=AF_INET6, 
sin6_port=htons(8080), inet_pton(AF_INET6, :::127.0.0.1, si
n6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
4622  13:17:16.741008 fcntl64(12, F_GETFL) = 0x2 (flags O_RDWR)
4622  13:17:16.741056 fcntl64(12, F_SETFL, O_RDWR|O_NONBLOCK) = 0
4622  13:17:16.741824 gettimeofday({1346217436, 741849}, NULL) = 0
4622  13:17:16.741890 gettimeofday({1346217436, 741911}, NULL) = 0
4622  13:17:16.741969 read(12, GET /fcgi-bin/foo.pl HTTP/1.0\r\nH..., 8000) = 
97
4622  13:17:16.742049 gettimeofday({1346217436, 742070}, NULL) = 0
4622  13:17:16.742108 gettimeofday({1346217436, 742129}, NULL) = 0
4622  13:17:16.742165 gettimeofday({1346217436, 742185}, NULL) = 0
4622  13:17:16.74 gettimeofday({1346217436, 742242}, NULL) = 0
4622  13:17:16.742278 gettimeofday({1346217436, 742299}, NULL) = 0
4622  13:17:16.742345 gettimeofday({1346217436, 742367}, NULL) = 0
4622  13:17:16.742404 gettimeofday({1346217436, 742424}, NULL) = 0
4622  13:17:16.742467 stat64(/usr/local/apache2/htdocs/fcgi-bin/foo.pl, 
{st_mode=S_IFREG|0755, st_size=229, ...}) = 0
4622  13:17:16.742697 semop(4292662, 0x17234c, 1) = 0
4622  13:17:16.742750 semop(4292662, 0x172358, 1) = 0
4622  13:17:16.742804 socket(PF_FILE, SOCK_STREAM, 0) = 13
4622  13:17:16.742864 connect(13, {sa_family=AF_FILE, 
path=/usr/local/apache2/logs/fcgidsock/4617.11}, 110) = 0
4622  13:17:16.743028 fcntl64(13, F_GETFL) = 0x2 (flags O_RDWR)
4622  13:17:16.743074 fcntl64(13, F_SETFL, O_RDWR|O_NONBLOCK) = 0
4622  13:17:16.743118 gettimeofday({1346217436, 743138}, NULL) = 0
4622  13:17:16.743174 writev(13, [{\1\1\0\1\0\10\0\0, 8}, 
{\0\1\0\0\0\0\0\0, 8}, {\1\4\0\1\3U\0\0, 8}, {\t\30UNIQUE_ID
UD2l3MCoqlsAABILpXoAA..., 853}, {\1\4\0\1\0\0\0\0, 8}, {\1\5\0\1\0\0\0\0, 
8}], 6) = 893
4622  13:17:16.743337 read(13, 0xa6f4c5b8, 8192) = -1 EAGAIN (Resource 
temporarily unavailable)
4622  13:17:16.743386 poll([{fd=13, events=POLLIN, revents=POLLIN}], 1, 4) 
= 1
4622  13:17:16.743789 read(13, \1\6\0\1\3\266\2\0Content-Type: text/plain..., 
8192) = 984
4622  13:17:16.743879 gettimeofday( unfinished ...
4622  13:17:16.743968 ... gettimeofday resumed {1346217436, 743943}, NULL) = 0
4622  13:17:16.744037 close(13) = 0
4622  13:17:16.744101 semop(4292662, 0x17234c, 1) = 0
4622  13:17:16.744149 semop(4292662, 0x172358, 1) = 0
4622  13:17:16.744209 gettimeofday({1346217436, 744231}, NULL) = 0
4622  13:17:16.744266 gettimeofday({1346217436, 744287}, NULL) = 0
4622  13:17:16.744343 writev(12, [{HTTP/1.1 200 OK\r\nDate: Wed, 29 A..., 
154}, {CONTEXT_DOCUMENT_ROOT=\/usr/loca..., 92
4}], 2 unfinished ...
4622  13:17:16.745033 ... writev resumed ) = 1078
4622  13:17:16.745076 gettimeofday({1346217436, 745098}, NULL) = 0
4622  13:17:16.745148 times({tms_utime=1416, tms_stime=1464, tms_cutime=0, 
tms_cstime=0}) = 499844951
4622  13:17:16.745255 gettimeofday( unfinished ...
4622  13:17:16.745800 ... gettimeofday resumed {1346217436, 745791}, NULL) = 0
4622  13:17:16.746013 shutdown(12, 1 /* send */ unfinished ...
4622  13:17:16.746554 ... shutdown resumed ) = 0
4622  13:17:16.746625 gettimeofday( unfinished ...
4622  13:17:16.746684 ... gettimeofday resumed {1346217436, 746658}, NULL) = 0
4622  13:17:16.746765 epoll_ctl(11, EPOLL_CTL_ADD, 12, {EPOLLIN, 
{u32=158210764, u64=158210764}} unfinished ...
4622  13:17:16.746833 ... epoll_ctl resumed ) = 0
4622  13:17:16.747587 gettimeofday( unfinished ...
4622  13:17:16.747904 ... gettimeofday resumed {1346217436, 747649}, NULL) = 0
4622  13:17:16.748672 futex(0x95ce690, FUTEX_WAIT, 176390, NULL unfinished ...



2012-09-05



pqf



发件人:pqf
发送时间:2012-08-28 09:34
主题:回复: Re: Re: Re: mod_fcgid concurrency bottleneck, issue#53693
收件人:devdev@httpd.apache.org
抄送:

So what can mod_fcgid do in this overloaded?
1. mod_fcgid get a request
2. mod_fcgid can't apply a free slot

Re: Re: Re: Re: mod_fcgid concurrency bottleneck, issue#53693

2012-08-28 Thread Lazy
2012/8/28 pqf p...@mailtech.cn:
 So what can mod_fcgid do in this overloaded?
 1. mod_fcgid get a request
 2. mod_fcgid can't apply a free slot of FCGI handler
 3. mod_fcgid send a spawn request to PM
 4. PM deny the request(for too much process already)
 5. Now
for( i=1; i64; i++)
   {
  a) mod_fcgid delay a while, and then send another spawn request to PM
 and try apply free slot again.
  b) mod_fcgid send another spawn request at once, even the last request
 is denied.
  c) ??
 (now is a, b maybe not a good idea, any new idea?)
   }

 I think the bottleneck is too much request, too less FCGI handler. httpd(or
 mod_fcgid) either drop client connections or delay a while, there is no
 other way out?

My idea is to add a availability number to each class. If a wait fails
it will be decreased or increased if wait is successful. Let's say we
want to wait max 100 times 250ms


int connected=0;

   for(i=0; !connected  i = (get_clas_avail()/MAX_AVAIL)*100; i++) {
/* Apply a process slot */
bucket_ctx-procnode = apply_free_procnode(r, fcgi_request);

/* Send a spawn request if I can't get a process slot */
/* procmgr_send_spawn_cmd() return APR_SUCCESS if a
process is created */
if( !bucket_ctx-procnode 
(procmgr_send_spawn_cmd(fcgi_request, r) != APR_SUCCESS) )
apr_sleep(apr_time_from_msec(250));
else
/* Apply a process slot */
bucket_ctx-procnode = apply_free_procnode(r, fcgi_request);

if (bucket_ctx-procnode) {
if (proc_connect_ipc(bucket_ctx-procnode,
 bucket_ctx-ipc) != APR_SUCCESS) {
proc_close_ipc(bucket_ctx-ipc);
bucket_ctx-procnode-diewhy = FCGID_DIE_CONNECT_ERROR;
return_procnode(r, bucket_ctx-procnode, 1 /* has
error */ );
bucket_ctx-procnode = NULL;
decrease_avail();
}else {
increase_avail();
connected=1;
}
}
}

if (!connected) {
decrease_avail();
ap_log_rerror(APLOG_MARK, APLOG_WARNING, 0, r,
  mod_fcgid: can't apply process slot for %s,
  cmd_conf-cmdline);
return HTTP_SERVICE_UNAVAILABLE;
}

decrease_avail() might halve availability each time called.


Availability should be dynamic maybe controlled by processmanager and
returned to the threads handling connections by
procmgr_send_spawn_cmd(), it can depend on total number of denied
spawn requests for specific class in a similar way as score, without
this connections will be 503 not sooner then 25 seconds, which is
still IMHO to long. Another improvement would bo to make wait time
shorter for not overloaded classes to keep the penalty of denied spawn
as low as possible.


I plan to work on that later.



 Another question. Is it necessary to call procmgr_init_spawn_cmd() from
 inside the for loop ?
 I took a brief look, it seems not necessary. I will move it out of loop and
 test.

 2012-08-28
 
 pqf
 
 发件人:Lazy
 发送时间:2012-08-27 21:47
 主题:Re: Re: Re: mod_fcgid concurrency bottleneck, issue#53693
 收件人:devdev@httpd.apache.org
 抄送:

 2012/8/16 pqf p...@mailtech.cn:
 How about this:
 1. procmgr_post_spawn_cmd() now return a status code from PM, so process
 handler now know the spawn request is denyed or not.
 2. if a new process is created, no sleep is needed.
 3. if no process is created, sleep a while

 sorry for the late reply,

 in the old code there ware no sleep() between procmgr_post_spawn_cmd()
 and apply_free_procnode()

 sleep() was invoked only if there ware no free procnode.

 This happened only if we ware denied spawning new process or in some
 cases if some other thread managed to use that procnode before us.

 Your change adresses cases if some other thread stole our newly
 spawned fcgi process, old code was waiting 1s before trying to spawn
 another/recheck, new code doesn't, I guess this is the orginal issue
 in stress tests when total number of simultaneous connections doesn't
 exceed max fcgi processes. But when spawning is denied recovery time
 is still long 1s.


 I was refering to cases when spawn is denied.

 If a vhost is overloaded or someone added sleep(60) in the code,
 mod_fcgid blocks on all request to that vhost
 for over a minute and it is possible to occupy 1000 threads using
 under 20 new connections to slow vhost
 per second. This can be mitingated by adding avaiability which will
 impact time spend on waiting for free process. Overloaded vhost will
 start to drop connections faster preventing the web-server reaching
 MaxClients
 limit.

 Another question. Is it necessary to call procmgr_init_spawn_cmd()
 from inside the for loop ?



 2012-08-16
 
 pqf

Re: Re: Re: mod_fcgid concurrency bottleneck, issue#53693

2012-08-27 Thread Lazy
2012/8/16 pqf p...@mailtech.cn:
 How about this:
 1. procmgr_post_spawn_cmd() now return a status code from PM, so process
 handler now know the spawn request is denyed or not.
 2. if a new process is created, no sleep is needed.
 3. if no process is created, sleep a while

sorry for the late reply,

in the old code there ware no sleep() between procmgr_post_spawn_cmd()
and apply_free_procnode()

sleep() was invoked only if there ware no free procnode.

This happened only if we ware denied spawning new process or in some
cases if some other thread managed to use that procnode before us.

Your change adresses cases if some other thread stole our newly
spawned fcgi process, old code was waiting 1s before trying to spawn
another/recheck, new code doesn't, I guess this is the orginal issue
in stress tests when total number of simultaneous connections doesn't
exceed max fcgi processes. But when spawning is denied recovery time
is still long 1s.


I was refering to cases when spawn is denied.

If a vhost is overloaded or someone added sleep(60) in the code,
mod_fcgid blocks on all request to that vhost
for over a minute and it is possible to occupy 1000 threads using
under 20 new connections to slow vhost
per second. This can be mitingated by adding avaiability which will
impact time spend on waiting for free process. Overloaded vhost will
start to drop connections faster preventing the web-server reaching
MaxClients
limit.

Another question. Is it necessary to call procmgr_init_spawn_cmd()
from inside the for loop ?



 2012-08-16
 
 pqf
 
 发件人:Lazy
 发送时间:2012-08-16 16:47
 主题:Re: Re: mod_fcgid concurrency bottleneck, issue#53693
 收件人:devdev@httpd.apache.org
 抄送:

 2012/8/16 pqf p...@mailtech.cn:
 Hi, Michal
 My solution do add availability to each class, which is the
 procmgr_post_spawn_cmd() call in each loop do.
 The sleep() call is intrudused for a stress test without warm up time, in
 this case, mod_fcgid will create more processes than a slow start one(each
 process handler can't apply a free slot on the very begining, so send a
 request to process manager to create one, it's easy to reach the max # of
 process limit while httpd startup, but the idle process will be killed
 later), the sleep() call is a little like a server side warm up delay.
 But since someone said remove this sleep(), the server work fine without
 bottleneck(Maybe he didn't notise the warm up issue?), so I thought remove
 the sleep() is a good idea. But reduce the time of sleep() is fine to me
 too.

 I was referring to the case where all processes are busy, without
 sleep(), handle_request() wil quickly send spawn requsts, whith will
 be denyed by process menager, with sleep() handle_request() will
 always wait quite a long time,
 occupying slots

 --
 Michal Grzedzicki




回复: Re: Re: Re: mod_fcgid concurrency bottleneck, issue#53693

2012-08-27 Thread pqf
So what can mod_fcgid do in this overloaded?
1. mod_fcgid get a request
2. mod_fcgid can't apply a free slot of FCGI handler
3. mod_fcgid send a spawn request to PM
4. PM deny the request(for too much process already)
5. Now
   for( i=1; i64; i++)
  {
 a) mod_fcgid delay a while, and then send another spawn request to PM and 
try apply free slot again.
 b) mod_fcgid send another spawn request at once, even the last request is 
denied.
 c) ??
(now is a, b maybe not a good idea, any new idea?)
  }

I think the bottleneck is too much request, too less FCGI handler. httpd(or 
mod_fcgid) either drop client connections or delay a while, there is no other 
way out?


Another question. Is it necessary to call procmgr_init_spawn_cmd() from inside 
the for loop ? 
I took a brief look, it seems not necessary. I will move it out of loop and 
test.

2012-08-28



pqf



发件人:Lazy
发送时间:2012-08-27 21:47
主题:Re: Re: Re: mod_fcgid concurrency bottleneck, issue#53693
收件人:devdev@httpd.apache.org
抄送:

2012/8/16 pqf p...@mailtech.cn: 
 How about this: 
 1. procmgr_post_spawn_cmd() now return a status code from PM, so process 
 handler now know the spawn request is denyed or not. 
 2. if a new process is created, no sleep is needed. 
 3. if no process is created, sleep a while 

sorry for the late reply, 

in the old code there ware no sleep() between procmgr_post_spawn_cmd() 
and apply_free_procnode() 

sleep() was invoked only if there ware no free procnode. 

This happened only if we ware denied spawning new process or in some 
cases if some other thread managed to use that procnode before us. 

Your change adresses cases if some other thread stole our newly 
spawned fcgi process, old code was waiting 1s before trying to spawn 
another/recheck, new code doesn't, I guess this is the orginal issue 
in stress tests when total number of simultaneous connections doesn't 
exceed max fcgi processes. But when spawning is denied recovery time 
is still long 1s. 


I was refering to cases when spawn is denied. 

If a vhost is overloaded or someone added sleep(60) in the code, 
mod_fcgid blocks on all request to that vhost 
for over a minute and it is possible to occupy 1000 threads using 
under 20 new connections to slow vhost 
per second. This can be mitingated by adding avaiability which will 
impact time spend on waiting for free process. Overloaded vhost will 
start to drop connections faster preventing the web-server reaching 
MaxClients 
limit. 

Another question. Is it necessary to call procmgr_init_spawn_cmd() 
from inside the for loop ? 


 
 2012-08-16 
  
 pqf 
  
 发件人:Lazy 
 发送时间:2012-08-16 16:47 
 主题:Re: Re: mod_fcgid concurrency bottleneck, issue#53693 
 收件人:devdev@httpd.apache.org 
 抄送: 
 
 2012/8/16 pqf p...@mailtech.cn: 
 Hi, Michal 
 My solution do add availability to each class, which is the 
 procmgr_post_spawn_cmd() call in each loop do. 
 The sleep() call is intrudused for a stress test without warm up time, in 
 this case, mod_fcgid will create more processes than a slow start one(each 
 process handler can't apply a free slot on the very begining, so send a 
 request to process manager to create one, it's easy to reach the max # of 
 process limit while httpd startup, but the idle process will be killed 
 later), the sleep() call is a little like a server side warm up delay. 
 But since someone said remove this sleep(), the server work fine without 
 bottleneck(Maybe he didn't notise the warm up issue?), so I thought remove 
 the sleep() is a good idea. But reduce the time of sleep() is fine to me 
 too. 
 
 I was referring to the case where all processes are busy, without 
 sleep(), handle_request() wil quickly send spawn requsts, whith will 
 be denyed by process menager, with sleep() handle_request() will 
 always wait quite a long time, 
 occupying slots 
 
 -- 
 Michal Grzedzicki 
 
 

Re: mod_fcgid concurrency bottleneck, issue#53693

2012-08-16 Thread William A. Rowe Jr.
On 8/15/2012 8:23 PM, pqf wrote:
 But since someone said remove this sleep(), the server work fine without 
 bottleneck(Maybe
 he didn't notise the warm up issue?), so I thought remove the sleep() is a 
 good idea. But
 reduce the time of sleep() is fine to me too.
 Any other idea?

Can we introduce a blocking call on 'pipe ready' based on the child spawn
being ready to accept data?  The equivalent of a ping-test?


Re: Re: mod_fcgid concurrency bottleneck, issue#53693

2012-08-16 Thread pqf
procmgr_post_spawn_cmd(maybe it should be renamed to procmgr_send_spawn_cmd) 
will keep blocking while PM is creating a new process, after 
procmgr_post_spawn_cmd() return, there should be a process ready there waiting 
for accept.

2012-08-16



pqf



发件人:William A. Rowe Jr.
发送时间:2012-08-16 15:23
主题:Re: mod_fcgid concurrency bottleneck, issue#53693
收件人:devdev@httpd.apache.org
抄送:pqfp...@mailtech.cn

On 8/15/2012 8:23 PM, pqf wrote: 
 But since someone said remove this sleep(), the server work fine without 
 bottleneck(Maybe 
 he didn't notise the warm up issue?), so I thought remove the sleep() is a 
 good idea. But 
 reduce the time of sleep() is fine to me too. 
 Any other idea? 

Can we introduce a blocking call on 'pipe ready' based on the child spawn 
being ready to accept data?  The equivalent of a ping-test? 

Re: Re: mod_fcgid concurrency bottleneck, issue#53693

2012-08-16 Thread Lazy
2012/8/16 pqf p...@mailtech.cn:
 Hi, Michal
 My solution do add availability to each class, which is the
 procmgr_post_spawn_cmd() call in each loop do.
 The sleep() call is intrudused for a stress test without warm up time, in
 this case, mod_fcgid will create more processes than a slow start one(each
 process handler can't apply a free slot on the very begining, so send a
 request to process manager to create one, it's easy to reach the max # of
 process limit while httpd startup, but the idle process will be killed
 later), the sleep() call is a little like a server side warm up delay.
 But since someone said remove this sleep(), the server work fine without
 bottleneck(Maybe he didn't notise the warm up issue?), so I thought remove
 the sleep() is a good idea. But reduce the time of sleep() is fine to me
 too.

I was referring to the case where all processes are busy, without
sleep(), handle_request() wil quickly send spawn requsts, whith will
be denyed by process menager, with sleep() handle_request() will
always wait quite a long time,
occupying slots

-- 
Michal Grzedzicki


回复: Re: Re: mod_fcgid concurrency bottleneck, issue#53693

2012-08-16 Thread pqf
How about this:
1. procmgr_post_spawn_cmd() now return a status code from PM, so process 
handler now know the spawn request is denyed or not.
2. if a new process is created, no sleep is needed.
3. if no process is created, sleep a while

2012-08-16



pqf



发件人:Lazy
发送时间:2012-08-16 16:47
主题:Re: Re: mod_fcgid concurrency bottleneck, issue#53693
收件人:devdev@httpd.apache.org
抄送:

2012/8/16 pqf p...@mailtech.cn: 
 Hi, Michal 
 My solution do add availability to each class, which is the 
 procmgr_post_spawn_cmd() call in each loop do. 
 The sleep() call is intrudused for a stress test without warm up time, in 
 this case, mod_fcgid will create more processes than a slow start one(each 
 process handler can't apply a free slot on the very begining, so send a 
 request to process manager to create one, it's easy to reach the max # of 
 process limit while httpd startup, but the idle process will be killed 
 later), the sleep() call is a little like a server side warm up delay. 
 But since someone said remove this sleep(), the server work fine without 
 bottleneck(Maybe he didn't notise the warm up issue?), so I thought remove 
 the sleep() is a good idea. But reduce the time of sleep() is fine to me 
 too. 

I was referring to the case where all processes are busy, without 
sleep(), handle_request() wil quickly send spawn requsts, whith will 
be denyed by process menager, with sleep() handle_request() will 
always wait quite a long time, 
occupying slots 

--  
Michal Grzedzicki 

Re: mod_fcgid concurrency bottleneck, issue#53693

2012-08-15 Thread pqf
Hi, all
I prefer the following solution, can anyone review it? 
procmgr_post_spawn_cmd() will be blocked until process manager create a new 
fcgid process, the worst case is someone else take the new created process 
before I do, and I have to post another spawn command to PM again. The extreme 
case is loop FCGID_APPLY_TRY_COUNT but get no process slot.


Index: fcgid_bridge.c
===
--- fcgid_bridge.c  (revision 1373226)
+++ fcgid_bridge.c  (working copy)
@@ -30,7 +30,7 @@
 #include fcgid_spawn_ctl.h
 #include fcgid_protocol.h
 #include fcgid_bucket.h
-#define FCGID_APPLY_TRY_COUNT 2
+#define FCGID_APPLY_TRY_COUNT 4
 #define FCGID_REQUEST_COUNT 32
 #define FCGID_BRIGADE_CLEAN_STEP 32
 
@@ -447,19 +447,13 @@
 if (bucket_ctx-procnode)
 break;
 
-/* Avoid sleeping the very first time through if there are no
-   busy processes; the problem is just that we haven't spawned
-   anything yet, so waiting is pointless */
-if (i  0 || j  0 || count_busy_processes(r, fcgi_request)) {
-apr_sleep(apr_time_from_sec(1));
-
-bucket_ctx-procnode = apply_free_procnode(r, fcgi_request);
-if (bucket_ctx-procnode)
-break;
-}
-
 /* Send a spawn request if I can't get a process slot */
 procmgr_post_spawn_cmd(fcgi_request, r);
+
+/* Try again */
+bucket_ctx-procnode = apply_free_procnode(r, fcgi_request);
+if (bucket_ctx-procnode)
+break;
 }
 
 /* Connect to the fastcgi server */
2012-08-15



pqf



发件人:Mike M
发送时间:2012-08-15 04:29
主题:mod_fcgid concurrency bottleneck, issue#53693
收件人:devdev@httpd.apache.org
抄送:

Ahoy! 

With mod_fcgid I've observed a performance bottleneck in which concurrent 
requests to a virtualhost where the number of concurrent requests exceed 
the number of available fcgid-spawned processes result in a significant 
delay in responding to requests. 

Details of this situation, how to reproduce, various scenarios tried, etc 
are detailed in the issue submission [1]. 

In file  modules/fcgid/fcgid_bridge.c, there is this section of code: 

if (i  0 || j  0 || count_busy_processes(r, fcgi_request)) { 
apr_sleep(apr_time_from_sec(1)); 

If I change the sleep time from 1 second to 0 seconds (or, comment out 
this section of code entirely), the bottleneck goes away.  It seems like 
the more appropriate action here would be to turn the sleep time into a 
configurable value, with the current 1s value as the default. 

I presume this code is meant to be a way to help defer spawn requests so 
that a server is not over-whelmed with spin-up related I/O. 

My questions are: 
- Is this intended to be a throttling mechanism and if so, is there a more 
efficient way to handle this throttling? 
- If I outright reduce that sleep delay to 0 (or comment out this code), 
what potential or expected problems am I introducing to mod_fcgid? 


[1] https://issues.apache.org/bugzilla/show_bug.cgi?id=53693 


--- 
Mike M 

You can't learn in school what the world is going to do next year. 

Re: mod_fcgid concurrency bottleneck, issue#53693

2012-08-15 Thread Lazy
2012/8/15 pqf p...@mailtech.cn:
 Hi, all
 I prefer the following solution, can anyone review it?
 procmgr_post_spawn_cmd() will be blocked until process manager create a new
 fcgid process, the worst case is someone else take the new created process
 before I do, and I have to post another spawn command to PM again. The
 extreme case is loop FCGID_APPLY_TRY_COUNT but get no process slot.


 Index: fcgid_bridge.c
 ===
 --- fcgid_bridge.c  (revision 1373226)
 +++ fcgid_bridge.c  (working copy)
 @@ -30,7 +30,7 @@
  #include fcgid_spawn_ctl.h
  #include fcgid_protocol.h
  #include fcgid_bucket.h
 -#define FCGID_APPLY_TRY_COUNT 2
 +#define FCGID_APPLY_TRY_COUNT 4
  #define FCGID_REQUEST_COUNT 32
  #define FCGID_BRIGADE_CLEAN_STEP 32

 @@ -447,19 +447,13 @@
  if (bucket_ctx-procnode)
  break;

 -/* Avoid sleeping the very first time through if there are no
 -   busy processes; the problem is just that we haven't spawned
 -   anything yet, so waiting is pointless */
 -if (i  0 || j  0 || count_busy_processes(r, fcgi_request)) {
 -apr_sleep(apr_time_from_sec(1));
 -
 -bucket_ctx-procnode = apply_free_procnode(r,
 fcgi_request);
 -if (bucket_ctx-procnode)
 -break;
 -}
 -
  /* Send a spawn request if I can't get a process slot */
  procmgr_post_spawn_cmd(fcgi_request, r);
 +
 +/* Try again */
 +bucket_ctx-procnode = apply_free_procnode(r, fcgi_request);
 +if (bucket_ctx-procnode)
 +break;
  }

  /* Connect to the fastcgi server */


if You get rid of sleep apache will not wait for free process if all
of them are busy, this will lead to 503 errors

currently mod_fcgid waits FCGID_APPLY_TRY_COUNT * FCGID_REQUEST_COUNT
* 1 second and this is usually 64 seconds, this means if You have an
overloaded vhost with low FcgidMaxProcessesPerClass it can bring whole
server down, each thread waits 64 seconds so it doesn't take long
before all threads are occupied.

In my setup we lowerd the wait time and FCGID_REQUEST_COUNT to lower
the impact of overloaded class but I think
the best solution will be to add availability to each class. Total
wait time will be related to it (by changing sleep time, and
FCGID_APPLY_TRY_COUNT). If request is unsuccessful availability will
be halved so next time the wait time will be shorter. This way
congested class will get 0% availability, and new connections will
instantly get 503 it there are no free slots. A successful wait will
increase availability.


Regards,

Michal Grzedzicki


Re: Re: mod_fcgid concurrency bottleneck, issue#53693

2012-08-15 Thread pqf
Hi, Michal
My solution do add availability to each class, which is the 
procmgr_post_spawn_cmd() call in each loop do.
The sleep() call is intrudused for a stress test without warm up time, in this 
case, mod_fcgid will create more processes than a slow start one(each process 
handler can't apply a free slot on the very begining, so send a request to 
process manager to create one, it's easy to reach the max # of process limit 
while httpd startup, but the idle process will be killed later), the sleep() 
call is a little like a server side warm up delay.
But since someone said remove this sleep(), the server work fine without 
bottleneck(Maybe he didn't notise the warm up issue?), so I thought remove the 
sleep() is a good idea. But reduce the time of sleep() is fine to me too.
Any other idea?

Regards,

Ryan Pan

2012-08-16



pqf



发件人:Lazy
发送时间:2012-08-15 21:48
主题:Re: mod_fcgid concurrency bottleneck, issue#53693
收件人:devdev@httpd.apache.org
抄送:

2012/8/15 pqf p...@mailtech.cn: 
 Hi, all 
 I prefer the following solution, can anyone review it? 
 procmgr_post_spawn_cmd() will be blocked until process manager create a new 
 fcgid process, the worst case is someone else take the new created process 
 before I do, and I have to post another spawn command to PM again. The 
 extreme case is loop FCGID_APPLY_TRY_COUNT but get no process slot. 
 
 
 Index: fcgid_bridge.c 
 === 
 --- fcgid_bridge.c  (revision 1373226) 
 +++ fcgid_bridge.c  (working copy) 
 @@ -30,7 +30,7 @@ 
  #include fcgid_spawn_ctl.h 
  #include fcgid_protocol.h 
  #include fcgid_bucket.h 
 -#define FCGID_APPLY_TRY_COUNT 2 
 +#define FCGID_APPLY_TRY_COUNT 4 
  #define FCGID_REQUEST_COUNT 32 
  #define FCGID_BRIGADE_CLEAN_STEP 32 
 
 @@ -447,19 +447,13 @@ 
  if (bucket_ctx-procnode) 
  break; 
 
 -/* Avoid sleeping the very first time through if there are no 
 -   busy processes; the problem is just that we haven't spawned 
 -   anything yet, so waiting is pointless */ 
 -if (i  0 || j  0 || count_busy_processes(r, fcgi_request)) { 
 -apr_sleep(apr_time_from_sec(1)); 
 - 
 -bucket_ctx-procnode = apply_free_procnode(r, 
 fcgi_request); 
 -if (bucket_ctx-procnode) 
 -break; 
 -} 
 - 
  /* Send a spawn request if I can't get a process slot */ 
  procmgr_post_spawn_cmd(fcgi_request, r); 
 + 
 +/* Try again */ 
 +bucket_ctx-procnode = apply_free_procnode(r, fcgi_request); 
 +if (bucket_ctx-procnode) 
 +break; 
  } 
 
  /* Connect to the fastcgi server */ 


if You get rid of sleep apache will not wait for free process if all 
of them are busy, this will lead to 503 errors 

currently mod_fcgid waits FCGID_APPLY_TRY_COUNT * FCGID_REQUEST_COUNT 
* 1 second and this is usually 64 seconds, this means if You have an 
overloaded vhost with low FcgidMaxProcessesPerClass it can bring whole 
server down, each thread waits 64 seconds so it doesn't take long 
before all threads are occupied. 

In my setup we lowerd the wait time and FCGID_REQUEST_COUNT to lower 
the impact of overloaded class but I think 
the best solution will be to add availability to each class. Total 
wait time will be related to it (by changing sleep time, and 
FCGID_APPLY_TRY_COUNT). If request is unsuccessful availability will 
be halved so next time the wait time will be shorter. This way 
congested class will get 0% availability, and new connections will 
instantly get 503 it there are no free slots. A successful wait will 
increase availability. 


Regards, 

Michal Grzedzicki