Thank you so much for the reply. Here are a couple of examples, as I'm not
completely sure if my symptoms match, though the pstacks do look very similar
to my untrained eye:
Here is a two day-old child:
27743: /usr/local/apache2/bin/httpd -k start
----------------- lwp# 1 / thread# 1 --------------------
ff00a42c lwp_wait (3, ffbff804)
ff001e88 _thrp_join (3, 0, ffbff86c, 1, ff0b2780, ffbff804) + 38
ff214544 apr_thread_join (ffbff8ec, 32eea8, 7, 0, dc328, b15e0) + c
0008c43c join_workers (0, fe3aa8, 8bfcc, 32ec30, 0, 1) + ec
0008c790 child_main (2, 8b31c, 0, feee2a40, ff0b2840, ff0b2780) + 270
0008c970 make_child (c7800, 2, 0, c8800, c7000, c8400) + 128
0008d1b4 ap_mpm_run (fe4100f8, e, 0, 1, 27, 1) + 754
000343c0 main (d6218, d8190, ffbffc54, c7800, c7800, 0) + 79c
00033754 _start (0, 0, 0, 0, 0, 0) + 5c
----------------- lwp# 3 / thread# 3 --------------------
ff0058d4 lwp_park (0, 0, 0)
fefff6e8 cond_wait_queue (32ecc8, 32ec98, 0, 0, 0, 0) + 4c
fefffd30 cond_wait (32ecc8, 32ec98, 0, 0, fe460a40, 0) + 10
fefffd6c pthread_cond_wait (32ecc8, 32ec98, 0, 0, 32ec98, 0) + 8
0008e674 ap_queue_pop (32ec78, fe30bf1c, fe30bf18, 4, 0, 32ee40) + 64
0008be1c worker_thread (32eea8, 2, fe460a40, c8400, c8400, 0) + 10c
ff21440c dummy_worker (32eea8, 0, 0, fe460a40, ff214400, 1) + c
ff005850 _lwp_start (0, 0, 0, 0, 0, 0)
----------------- lwp# 4 / thread# 4 --------------------
ff0058d4 lwp_park (0, 0, 0)
fefff6e8 cond_wait_queue (32ecc8, 32ec98, 0, 0, 0, 0) + 4c
fefffd30 cond_wait (32ecc8, 32ec98, 0, 0, fe461240, 11692d8) + 10
fefffd6c pthread_cond_wait (32ecc8, 32ec98, 0, 0, 32ec98, 0) + 8
0008e674 ap_queue_pop (32ec78, fe20bf1c, fe20bf18, 0, 0, 32ee40) + 64
0008be1c worker_thread (32eec8, 2, fe461240, c8400, c8400, 4) + 10c
ff21440c dummy_worker (32eec8, 0, 0, fe461240, ff214400, 1) + c
ff005850 _lwp_start (0, 0, 0, 0, 0, 0)
...and several more in lwp_park.
And here's another one that's a day old, but looks different (including lots of
jk references):
7934: /usr/local/apache2/bin/httpd -k start
----------------- lwp# 1 / thread# 1 --------------------
ff00a42c lwp_wait (6, ffbff80c)
ff001e88 _thrp_join (6, 0, ffbff874, 1, ff0b2780, ffbff80c) + 38
ff214544 apr_thread_join (ffbff8f4, 28e228, 2, 0, 1, b1600) + c
0008c43c join_workers (c, 3c5f38, 8bfcc, 28df50, 0, 1) + ec
0008c790 child_main (0, 8b31c, 0, feee2a40, ff0b2840, ff0b2780) + 270
0008c970 make_child (c7800, 0, 0, c8800, c7000, c8400) + 128
0008d1b4 ap_mpm_run (fe4100f8, e, 0, 1, 26, 1) + 754
000343c0 main (d6218, d8190, ffbffc5c, c7800, c7800, 0) + 79c
00033754 _start (0, 0, 0, 0, 0, 0) + 5c
----------------- lwp# 6 / thread# 6 --------------------
ff00a14c read (15, fe00a908, 4)
fe4a87dc jk_tcp_socket_recvfull (15, fe00a908, 4, 2e4bf8, 510, 4ec) + 74
fe4c3088 ajp_connection_tcp_get_message (35f130, 35f168, 2e4bf8, 361188, 2000,
2064) + 44
fe4c5588 ajp_get_reply (361168, fe00bb50, 2e4bf8, 35f130, fe00aa70, 2028) + 9c
fe4c9304 ajp_service (361168, fe00bb50, 2e4bf8, fe00ab38, 1, c00) + 22b8
fe4a1234 jk_handler (23c, 35e740, 3f4390, 1, 13, 3544c8) + 9e4
00047534 ap_run_handler (3f40a0, 0, 11, 3e7028, 3f5a08, 0) + 3c
000479c0 ap_invoke_handler (3f40a0, 9d000, 3f40a0, 0, fe410028, 0) + c0
00073aa4 ap_process_request (3f40a0, 3, 4, 3f40a0, c8420, 21d8d8) + 160
00070b34 ap_process_http_connection (3d52e8, 3d5038, 3d5038, 3, c8420, 211980)
+ 10c
0004dce8 ap_run_process_connection (3d52e8, 3d5038, 3d5038, 3, 3d52e0, 3d7068)
+ 3c
0008bf1c worker_thread (28e228, 0, fe462240, c8400, c8400, c) + 20c
ff21440c dummy_worker (28e228, 0, 0, fe462240, ff214400, 1) + c
ff005850 _lwp_start (0, 0, 0, 0, 0, 0)
----------------- lwp# 7 / thread# 7 --------------------
ff214400 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 8 / thread# 8 --------------------
ff214400 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 9 / thread# 9 --------------------
ff214400 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 10 / thread# 10 --------------------
ff214400 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 11 / thread# 11 --------------------
ff214400 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 12 / thread# 12 --------------------
ff214400 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 13 / thread# 13 --------------------
ff214400 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
----------------- lwp# 14 / thread# 14 --------------------
ff214400 dummy_worker(), exit value = 0x00000000
** zombie (exited, not detached, not yet joined) **
...and so on...
If anyone has the time to confirm my case is a match I'd be very grateful but
this patch looks promising!
Thank you VERY MUCH!
-Chris
-----Original Message-----
From: Rainer Jung [mailto:[email protected]]
Sent: Saturday, June 22, 2013 12:31 PM
To: [email protected]
Subject: Re: Abandoned apache children with mod_jk
On 21.06.2013 19:47, Chris Boyce wrote:
> Hello,
>
> I'm running apache 2.2.24 (worker MPM) with mod_jk 1.2.37 under Solaris 11,
> compiled as follows (from config.log):
>
> --with-included-apr --with-mpm=worker --enable-so --enable-rewrite
> --enable-headers --enable-proxy --enable-proxy-http --enable-expires
> --enable-nonportable-atomics=yes --disable-include --disable-autoindex
> --disable-imap --disable-userdir CC=/usr/sfw/bin/gcc
>
> We are running Tomcat 7.0.32.
>
> Since moving to Solaris 11 I'm noticing over time that apache children are
> getting left in an idle state (and usually not showing up on the scoreboard
> at all) when doing graceful restarts. If I do a hard restart, the error_log
> notes that the process had to be forcibly killed:
>
> [Wed May 15 11:41:24 2013] [warn] child process 10057 still did not
> exit, sending a SIGTERM [Wed May 15 11:41:26 2013] [error] child
> process 10057 still did not exit, sending a SIGKILL
>
> If I let apache go unchecked, it will eventually stop passing traffic
> completely and a hard restart is required. Example ps output looks like this:
>
> nobody 24429 20925 0 11:43:59 ? 0:02 /usr/local/apache2/bin/httpd
> -k start
> nobody 9750 20925 0 23:59:02 ? 0:00 /usr/local/apache2/bin/httpd
> -k start
> nobody 20925 2440 0 May 15 ? 3:07 /usr/local/apache2/bin/httpd
> -k start
> nobody 24689 20925 0 11:47:52 ? 0:00 /usr/local/apache2/bin/httpd
> -k start
> nobody 24628 20925 0 11:46:18 ? 0:01 /usr/local/apache2/bin/httpd
> -k start
> nobody 24428 20925 0 11:43:39 ? 0:02 /usr/local/apache2/bin/httpd
> -k start
>
> Note PID 9750 is lingering, doing nothing according to pfiles and truss, and
> its timestamp coincides with the last graceful restart (log rotation). Two
> main differences between this web server and ones that are working include:
>
> a) This is Solaris 11 (vs. Solaris 10)
> b) I have hardened apache by putting it in a Solaris 11 zone, and I'm
> starting apache as the "nobody" user with the net_privaddr privilege so it
> can function as the parent process. It talks to Tomcat on another zone and
> everything works great (other than the problem described here).
>
> Apache has permission to write to /logs, and /log/apache2 is where I set
> these:
>
> JkLogFile /logs/apache2/mod_jk.log
> JkShmFile /logs/apache2/jk-runtime-status
>
> And this.
> PidFile /logs/apache2/run/httpd.pid
>
>
> Can anyone think of a reason why children are not being recycled or getting
> stranded like this over successive graceful restarts? We do use multiple
> listeners, so I don't know if I'm dealing with a locking/mutex/serialization
> type of issue. I'm not a C programmer. There seems to be little info out
> there for Solaris platforms that's recent.
>
> I'd be happy to post more info if needed. I appreciate your time.
What does "pstack" show for such an abandoned child?
Maybe another occurance of
https://issues.apache.org/bugzilla/show_bug.cgi?id=49504.
Regards,
Rainer
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]