Hi Anthony, On 16.04.2009 01:49, Anthony J. Biacco wrote: > A month or so ago I posted that I was having problems with mod_jk > (1.2.27) getting a pong response back from tomcat (6.0.18) in responses > to a ping. Apache is 2.2.11 with worker mpm. > > I have a little more information now and am hoping with help I can solve > the problem so I can keep the ping timeout low and get rid of the error. > Error does not show up when ping timeout is >= 25000.
It sounds like you can easily reproduce on a test system? If yes, despite what I'll write below, increasing the log level to trace presents even more info for us developers, so in case we do another test, please adjust the log level. But debug is fine for now. > My tomcat connectionTimeout is set to 60000 > My jkwatchdoginterval is set to 60 > My relevant workers.properties is: > worker.template.connection_pool_timeout=60 > worker.template.reply_timeout=20000 Not related to your ping observation, but if you really want to set your reply_timeout that small, consider using max_reply_timeouts too. Otherwise one single long running request will throw the whole worker out of the balancing for some time (due to the reply_timeout indicating a failed node). > worker.template.socket_timeout=10 I'm not very much in favor of the socket_timeout, but well, if you think you need it. Just for the sake of completeness, please check, whether having no socket_timeout changes anything about your problem. > worker.template.socket_connect_timeout=5000 > worker.template.ping_mode=A > worker.template.ping_timeout=10000 > > The mod_jk error is: First of all, we note that the process id and thread id in all those log lines is 23222:1085466944. So all lines come from the same thread. There are no lines mixed between two or more concurrent things going on. The log lines reflect some sequential thing going on. > [Wed Apr 15 17:25:06 2009] [23222:1085466944] [debug] > wc_maintain::jk_worker.c (339): Maintaining worker app-01 We start doing internal maintenance, i.e. checking for idle connections, doing ping (if it is the watchdog thread and interval ping is active - it is for your config), adjust some load numbers etc. > [Wed Apr 15 17:25:06 2009] [23222:1085466944] [debug] > ajp_maintain::jk_ajp_common.c (3081): reached pool min size 13 from 25 > cache slots OK, our connection pool will not be checked for more idle connections, because we already reached the configured (=default) minimum. I think here's the culprit. If connections would have been closed due to idleness (connection_pool_timeout), there would have been a message of type "cleaning pool slot=XX elapsed NN in MM". NN would be the idle time, MM how long it took to close the connection. Checking the idleness stopps as soon as we reach the allowed min pool size, which is immediately the case here. > [Wed Apr 15 17:25:06 2009] [23222:1085466944] [debug] > ajp_connection_tcp_send_message::jk_ajp_common.c (1070): sending to > ajp13 pos=4 len=5 max=16 We are doing a Cping/Cpong test > [Wed Apr 15 17:25:06 2009] [23222:1085466944] [debug] > ajp_connection_tcp_send_message::jk_ajp_common.c (1070): 0000 12 34 > 00 01 0A 00 00 00 00 00 00 00 00 00 00 00 - .4.............. Send the Cping packet over the wire > [Wed Apr 15 17:25:06 2009] [23222:1085466944] [debug] > jk_shutdown_socket::jk_connect.c (681): About to shutdown socket 52 This is a consequence of the line marked with (*) Shutting down the connection. > [Wed Apr 15 17:25:06 2009] [23222:1085466944] [debug] > jk_shutdown_socket::jk_connect.c (732): Shutdown socket 52 and read 0 > lingering bytes Yep, we shut it down. > [Wed Apr 15 17:25:06 2009] [23222:1085466944] [info] > ajp_connection_tcp_get_message::jk_ajp_common.c (1143): (app-01) can't > receive the response header message from tomcat, tomcat > (10.10.10.16:8009) has forced a connection close for socket 52 (*) We failed to read the Cpoong response. Note that we are still in the same second, so the wait time did not expire. We must have had some other failure. My expectation: Tomcat closed the connection due to connectionTimeout. mod_jk didn't check the connection_pool_timeout for this connection, because our pool is already small enough (13 conns). Try setting connection_pool_minsize to "0" and check, whether the observation goes away. Even with minsize "0", it can happen rarely, because Tomcat closes the connection at the moment the connectionTimeout is reached, whereas mod_jk needs to run the internal maintenance (once a minute) to check the idle connection timeouts. On a more busy server this happens fewer times, because we try to reuse the connections used most recently, so idle connections tend to not get reused and then detecting them every now and then is enough. > [Wed Apr 15 17:25:06 2009] [23222:1085466944] [info] > ajp_handle_cping_cpong::jk_ajp_common.c (876): awaited reply cpong, not > received Yeah, we saw that in the debug lines. > [Wed Apr 15 17:25:06 2009] [23222:1085466944] [info] > ajp_maintain::jk_ajp_common.c (3101): (app-01) failed sending request, > socket -1 keepalive cping/cpong failure (errno=0) Yeah. > [Wed Apr 15 17:25:06 2009] [23222:1085466944] [debug] > ajp_reset_endpoint::jk_ajp_common.c (743): (app-01) resetting endpoint > with sd = 4294967295 (socket shutdown) Yeah. > [Wed Apr 15 17:25:06 2009] [23222:1085466944] [debug] > ajp_maintain::jk_ajp_common.c (3126): pinged 1 sockets in 0 seconds from > 25 pool slots OK. So all we can see is info level messages. Our usual rule is: as long as you only see info, there's no problem. Any [error] should be analyzed and understood, and the infos can be helpful then. Infos by themselves should be harmless. If my analysis above is right, then yes, the ones you get are harmless. *But*: I don't fully understand, why the problem goes away by switching to a ping timeout of 25000 milliseconds. Maybe the 10 second socket_timeout hides the thing in this case [wild guess]. There is also a relation between the ping_timeout and which connections do get checked during maintenance (the connection_ping_interval). So a longer ping_timeout reduces also the frequency of ping checks, so it should also reduce the frequency of the info messages. Not sure whether the combination of all results in not getting them any longer. Go with connection_pool_minsize=0 and ignore very occasional info level messages. > I send a request to tomcat via apache/mod_jk. Request's mount is mapped > to worker, not loadbalancer. > After successful request, Tomcat manager status shows the socket in > stage 'R' and keeped alive socket count as 1. > After 60 seconds, Tomcat manager status shows keeped alive socket count > as 0, socket still in stage 'R' > Above error happens 2 minutes after a request is finished, I'm assuming > because the keep alive socket is not there anymore. You could have a look at "netstat -an | grep 8009" on the Apache server (or whatever your AJP port is) and see, how the status of the connections changes, e.g. from ESTABLISHED to CLOSE_WAIT (remote side has closed the connection) and TIME_WAIT (local side has also closed the connection). > So I set tomcat's connectionTimeout to 0 and repeat request. > This time Tomcat (after 60 seconds) doesn't reset the keeped alive > socket count to 0 (as expected), socket still in stage 'R'. > 2 minutes later from mod_jk I get: Again everything is happening in one thread serially. > [Wed Apr 15 17:30:08 2009] [23251:1102088512] [debug] > wc_maintain::jk_worker.c (339): Maintaining worker app-01 Starting internal maintenance. > [Wed Apr 15 17:30:08 2009] [23251:1102088512] [debug] > ajp_maintain::jk_ajp_common.c (3081): reached pool min size 13 from 25 > cache slots > [Wed Apr 15 17:30:08 2009] [23251:1102088512] [debug] > ajp_connection_tcp_send_message::jk_ajp_common.c (1070): sending to > ajp13 pos=4 len=5 max=16 Send Cping > [Wed Apr 15 17:30:08 2009] [23251:1102088512] [debug] > ajp_connection_tcp_send_message::jk_ajp_common.c (1070): 0000 12 34 > 00 01 0A 00 00 00 00 00 00 00 00 00 00 00 - .4.............. Putting it on the wire > [Wed Apr 15 17:30:08 2009] [23251:1102088512] [debug] > ajp_connection_tcp_get_message::jk_ajp_common.c (1259): received from > ajp13 pos=0 len=1 max=16 Hey we immediately get an answer > [Wed Apr 15 17:30:08 2009] [23251:1102088512] [debug] > ajp_connection_tcp_get_message::jk_ajp_common.c (1259): 0000 09 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 - ................ Fine it's a Cpong packet > [Wed Apr 15 17:30:08 2009] [23251:1102088512] [debug] > ajp_maintain::jk_ajp_common.c (3126): pinged 1 sockets in 0 seconds from > 25 pool slots We saw that. Since Cping/Cpong worked, we do not even have an info message, only debug messages. > At this point tomcat manager status shows the socket in stage 'P' and > keeped alive socket count is 0. > Above mod_jk log repeats every 2 minutes. Tomcat manager status doesn't > change. Regards, Rainer --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org