I am having some trouble keeping a mod_jk setup stable.  At this point, I
feel like I am too far into trial and error mode and would like some help
figuring out how to identify the problem.

My current setup involves, two linux (RHEL 5) server each running two tomcat
instances (6.0.20).  A third RHEL 5 box is running apache (2.2.3) with
mod_jk(1.2.28).  I am using terracotta to "cluster" the tomcat sessions.

The problem that I am having is that under small load (and unfortunately,
intermittently), I get random nodes that produce errors.  Typically these
errors indicate that mod_jk can no longer contact tomcat (see excerpts
below).  In most cases, the the user request just hangs (never returns). 
So, it also appears that the errors are not causing a session failover --
though I need to confirm that again after my recent round of changes.  In
most cases, these nodes that are in error recover on their own.  However,
during the failure event, I get a bunch of unhappy users.  I am hoping to
find a way to make the nodes more stable and then address the fail-over
aspect.

I have tried different mod_jk parameters and think I have settled on a
decent set of them.  I have all of the garbage collection information
logging out and do not seem to have any gc events that are taking longer
than the request timeout.  I am gathering jvm and os stats and do not see a
hardware constraint (memory, cpu, io).  So, I am a bit of a loss on where to
look.

I am pasting in all of the relevant files/excerpts that I can think of.  I
appreciate any advice on what additional data to gather to shed light on
this problem (outright solutions are welcome too :)).

Please let me know if there is any other information that would be helpful.

Thanx,
LES


************* workers.properties **************
# Define 1 real worker using ajp13
worker.list=lb,jkstatus,cas
# Set properties for worker1 (ajp13)
worker.template.type=ajp13
worker.template.retries=4
worker.template.lbfactor=1
worker.template.reply_timeout=300000
worker.template.max_reply_timeouts=4
worker.template.connection_pool_timeout=60
worker.template.ping_mode=A
#worker.template.socket_timeout=10
worker.template.socket_connect_timeout=10

worker.tomcat01-instance1.reference=worker.template
worker.tomcat01-instance1.host=tomcat01.barnhardt.local
worker.tomcat01-instance1.port=8009

worker.tomcat01-instance2.reference=worker.template
worker.tomcat01-instance2.host=tomcat01.barnhardt.local
worker.tomcat01-instance2.port=18009

worker.tomcat02-instance1.reference=worker.template
worker.tomcat02-instance1.host=tomcat02.barnhardt.local
worker.tomcat02-instance1.port=8009

worker.tomcat02-instance2.reference=worker.template
worker.tomcat02-instance2.host=tomcat02.barnhardt.local
worker.tomcat02-instance2.port=18009

worker.cas.type=ajp13
worker.cas.host=localhost
worker.cas.port=8009
worker.cas.lbfactor=1
worker.cas.connection_pool_timeout=600
worker.cas.socket_keepalive=1
worker.cas.socket_timeout=60

# Set properties for lb which use the other workers
worker.lb.type=lb
#worker.lb.method=B
worker.lb.sticky_session=True
worker.lb.balance_workers=tomcat01-instance1,tomcat01-instance2,tomcat02-instance1,tomcat02-instance2

# Define a 'jkstatus' worker using status
worker.jkstatus.type=status
***********************************************


****** Errors from log *******

//////This particular error(info) seems to happen constantly - is it a
normal operational thing?
[Mon May 24 10:22:56 2010] [26131:4045374208] [info]
ajp_send_request::jk_ajp_common.c (1496): (tomcat02-instance2) all endpoints
are disconnected, detected by connect check (1), cping (0), send (0)
[Mon May 24 11:55:21 2010] [2711:4045374208] [info]
ajp_send_request::jk_ajp_common.c (1496): (tomcat02-instance1) all endpoints
are disconnected, detected by connect check (1), cping (0), send (0)
[Mon May 24 13:08:25 2010] [27439:4045374208] [info]
ajp_send_request::jk_ajp_common.c (1496): (tomcat01-instance1) all endpoints
are disconnected, detected by connect check (1), cping (0), send (0)

////This error happens intermittently and seems to cause some the the
cluster problems I mentioned above
[Mon May 24 07:19:21 2010] [27432:4045374208] [error]
ajp_get_reply::jk_ajp_common.c (1926): (tomcat01-instance2) Timeout with
waiting reply from tomcat. Tomcat is down, stopped or network problems
(errno=110)
[Mon May 24 07:19:23 2010] [27432:4045374208] [info]
ajp_service::jk_ajp_common.c (2447): (tomcat01-instance2) sending request to
tomcat failed (recoverable), because of reply timeout (attempt=1)
[Mon May 24 07:24:23 2010] [27432:4045374208] [error]
ajp_get_reply::jk_ajp_common.c (1926): (tomcat01-instance2) Timeout with
waiting reply from tomcat. Tomcat is down, stopped or network problems
(errno=110)
[Mon May 24 07:24:25 2010] [27432:4045374208] [info]
ajp_service::jk_ajp_common.c (2447): (tomcat01-instance2) sending request to
tomcat failed (recoverable), because of reply timeout (attempt=2)

////I get this error occassionally, too
[Sun May 23 03:48:51 2010] [15814:4045374208] [info]
jk_open_socket::jk_connect.c (594): connect to 192.168.60.157:8009 failed
(errno=115)
[Sun May 23 03:48:51 2010] [15814:4045374208] [info]
ajp_connect_to_endpoint::jk_ajp_common.c (922): Failed opening socket to
(192.168.60.157:8009) (errno=115)
[Sun May 23 03:48:51 2010] [15814:4045374208] [error]
ajp_send_request::jk_ajp_common.c (1507): (tomcat02-instance1) connecting to
backend failed. Tomcat is probably not started or is listening on the wrong
port (errno=115)

////Third time is a charm...another error for the hat trick
[Sat May 22 21:41:17 2010] [13933:4045374208] [info]
ajp_connection_tcp_get_message::jk_ajp_common.c (1150): (tomcat01-instance1)
can't receive the response header message from tomcat, network problems or
tomcat (192.168.60.156:8009) is down (errno=104)



-- 
View this message in context: 
http://old.nabble.com/mod_jk-stability-issues-tp28662097p28662097.html
Sent from the Tomcat - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to