Re: mod_jk 1.2.28 errors

Rainer Jung Thu, 28 Oct 2010 03:09:36 -0700

On 26.10.2010 01:05, Hannaoui, Mo wrote:

1. When there are>  30 http connections, I see the error below almost
every 1 minute. As the traffic and the number of connections increase,
the frequency of error increases and the performance of the web
application that is being hosted on the system decreases.




[Mon Oct 25 20:59:42 2010][11224:3086337808] [info]
ajp_process_callback::jk_ajp_common.c (1882): Writing to client aborted
or client network problems

This error tells us, that during sending the response back to thebrowser a problem was detected. Most likely a connction abort or similar.

This will happen every now and then when users do not wait for aresponse and instead proceed clicking on other links. If it happens tooften, then it might indicate that either your application is notresponsive enough, so users have a reason to start clicking whilewaiting, or you might have an infrastrutural problem on the way back tothe browser.

Note that the messages are only flagged as "[info]", because as saidoccasional occurence is not problematic.

If you want to decie, whether this is happening due to bad performance,you should

- add "%P %{tid}P %D" to your LogFormat for the Apache access log. Thiswill log the process id, the thread id (for prefork MPM that's always"1") and the duration in microseconds. You can use the pid and tid tocorrelate with the jk log messages. In the jk log line it is"[11224:3086337808]", the irst number is the pid, the second the tid.

Note that the timestamp in the access log is when the request started,the time stamp in the JK log is when the response was detected asbroken. The delta should be roughly what is being loged as %D. Choose acouple of occurances, find the counterparts in the access log and see,whether they tok especially long. You can also look at what are theURLs, the user agents, the client IPs etc., all via the access log.

- add an access log to Tomcat and do not forget to add %D to the logpattern as well. Check whether the same, likely long running requestsalso take long according to the Tomcat access log. Note that %D forTomcat logs milliseconds, not microseconds like for Apache.

If you find many examples, where Tomcat logs a short time and Apache along time, then you likely have a network/firwall/load-balancer/whateverproblem between Apache and the browser. Especially if file sizes are nothuge. In that case Tomcat will be able to stream back to Apache, whichwill be able to put all of the response in the TCP send buffer, butApache will nevertheless log the error, if the content finally can notbe transmitted.

- next you can start sniffing to find out, what actually was the rootcase from the point of view of Apache, e.g. whether a reset was sent bythe client. I did run into cases, where security devices every now andthen reset foreign connections for which they thought they looked likean attack. Easy to detect with a network sniff: in that case the MACaddress from which the reset was sent was different form the mac addressthat sent the rest of the connection packets.

- finally you can try to work your way close to the browser by doingsniffs further up th enetwork.

2. The number of connections will suddenly surge from say 40 to 90 to
~200 in no time, at which point all I see in mod_jk.log is error
messages and the application either stops responding with the connection
refused or bad gateway error. To fix the problem the Jboss service
usually needs to be restated. This surge is unpredictable and may happen
between 1 and 5 times in 24 hours.


This indicates a prformance problem with the app (or GC problems).
Observed concurrance is roughly:

concurrancy = requests per second * average response time

If the concurrancy spikes, it is usualy actually the response time thatspikes. Add "%D" to the access logs to verify.

If so, start doing Java thread dumps to analyze what's happening inJBoss. Also look at per thread CPU load using ps to check, whether thereare special thrads that take to much CPU. Finally check GC activity.

I have read many posts and documents (including
http://kbase.redhat.com/faq/docs/DOC-15866 and used
http://lbconfig.appspot.com/ for base configurations) and changed the
configurations many times, but the problem continues to exist. I think
my current configuration is the worst version so far. It works well only
with low traffic.

Here's the current configuration:

--- workers.properties ----

...

worker.template.reply_timeout=30000

Might be a bit short. Check with your %D logged values. Please do alsoadd max_reply_timeouts to your load balancer.

...

worker.template.socket_timeout=10

I personally don't like the general socket_timeout. I do like the morefine-grained individual timeouts.

The source download of mod_jk 1.2.30 contains a well-documented exampleconfiguration (1.2.28 does not). Further "official" notes about timeoutsare available at:


http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: mod_jk 1.2.28 errors

Reply via email to