On 26.10.2010 01:05, Hannaoui, Mo wrote:
1. When there are>  30 http connections, I see the error below almost
every 1 minute. As the traffic and the number of connections increase,
the frequency of error increases and the performance of the web
application that is being hosted on the system decreases.



[Mon Oct 25 20:59:42 2010][11224:3086337808] [info]
ajp_process_callback::jk_ajp_common.c (1882): Writing to client aborted
or client network problems

This error tells us, that during sending the response back to the browser a problem was detected. Most likely a connction abort or similar.

This will happen every now and then when users do not wait for a response and instead proceed clicking on other links. If it happens to often, then it might indicate that either your application is not responsive enough, so users have a reason to start clicking while waiting, or you might have an infrastrutural problem on the way back to the browser.

Note that the messages are only flagged as "[info]", because as said occasional occurence is not problematic.

If you want to decie, whether this is happening due to bad performance, you should

- add "%P %{tid}P %D" to your LogFormat for the Apache access log. This will log the process id, the thread id (for prefork MPM that's always "1") and the duration in microseconds. You can use the pid and tid to correlate with the jk log messages. In the jk log line it is "[11224:3086337808]", the irst number is the pid, the second the tid.

Note that the timestamp in the access log is when the request started, the time stamp in the JK log is when the response was detected as broken. The delta should be roughly what is being loged as %D. Choose a couple of occurances, find the counterparts in the access log and see, whether they tok especially long. You can also look at what are the URLs, the user agents, the client IPs etc., all via the access log.

- add an access log to Tomcat and do not forget to add %D to the log pattern as well. Check whether the same, likely long running requests also take long according to the Tomcat access log. Note that %D for Tomcat logs milliseconds, not microseconds like for Apache.

If you find many examples, where Tomcat logs a short time and Apache a long time, then you likely have a network/firwall/load-balancer/whatever problem between Apache and the browser. Especially if file sizes are not huge. In that case Tomcat will be able to stream back to Apache, which will be able to put all of the response in the TCP send buffer, but Apache will nevertheless log the error, if the content finally can not be transmitted.

- next you can start sniffing to find out, what actually was the root case from the point of view of Apache, e.g. whether a reset was sent by the client. I did run into cases, where security devices every now and then reset foreign connections for which they thought they looked like an attack. Easy to detect with a network sniff: in that case the MAC address from which the reset was sent was different form the mac address that sent the rest of the connection packets.

- finally you can try to work your way close to the browser by doing sniffs further up th enetwork.

2. The number of connections will suddenly surge from say 40 to 90 to
~200 in no time, at which point all I see in mod_jk.log is error
messages and the application either stops responding with the connection
refused or bad gateway error. To fix the problem the Jboss service
usually needs to be restated. This surge is unpredictable and may happen
between 1 and 5 times in 24 hours.

This indicates a prformance problem with the app (or GC problems).
Observed concurrance is roughly:

concurrancy = requests per second * average response time

If the concurrancy spikes, it is usualy actually the response time that spikes. Add "%D" to the access logs to verify.

If so, start doing Java thread dumps to analyze what's happening in JBoss. Also look at per thread CPU load using ps to check, whether there are special thrads that take to much CPU. Finally check GC activity.

I have read many posts and documents (including
http://kbase.redhat.com/faq/docs/DOC-15866 and used
http://lbconfig.appspot.com/ for base configurations) and changed the
configurations many times, but the problem continues to exist. I think
my current configuration is the worst version so far. It works well only
with low traffic.

Here's the current configuration:

--- workers.properties ----
...


worker.template.reply_timeout=30000

Might be a bit short. Check with your %D logged values. Please do also add max_reply_timeouts to your load balancer.

...

worker.template.socket_timeout=10

I personally don't like the general socket_timeout. I do like the more fine-grained individual timeouts.

The source download of mod_jk 1.2.30 contains a well-documented example configuration (1.2.28 does not). Further "official" notes about timeouts are available at:

http://tomcat.apache.org/connectors-doc/generic_howto/timeouts.html

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to