Hi Rainer.

Thanks very much for the time you took to analyse the case below.
However, I am still puzzled :

This is indeed a system where there appear to be problems at the level of the connection between the client workstations/browser, and the distant server. Those are the "client write" errors. We are in the process (right, with the network people), of analysing/discussing that (automated tests, Wireshark etc..).

However, it is also a single host with both Apache and mod_jk and Tomcat on the same host. So what I am troubled about is the "cping/cpong" and "sendfull" kind of messages. Are they not strictly between mod_jk and Tomcat ? And if yes, considering these are on the same host, do these messages also indicate that something else is wrong ?

André


Rainer Jung wrote:
Hi André,

On 22.04.2009 18:29, André Warnier wrote:
As (maybe) part of another issue which I am still trying to track down
with the concerned network people (client write errors, Sample 2 below),

Before commenting Sample 1: Although we generally respond to the "client
errors" topic with "Users press reload or click on another link before
answer returns - check whether your responses take to long", we now
observed two cases, where there were real network problems (I generally
avoid to bash the network people, but it is still possible that the
trouble comes from the network):

- once a bad cable
- once a bad driver for a network card used by a VMware guest system
(problems with TCP offloading). TCP offloading problems seem to become a
new trend :(

I find the following kind of messages regularly in the mod_jk logfile
(Sample 1).
I just want to know if this indicates a problem somewhere, or if these
are normal occurrences.
These [info] messages do not always come as often as shown below, but
seem to come with some regularity nevertheless.

Basically, my question is whether this indicates that there might be a
discrepancy between front-end and back(-)end or something of the kind,
or if there is some other parameter to adjust to make these go away.

Apache 2.x prefork MPM
mod_jk 1.2.27
Tomcat 5.5.x
under RHEL, single host

JkLogLevel  info
and
worker.ajp13.port=8009
worker.ajp13.host=localhost
worker.ajp13.type=ajp13
# new options since 1.2.27 :
worker.ajp13.ping_mode=A
(single worker, no load balancing)

back-end Connector :
<Connector port="8009"
 enableLookups="false" redirectPort="8443" protocol="AJP/1.3"
 maxThreads="150" minSpareThreads="25" maxSpareThreads="100"
backlog="100" />

The configuration contains no means for shutting down idle workers.
Neither mod_jk (connection_pool_timeout) nor Tomcat (connectTimeout)
will close idle connections. So it's possible, that an intermediate
system, e.g. firewall drops connections.

Sample 1 :
--------

[Tue Apr 21 13:34:59 2009] GET /myapp/normen/servlet.myapp 200 0.007701
4897
[Tue Apr 21 13:34:59 2009] GET /myapp/normen/servlet.myapp 200 0.004344
1457
[Tue Apr 21 13:34:59 2009] GET /myapp/normen/servlet.myapp 200 0.101926
1945
[Tue Apr 21 13:34:59 2009] GET /myapp/normen/servlet.myapp 200 0.322939
296975
[Tue Apr 21 13:35:14 2009] POST /myapp/normen/servlet.myapp 200 0.032625
1576
[Tue Apr 21 13:35:14 2009] GET /myapp/normen/servlet.myapp 200 0.003084
4176

It would be nicer to put the JK access logging into the access log and
add the pid (process id) there, so you could identify exaclty which
access log line relates to which JK info/error message block. not
critical though.

[Tue Apr 21 13:35:14 2009] [18753:2537034048] [info]
ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for
socket 16 returned -32 (errno=32)
[Tue Apr 21 13:35:14 2009] [18753:2537034048] [info]
ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query
[Tue Apr 21 13:35:14 2009] [18753:2537034048] [info]
ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request,
socket -1 keepalive cping/cpong failure (errno=32)

As you already noticed, errno=32 is a broken pipe error. Sending the
cping packet uses the write() system call. The linux man pages for
write(2) on my systems say:

EPIPE  fd is connected to a pipe or socket whose reading end is closed.
 When this happens the writing process will also receive a SIGPIPE
signal.  (Thus, the write return value is seen only if the program
catches, blocks or ignores this signal.)

So we should have "reading end is closed", which is Tomcat, so the
connection should bw in CLOSE_WAIT from the point of view of mod_jk. I
would expect a RST send by a firewall or similar between mod_jk and
Tomcat to also result in EPIPE.

[Tue Apr 21 13:35:14 2009] GET /myapp/normen/servlet.myapp 200 0.010538
1366
[Tue Apr 21 13:35:14 2009] GET /myapp/normen/servlet.myapp 200 0.002768
2507
[Tue Apr 21 13:35:14 2009] GET /myapp/normen/servlet.myapp 200 0.002658
1999
[Tue Apr 21 13:35:17 2009] POST /myapp/normen/servlet.myapp 200 0.060688
21560
[Tue Apr 21 13:35:19 2009] POST /myapp/normen/servlet.myapp 200 0.017692
23635
[Tue Apr 21 13:35:22 2009] POST /myapp/normen/servlet.myapp 200 1.032645
21400
[Tue Apr 21 13:35:34 2009] [30410:2537034048] [info]
ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for
socket 16 returned -32 (errno=32)
[Tue Apr 21 13:35:34 2009] [30410:2537034048] [info]
ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query
[Tue Apr 21 13:35:34 2009] [30410:2537034048] [info]
ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request,
socket -1 keepalive cping/cpong failure (errno=32)

Same again.

[Tue Apr 21 13:35:34 2009] POST /myapp/normen/servlet.myapp 200 0.071825
20753
[Tue Apr 21 13:35:36 2009] POST /myapp/normen/servlet.myapp 200 0.115782
14137
[Tue Apr 21 13:35:56 2009] [30413:2537034048] [info]
ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for
socket 1 returned -32 (errno=32)
[Tue Apr 21 13:35:56 2009] [30413:2537034048] [info]
ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query
[Tue Apr 21 13:35:56 2009] [30413:2537034048] [info]
ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request,
socket -1 keepalive cping/cpong failure (errno=32)

Again

[Tue Apr 21 13:35:56 2009] POST /myapp/normen/servlet.myapp 200 0.040233
20753
[Tue Apr 21 13:35:57 2009] [29827:2537034048] [info]
ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for
socket 16 returned -32 (errno=32)
[Tue Apr 21 13:35:57 2009] [29827:2537034048] [info]
ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query
[Tue Apr 21 13:35:57 2009] [29827:2537034048] [info]
ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request,
socket -1 keepalive cping/cpong failure (errno=32)

Again

[Tue Apr 21 13:35:57 2009] POST /myapp/normen/servlet.myapp 200 0.036396
12264
[Tue Apr 21 13:36:36 2009] GET /myapp/normen/servlet.myapp 200 0.292587
1409
[Tue Apr 21 13:36:36 2009] GET /myapp/normen/servlet.myapp 200 0.010833
6729
[Tue Apr 21 13:36:36 2009] GET /myapp/normen/servlet.myapp 200 0.015750
3484
[Tue Apr 21 13:36:36 2009] GET /myapp/normen/servlet.myapp 200 0.032983
2915

Note that all 4 cases were logged by a diffrent PID (process id), and in
the prefork case, each process has its own pool of size 1. So 4
different processes experienced a connection closed by the backend or
something in between. If you would log the JK access data and the PID in
the apache access log, you could verify, when those processes used the
backend connection the last time previously to verify the idleness theory.

Sample 2 : (for illustration of the other issue only)
--------
[Tue Apr 21 13:15:32 2009] GET /myapp/normen/servlet.myapp 200 0.016737
6729
[Tue Apr 21 13:15:32 2009] GET /myapp/normen/servlet.myapp 200 0.019133
3484
[Tue Apr 21 13:15:32 2009] GET /myapp/normen/servlet.myapp 200 0.010458
2915
[Tue Apr 21 13:15:32 2009] GET /myapp/normen/servlet.myapp 200 0.112756
18897
[Tue Apr 21 13:15:44 2009] [29831:2537034048] [info]
ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for
socket 1 returned -32 (errno=32)
[Tue Apr 21 13:15:44 2009] [29831:2537034048] [info]
ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query
[Tue Apr 21 13:15:44 2009] [29831:2537034048] [info]
ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request,
socket -1 keepalive cping/cpong failure (errno=32)

The same for pid 29831

[Tue Apr 21 13:15:44 2009] [30401:2537034048] [info]
ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for
socket 17 returned -32 (errno=32)
[Tue Apr 21 13:15:44 2009] [30401:2537034048] [info]
ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query
[Tue Apr 21 13:15:44 2009] [30401:2537034048] [info]
ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request,
socket -1 keepalive cping/cpong failure (errno=32)

pid 30401

[Tue Apr 21 13:15:49 2009] [29827:2537034048] [info]
ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for
socket 16 returned -32 (errno=32)
[Tue Apr 21 13:15:49 2009] [29827:2537034048] [info]
ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query
[Tue Apr 21 13:15:49 2009] [29827:2537034048] [info]
ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request,
socket -1 keepalive cping/cpong failure (errno=32)

pid 29827

[Tue Apr 21 13:15:49 2009] [29831:2537034048] [info]
ajp_process_callback::jk_ajp_common.c (1748): Writing to client aborted
or client network problems
[Tue Apr 21 13:15:49 2009] [29831:2537034048] [info]
ajp_service::jk_ajp_common.c (2407): (ajp13) sending request to tomcat
failed (unrecoverable), because of client write error (attempt=1)

Now a client write error, again pid 29831. Possibly, the cping write
error above too a long time to be detected and that's why the response
retrieved via a fresh connection wasn't consumed by the client? Although
the below response times seem to tell us, that the longest request in
the snippet here took 9 seconds, which is not that long.

There is no mod_jk internal relation between the cping send failure and
a client write error. It could only be in some indirect way, like both
being caused by a general network problem (occuring on the path to the
browser and on the path to the backend - but then why don't we see lots
of other communication errors as well?) or by some timing issue
triggering user behaviour - not very likely in light of the blow
duration lines.

[Tue Apr 21 13:15:49 2009] POST /myapp/normen/servlet.myapp 200 5.210358
8184
[Tue Apr 21 13:15:49 2009] [29831:2537034048] [info]
jk_handler::mod_jk.c (2469): Aborting connection for worker=ajp13

This is a consequence of the client write error, so next time this
process will open a fresh connection to the backend.

[Tue Apr 21 13:15:49 2009] POST /myapp/normen/servlet.myapp 200 0.363977
20753
[Tue Apr 21 13:15:54 2009] POST /myapp/normen/servlet.myapp 200 9.057331
20755
[Tue Apr 21 13:15:55 2009] POST /myapp/normen/servlet.myapp 200 0.215937
14166

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to