Hi André,

On 22.04.2009 18:29, André Warnier wrote:
> As (maybe) part of another issue which I am still trying to track down
> with the concerned network people (client write errors, Sample 2 below),

Before commenting Sample 1: Although we generally respond to the "client
errors" topic with "Users press reload or click on another link before
answer returns - check whether your responses take to long", we now
observed two cases, where there were real network problems (I generally
avoid to bash the network people, but it is still possible that the
trouble comes from the network):

- once a bad cable
- once a bad driver for a network card used by a VMware guest system
(problems with TCP offloading). TCP offloading problems seem to become a
new trend :(

> I find the following kind of messages regularly in the mod_jk logfile
> (Sample 1).
> I just want to know if this indicates a problem somewhere, or if these
> are normal occurrences.
> These [info] messages do not always come as often as shown below, but
> seem to come with some regularity nevertheless.
> 
> Basically, my question is whether this indicates that there might be a
> discrepancy between front-end and back(-)end or something of the kind,
> or if there is some other parameter to adjust to make these go away.
> 
> Apache 2.x prefork MPM
> mod_jk 1.2.27
> Tomcat 5.5.x
> under RHEL, single host
> 
> JkLogLevel  info
> and
> worker.ajp13.port=8009
> worker.ajp13.host=localhost
> worker.ajp13.type=ajp13
> # new options since 1.2.27 :
> worker.ajp13.ping_mode=A
> (single worker, no load balancing)
> 
> back-end Connector :
> <Connector port="8009"
>  enableLookups="false" redirectPort="8443" protocol="AJP/1.3"
>  maxThreads="150" minSpareThreads="25" maxSpareThreads="100"
> backlog="100" />

The configuration contains no means for shutting down idle workers.
Neither mod_jk (connection_pool_timeout) nor Tomcat (connectTimeout)
will close idle connections. So it's possible, that an intermediate
system, e.g. firewall drops connections.

> Sample 1 :
> --------
> 
> [Tue Apr 21 13:34:59 2009] GET /myapp/normen/servlet.myapp 200 0.007701
> 4897
> [Tue Apr 21 13:34:59 2009] GET /myapp/normen/servlet.myapp 200 0.004344
> 1457
> [Tue Apr 21 13:34:59 2009] GET /myapp/normen/servlet.myapp 200 0.101926
> 1945
> [Tue Apr 21 13:34:59 2009] GET /myapp/normen/servlet.myapp 200 0.322939
> 296975
> [Tue Apr 21 13:35:14 2009] POST /myapp/normen/servlet.myapp 200 0.032625
> 1576
> [Tue Apr 21 13:35:14 2009] GET /myapp/normen/servlet.myapp 200 0.003084
> 4176

It would be nicer to put the JK access logging into the access log and
add the pid (process id) there, so you could identify exaclty which
access log line relates to which JK info/error message block. not
critical though.

> [Tue Apr 21 13:35:14 2009] [18753:2537034048] [info]
> ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for
> socket 16 returned -32 (errno=32)
> [Tue Apr 21 13:35:14 2009] [18753:2537034048] [info]
> ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query
> [Tue Apr 21 13:35:14 2009] [18753:2537034048] [info]
> ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request,
> socket -1 keepalive cping/cpong failure (errno=32)

As you already noticed, errno=32 is a broken pipe error. Sending the
cping packet uses the write() system call. The linux man pages for
write(2) on my systems say:

EPIPE  fd is connected to a pipe or socket whose reading end is closed.
 When this happens the writing process will also receive a SIGPIPE
signal.  (Thus, the write return value is seen only if the program
catches, blocks or ignores this signal.)

So we should have "reading end is closed", which is Tomcat, so the
connection should bw in CLOSE_WAIT from the point of view of mod_jk. I
would expect a RST send by a firewall or similar between mod_jk and
Tomcat to also result in EPIPE.

> [Tue Apr 21 13:35:14 2009] GET /myapp/normen/servlet.myapp 200 0.010538
> 1366
> [Tue Apr 21 13:35:14 2009] GET /myapp/normen/servlet.myapp 200 0.002768
> 2507
> [Tue Apr 21 13:35:14 2009] GET /myapp/normen/servlet.myapp 200 0.002658
> 1999
> [Tue Apr 21 13:35:17 2009] POST /myapp/normen/servlet.myapp 200 0.060688
> 21560
> [Tue Apr 21 13:35:19 2009] POST /myapp/normen/servlet.myapp 200 0.017692
> 23635
> [Tue Apr 21 13:35:22 2009] POST /myapp/normen/servlet.myapp 200 1.032645
> 21400
> [Tue Apr 21 13:35:34 2009] [30410:2537034048] [info]
> ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for
> socket 16 returned -32 (errno=32)
> [Tue Apr 21 13:35:34 2009] [30410:2537034048] [info]
> ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query
> [Tue Apr 21 13:35:34 2009] [30410:2537034048] [info]
> ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request,
> socket -1 keepalive cping/cpong failure (errno=32)

Same again.

> [Tue Apr 21 13:35:34 2009] POST /myapp/normen/servlet.myapp 200 0.071825
> 20753
> [Tue Apr 21 13:35:36 2009] POST /myapp/normen/servlet.myapp 200 0.115782
> 14137
> [Tue Apr 21 13:35:56 2009] [30413:2537034048] [info]
> ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for
> socket 1 returned -32 (errno=32)
> [Tue Apr 21 13:35:56 2009] [30413:2537034048] [info]
> ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query
> [Tue Apr 21 13:35:56 2009] [30413:2537034048] [info]
> ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request,
> socket -1 keepalive cping/cpong failure (errno=32)

Again

> [Tue Apr 21 13:35:56 2009] POST /myapp/normen/servlet.myapp 200 0.040233
> 20753
> [Tue Apr 21 13:35:57 2009] [29827:2537034048] [info]
> ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for
> socket 16 returned -32 (errno=32)
> [Tue Apr 21 13:35:57 2009] [29827:2537034048] [info]
> ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query
> [Tue Apr 21 13:35:57 2009] [29827:2537034048] [info]
> ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request,
> socket -1 keepalive cping/cpong failure (errno=32)

Again

> [Tue Apr 21 13:35:57 2009] POST /myapp/normen/servlet.myapp 200 0.036396
> 12264
> [Tue Apr 21 13:36:36 2009] GET /myapp/normen/servlet.myapp 200 0.292587
> 1409
> [Tue Apr 21 13:36:36 2009] GET /myapp/normen/servlet.myapp 200 0.010833
> 6729
> [Tue Apr 21 13:36:36 2009] GET /myapp/normen/servlet.myapp 200 0.015750
> 3484
> [Tue Apr 21 13:36:36 2009] GET /myapp/normen/servlet.myapp 200 0.032983
> 2915

Note that all 4 cases were logged by a diffrent PID (process id), and in
the prefork case, each process has its own pool of size 1. So 4
different processes experienced a connection closed by the backend or
something in between. If you would log the JK access data and the PID in
the apache access log, you could verify, when those processes used the
backend connection the last time previously to verify the idleness theory.

> Sample 2 : (for illustration of the other issue only)
> --------
> [Tue Apr 21 13:15:32 2009] GET /myapp/normen/servlet.myapp 200 0.016737
> 6729
> [Tue Apr 21 13:15:32 2009] GET /myapp/normen/servlet.myapp 200 0.019133
> 3484
> [Tue Apr 21 13:15:32 2009] GET /myapp/normen/servlet.myapp 200 0.010458
> 2915
> [Tue Apr 21 13:15:32 2009] GET /myapp/normen/servlet.myapp 200 0.112756
> 18897
> [Tue Apr 21 13:15:44 2009] [29831:2537034048] [info]
> ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for
> socket 1 returned -32 (errno=32)
> [Tue Apr 21 13:15:44 2009] [29831:2537034048] [info]
> ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query
> [Tue Apr 21 13:15:44 2009] [29831:2537034048] [info]
> ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request,
> socket -1 keepalive cping/cpong failure (errno=32)

The same for pid 29831

> [Tue Apr 21 13:15:44 2009] [30401:2537034048] [info]
> ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for
> socket 17 returned -32 (errno=32)
> [Tue Apr 21 13:15:44 2009] [30401:2537034048] [info]
> ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query
> [Tue Apr 21 13:15:44 2009] [30401:2537034048] [info]
> ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request,
> socket -1 keepalive cping/cpong failure (errno=32)

pid 30401

> [Tue Apr 21 13:15:49 2009] [29827:2537034048] [info]
> ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for
> socket 16 returned -32 (errno=32)
> [Tue Apr 21 13:15:49 2009] [29827:2537034048] [info]
> ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query
> [Tue Apr 21 13:15:49 2009] [29827:2537034048] [info]
> ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request,
> socket -1 keepalive cping/cpong failure (errno=32)

pid 29827

> [Tue Apr 21 13:15:49 2009] [29831:2537034048] [info]
> ajp_process_callback::jk_ajp_common.c (1748): Writing to client aborted
> or client network problems
> [Tue Apr 21 13:15:49 2009] [29831:2537034048] [info]
> ajp_service::jk_ajp_common.c (2407): (ajp13) sending request to tomcat
> failed (unrecoverable), because of client write error (attempt=1)

Now a client write error, again pid 29831. Possibly, the cping write
error above too a long time to be detected and that's why the response
retrieved via a fresh connection wasn't consumed by the client? Although
the below response times seem to tell us, that the longest request in
the snippet here took 9 seconds, which is not that long.

There is no mod_jk internal relation between the cping send failure and
a client write error. It could only be in some indirect way, like both
being caused by a general network problem (occuring on the path to the
browser and on the path to the backend - but then why don't we see lots
of other communication errors as well?) or by some timing issue
triggering user behaviour - not very likely in light of the blow
duration lines.

> [Tue Apr 21 13:15:49 2009] POST /myapp/normen/servlet.myapp 200 5.210358
> 8184
> [Tue Apr 21 13:15:49 2009] [29831:2537034048] [info]
> jk_handler::mod_jk.c (2469): Aborting connection for worker=ajp13

This is a consequence of the client write error, so next time this
process will open a fresh connection to the backend.

> [Tue Apr 21 13:15:49 2009] POST /myapp/normen/servlet.myapp 200 0.363977
> 20753
> [Tue Apr 21 13:15:54 2009] POST /myapp/normen/servlet.myapp 200 9.057331
> 20755
> [Tue Apr 21 13:15:55 2009] POST /myapp/normen/servlet.myapp 200 0.215937
> 14166

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to