Hi André, On 25.04.2009 13:58, André Warnier wrote: > Hi Rainer. > > Thanks very much for the time you took to analyse the case below. > However, I am still puzzled : > > This is indeed a system where there appear to be problems at the level > of the connection between the client workstations/browser, and the > distant server. Those are the "client write" errors. > We are in the process (right, with the network people), of > analysing/discussing that (automated tests, Wireshark etc..). > > However, it is also a single host with both Apache and mod_jk and Tomcat > on the same host. So what I am troubled about is the "cping/cpong" and > "sendfull" kind of messages. Are they not strictly between mod_jk and > Tomcat ? And if yes, considering these are on the same host, do these > messages also indicate that something else is wrong ?
They are strictly between Apache and Tomcat. "write" failed during sending cping from Apache to Tomcat overa previously established connection, and EPIPE should indicate a remote socket/connection close (or maybe reset?). Maybe Tomcat has a default connectTimeout? Not sure at the moment. Regards, Rainer > Rainer Jung wrote: >> Hi André, >> >> On 22.04.2009 18:29, André Warnier wrote: >>> As (maybe) part of another issue which I am still trying to track down >>> with the concerned network people (client write errors, Sample 2 below), >> >> Before commenting Sample 1: Although we generally respond to the "client >> errors" topic with "Users press reload or click on another link before >> answer returns - check whether your responses take to long", we now >> observed two cases, where there were real network problems (I generally >> avoid to bash the network people, but it is still possible that the >> trouble comes from the network): >> >> - once a bad cable >> - once a bad driver for a network card used by a VMware guest system >> (problems with TCP offloading). TCP offloading problems seem to become a >> new trend :( >> >>> I find the following kind of messages regularly in the mod_jk logfile >>> (Sample 1). >>> I just want to know if this indicates a problem somewhere, or if these >>> are normal occurrences. >>> These [info] messages do not always come as often as shown below, but >>> seem to come with some regularity nevertheless. >>> >>> Basically, my question is whether this indicates that there might be a >>> discrepancy between front-end and back(-)end or something of the kind, >>> or if there is some other parameter to adjust to make these go away. >>> >>> Apache 2.x prefork MPM >>> mod_jk 1.2.27 >>> Tomcat 5.5.x >>> under RHEL, single host >>> >>> JkLogLevel info >>> and >>> worker.ajp13.port=8009 >>> worker.ajp13.host=localhost >>> worker.ajp13.type=ajp13 >>> # new options since 1.2.27 : >>> worker.ajp13.ping_mode=A >>> (single worker, no load balancing) >>> >>> back-end Connector : >>> <Connector port="8009" >>> enableLookups="false" redirectPort="8443" protocol="AJP/1.3" >>> maxThreads="150" minSpareThreads="25" maxSpareThreads="100" >>> backlog="100" /> >> >> The configuration contains no means for shutting down idle workers. >> Neither mod_jk (connection_pool_timeout) nor Tomcat (connectTimeout) >> will close idle connections. So it's possible, that an intermediate >> system, e.g. firewall drops connections. >> >>> Sample 1 : >>> -------- >>> >>> [Tue Apr 21 13:34:59 2009] GET /myapp/normen/servlet.myapp 200 0.007701 >>> 4897 >>> [Tue Apr 21 13:34:59 2009] GET /myapp/normen/servlet.myapp 200 0.004344 >>> 1457 >>> [Tue Apr 21 13:34:59 2009] GET /myapp/normen/servlet.myapp 200 0.101926 >>> 1945 >>> [Tue Apr 21 13:34:59 2009] GET /myapp/normen/servlet.myapp 200 0.322939 >>> 296975 >>> [Tue Apr 21 13:35:14 2009] POST /myapp/normen/servlet.myapp 200 0.032625 >>> 1576 >>> [Tue Apr 21 13:35:14 2009] GET /myapp/normen/servlet.myapp 200 0.003084 >>> 4176 >> >> It would be nicer to put the JK access logging into the access log and >> add the pid (process id) there, so you could identify exaclty which >> access log line relates to which JK info/error message block. not >> critical though. >> >>> [Tue Apr 21 13:35:14 2009] [18753:2537034048] [info] >>> ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for >>> socket 16 returned -32 (errno=32) >>> [Tue Apr 21 13:35:14 2009] [18753:2537034048] [info] >>> ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query >>> [Tue Apr 21 13:35:14 2009] [18753:2537034048] [info] >>> ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request, >>> socket -1 keepalive cping/cpong failure (errno=32) >> >> As you already noticed, errno=32 is a broken pipe error. Sending the >> cping packet uses the write() system call. The linux man pages for >> write(2) on my systems say: >> >> EPIPE fd is connected to a pipe or socket whose reading end is closed. >> When this happens the writing process will also receive a SIGPIPE >> signal. (Thus, the write return value is seen only if the program >> catches, blocks or ignores this signal.) >> >> So we should have "reading end is closed", which is Tomcat, so the >> connection should bw in CLOSE_WAIT from the point of view of mod_jk. I >> would expect a RST send by a firewall or similar between mod_jk and >> Tomcat to also result in EPIPE. >> >>> [Tue Apr 21 13:35:14 2009] GET /myapp/normen/servlet.myapp 200 0.010538 >>> 1366 >>> [Tue Apr 21 13:35:14 2009] GET /myapp/normen/servlet.myapp 200 0.002768 >>> 2507 >>> [Tue Apr 21 13:35:14 2009] GET /myapp/normen/servlet.myapp 200 0.002658 >>> 1999 >>> [Tue Apr 21 13:35:17 2009] POST /myapp/normen/servlet.myapp 200 0.060688 >>> 21560 >>> [Tue Apr 21 13:35:19 2009] POST /myapp/normen/servlet.myapp 200 0.017692 >>> 23635 >>> [Tue Apr 21 13:35:22 2009] POST /myapp/normen/servlet.myapp 200 1.032645 >>> 21400 >>> [Tue Apr 21 13:35:34 2009] [30410:2537034048] [info] >>> ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for >>> socket 16 returned -32 (errno=32) >>> [Tue Apr 21 13:35:34 2009] [30410:2537034048] [info] >>> ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query >>> [Tue Apr 21 13:35:34 2009] [30410:2537034048] [info] >>> ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request, >>> socket -1 keepalive cping/cpong failure (errno=32) >> >> Same again. >> >>> [Tue Apr 21 13:35:34 2009] POST /myapp/normen/servlet.myapp 200 0.071825 >>> 20753 >>> [Tue Apr 21 13:35:36 2009] POST /myapp/normen/servlet.myapp 200 0.115782 >>> 14137 >>> [Tue Apr 21 13:35:56 2009] [30413:2537034048] [info] >>> ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for >>> socket 1 returned -32 (errno=32) >>> [Tue Apr 21 13:35:56 2009] [30413:2537034048] [info] >>> ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query >>> [Tue Apr 21 13:35:56 2009] [30413:2537034048] [info] >>> ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request, >>> socket -1 keepalive cping/cpong failure (errno=32) >> >> Again >> >>> [Tue Apr 21 13:35:56 2009] POST /myapp/normen/servlet.myapp 200 0.040233 >>> 20753 >>> [Tue Apr 21 13:35:57 2009] [29827:2537034048] [info] >>> ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for >>> socket 16 returned -32 (errno=32) >>> [Tue Apr 21 13:35:57 2009] [29827:2537034048] [info] >>> ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query >>> [Tue Apr 21 13:35:57 2009] [29827:2537034048] [info] >>> ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request, >>> socket -1 keepalive cping/cpong failure (errno=32) >> >> Again >> >>> [Tue Apr 21 13:35:57 2009] POST /myapp/normen/servlet.myapp 200 0.036396 >>> 12264 >>> [Tue Apr 21 13:36:36 2009] GET /myapp/normen/servlet.myapp 200 0.292587 >>> 1409 >>> [Tue Apr 21 13:36:36 2009] GET /myapp/normen/servlet.myapp 200 0.010833 >>> 6729 >>> [Tue Apr 21 13:36:36 2009] GET /myapp/normen/servlet.myapp 200 0.015750 >>> 3484 >>> [Tue Apr 21 13:36:36 2009] GET /myapp/normen/servlet.myapp 200 0.032983 >>> 2915 >> >> Note that all 4 cases were logged by a diffrent PID (process id), and in >> the prefork case, each process has its own pool of size 1. So 4 >> different processes experienced a connection closed by the backend or >> something in between. If you would log the JK access data and the PID in >> the apache access log, you could verify, when those processes used the >> backend connection the last time previously to verify the idleness >> theory. >> >>> Sample 2 : (for illustration of the other issue only) >>> -------- >>> [Tue Apr 21 13:15:32 2009] GET /myapp/normen/servlet.myapp 200 0.016737 >>> 6729 >>> [Tue Apr 21 13:15:32 2009] GET /myapp/normen/servlet.myapp 200 0.019133 >>> 3484 >>> [Tue Apr 21 13:15:32 2009] GET /myapp/normen/servlet.myapp 200 0.010458 >>> 2915 >>> [Tue Apr 21 13:15:32 2009] GET /myapp/normen/servlet.myapp 200 0.112756 >>> 18897 >>> [Tue Apr 21 13:15:44 2009] [29831:2537034048] [info] >>> ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for >>> socket 1 returned -32 (errno=32) >>> [Tue Apr 21 13:15:44 2009] [29831:2537034048] [info] >>> ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query >>> [Tue Apr 21 13:15:44 2009] [29831:2537034048] [info] >>> ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request, >>> socket -1 keepalive cping/cpong failure (errno=32) >> >> The same for pid 29831 >> >>> [Tue Apr 21 13:15:44 2009] [30401:2537034048] [info] >>> ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for >>> socket 17 returned -32 (errno=32) >>> [Tue Apr 21 13:15:44 2009] [30401:2537034048] [info] >>> ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query >>> [Tue Apr 21 13:15:44 2009] [30401:2537034048] [info] >>> ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request, >>> socket -1 keepalive cping/cpong failure (errno=32) >> >> pid 30401 >> >>> [Tue Apr 21 13:15:49 2009] [29827:2537034048] [info] >>> ajp_connection_tcp_send_message::jk_ajp_common.c (1064): sendfull for >>> socket 16 returned -32 (errno=32) >>> [Tue Apr 21 13:15:49 2009] [29827:2537034048] [info] >>> ajp_handle_cping_cpong::jk_ajp_common.c (855): can't send cping query >>> [Tue Apr 21 13:15:49 2009] [29827:2537034048] [info] >>> ajp_maintain::jk_ajp_common.c (3046): (ajp13) failed sending request, >>> socket -1 keepalive cping/cpong failure (errno=32) >> >> pid 29827 >> >>> [Tue Apr 21 13:15:49 2009] [29831:2537034048] [info] >>> ajp_process_callback::jk_ajp_common.c (1748): Writing to client aborted >>> or client network problems >>> [Tue Apr 21 13:15:49 2009] [29831:2537034048] [info] >>> ajp_service::jk_ajp_common.c (2407): (ajp13) sending request to tomcat >>> failed (unrecoverable), because of client write error (attempt=1) >> >> Now a client write error, again pid 29831. Possibly, the cping write >> error above too a long time to be detected and that's why the response >> retrieved via a fresh connection wasn't consumed by the client? Although >> the below response times seem to tell us, that the longest request in >> the snippet here took 9 seconds, which is not that long. >> >> There is no mod_jk internal relation between the cping send failure and >> a client write error. It could only be in some indirect way, like both >> being caused by a general network problem (occuring on the path to the >> browser and on the path to the backend - but then why don't we see lots >> of other communication errors as well?) or by some timing issue >> triggering user behaviour - not very likely in light of the blow >> duration lines. >> >>> [Tue Apr 21 13:15:49 2009] POST /myapp/normen/servlet.myapp 200 5.210358 >>> 8184 >>> [Tue Apr 21 13:15:49 2009] [29831:2537034048] [info] >>> jk_handler::mod_jk.c (2469): Aborting connection for worker=ajp13 >> >> This is a consequence of the client write error, so next time this >> process will open a fresh connection to the backend. >> >>> [Tue Apr 21 13:15:49 2009] POST /myapp/normen/servlet.myapp 200 0.363977 >>> 20753 >>> [Tue Apr 21 13:15:54 2009] POST /myapp/normen/servlet.myapp 200 9.057331 >>> 20755 >>> [Tue Apr 21 13:15:55 2009] POST /myapp/normen/servlet.myapp 200 0.215937 >>> 14166 >> >> Regards, >> >> Rainer >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >> For additional commands, e-mail: users-h...@tomcat.apache.org >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > > -- kippdata informationstechnologie GmbH Tel: 0228 98549 -0 Bornheimer Str. 33a Fax: 0228 98549 -50 53111 Bonn www.kippdata.de HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417 Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann =============================== kippdata informationstechnologie GmbH Tel: +49 228 98549 -0 Bornheimer Str. 33a Fax: +49 228 98549 -50 D-53111 Bonn www.kippdata.de HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417 Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org