Hi Christopher thanks so much for your replies..., I am responding with inline comments below ________________________________________ From: Christopher Schultz [ch...@christopherschultz.net] Sent: Monday, February 24, 2014 9:56 PM To: Tomcat Users List Subject: Re: tomcat 6 refuses mod_jk connections after server runs for a couple of days
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Isaac, On 2/24/14, 2:27 PM, Isaac Gonzalez wrote: > Hello all, > > I'm running tomcat 6.0.32 on Cent OS 6 with 2 front end apache load > balancers with a firewall in between the tomcat and load balancers > using mod_jk v. 1.2.37 under apache 2.2.10 to connect the backend > tomcat. I have had this running ok for a few years but our user > traffic has increased significantly. A few months ago, the tomcat > server seemed to refuse or not accept any new connections from > either load balancer and required a restart on the tomcat end, even > though I could easily connect to tomcat on port 8080(manager). I > can intermittently telnet to port 8009, but am denied a bit as well > both inside and outside the firewall. > > I proceeded to split the tomcats up into their own instances, > hoping when this issue recurred that it would only affect a > particular tomcat app. It also gave our developers the ability to > patch a single tomcat app without downing all of our apps. > > Unfortunately, this issue has recurred several times and I have > spent most of my days researching and digging for hope of someone > with a similar experience that may have resolved it. Last Friday > the problem was so bad, I had to completely restart the tomcat > server(reboot it). > > So far I am at a loss...I have installed psi-probe on all tomcat > instances to give me more in depth analysis to tomcat threads and > related server metadata when the problem is occuring. I have made a > few modifications to workers.properties, in particular to decrease > the connection timeout as well as the tomcat ajp connector from 10 > minutes to 5 minutes and added the ping timeout and socket timeout. > I also increased my apache prefork MPM client connections to 500 on > each load balancer. Below is my relevant configs...any suggestions > to help remedy this would help... I have also increased threads > from 200 to 500 on all tomcat instances. I'd be interested to see a thread dump on a "stuck" Tomcat to see what it's doing. If it happens again, please take a thread dump (or, better yet, 3 or so maybe 5-10 seconds apart) and post them back to the list. http://wiki.apache.org/tomcat/HowTo#How_do_I_obtain_a_thread_dump_of_my_running_webapp_.3F Isaac: Ok, I will submit one...PSI Probe shows them all but I have to click on each one at a time... Does restarting the Tomcat instance fix everything, or do you have to also bounce httpd? What happens if you bounce only httpd? Isaac: Restarting the Tomcat instance fixes it. Bouncing httpd has no affect. After the "split", did both Tomcats appear to lock-up simultaneously, or did only one of them have a problem and the other one stayed up? Isaac: They all appear to lock-up simultaneously, if users try to access that JK mount point. Do the lock-ups appear to be related to anything you can observe, such as particularly high-load, etc.? I have seen the lock-up appear when we had some network latency and other network issues going on all externally facing traffic at this datacenter. I have also seen it happen when there is some database connectivity issues within the applications. Other times I have just seen it appear with possibly a high load. > Workers.properties: > > worker.list=jkstatus,server1,server2,server3,server4,server5,server6,server7,server8 > > worker.jkstatus.type=status > > # Let's define some defaults worker.basic.port=8009 > worker.basic.type=ajp13 worker.basic.socket_keepalive=True > worker.basic.connection_pool_timeout=300 > worker.basic.ping_timeout=1000 worker.basic.ping_mode=A > worker.basic.socket_timeout=10 > > worker.lb1.distance=0 worker.lb1.reference=worker.basic > > worker.server1.host= server1hostname > worker.server1.reference=worker.lb1 > worker.server2.host=server2hostname > worker.server2.reference=worker.lb1 > worker.server3.host=server3hostname > worker.server3.reference=worker.lb1 worker.server4.host= > server4hostname worker.server4.reference=worker.lb1 > worker.server5.host= server5hostname > worker.server5.reference=worker.lb1 worker.server6.host= > server6hostname worker.server6.reference=worker.lb1 > worker.server7.host= server7hostname > worker.server7.reference=worker.lb1 worker.server8.host= > server7hostname worker.server8.reference=worker.lb1 You didn't show any JkMounts in your httpd.conf file. What worker are you using? It sounded like you were load-balancing the servers, but your "lb1" worker does not have any balance_workers setting so it doesn't look like it's going to work. Isaac: I am not load-balancing the tomcat servers...I only have one...I do "load balance" the apache front end servers via dns round-robin.... JkWorkersFile /etc/httpd/conf/workers.properties JkLogFile logs/mod_jk.log JkLogLevel info JkLogStampFormat "[%a %b %d %H:%M:%S %Y] " # JkOptions indicate to send SSL KEY SIZE, JkOptions +ForwardKeySize +ForwardURICompat -ForwardDirectories JkRequestLogFormat "%w %V %T" JkMount /appnam4escrubbed server4 JkMount /appnames4crubbed/* server4 JkMount /appname5scrubbed server5 JkMount /appname5scrubbed/* server5 JkMount /appname7scrubbed server7 JkMount /appname7scrubbed/* server7 JkMount /appname2scrubbed server2 JkMount /appname2scrubbed/* server2 JkMount /appname6scrubbed server6 JkMount /appname6scrubbed/* server6 JkMount /appname3scrubbed server3 JkMount /appname3scrubbed/* server3 JkMount /appname1scrubbed server1 JkMount /appname1scrubbed/* server1 JkMount /appname8scrubbed server8 JkMount /appname8scrubbed/* server8 JkMount /jkmanager/* jkstatus > httpd.conf: > > KeepAlive Off MaxKeepAliveRequests 100 KeepAliveTimeout 15 > > > > # prefork MPM # StartServers: number of server processes to start # > MinSpareServers: minimum number of server processes which are kept > spare # MaxSpareServers: maximum number of server processes which > are kept spare # ServerLimit: maximum value for MaxClients for the > lifetime of the server # MaxClients: maximum number of server > processes allowed to start # MaxRequestsPerChild: maximum number of > requests a server process serves <IfModule prefork.c> StartServers > 8 MinSpareServers 5 MaxSpareServers 20 ServerLimit 500 > MaxClients 500 MaxRequestsPerChild 5000 </IfModule> It would be good to see your Jk* setting as well. Isaac: See above > Tomcat server.xml: > > <!-- Define an AJP 1.3 Connector on port 8009 --> <Connector > port="8009" address="x.x.x.x" protocol="AJP/1.3" > redirectPort="8443" connectionTimeout="300000" maxThreads="500" /> Why do you both having a connectionTimeout on an AJP connection? httpd should only send a request to you once the request line has been received by the client, so there isn't really any legitimate reason for an AJP request to time out before the request-line comes-through. I have seen many with this same problem and it has been suggested, even in the apache connector documentation to have the connection timeout in the AJP connector match the connection_pool_timeout in the workers.properties file. Here I have them both set to 5 minutes. They were set to 10 minutes before. I also recently added the socket_timeout in workers.properties after reading a helpful how-to from some jboss documentation. It did not help with the problem at all. IMO, you should define an <Executor> and share threads between your AJP connector and your HTTP connector (which you didn't show config for, but mentioned you had one above running on port 8080). Otherwise, your Tomcat configuration looks fine. Isaac: Hmmm...here is my <executor> connector below which is commented out...: <Service name="Catalina"> <!--The connectors can use a shared executor, you can define one or more named thread pools--> <!-- <Executor name="tomcatThreadPool" namePrefix="catalina-exec-" maxThreads="150" minSpareThreads="4"/> --> <!-- A "Connector" represents an endpoint by which requests are received and responses are returned. Documentation at : Java HTTP Connector: /docs/config/http.html (blocking & non-blocking) Java AJP Connector: /docs/config/ajp.html APR (HTTP/AJP) Connector: /docs/apr.html Define a non-SSL HTTP/1.1 Connector on port 8080 --> <Connector port="8080" protocol="HTTP/1.1" address="x.x.x.x" connectionTimeout="20000" redirectPort="8443" /> <!-- A "Connector" using the shared thread pool--> <!-- <Connector executor="tomcatThreadPool" port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" /> --> If you get despirate, you can set mod_jk's log level to DEBUG and watch your disk fill up. You'll get great information if/when things start to go south. Isaac: I am very desperate Chris, I guess I have no choice...I'm unsure of how I should implement the executor...so any suggestions on that would be helpful... -Isaac - -chris -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJTDDCXAAoJEBzwKT+lPKRYfm0P/jx9Dqimk06Aihjpm1Qv9e6r G7wgpmKjpPYOULNk6TVsz0Gs4YljOTklviDvTH2dL1bHszaJNKR5ZOKXlgXiaWRG 3w6HpH2ujgJQczXxVi43eF3bbYnPHJguGEN6pVG4ig5qYGuwPVHNYsvWQBRjVITS IRxnTdCsBtBOgdfAS8WEJNZKK/Ep5s37nehtcZ4Hq8T/q9TkWGnZ7rABglUN0Mex JKnSJR3/VYS1FFH7efV+GDdtlPRFEebXXzjA+DNAYyo2HH4zV0XAfF4zaMAmTHxK MgvP8sQA6klWH3XSkNQRm+SJMD5Kl32JLsIi3osGP7pLowlg/jvI66lcZnkVKhGi SIhgzUSEHLAgBZOO+wIRyTEBU5ALJ6kW3ySZqdL6fF4wDsON1ZV/gXCshFlByJ9q i+jth+jtovmDsLgiwYhKsZ3zr+lfeVVQWjZ3cY5l0kGxaPYpJt+EhEjLtjSMg8Lg 06ZMSBi8hZ+R+fyz1jBzcGadRMTRmN6aaolf/XZOom4i+5qcjrJ5r0cSXFqVBY1D XBfnDRwAAjOfWLCwiQ9XWhDWfyOQZghJ2mBT2vaQbCnZelcK4e3V0Ix8iYUatnTu o2lKiOxsRlkIXRgBfutXu7z3IpdNK1YaotpFtz4hb4lz0dSmDsNP5llpw3zyuQ9l ZMKc94GOw+ZY2Tt2XFYD =tdbH -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org