Hi all, Here's the situation as it stands today and what can be done to solve it. I'll try to keep this short.
Running configuration: * Running on Linux Red-Hat Ent 3 * 1 X F5 load balancer and hardware SSL box. * 5 X Apache 1.3.33/mod_jk 1.2.14 * 6 X JBoss 4.0.0/Tomcat 5.0.28 using the AJP13 connector. * Oracle 9i Our production environment hosts a number of applications, each with different load and usage patterns. Our problem comes from the fact that it is difficult to find a web farm configuration that will satisfy every application. For reasons I will not explain here, we cannot have a dedicated web farm for each application. This is what we think is happening in our production environment based on tests ran in UAT (User Acceptance Tests) and literature from the Apache and Tomcat products. This is all pretty new to us so if someone can provide hard facts, you are more than welcome. 1. The 1.3 generation of Apache web servers will spawn a child process to handle an HTTP request. Only one HTTP request at a time can be processed by that child. 2. As the load increases on the web server, additional child processes will be spawned to concurrently serve the requests. There is a default limit to how many child processes can be forked. That limit defaults to 256 but has been changed in production to 16384. This is the MaxClients directive. It seems that production really needs the 16384 value instead of the 256 default. With 256, our web servers were rejecting connections and could not support the load generated by all of our clients. 3. To prevent latency, Apache will maintain a maximum of 100 spare child processes alive. Spare means that they are not serving requests. Once reached, that number of spare servers does not seem to decrease. This is the number we see in our tests in UAT where 201 threads remain active in Tomcat. This is the 100 spare children connections * 2 web server plus accept() thread. 4. If a request needs to be forwarded to Tomcat/JBoss (dynamic pages), the child process mod_jk module will instantiate a socket connection to the ajp13 connector in Tomcat. 5. Tomcat will accept the connection and create a thread to serve it. Connections will be accepted up to a concurrent maximum of 1200. This upper value has been set by us. 6. Tomcat will reject connections when the maximum is reached. JBoss 4.0.0 has a known issue where the server will die when the maximum is reached. This has been fixed in 4.0.2. 7. A connection could potentially be recycled in mod_jk (recycle_timeout) if no activity occurs thru the connection. However, any requests to Tomcat from any user session-bound to that Tomcat instance could go thru the connection, thus keeping it active. Recycling does not seem to occur. We use a recycle_timeout value of 300. 8. The fact that the production web servers can potentially serve up to 16384 concurrent requests make it possible for a web server to instantiate an almost infinite number of connections to Tomcat and nuke it. 9. Tomcat can then become overloaded with connections. If a valid HTTP request comes thru Apache and is routed to a child process that has not yet made a connection to Tomcat, the connection could be impossible if Tomcat has already accepted its 1200 limit. 10. In that case, mod_jk could potentially fail over to another Tomcat. The user would however loose his session. 11. The recycle_timeout and cache_size options are of no use to us because too many web server children are created to serve the company load. Thus, many different routes can be taken by requests targeted to our application, keeping all the connection alive. 12. We tried really small recycle_timeout values (e.g. 20) with no effect. A netstat reveals that connections remain ESTABLISHED. 13. The maxRequestsPerChild setting is set to 0 in PROD. It means that Apache child processes will never die, unless you reach the maxSpareServers value. Thus, at least 100 connections per web server will always remain actively connected to Tomcat. A > 0 value would at least guarantee that a child process would eventually die, freeing Tomcat connections and releasing back leaked memory to the OS. It's hard to see a path out of this one. One solution would be to reduce the MaxClients Apache config back to 256. This would mean that a single instance of Tomcat would not be hit by more than 256 * 5 = 1280 (5 is the web farm size) connections. Our current jvm settings (heap + thread stack sizes) would allow us to do it. We would also need to bump our current 1200 limit a bit higher. However, this solution if not compatible with other applications which have really high loads. Second option would be to patch mod_jk so that connections are dropped as soon as the response has been received from Tomcat. Drawbacks include preventing us from upgrading to new releases (unless we re-apply the modifications), introduce the risk of breaking something in this add-on, concentrate knowledge in the head of the person making the changes, introduce yet another component for the prod people to know and manage. The overhead of a connection is probably quite small but would need to be validated. Finally, having our own web farm would be a solution. However, this goes against Production master plan of having only one web farm for production. Thank you all for your help! Remy Remy Gendron Team Leader - Contingent Taleo T. 418.524.5665 x 1259 C. 418.809.8585 F. 418.524.8899 E. [EMAIL PROTECTED] www.taleo.com Talent Management Drives the Enterprise. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]