Hi all,

Here's the situation as it stands today and what can be done to solve
it. I'll try to keep this short.

Running configuration:

*       Running on Linux Red-Hat Ent 3
*       1 X F5 load balancer and hardware SSL box.
*       5 X Apache 1.3.33/mod_jk 1.2.14
*       6 X JBoss 4.0.0/Tomcat 5.0.28 using the AJP13 connector. 
*       Oracle 9i

Our production environment hosts a number of applications, each with
different load and usage patterns. Our problem comes from the fact that
it is difficult to find a web farm configuration that will satisfy every
application. For reasons I will not explain here, we cannot have a
dedicated web farm for each application.

This is what we think is happening in our production environment based
on tests ran in UAT (User Acceptance Tests) and literature from the
Apache and Tomcat products. This is all pretty new to us so if someone
can provide hard facts, you are more than welcome.

1.      The 1.3 generation of Apache web servers will spawn a child
process to handle an HTTP request. Only one HTTP request at a time can
be processed by that child. 
2.      As the load increases on the web server, additional child
processes will be spawned to concurrently serve the requests. There is a
default limit to how many child processes can be forked. That limit
defaults to 256 but has been changed in production to 16384. This is the
MaxClients directive. It seems that production really needs the 16384
value instead of the 256 default. With 256, our web servers were
rejecting connections and could not support the load generated by all of
our clients.
3.      To prevent latency, Apache will maintain a maximum of 100 spare
child processes alive. Spare means that they are not serving requests.
Once reached, that number of spare servers does not seem to decrease.
This is the number we see in our tests in UAT where 201 threads remain
active in Tomcat. This is the 100 spare children connections * 2 web
server plus accept() thread. 
4.      If a request needs to be forwarded to Tomcat/JBoss (dynamic
pages), the child process mod_jk module will instantiate a socket
connection to the ajp13 connector in Tomcat. 
5.      Tomcat will accept the connection and create a thread to serve
it. Connections will be accepted up to a concurrent maximum of 1200.
This upper value has been set by us. 
6.      Tomcat will reject connections when the maximum is reached.
JBoss 4.0.0 has a known issue where the server will die when the maximum
is reached. This has been fixed in 4.0.2. 
7.      A connection could potentially be recycled in mod_jk
(recycle_timeout) if no activity occurs thru the connection. However,
any requests to Tomcat from any user session-bound to that Tomcat
instance could go thru the connection, thus keeping it active. Recycling
does not seem to occur. We use a recycle_timeout value of 300.
8.      The fact that the production web servers can potentially serve
up to 16384 concurrent requests make it possible for a web server to
instantiate an almost infinite number of connections to Tomcat and nuke
9.      Tomcat can then become overloaded with connections. If a valid
HTTP request comes thru Apache and is routed to a child process that has
not yet made a connection to Tomcat, the connection could be impossible
if Tomcat has already accepted its 1200 limit. 
10.     In that case, mod_jk could potentially fail over to another
Tomcat. The user would however loose his session.
11.     The recycle_timeout and  cache_size options are of no use to us
because too many web server children are created to serve the company
load. Thus, many different routes can be taken by requests targeted to
our application, keeping all the connection alive.
12.     We tried really small recycle_timeout values (e.g. 20) with no
effect. A netstat reveals that connections remain ESTABLISHED. 
13.     The maxRequestsPerChild setting is set to 0 in PROD. It means
that Apache child processes will never die, unless you reach the
maxSpareServers value. Thus, at least 100 connections per web server
will always remain actively connected to Tomcat. A > 0 value would at
least guarantee that a child process would eventually die, freeing
Tomcat connections and releasing back leaked memory to the OS. 

It's hard to see a path out of this one. One solution would be to reduce
the MaxClients Apache config back to 256. This would mean that a single
instance of Tomcat would not be hit by more than 256 * 5 = 1280 (5 is
the web farm size) connections. Our current jvm settings (heap + thread
stack sizes) would allow us to do it. We would also need to bump our
current 1200 limit a bit higher. However, this solution if not
compatible with other applications which have really high loads.

Second option would be to patch mod_jk so that connections are dropped
as soon as the response has been received from Tomcat. Drawbacks include
preventing us from upgrading to new releases (unless we re-apply the
modifications), introduce the risk of breaking something in this add-on,
concentrate knowledge in the head of the person making the changes,
introduce yet another component for the prod people to know and manage.
The overhead of a connection is probably quite small but would need to
be validated.

Finally, having our own web farm would be a solution. However, this goes
against Production master plan of having only one web farm for

Thank you all for your help!


Remy Gendron
Team Leader - Contingent
T. 418.524.5665 x 1259
C. 418.809.8585
F. 418.524.8899 
Talent Management Drives the Enterprise.

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to