Re: Using Apache/Tomcat in high-traffic site

David Crooke Wed, 28 Mar 2001 23:02:24 -0800

[EMAIL PROTECTED] wrote:

Hi,

Hi, how's life back over that side of the pond? (I'm from Edinburgh myself)

I am looking to use Apache and Tomcat in my company's production web site.
Before I can convince management that this is a good idea I need some
information so I am confident. If anyone can help I'll be very grateful.

We have used Apache+JServ in anger for some time now, and it is good stuff. We have load tested it to the limit, and it both degrades and recovers very gracefully. My only concern would be how robust the Tomcat code base is by comparison, given its youth.

First of all I'll give you a picture of the overall architecture that I want to
create:
1) For fault-tolerance and scaleability we want to have several instances of
Apache running over a number of machines. These will be load balanced by a pair
of Cisco Local Director boxes, because we already own a couple of them. >From a
brief look through the operating manual on Cisco's web site I get the impression
that Local Director cannot support "sticky" load-balancing. Please correct me
if I'm wrong.

Most of these products do support sticky load balancing, and I am pretty sure that LocalDirector does. Beware - some of the cheaper systems do it with cookies, thus are dependent on the browser accepting them. The better ones use a hashtable of client IP's, much like a network switch does with MAC addresses.

We don't use one of these, we just use round robin DNS across web servers with real IP's, and so far it suffices to get pretty even load balancing. The problem with a single LocalDirector is it's a single point of failure; it could also in theory be a performance bottleneck if you have a LOT of traffic twin LD's sounds better.

Even with a fair amount of our app being served via SSL, the web server load is pretty small.

One minus point of JServ vs. a commercial appserver like BEA Weblogic is that JServ relies on random distribution of new sessions for load balancing, whereas WebLogic will react dynamically to appserver loads. You could of course implement load management externally by using the jserv_shm file to throttle new session allocations.

2) We will run several instances of Tomcat (version 3.2.1 most likely) on our
back-end servers. These are a couple of Sun E10000s with 32 processors each.
These are partitioned up into domains, so that we end up with 12 processors on
each machine dedicated to running Tomcat. Each domain has 2GB of RAM. There
are other domains dedicated to databases, etc...

That's pretty serious iron (speaking as a former Cray sysadmin :-)

We just use a bunch of twin Pentium rackmount boxes running Linux for the appservers, and use Oracle on Sun's for the DB only. E10k's are pretty reliable, but I'd hate to lose half the appserver capacity in one go.

Of course, I'm a Scotsman, and your hardware probably cost more than we raised in our first venture capital round :-)

3) Each instance of Apache will be using mod_jk in a "sticky" load-balanced
configuration. Every instance of Apache (and so mod_jk) will have workers
defined for every instance of Tomcat on both E10000s. In other words, a Tomcat
instance may receive a request from any instance of Apache.
The things I need to know :)
1) Each request to our site will be assigned seemingly at random to one of the
Apache instances by the Cisco boxes. Can the "sticky" part of mod_jk cope with
this? i.e. Will mod_jk pass the request to the correct Tomcat instance even
though it does not share context information with the other Apache instances?

Yes (at least, it works in mod_jserv) as long as they all use the same route labels

2) What is the best way of running Tomcat on the large E10000s? Should I have
just a few very large processes (say 2 x 800MB max heap) for Tomcat or should I
run many smaller processes (say 16 x 100MB max heap)? Has anyone got any
experience with making Tomcat (or any Java based server) scale to machines of
this size and power? I want to get good performance, but I don't want to
compromise stability.

Java VM's can have trouble garbage collecting efficiently on very large memory sizes. The conventional wisdom in the Java 1.1 days was to run appservers in green threads VM's, which meant one VM per CPU (minimum). Solaris nowadays has wonderful threading, so native threads in Java are very useable. Items to consider, in no particular order:

- what are the characteristics of your app? What is the memory footprint of a session? Are requests little or large (in terms of server CPU resources)? What rate will a typical user make requests at? (the public on the web will have a much slower pageview rate than telesales users on the LAN who know their app like the back of their hand). How many sessions will be active at once? Lots of light users and lots of little requests -> more, smaller VMs; a few power users -> less, larger VMs

- number of filehandles and I/O sockets each VM can (efficiently) handle; we found this was a limit on Linux with mod_jserv, where each requests uses a dedicated TCP connection. Linux has a default limit of 1024 file handles per process, which we have increased (kernel rebuild). I know mod_jk re-uses connections, but I don't know i it multiplexes them

- how much Java code or data is common and shared across all sessions (at minimum, Tomcat and your code will be)? If you have some, you'll have a copy in each VM. We cache database tables in the VM, so we have this replication overhead issue - the site-wide caches exist in each VM.

- can a single big VM efficiently manage that amount of RAM? Java 1.1 was horrendous at garbage collection, VM's would stall for *seconds* at a time; in Java 1.3 it's much more dynamic but I'd fear what happens if it hits 700Mb and decided to collect 600Mb of dead objects in one pass :-) A background thread which wakes up every few seconds and calls System.GC() can mitigate this of course. Try a test program to just new and dereference a ton of objects while making periodic calls to the system clock, and see what happens (see if you get any large variation in the gap between clock values).

(Side question - what is the heap limit in Java 1.3? It used to be 256Mb in 1.1)

- too many small VM's may thrash the CPU caches. On UltraSparcs they are larger (2 or 4Mb) than on an Intel CPU (512Kb) but it's still a consideration.

- too few large VM's will not efficiently use the CPU's. Under JDK 1.2 the maximum number of CPUs one VM would use efficiently on Solaris was five (no, I have no idea why not six or four). Don't know what the scaling is like for 1.3

- how does the request marshalling in Tomcat itself scale? Most appservers keep a list of sessions in a java.util.Hashtable to maintain sync protection, will this cause the request dispatch to be a bottleneck in a very large VM?

- most important - make sure the heaps are sized so that the VM heap memory is never swapped by the OS; that means the total of the -mx sizes should be a chunk less than the system memory, since you want to allow for the OS, system services, the VM code and data segments and a bit of disk buffer cache.

If I had to haul a number out of thin air for a 12CPU / 2Gb system, I'd try starting with something like 8 VM's at -mx192M each or 12 at 128Mb and seeing how they run.

3) Has anyone used Tomcat in a mission critical environment? I use it in
development all the time and I'm very happy with it, but I'd like to get some
impression of how it holds up.

Can't speak for Tomcat, but JServ is very solid.

I have also used ATG Dynamo and found it good, and heard good things about Websphere and Weblogic.

JWS is slow and is development use only (Sun don't use it in anger themselves, I have deployed stuff on ATG for them).

JRun I have experienced issues with in the past, but that was on Windows :-)

4) Are there any security considerations with this configuration? We have
firewalls, intrusion detection kit, etc... for which we have a dedicated
maintenance team. I'm really only interested in the software here. Has anyone
used the new Java 2 security features with Tomcat?

As per any web application - validate all your input, don't trust the browser. Keep any security flags or state you need to rely on in the session data the server side.

Thank you very much indeed!
Cheerio, Woody.

Re: Using Apache/Tomcat in high-traffic site

Reply via email to