Magnus Hedemark wrote:
David McDowell wrote:
One in the same? Here's my idea. I'd like to use CentOS 4 if
possible to do this. I would like to have my webserver mirrored on
another machine so that if one goes down, the site continues to run. If I change a config on one machine, the config should change on the
mirrored machine. Is this running a cluster or is this some other
kind of setup? Basically I have some time at work to play. Any good
resources for this kind of information? Basically I want 2 servers to
be identical mirrors of one another so that if one of the 2 goes down,
I'm still online. And, if I repair the broken one, it can resync
itself so that the mirror of the 2 machines is identical again. Suggestions, links, etc?
Since CentOS 4 is, despite what RHAT's lawyers say, technically pretty much RHEL 4, you can follow the admin docs for RHEL 4 to see how clustering works in there. The way we're doing it at $WORK requires access to a LUN on a SAN that is unmasked to both servers, though, so I'm not quite sure how you would pull it off without some sort of shared external SCSI or SAN storage.
I wanted to comment on this thread yesterday, but time did not allow. As I've been looking into this a lot lately, I'll quickly try to summarize the available options for "clustering", as I've discovered thus far. Primarily I'm familiar with high availability clustering, as opposed to compute-performance clustering. The latter is more geared towards parallel processing of individual data chunks, and sharing those chunks to be processed over something like Myrinet or Infiniband, or even Gig-E. This requires dedicated software which understands how to divide up the processing task, and isn't really what you're after. Having gotten that out of the way, on to HA clustering...
For clustering web traffic, ultramonkey has been mentioned already, and is a very nice bundling of heartbeat and the other tools from the linux-ha project, with LVS (Linux Virtual Server). I have seen it used with some success in production, for doing pretty-much what you describe. If you need a group of web servers to appear as "one" web server which never goes down, Ultramonkey is a nice bundling of those tools. It essentially allows you to take a pair of very simple, low-powered machines and make a failover-pair out of them. Then on that failover pair you run LVS, which talks to your webserver "farm" behind the failover pair. This provides a very redundant system, in that either of the front end machines can fail, or any of the back end machines can fail, and traffic will continue to be served accordingly. This doesn't help you solve the problem of dividing up your application, if it's not currently capable of being run on more than one server...
Consider something other than web services. Something like Cyrus IMAP, for example, is notoriously difficult to setup for high availability. The traditional way to handle the problem is to use one big box, make it as redundant as possible, and hope it doesn't go down. Anything more complicated involved a "murder" of IMAP servers (I'm not making that name up), with a specially designed redirector, and a lot more machines. Essentially, this becomes a configuration and management nightmare, unless you *really* need to scale well beyond what one box can serve up anyway. With Cyrus, you can't even share the backed storage over NFS, because it's locking doesn't play well with NFS. So, how do you provide good email services? Enter DRBD. DRBD is short for Distributed Redundant Block Device, and it's exactly what it sounds like. A method for creating a block device in Linux, and then mirroring all changes to that block device to a redundant block device on another computer over the network (usually over a dedicated Gig-E link). You then format this block device with your journaling filesystem of choice (after all, a disk is just a block device to linux). You then combine the afore-mentioned linux-ha project (with heartbeat and IP failover) in such a fashion that when the first machine fails, the second machine Shoots The Other Node In The Head (STONITH) to ensure that it's off, and then mounts the redundant copy of the filesystem, starts the services, takes over the IP address, and starts serving up traffic. All with in about 10-15 seconds. With this way of doing things, you can serve up redundant NFS, redundant Cyrus, redundant Jabber, or what ever you need. Even basic clustering of an Apache server works well in this case, if all you need is failover with out load balancing. It also somewhat reduces the complexity of the picture, as you don't have LVS involved monkeying (no pun intended) with the packets (which is required for scaling much beyond two machines).
There is of course the classic way of clustering, involving external shared storage. It's very similar to the DRBD description above (in fact the story should be told in reverse, as DRBD is based off this idea), where you have an external source of shared storage, such as a SAN fabric, or a shared SCSI disc subsystem. Instead of having a shared Gigabit link to mirror the data back and forth, it's simply stored on media which both machines have access to. The obvious problem here being that Brocade switches and fiberchannel disc controllers / disc boxes aren't cheap equipment. :) Even external SCSI arrays are usually a bit of overkill for the task at hand, not to mention equally expensive.
Anyway, hopefully this has been a nice summary of the clustering options that are available, and what purposes they are best for. By all means, I'm not authoritative (or even timely in this case), but maybe it's a useful starting point for someone.
Aaron S. Joyner -- TriLUG mailing list : http://www.trilug.org/mailman/listinfo/trilug TriLUG Organizational FAQ : http://trilug.org/faq/ TriLUG Member Services FAQ : http://members.trilug.org/services_faq/ TriLUG PGP Keyring : http://trilug.org/~chrish/trilug.asc
