On Thu, 10 Dec 2009, Ryan Lynch wrote: > On Thu, Dec 10, 2009 at 08:25, <[email protected]> wrote: >> On Wed, 9 Dec 2009, Ryan Lynch wrote: > >>> In my initial response to David, I was a little ambigious as to what >>> "central" log servers means: The hosts RECEIVING syslog messages are >>> all on different subnets, at different sites. I quoted the word >>> "central" because the logs are centralized, not the servers >>> themselves. Ideally, the servers are dispersed far and wide, for >>> better survivability. So ClusterIP might still play a role, but it >>> wouldn't be a drop-in replacement for a round-robin. I'm not sure, >>> yet, how much of the initial, shared-nothing design I want to >>> compromise--it's going to be a judgement call. >> >> unless you send copies of all your logs to all your servers do you really >> get the survivability that you need? (I don't know what you are trying >> for) > > That's a very good question. If we divide the log data across a pool > of N log-receiving hosts, a catastrophic failure of one receiver would > blow away (1/N) of our logs. And like RAID striping, the probability > of a single-element failure increases as we add more receivers. I can > mitigate that risk by building the receivers on redundant RAID arrays, > using battery-backed cache and reliable journaling filesystems, but > the risk of a single receiver losing data is still there. So it's not > as simple as saying "let's not put all our eggs in one basket", if > losing even a single egg is a problem. (Our logs aren't all quite that > important, but we don't want to ignore the issue completely.) I have a > couple of ideas, to start with.
one really interesting trick that you can pull with clusterIP is that you can use the same IP on more than one cluster with UDP. so assuming that you don't need load balancing and can use UDP for the final hop, you could have two machines, both set to use clusterIP with a node count of 1 and a node id of 1. both machines would see and process all the logs. if one machine dies the other still has the logs. as one machine dies and is fixed and later the other does the same end up with all your logs there, but neither machine having all of them. I am doing something along these lines now. I have several clusters setup with each cluster using the same IP on one of these clusters I have a log searching applicationa (splunk), which needs many machines. so I have the 20 machines in this cluster setup as 10 pairs of 2 machines, each pair has both machines configured to receive the same 1/10 of the log data, which rsyslog then writes to disk. when the logs roll, I configure the 'standby' box of each set to throw away it's logs, and the 'active' box of each set moves it's logs both to long-term storage in itself, and to long-term storage on the standby box. when a failover happens I may loose or duplicate a small amount of logs (the log rotation isn't going to be at exactly the same instant on the two machines), but this is close enough for my situation. David Lang _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

