On Thu, Dec 10, 2009 at 08:25, <[email protected]> wrote: > On Wed, 9 Dec 2009, Ryan Lynch wrote: >> I've learned something about DNS round-robin, in the past 24 hours: >> Apparently, everybody hates it! > > actually, in my case it's more a matter that doing a DNS lookup is a > fairly slow operation, so it's something to avoid if possible.
Makes sense to me. >> In my initial response to David, I was a little ambigious as to what >> "central" log servers means: The hosts RECEIVING syslog messages are >> all on different subnets, at different sites. I quoted the word >> "central" because the logs are centralized, not the servers >> themselves. Ideally, the servers are dispersed far and wide, for >> better survivability. So ClusterIP might still play a role, but it >> wouldn't be a drop-in replacement for a round-robin. I'm not sure, >> yet, how much of the initial, shared-nothing design I want to >> compromise--it's going to be a judgement call. > > unless you send copies of all your logs to all your servers do you really > get the survivability that you need? (I don't know what you are trying > for) That's a very good question. If we divide the log data across a pool of N log-receiving hosts, a catastrophic failure of one receiver would blow away (1/N) of our logs. And like RAID striping, the probability of a single-element failure increases as we add more receivers. I can mitigate that risk by building the receivers on redundant RAID arrays, using battery-backed cache and reliable journaling filesystems, but the risk of a single receiver losing data is still there. So it's not as simple as saying "let's not put all our eggs in one basket", if losing even a single egg is a problem. (Our logs aren't all quite that important, but we don't want to ignore the issue completely.) I have a couple of ideas, to start with. My first idea is probably overkil: Secondary relaying between the log-receiving hosts. I could configure the receivers' Rsyslog daemons to write edge server log data to local file, but then to also send a copy to another receiver. The distribution of secondary relay targets could be statically assigned, or I could imagine using another clustering mechanism to make it handle failover conditions automatically. We'd have to be careful of logging "loops", where two servers keep sending the same log data back and forth to each other, but that's pretty easy if you know to watch out for it. The real problem, I think, is doubling the amount of storage, network bandwidth, disk bandwidth, and CPU time that we would otherwise need to handle a single copy of the edge logs in real time. A second possibility is to periodically batch compress/copy each receiver's local files to another receiver, or to some other backup location. We still use some extra CPU, network bandwidth, and storage, but batch processing and compression will probably increase the per-message efficiency. I think we might actually use more disk IO, though, because the batch processing needs a separate read operation to grab the already-written files on disk (whereas real time processing already has the message in memory when it makes the 2nd copy). We could still lose some data to a receiver failure, from the interval between batches, so it would be important to consider the costs of data loss versus the lower efficiency with shorter intervals. (It's worth noting, here, that this method could become *less* efficient than real-time processing, if the interval gets too short.) It might make sense to do both, in parallel, for different types of data. Some of the logs (audit, auth, error) are low-volume, while others (WWW access, mail info) generate stupid numbers of messages. And our tolerance for data loss varies considerably: Nobody really cares if we accidentally lose a fraction of our WWW access logs for some random half-hour period, but some of the audit and auth logs need to be guaranteed a little better. With the right Rsyslog configurations, we could double-relay the low-volume and high-value data, and either batch-backup the rest, or even just leave it be, for the most trivial data. >> RB suggested LVS as another possible alternative. I actually have some >> firsthand experience with LVS, having built some WWW server farms on >> it, and I love it--truly a great piece of software. But in this >> situation, I wonder whether LVS really adds anything. I'd have to >> dedicate at least 2 extra machines to load-balancing duties, and the >> added complexity of the ipvsadm, CARP, etc. configs. The >> dedicated/clustered LBs would have to reside on a single subnet, >> though we could direct traffic to syslog receivers anywhere. > > the load balancers themselves would need to be in one place, but could > then direct the traffic to multiple sites. > > would it make sense to have fairly local relay boxes (possibly in a HA > pair) that the local servers send their logs to, then these relay boxes > forward the logs to more central servers (possibly multiple such servers > in multiple locations for survivability)? I actually like the local relay idea, and we could probably configure things to address the log backup copy issue. After all, we ought to have at least one (hopefully two) receivers at each site (in case we lose a site, or an inter-site link). It would make sense for clients to default to using the closest (local) log receivers, if they're available. I don't think there's a really simple "best" answer, here, and just figuring out the optimal corner cases is going to take some thought. There are a lot of possible combinations of LVS, ClusterIP, DNS round-robin, and Rsyslog's own built-in failover mechanism, and right now my brain feels like a tumble-dryer full of possibilities. -Ryan _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

