On Thu, Dec 10, 2009 at 08:25,  <[email protected]> wrote:
> On Wed, 9 Dec 2009, Ryan Lynch wrote:
>> I've learned something about DNS round-robin, in the past 24 hours:
>> Apparently, everybody hates it!
>
> actually, in my case it's more a matter that doing a DNS lookup is a
> fairly slow operation, so it's something to avoid if possible.

Makes sense to me.


>> In my initial response to David, I was a little ambigious as to what
>> "central" log servers means: The hosts RECEIVING syslog messages are
>> all on different subnets, at different sites. I quoted the word
>> "central" because the logs are centralized, not the servers
>> themselves. Ideally, the servers are dispersed far and wide, for
>> better survivability. So ClusterIP might still play a role, but it
>> wouldn't be a drop-in replacement for a round-robin. I'm not sure,
>> yet, how much of the initial, shared-nothing design I want to
>> compromise--it's going to be a judgement call.
>
> unless you send copies of all your logs to all your servers do you really
> get the survivability that you need? (I don't know what you are trying
> for)

That's a very good question. If we divide the log data across a pool
of N log-receiving hosts, a catastrophic failure of one receiver would
blow away (1/N) of our logs. And like RAID striping, the probability
of a single-element failure increases as we add more receivers. I can
mitigate that risk by building the receivers on redundant RAID arrays,
using battery-backed cache and reliable journaling filesystems, but
the risk of a single receiver losing data is still there. So it's not
as simple as saying "let's not put all our eggs in one basket", if
losing even a single egg is a problem. (Our logs aren't all quite that
important, but we don't want to ignore the issue completely.) I have a
couple of ideas, to start with.

My first idea is probably overkil: Secondary relaying between the
log-receiving hosts. I could configure the receivers' Rsyslog daemons
to write edge server log data to local file, but then to also send a
copy to another receiver. The distribution of secondary relay targets
could be statically assigned, or I could imagine using another
clustering mechanism to make it handle failover conditions
automatically. We'd have to be careful of logging "loops", where two
servers keep sending the same log data back and forth to each other,
but that's pretty easy if you know to watch out for it. The real
problem, I think, is doubling the amount of storage, network
bandwidth, disk bandwidth, and CPU time that we would otherwise need
to handle a single copy of the edge logs in real time.

A second possibility is to periodically batch compress/copy each
receiver's local files to another receiver, or to some other backup
location. We still use some extra CPU, network bandwidth, and storage,
but batch processing and compression will probably increase the
per-message efficiency. I think we might actually use more disk IO,
though, because the batch processing needs a separate read operation
to grab the already-written files on disk (whereas real time
processing already has the message in memory when it makes the 2nd
copy). We could still lose some data to a receiver failure, from the
interval between batches, so it would be important to consider the
costs of data loss versus the lower efficiency with shorter intervals.
(It's worth noting, here, that this method could become *less*
efficient than real-time processing, if the interval gets too short.)

It might make sense to do both, in parallel, for different types of
data. Some of the logs (audit, auth, error) are low-volume, while
others (WWW access, mail info) generate stupid numbers of messages.
And our tolerance for data loss varies considerably: Nobody really
cares if we accidentally lose a fraction of our WWW access logs for
some random half-hour period, but some of the audit and auth logs need
to be guaranteed a little better. With the right Rsyslog
configurations, we could double-relay the low-volume and high-value
data, and either batch-backup the rest, or even just leave it be, for
the most trivial data.


>> RB suggested LVS as another possible alternative. I actually have some
>> firsthand experience with LVS, having built some WWW server farms on
>> it, and I love it--truly a great piece of software. But in this
>> situation, I wonder whether LVS really adds anything. I'd have to
>> dedicate at least 2 extra machines to load-balancing duties, and the
>> added complexity of the ipvsadm, CARP, etc. configs. The
>> dedicated/clustered LBs would have to reside on a single subnet,
>> though we could direct traffic to syslog receivers anywhere.
>
> the load balancers themselves would need to be in one place, but could
> then direct the traffic to multiple sites.
>
> would it make sense to have fairly local relay boxes (possibly in a HA
> pair) that the local servers send their logs to, then these relay boxes
> forward the logs to more central servers (possibly multiple such servers
> in multiple locations for survivability)?

I actually like the local relay idea, and we could probably configure
things to address the log backup copy issue. After all, we ought to
have at least one (hopefully two) receivers at each site (in case we
lose a site, or an inter-site link). It would make sense for clients
to default to using the closest (local) log receivers, if they're
available.

I don't think there's a really simple "best" answer, here, and just
figuring out the optimal corner cases is going to take some thought.
There are a lot of possible combinations of LVS, ClusterIP, DNS
round-robin, and Rsyslog's own built-in failover mechanism, and right
now my brain feels like a tumble-dryer full of possibilities.

-Ryan
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to