In the doc example, the server "primary-syslog.example.com" is always tried first, unless it's down, followed by "secondary-1-syslog.example.com", and then finally "secondary-2-syslog.example.com" only if both of the others are down. Here's the doc in question, BTW:
* http://wiki.rsyslog.com/index.php/FailoverSyslogServer 1. Is it possible to configure Rsyslog to pick randomly from a list of remote syslog destinations instead of following the listed order? If I have a large population of log generating hosts, I'd like to be able to distribute the total load amongst two or more "central" log receivers, with failover if a receiver dies. Assuming that we have a sound random-selection implementation, and a large enough population of hosts generating approximately equal amounts of log data, I should get an approximately even distribution across my receiver hosts. In the event that a receiver dies, its clients would randomly try another receiver, keeping the load close to even. Is this possible, with the existing failover mechanism? 2. How does Rsyslog handle a round-robin DNS hostname (i.e., a single A record resolves to multiple IP addresses), if that hostname is a remote log destination? Does it just call gethostbyname()? 3. If Rsyslog does just call gethostbyname(), how does it handle multiple occurrences of the same hostname in the config file? Does it call gethostbyname() once for each instance of a duplicate name, or does it perform a single lookup and use the same result for all? 4. When and how does the failover code check for remote destination failures? Does it re-check failed servers after they're declared "failed"? If so, does Rsyslog re-check a failed server every time it sends a log message, or does it have some kind of polling timeout, or is the check interval determined by something else entirely? When a server fails, how long does it take Rsyslog to notice the failure and start re-directing messages to the next host? When a server recovers, does Rsyslog automatically notice the recovery, and if so, how long does it take to start re-directing messages to the primary host? In case anybody's wondering, I've include some examples of what I'm thinking of trying, here. Assume that there are three (3) central log receiver hosts, with each with a unique DNS name ('log-alfa', 'log-bravo', and 'log-charlie') that points to its own IP. Also, assume that the round-robin DNS name 'log' points to all three hosts' IP addresses, and that all of my log-generating hosts (the clients) are configured to pick a random IP when calling gethostbyname() on a round-robin name. *.* @@log $ActionExecOnlyWhenPreviousIsSuspended on & @@log & @@log & /var/log/localbuffer $ActionExecOnlyWhenPreviousIsSuspended off But if Rsyslog resolves all three instances of the name 'log' with a single call to gethostbyname(), this won't work. I could probably hack around it with some additional round robin entries similar to 'log', but named 'log-all-1', 'log-all-2', 'log-all-3', etc., to force an independent lookup attempt for each: *.* @@log $ActionExecOnlyWhenPreviousIsSuspended on & @@log-all-1 & @@log-all-2 & @@log-all-3 & @@log-all-4 & @@log-all-5 & @@log-all-6 ... & /var/log/localbuffer $ActionExecOnlyWhenPreviousIsSuspended off But this is pretty messy. First, there's a lot of extra DNS records to maintain, which is a real pain. Second, I have to use a lot of extra round-robin names (definitely more than 3) to maximize the probability that gethostbyname() will return at least one working server before reaching the end of the list. (gethostbyname() isn't aware of downed hosts, nor does it enforce balance in its responses, so it might return the same bad server several times in a row.) It would nice to have a random variant of the '$ActionExecOnlyWhenPreviousIsSuspended' functionality. As a hypothetical example I just cooked up off the top of my head: *.* @@log-alfa $ActionExecPickRandom on $ActionExecPickRandomRetryWait 1 $ActionExecPickRandomRetryLimit -1 & @@log-bravo & @@log-charlie $ActionExecPickRandom off where Rsyslog randomly selects one of log-alfa, log-bravo, and log-charlie, initially, and then makes another random pick at failover, etc. I can think of all sorts of nifty options and configurable knobs that would come in handy, here, too. I'm planning to try out my first two examples, tomorrow, to see what works. If anybody has any comments, I'd love to hear them. Ryan B. Lynch [email protected] _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

