Rainer, Thanks for the fast responses. I know a lot of the is speculative, and therefore low-priority, so I really appreciate the attention to such a niche item.
On Wed, Dec 9, 2009 at 04:53, Rainer Gerhards <[email protected]> wrote: >> If so, does Rsyslog re-check a failed server every time it >> sends a log message, or does it have some kind of polling timeout, or >> is the check interval determined by something else entirely? > > It's an increasing interval (the longer it is down, the less frequently it is > probed with some upper bound). I think (but not sure) you can configure the > timers. Is the timer configuration option documented somewhere? I don't remember seeing it on the Wiki config sample page, but I might have missed it, in the docs. (If you don't know off the top of your head, don't worry about it, I'll go check the source code.) >> When a >> server fails, how long does it take Rsyslog to notice the failure and >> start re-directing messages to the next host? > > As soon as the socket call returns an error, what may not be very soon. > >> When a server recovers, >> does Rsyslog automatically notice the recovery, and if so, how long >> does it take to start re-directing messages to the primary host? > > see above - it depends on how long it was down. The lower bound is, I think, > 30 seconds. (Except that when a failure happens, one retry is done > immediately). By the way, does this apply to RELP connections, or just plain TCP syslog? >> In case anybody's wondering, I've include some examples of what I'm >> thinking of trying, here. Assume that there are three (3) central log >> receiver hosts, with each with a unique DNS name ('log-alfa', >> 'log-bravo', and 'log-charlie') that points to its own IP. Also, >> assume that the round-robin DNS name 'log' points to all three hosts' >> IP addresses, and that all of my log-generating hosts (the clients) >> are configured to pick a random IP when calling gethostbyname() on a >> round-robin name. >> >> *.* @@log >> $ActionExecOnlyWhenPreviousIsSuspended on >> & @@log >> & @@log >> & /var/log/localbuffer >> $ActionExecOnlyWhenPreviousIsSuspended off >> >> But if Rsyslog resolves all three instances of the name 'log' with a >> single call to gethostbyname(), this won't work. > > Not with a single call, but you never know who else queries DNS in the mean > time. So I'd say that configuration is at least unreliable. I strongly > recommend against it. Your recommendation is well taken--round-robin DNS comes with a lot of "gotchas". This is a pretty special case, though, since I can verify that the resolver actually caches all of the round-robin records, not just a random pick. So each individual host that calls calls gethostbyname() gets an a random selection, in response. But that cache behavior is not generally guaranteed, at all sites, which might bite somebody else who wants to try this failover strategy. Round-robin isn't a very well-implemented feature, especially on older platforms, and in my experience, it's not terribly reliable unless you have verified the operations of the client resolver, the resolver cache server, and probably the authoritative server, too. (Which is usually not the case, but I got lucky, here.) In any event, I'm planning to monitor the client behavior pretty closely, and to keep a pretty close eye on the distribution of clients across the Rsyslog servers. If it won't stay in balance, I may have to go back to the drawing board, anyway. I'll post about my experiences, when I have some data. >> It would nice to have a random variant of the >> '$ActionExecOnlyWhenPreviousIsSuspended' functionality. As a >> hypothetical example I just cooked up off the top of my head: >> >> *.* @@log-alfa >> $ActionExecPickRandom on >> $ActionExecPickRandomRetryWait 1 >> $ActionExecPickRandomRetryLimit -1 >> & @@log-bravo >> & @@log-charlie >> $ActionExecPickRandom off >> >> where Rsyslog randomly selects one of log-alfa, log-bravo, and >> log-charlie, initially, and then makes another random pick at >> failover, etc. I can think of all sorts of nifty options and >> configurable knobs that would come in handy, here, too. >> > > That's pretty interesting, but I think community demand is very low for this > feature. So it is unlikely that I will find time to implement it in the > foreseeable future (if others like that feature, too, please make yourself > heard! It may influence priorities, what means that something else does not > get done ;)). > > I'll provide more info on my schedule with a separate mail. I'm curious about the level of demand, but I suspect you're probably right. As far as I can figure, this kind of "clustered-failover-with-load-distribution" would only be worth a lot of effort when you have (A) a large enough group of high-volume log-generating hosts to overwhelm a single log-receiver (hence the need for load-distribution), and (B) a strict no-loss requirement for the log data (hence the need for failover). Most sites probably don't have both those requirements. But, then again, most of the sysadmin objections I hear to purely centralized logging are vague, uninformed variations on the theme of "That's not how I learned it, and it makes me uncomfortable". Historically, there were good reasons not to trust networked syslog, even over TCP. But Rsyslog is systematically dismantling those barriers, with RELP, queues, and failover. I can imagine that as first-hand sysadmin knowledge of these techniques spreads, we will see an inflection point in the adoption rate curve, and it will eventually become a standard practice. But that's enough speculation. Hopefully, some other Rsyslog users will be able to weigh in. -Ryan _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

