Rainer,

Thanks for the fast responses. I know a lot of the is speculative, and
therefore low-priority, so I really appreciate the attention to such a
niche item.

On Wed, Dec 9, 2009 at 04:53, Rainer Gerhards <[email protected]> wrote:
>> If so, does Rsyslog re-check a failed server every time it
>> sends a log message, or does it have some kind of polling timeout, or
>> is the check interval determined by something else entirely?
>
> It's an increasing interval (the longer it is down, the less frequently it is
> probed with some upper bound). I think (but not sure) you can configure the
> timers.

Is the timer configuration option documented somewhere? I don't
remember seeing it on the Wiki config sample page, but I might have
missed it, in the docs. (If you don't know off the top of your head,
don't worry about it, I'll go check the source code.)


>> When a
>> server fails, how long does it take Rsyslog to notice the failure and
>> start re-directing messages to the next host?
>
> As soon as the socket call returns an error, what may not be very soon.
>
>> When a server recovers,
>> does Rsyslog automatically notice the recovery, and if so, how long
>> does it take to start re-directing messages to the primary host?
>
> see above - it depends on how long it was down. The lower bound is, I think,
> 30 seconds. (Except that when a failure happens, one retry is done
> immediately).

By the way, does this apply to RELP connections, or just plain TCP syslog?


>> In case anybody's wondering, I've include some examples of what I'm
>> thinking of trying, here. Assume that there are three (3) central log
>> receiver hosts, with each with a unique DNS name ('log-alfa',
>> 'log-bravo', and 'log-charlie') that points to its own IP. Also,
>> assume that the round-robin DNS name 'log' points to all three hosts'
>> IP addresses, and that all of my log-generating hosts (the clients)
>> are configured to pick a random IP when calling gethostbyname() on a
>> round-robin name.
>>
>>     *.* @@log
>>     $ActionExecOnlyWhenPreviousIsSuspended on
>>     & @@log
>>     & @@log
>>     & /var/log/localbuffer
>>     $ActionExecOnlyWhenPreviousIsSuspended off
>>
>> But if Rsyslog resolves all three instances of the name 'log' with a
>> single call to gethostbyname(), this won't work.
>
> Not with a single call, but you never know who else queries DNS in the mean
> time. So I'd say that configuration is at least unreliable. I strongly
> recommend against it.

Your recommendation is well taken--round-robin DNS comes with a lot of
"gotchas". This is a pretty special case, though, since I can verify
that the resolver actually caches all of the round-robin records, not
just a random pick. So each individual host that calls calls
gethostbyname() gets an a random selection, in response.

But that cache behavior is not generally guaranteed, at all sites,
which might bite somebody else who wants to try this failover
strategy. Round-robin isn't a very well-implemented feature,
especially on older platforms, and in my experience, it's not terribly
reliable unless you have verified the operations of the client
resolver, the resolver cache server, and probably the authoritative
server, too. (Which is usually not the case, but I got lucky, here.)

In any event, I'm planning to monitor the client behavior pretty
closely, and to keep a pretty close eye on the distribution of clients
across the Rsyslog servers. If it won't stay in balance, I may have to
go back to the drawing board, anyway. I'll post about my experiences,
when I have some data.


>> It would nice to have a random variant of the
>> '$ActionExecOnlyWhenPreviousIsSuspended' functionality. As a
>> hypothetical example I just cooked up off the top of my head:
>>
>>     *.* @@log-alfa
>>     $ActionExecPickRandom on
>>     $ActionExecPickRandomRetryWait 1
>>     $ActionExecPickRandomRetryLimit -1
>>     & @@log-bravo
>>     & @@log-charlie
>>     $ActionExecPickRandom off
>>
>> where Rsyslog randomly selects one of log-alfa, log-bravo, and
>> log-charlie, initially, and then makes another random pick at
>> failover, etc. I can think of all sorts of nifty options and
>> configurable knobs that would come in handy, here, too.
>>
>
> That's pretty interesting, but I think community demand is very low for this
> feature. So it is unlikely that I will find time to implement it in the
> foreseeable future (if others like that feature, too, please make yourself
> heard! It may influence priorities, what means that something else does not
> get done ;)).
>
> I'll provide more info on my schedule with a separate mail.

I'm curious about the level of demand, but I suspect you're probably
right. As far as I can figure, this kind of
"clustered-failover-with-load-distribution" would only be worth a lot
of effort when you have (A) a large enough group of high-volume
log-generating hosts to overwhelm a single log-receiver (hence the
need for load-distribution), and (B) a strict no-loss requirement for
the log data (hence the need for failover). Most sites probably don't
have both those requirements.

But, then again, most of the sysadmin objections I hear to purely
centralized logging are vague, uninformed variations on the theme of
"That's not how I learned it, and it makes me uncomfortable".
Historically, there were good reasons not to trust networked syslog,
even over TCP. But Rsyslog is systematically dismantling those
barriers, with RELP, queues, and failover. I can imagine that as
first-hand sysadmin knowledge of these techniques spreads, we will see
an inflection point in the adoption rate curve, and it will eventually
become a standard practice.

But that's enough speculation. Hopefully, some other Rsyslog users
will be able to weigh in.

-Ryan
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to