UDP is very reliable over a local network, and introducing the reliability of
TCP can actually backfire on you, because it means that problems on the consumer
propogate back. The question to ask is
If something goes wrong with my logging, would I rather have everything stop,
or loose logs?
There are valid cases for having things stop, and you can also decide that you
can introduce enough buffering (with the rsyslog queues) that you can react
after a problem starts but before it gets to the point of halting systems.
I like to keep UDP syslog in the system at a couple points to provide the 'slip'
for when (not if) things go wrong.
Adding more reliability in each component doesn't always result in more
reliability for the system as a whole :-)
David Lang
On Wed, 23 Oct 2013, Robert J. McIntyre wrote:
Thanks for your suggestion... Definitely looking at a hardware/software RR
load balancer. In the meantime, also looking for other solutions using
existing technologies. Also looking at shifting to TCP and using rsyslog's
queuing. That would be my preferred approach for several reasons
(reliability, etc.).
Thanks!
Robert
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of David Lang
Sent: Wednesday, October 23, 2013 4:56 PM
To: rsyslog-users
Subject: Re: [rsyslog] Another approach to action load balancing
so are you sending the logs to these destinations via UDP syslog? If so, you
can put a round-robin load balancer between your syslog relay and the
destination, set rebindinterval to 1000 (or 10,000) and the load will be
spread evenly.
The thing I would be worried about from your description is that these
systems may loose messages that arrive too fast, even if only 1000 arrive in
too short a period (as an example)
several years ago I had a AIX server that if it was sent too many TCP
connections would ack the connection to the sender, and then forget about
it, causing the firewalls in between sender and receiver (who did not loose
track of the connection) to overload and crash. If the firewall had not
crashed, the senders would eventually crash (and in any case, made no
progress to service customers).
So I've seen cases where things react really badly to high volumes of
connections in a short time.
David Lang
On Wed, 23 Oct 2013, Robert McIntyre wrote:
Great questions, David. In a nutshell, it's all about load-balancing
the destinations. Say (for example) that my downstream recipients
have a cap of 20K messages per second, I'm trying to respect those.
There's some question (I'm following up on) whether if I were to dump
60K into one for a second, then switch to the next one for a second,
and so forth, such that the
*average* rate was < 20K mps that would be OK. I'm assuming that it
won't be, so I need to continue to work towards getting a more true
distribution.
In an ideal world, I'd be able to use TCP to forward to the
destinations, and just let standard TCP throttling take care of it.
But, I'm stuck with UDP for now, so can't do that. A true round-robin
would be the gold standard, but absent a "$messagecount" property, or
global variables (or a full-fledged load balancing feature, which I
offhand don't think rsyslog needs, rather, it needs to allow people to
implement their own via the above methods), I'm left pursuing these
avenues.
As for latency, I'm actually OK with several minutes of latency. Not
concerned about message order, either.
Thanks!!!
Robert
Date: Wed, 23 Oct 2013 16:01:11 -0700
From: [email protected]
To: [email protected]
Subject: Re: [rsyslog] Another approach to action load balancing
On Wed, 23 Oct 2013, Robert McIntyre wrote:
So, I've had decent luck with Pavel's suggestion
(field($timegenerated,':',3), and it rotates around nicely based on the
second.
I'm trying a slightly different approach, though, to try to get
sub-second rotation.
While I am in no way saying that you shouldn't continue trying to
solve the problem. I will ask if you really need the sub-second rotation
in practice?
if you can rotate between outputs, and give each output it's own
action queue, with that queue having space for one second's worth of
logs, then load balancing across 3 outputs will give you 3x the
throughput at the cost of delaying logs up to ~3 seconds (if
everything is on the edge of overload)
if it takes N seconds for one output to process 1 second's worth of
logs, and you spread the traffic across M outputs (where M > N to
give you a little headroom just in case), then your most delayed log
will be delayed up N seconds
going to rotation every 1/10 second will change the delay to be N/10
This will cause your logs to arrive at the destination in a different
order than they were in the first place, but any load balancing
scheme gives this potential.
The batch mode processing in rsyslog where each worker thread grabs
up to N messages from the main queue and works on them while another
worker thread grabs the next up to N messages results in the same
sort of thing (in this case N defaults to 128 or 256 for the main
queue, while a full second's worth of logs will almost certinly be a
much larger number :-)
When doing load balancing on the network layer, I try to set the
rebind interval large enough that it only rebalances once a second
(or a handful of times per
second) and when I put logs into something that doesn't take a stream
well (like Splunk), I write the logs to disk and move them to the
destination once a minute, and even that ends up not being really
significant in practice.
Yes, at lower traffic levels the balancing is choppy with different
workers going visibly idle for a bit, but finer grain balancing still
has the workers going idle, just for shorter periods that you are not
seeing due to the course granularity of your measurements :-)
if you balance per second across 10 outputs and you have 8 of them
idle looking at top with a refresh interval of 1, then changed to
balancing every 1/10 second you would still have 8 of them idle if
you were able to set top to give you a refresh interval of 1/10
the real question to think about is what is the maximum overall delay
that you can live with.
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE
WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE
WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This
is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our
control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.