so are you sending the logs to these destinations via UDP syslog? If so, you can
put a round-robin load balancer between your syslog relay and the destination,
set rebindinterval to 1000 (or 10,000) and the load will be spread evenly.
The thing I would be worried about from your description is that these systems
may loose messages that arrive too fast, even if only 1000 arrive in too short a
period (as an example)
several years ago I had a AIX server that if it was sent too many TCP
connections would ack the connection to the sender, and then forget about it,
causing the firewalls in between sender and receiver (who did not loose track of
the connection) to overload and crash. If the firewall had not crashed, the
senders would eventually crash (and in any case, made no progress to service
customers).
So I've seen cases where things react really badly to high volumes of
connections in a short time.
David Lang
On Wed, 23 Oct 2013, Robert McIntyre wrote:
Great questions, David. In a nutshell, it's all about load-balancing the
destinations. Say (for example) that my downstream recipients have a cap of
20K messages per second, I'm trying to respect those. There's some question
(I'm following up on) whether if I were to dump 60K into one for a second,
then switch to the next one for a second, and so forth, such that the
*average* rate was < 20K mps that would be OK. I'm assuming that it won't be,
so I need to continue to work towards getting a more true distribution.
In an ideal world, I'd be able to use TCP to forward to the destinations, and
just let standard TCP throttling take care of it. But, I'm stuck with UDP for
now, so can't do that. A true round-robin would be the gold standard, but
absent a "$messagecount" property, or global variables (or a full-fledged load
balancing feature, which I offhand don't think rsyslog needs, rather, it needs
to allow people to implement their own via the above methods), I'm left
pursuing these avenues.
As for latency, I'm actually OK with several minutes of latency. Not
concerned about message order, either.
Thanks!!!
Robert
Date: Wed, 23 Oct 2013 16:01:11 -0700
From: [email protected]
To: [email protected]
Subject: Re: [rsyslog] Another approach to action load balancing
On Wed, 23 Oct 2013, Robert McIntyre wrote:
So, I've had decent luck with Pavel's suggestion (field($timegenerated,':',3),
and it rotates around nicely based on the second.
I'm trying a slightly different approach, though, to try to get sub-second
rotation.
While I am in no way saying that you shouldn't continue trying to solve the
problem. I will ask if you really need the sub-second rotation in practice?
if you can rotate between outputs, and give each output it's own action queue,
with that queue having space for one second's worth of logs, then load balancing
across 3 outputs will give you 3x the throughput at the cost of delaying logs up
to ~3 seconds (if everything is on the edge of overload)
if it takes N seconds for one output to process 1 second's worth of logs, and
you spread the traffic across M outputs (where M > N to give you a little
headroom just in case), then your most delayed log will be delayed up N seconds
going to rotation every 1/10 second will change the delay to be N/10
This will cause your logs to arrive at the destination in a different order than
they were in the first place, but any load balancing scheme gives this
potential.
The batch mode processing in rsyslog where each worker thread grabs up to N
messages from the main queue and works on them while another worker thread grabs
the next up to N messages results in the same sort of thing (in this case N
defaults to 128 or 256 for the main queue, while a full second's worth of logs
will almost certinly be a much larger number :-)
When doing load balancing on the network layer, I try to set the rebind interval
large enough that it only rebalances once a second (or a handful of times per
second) and when I put logs into something that doesn't take a stream well (like
Splunk), I write the logs to disk and move them to the destination once a
minute, and even that ends up not being really significant in practice.
Yes, at lower traffic levels the balancing is choppy with different workers
going visibly idle for a bit, but finer grain balancing still has the workers
going idle, just for shorter periods that you are not seeing due to the course
granularity of your measurements :-)
if you balance per second across 10 outputs and you have 8 of them idle looking
at top with a refresh interval of 1, then changed to balancing every 1/10 second
you would still have 8 of them idle if you were able to set top to give you a
refresh interval of 1/10
the real question to think about is what is the maximum overall delay that you
can live with.
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.