Re: [rsyslog] Losing UDP packages

david Thu, 16 Dec 2010 06:42:15 -0800

On Thu, 16 Dec 2010, Dra?en Ka?ar wrote:

[email protected] wrote:

I haven't done tests on a recent version, butabout ayearago I
didextensive testing with UDP (early 5.x days)


make sure you disable name lookups.


I did.

I found that I was able to keep up with ~gig-E wire speed (380K logs/sec
with ~250 byte log messages) as long as the peak did not exceed the
memory queue size. At the time, rsyslog would process messages from the
queue at between 30K and 80K messages/sec (depending on the
configuration). I was testing this by using tcpreplay to send the
packets, 1,000,000 packets per test. In about 100 such tests I did not
experiance _any_ lost messages.


I believe it can be done without message loss. However, in my setup there
are other programs on the machine which can use a lot of CPU time, so
rsyslog might not be scheduled on a CPU fast enough. There is one CPU with
four cores, one of them will be taken by a SCHED_FIFO process belonging to
clusterware whenever it needs it and I'd like to do the same arrangement
for rsyslog's UDP receiver thread. That leaves two cores for everything
else, which should be perfectly fine. (There is a potential issue with
memory in case the machine starts swapping, but I'm not trying to solve
that right now.)

this is where increasing the kernel buffers can save you by giving youmore leeway in when rsyslog _needs_ cpu to avoid loss.

the other thing that you can do is to boost the priority of rsyslog withnice. ideally you would do this only to the thread that is handling theUDP reception, but I don't know a good way to find the pid of that thread.on my dedicated rsyslog boxes I boost the priority by 5 or 10 to make surethat housekeeping (and reporting) processes don't block reception.

I also don't have any control over the sending side, meaning it's very
hard for me to figure out what the peak is (or will be, once all senders
start sending).

make sure that you write the log with the sender's timestamp in the logmessage, then you can findthe peak by just using cut or awk to parse outthe timestamp and run it through sort |uniq -c |sort -n |tail to find themost common 10 seconds. even if it takes rsyslog a long time to output thelogs, this will still tell you the number sent (and therefor received) ina second.

Is there any way to get some statistics from rsyslog? It would be nice if
it could write some counters in a file every hour or so. For example: the
number of messages received by each of the input plugins and the number of
messages written by each of the output plugins.


there is a stats module that was created fairly recently.

Full SNMP support would be even nicer, but that's reasonably easy to
implement if all the counters exist. (There might be some issues with
Net-SNMP and multithreaded programs, though.)

I do have network receive buffers set to larger than normal values.


Yes, that would seem necessary.

So I don't think that the problem is on the receiver side.


Well, my receiver side has other processes competing for the CPU. I think
that can make a difference.

yes, if you don't get enough cpu time to process incoming logs beforeoverflowing the kernel buffers, the logs will be lost, boosting priorityand increasing buffer size will let you do this even when other things arerunning.

one other performance tweak, there is an option to tell rsyslog not tolookup the time for every message, but only do it every N messages whenthere are alog of messages to process. setting this can help (the biggestwin is at the beginning, 2 saves 1/2 the lookups of 1, 4 saves 3/4 thelookups, 10 saves 9/10 the lookups, etc)


David Lang

Rainer has been doing some tests recently on the sending side that may
indicate some issues. I haven't had time to get my test systems setup
again to try more tests.

David Lang

On Wed, 15 Dec 2010, Dra?en Ka?ar wrote:

I have rsyslog 5.6.2 which is a destination for a lot of traffic which is
currently using UDP as a transport protocol. We intend to switch some
sources to TCP, but UDP will still be used for a large chunk of data and I
can't do anything about that.

Current tests show that we lose 100-400 lines in every 100000 sent. That
isn't much and we could live with it, but I'd like to do something about
it, nevertheless.

My guess is that the kernel buffer fills up, rsyslog doesn't manage to
make room fast enough and then the kernel drops packages. I'm on CentOS
5.5. Kernel log shows lines like:

Dec  8 14:44:22 ssm1 kernel: Initalizing network drop monitor service
Dec 13 15:09:26 ssm1 kernel: Initalizing network drop monitor service

Curiously, the times do not match the ones for which I'm certain that the
package loss occured. But even so, the log shows that packet drop is
occuring on the system.

As far as I know, two things can be done.

1. Increase UDP receive buffer size. That should allow the system to
handle the bigger spike load than it's handling now, but that's about it.
There will be a spike load which is too large for any given buffer size.

I didn't manage to find a configuration option for this. I can increase
the system default, but I dislike doing that because it affects all
sockets on the system. If rsyslog really doesn't have this configuration
option, I'm willing to write a patch.

2. Put a thread that handles UDP socket in real-time class (or the closest
equivalent that a given OS has). The POSIX way for this is to use
sched_setscheduler() with SCHED_FIFO. There could be other non-portable
interfaces of interest (like priocntl() on Solaris), but I won't discuss
them right now.

The patch should add two configuration parameters:
 * one to define scheduling policy (SCHED_FIFO, SCHED_RR, etc.), let's
   call it $<object>QueueThreadSchedulingPolicy (although I'd like a
   shorter name)
 * and another to define sched_priority, let's call it
   $<object>QueueThreadSchedulingPriority

In case of SCHED_SPORADIC (and maybe other non-portable policies on some
operating systems) there are additional parameters that can be set, so
maybe the second parameter would turn out to be the string with several
numbers that needs to be parsed. Or there could be additional
configuration options introduced. I don't have a position about that and
would like to follow the existing rsyslog policy regarding configuration
options. Except that I don't know what that policy is, so I'd appreciate
if someone can enlighten me.

Anyway, the idea is to have a high priority thread (or a small number of
threads) sitting on a UDP socket whose sole purpose in life is to
transfer the messages from UDP socket to another queue(s), as fast as
possible. By default (if $<object>QueueThreadSchedulingPolicy is defined),
the number of threads put in that scheduling policy can be
$<object>QueueWorkerThreads, ie. the minimum number of worker threads.
There could be additional parameter to define the number of such threads,
but I'm not sure how to go about it.

There could be no imposed limit, so if an administator puts 128 in the
config file, 128 threads it is. But this is potentialy dangerous. Or there
could be rsyslog imposed restriction that the number of such threads cannot
exceed the number defined in $<object>QueueWorkerThreads, which is less
dangerous. I'm personally leaning to the second option.


Now, this whole thing is dangerous and must be implemented and used with
care and understanding. Currently I don't understand rsyslog queues well
enough, which is why I'm writing this mail. The UDP thread in FIFO class
should be used just to pick the packet from the socket and place it on
another action queue. Let's say we have the following configuration:

$Ruleset indata
$RulesetCreateMainQueue on

$UDPServerRun 514
$InputUDPServerBindRuleset indata

$MainMsgQueueThreadSchedulingPolicy fifo
$MainMsgQueueThreadSchedulingPriority 10

$ActionName local1
$ActionQueueType LinkedList
$ActionQueueFileName local1
$ActionResumeRetryCount -1
$ActionQueueSaveOnShutdown on
$ActionQueueDiscardSeverity 8

local1.*        |/path/to/fifo
&           /path/to/file


If I understand things correctly, there are two queues here. The first one
is ruleset indata queue with the UDP input module and the second one is
the action queue for the destinations (FIFO and file). I'm not sure about
the following:

1. After a $Ruleset directive followed by "$RulesetCreateMainQueue on",
are $MainMsgQueue* directives applied to that ruleset queue? It's not very
intuitive.

2. Say we add a TCP listener (or any other input module) to the above
configuration and we want UDP processing thread(s) to have SCHED_FIFO
priority, but other input handling threads to stay at the default
priority. I'm not sure if that can be specified with one ruleset. If two
rulesets must be specified, then I suppose the action directives must be
duplicated for each ruleset, which isn't very readable and
maintenance-friendly. (This is without taking omruleset plugin into
consideration).

3. The action queue is disk assisted. Which thread writes to disk when the
need arises? The one from the ruleset indata queue or the one from the
action queue?

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: [rsyslog] Losing UDP packages

Reply via email to