I haven't done tests on a recent version, butabout ayearago I didextensive testing with UDP (early 5.x days)

make sure you disable name lookups.

I found that I was able to keep up with ~gig-E wire speed (380K logs/sec with ~250 byte log messages) as long as the peak did not exceed the memory queue size. At the time, rsyslog would process messages from the queue at between 30K and 80K messages/sec (depending on the configuration). I was testing this by using tcpreplay to send the packets, 1,000,000 packets per test. In about 100 such tests I did not experiance _any_ lost messages.

I do have network receive buffers set to larger than normal values.

So I don't think that the problem is on the receiver side.

Rainer has been doing some tests recently on the sending side that may indicate some issues. I haven't had time to get my test systems setup again to try more tests.

David Lang

On Wed, 15 Dec 2010, Dra?en Ka?ar wrote:

I have rsyslog 5.6.2 which is a destination for a lot of traffic which is
currently using UDP as a transport protocol. We intend to switch some
sources to TCP, but UDP will still be used for a large chunk of data and I
can't do anything about that.

Current tests show that we lose 100-400 lines in every 100000 sent. That
isn't much and we could live with it, but I'd like to do something about
it, nevertheless.

My guess is that the kernel buffer fills up, rsyslog doesn't manage to
make room fast enough and then the kernel drops packages. I'm on CentOS
5.5. Kernel log shows lines like:

Dec  8 14:44:22 ssm1 kernel: Initalizing network drop monitor service
Dec 13 15:09:26 ssm1 kernel: Initalizing network drop monitor service

Curiously, the times do not match the ones for which I'm certain that the
package loss occured. But even so, the log shows that packet drop is
occuring on the system.

As far as I know, two things can be done.

1. Increase UDP receive buffer size. That should allow the system to
handle the bigger spike load than it's handling now, but that's about it.
There will be a spike load which is too large for any given buffer size.

I didn't manage to find a configuration option for this. I can increase
the system default, but I dislike doing that because it affects all
sockets on the system. If rsyslog really doesn't have this configuration
option, I'm willing to write a patch.

2. Put a thread that handles UDP socket in real-time class (or the closest
equivalent that a given OS has). The POSIX way for this is to use
sched_setscheduler() with SCHED_FIFO. There could be other non-portable
interfaces of interest (like priocntl() on Solaris), but I won't discuss
them right now.

The patch should add two configuration parameters:
 * one to define scheduling policy (SCHED_FIFO, SCHED_RR, etc.), let's
   call it $<object>QueueThreadSchedulingPolicy (although I'd like a
   shorter name)
 * and another to define sched_priority, let's call it
   $<object>QueueThreadSchedulingPriority

In case of SCHED_SPORADIC (and maybe other non-portable policies on some
operating systems) there are additional parameters that can be set, so
maybe the second parameter would turn out to be the string with several
numbers that needs to be parsed. Or there could be additional
configuration options introduced. I don't have a position about that and
would like to follow the existing rsyslog policy regarding configuration
options. Except that I don't know what that policy is, so I'd appreciate
if someone can enlighten me.

Anyway, the idea is to have a high priority thread (or a small number of
threads) sitting on a UDP socket whose sole purpose in life is to
transfer the messages from UDP socket to another queue(s), as fast as
possible. By default (if $<object>QueueThreadSchedulingPolicy is defined),
the number of threads put in that scheduling policy can be
$<object>QueueWorkerThreads, ie. the minimum number of worker threads.
There could be additional parameter to define the number of such threads,
but I'm not sure how to go about it.

There could be no imposed limit, so if an administator puts 128 in the
config file, 128 threads it is. But this is potentialy dangerous. Or there
could be rsyslog imposed restriction that the number of such threads cannot
exceed the number defined in $<object>QueueWorkerThreads, which is less
dangerous. I'm personally leaning to the second option.


Now, this whole thing is dangerous and must be implemented and used with
care and understanding. Currently I don't understand rsyslog queues well
enough, which is why I'm writing this mail. The UDP thread in FIFO class
should be used just to pick the packet from the socket and place it on
another action queue. Let's say we have the following configuration:

$Ruleset indata
$RulesetCreateMainQueue on

$UDPServerRun 514
$InputUDPServerBindRuleset indata

$MainMsgQueueThreadSchedulingPolicy fifo
$MainMsgQueueThreadSchedulingPriority 10

$ActionName local1
$ActionQueueType LinkedList
$ActionQueueFileName local1
$ActionResumeRetryCount -1
$ActionQueueSaveOnShutdown on
$ActionQueueDiscardSeverity 8

local1.*        |/path/to/fifo
&           /path/to/file


If I understand things correctly, there are two queues here. The first one
is ruleset indata queue with the UDP input module and the second one is
the action queue for the destinations (FIFO and file). I'm not sure about
the following:

1. After a $Ruleset directive followed by "$RulesetCreateMainQueue on",
are $MainMsgQueue* directives applied to that ruleset queue? It's not very
intuitive.

2. Say we add a TCP listener (or any other input module) to the above
configuration and we want UDP processing thread(s) to have SCHED_FIFO
priority, but other input handling threads to stay at the default
priority. I'm not sure if that can be specified with one ruleset. If two
rulesets must be specified, then I suppose the action directives must be
duplicated for each ruleset, which isn't very readable and
maintenance-friendly. (This is without taking omruleset plugin into
consideration).

3. The action queue is disk assisted. Which thread writes to disk when the
need arises? The one from the ruleset indata queue or the one from the
action queue?


_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to