Re: [rsyslog] Optimizing rules for multi threading

David Lang Tue, 11 Jun 2013 17:05:19 -0700

On Tue, 11 Jun 2013, Xuri Nagarin wrote:

First, David's suggestion of using a variable like "set $!vendor =
field($msg, 124, 2)" worked really well in reducing CPU utilization. My
packet loss is now zero :)


On Fri, Jun 7, 2013 at 4:59 AM, Rainer Gerhards <[email protected]>wrote:

On Thu, Jun 6, 2013 at 10:04 PM, David Lang <[email protected]> wrote:

On Thu, 6 Jun 2013, Xuri Nagarin wrote:

 I am trying to translate log flow from syslog-ng to rsyslog.


I have a few syslog sources that collect logs from all over the
infrastructure and deliver it to central syslog servers that in turn
deliver logs to different consumers.

At the moment, I have two consumers - one reads flat files and other
listens on TCP ports. For both, I need to tell them what type of event

am

I
delivering to them - Linux auth, firewall event, or web log.

Something like this:

LogFireHose====> SyslogNG/RSyslog =========>
                        (Parse and redirect events by type)||==> If

(Cisco

ASA) write to "FS (Cisco/ASA) & TCP DstHost: 5000"

||==> If (Apache Access) write to "FS (Apache/Access) & TCP DstHost:

5001"


||==> If (DNS logs) write to "FS (Bind/DNS) & TCP DstHost: 5002"


In Syslog-NG, every incoming message (in CEF format) is subject to a
parser
that splits the log message into eight fields. Fields 2 and 3 that are
vendor and product type are used to generate a template like
"/var/log/$vendor/$product/**logfile".

To deliver events, by type, to a specific network destination requires
filters and I have 30+ different vendor/product combinations. So I end

up

with 30+ log() statements, each with it's filter logic.
----------xxxxxxxxxxxxxxx-----**-----------
filter f1 (if $product contains "ASA")
filter f2 (if $product contains "ACCESS")
filter f3 (if $product contains "DNS")
...
..
filter 35 (if field3 contains "blah")

log (src=tcp;filter f1; dst=/var/log/$vendor/$product/**logfile;
dst=remotehost:5000)
log (src=tcp;filter f2; dst=/var/log/$vendor/$product/**logfile;
dst=remotehost:5001)
...
....
log (src=tcp;filter fx, dst=/var/log/$vendor/$product/**logfile;
dst=remotehost:5030)
----------xxxxxxxxxxxxxxx-----**-----------


In RSyslog, I have so far written the logic to write to filesystem like
this:
--------------xxxxxxxxxxxxx---**------------
template(name="cefdynfile" type="string"
string="/var/log/joe/%msg:F,**124:2%/%msg:F,124:3%/logfile")

ruleset(name="tcpcef") {
if $syslogtag=="CEF:" then { action (type="omfile" FileOwner="joe"
FileGroup="joe" DirOwner="joe" DirGroup="joe" DirCreateMode="0755"
FileCreateMode="0644" DynaFile="cefdynfile") stop }
}

module(load="imtcp") # needs to be done just once
input(type="imtcp" port="514" ruleset="tcpcef")
--------------xxxxxxxxxxxxx---**------------

Now, I am thinking how do I add rules for delivering events to tcp
destinations.

I could expand the "tcpcef" ruleset and add more "if condition then
{action()}" statements to it, OR
I can write multiple rulesets, one for each filter like
"rule f1 { if $msg contains "blah" then action()}"
"rule f2 { if $msg contains "foo" then action()}"

and then call these rules from "tcpcef" ruleset:

ruleset(name="tcpcef") {
call f1
call f2
...
...
call fx }

So two questions (1) Does this seem like a good way to parse/route
messages?


Well, if you are getting your messages in the CEE format, it would seem

to

me to make more sense to invoke the mmjson parsing module to extract all
the fields. That way you aren't doing string searches, you can just do an
if on the particular variable that you are looking for. I would expect

this

to be faster once you have a few different filters.

I never understood why in syslog-ng you need to name an if-condition and
then use that name. In any case, in rsyslog you should just naturally use
the conditions as they are needed. Creating extra rulesets and calling them
(without reuse) is defintely the wrong way to go.


I haven't looked at the code but from troubleshooting TCP packet collapse
issue extensively with Syslog-NG, I came to the conclusion that it
associates all actions related to a particular source with one thread. So,
like me, if you have one BIG source and multiple filters written to parse
that one big stream then all that gets associated with one thread/core.
That eventually leads to core saturation and packets are left unreaped in
the kernel tcp buffers. I tried to isolate the problem by using ramdisk and
essentially saw the same problem. Now this is all conjecture based on
several days of monitoring all sorts of OS parameters and I may have
written up the config very wrong in Syslog-NG. But the same logic ported to
RSyslog is working really well with zero packet collapse errors.


rsyslog has a very differnt architecture.

the receiving thread is only responsible for receiving the message, parsing outthe standard properties, and putting the resulting message into the main queue.

Other threads then do the work of pulling messages out of that queue, applyingrule to them, formatting the output, and outputting the message

This makes troubleshooting lost messages a bit more interesting as you have tofigure out if the messages are being lost because the input thread can't keep upor the output thread can't keep up.

if it's the output threads, there are lots of things that you can do to try anddeal with the issues.

one thing is to create additional queues so that instead of trying to deal withall the outputs in one thread, you have a thread that just moves messages fromthe main queue to a couple of output (action) queues. Then each output queuewill have it's own threads working to output the messages from that queue.

Rsyslog allows the output threads to process multiple messages at a time (batchsize), which greatly reduces the locking on the queues.

Reducing the amount of work done to deliver the messages (through betterrulesets) is always something to look at.


To be honest, I used to prefer Syslog-NG over RSyslog because (1) I found
config easier to write/understand for NG (2) Documentation is better.

But it is hard to argue with performance :D

Now that I've gotten hang of the config syntax, I am comfortable with
RSyslog too but the way config variables work still leaves a lot to be
desired in terms of documentation and example configs.

There's a lot of bits of examples around, but it's hard to find them when youdon't know roughly what you're looking for. Rainer concentrates on getting thecode written and documentation is definantly secondary. We can always use morehelp :-)

I'm writing a series of articles for ;login on logging, and will be doing abunch of examples as I go along. i'll keep everyone posted as thigs getwritten/published

 (2) Which way is better for multi-threading? I read that each ruleset

gets

it own queue and thread so I am thinking defining multiple rulesets and
then calling them for a master ruleset might offer better performance.


There's overhead to rulesets as well.

action can have their own queues as well. But with the simple file  writer,
it totally makes no sense to create rules. Writing to files is so fast,
there is no need to de-couple it from main procesing (except, maybe, if
writing to NFS or some other network destination). If you run into a
CPU-Bound, it's totally sufficient to increase the max number of worker
threads for the main queue. This queue will handle all you need.

In my use case, for every inbound event three things happen:
1. It needs to be parsed and classified based on certain fields/content.
2. Based on classification, it gets written to disk in it's own unique
folder/file structure. Here, I have Splunk tailing the log file. For each
log file, Splunk needs to be told the type of application it is reading so
corresponding index can be created on the splunk side.

Rather than having Splunk tail the file (large files are still not great on ext*filesystems, in spite of the improvements over the years), I use one of twoapproaches

1. I write the file and then roll the file every minute, when I roll it I movethe file to a Splunk sinkhole directory. Splunk reads the file, then once it'sprocessed, Splunk deletes the file.

2. I output the file to a named pipe that Splunk reads as it's input. barring alittle kernel buffering, once the message is written to the pipe, I can considerthat Splunk has the message (and since Splunk is not crash safe anyway, if itcrashes, some logs will be lost. for that matter, rsyslog will loose logs if itcrashes under normal configurations)

I like to deliver the logs via syslog all the way to the Splunk Indexer machinesand only hand it to Splunk at that point. This also simplifies the configurationof my central box as it can just forward all the logs to the Splunk box, and thersyslog on the Splunk box has a simpler config as it can ignore all messages notgoing into Splunk.

3. Based on classification, it gets forwarded over TCP to a Hadoop/Flume
listener. Each application type has a flume agent associated with it that
writes the event to it's own folder/file in HDFS. So I have 30+ application
types and there is an agent/port associated with each app type.

Decoupling each action rule from event receiver thread and each filter rule
allows rsyslog to adapt to however a destination behaves. For example, if a
flume agent crashes and stops listening or slows down, it should not slow
down rsyslog from reading events from senders or applying filters to
incoming events. Basically, isolate different components from each other in
case of various IO failures.

the question is just how finely you should decouple everythng. Each action queueyou create eats more memory. It's a balancing act between decoupling everythingand resource consuption.

Remember that premature optimization is the root of all evil. Set it up
the simple way (single ruleset) and only look at changing it if you find
that it isn't fast enough.

using the json parser can both speed things up, but more importantly, it
makes the rules easier to write and cleaner.

As a reply to a later message: I'll see if I can quickly hack together a
plugin that does field extraction for all fields at once. That would offer
better performance, BUT only if you actually NEED the MAJORITY of fields.
In any case, CEF is popular enough to provide such functionality and it is
on my whishlist for quite a while - no promise though!

Actually, we are moving away from CEF because HP Arcsight is such a PITA
:-) Today, all our sources log into Arcsight Loggers and Loggers send a
stream to Syslog and then Syslog distributes to Splunk and Hadoop. Once we
kill the Loggers, all sources will directly feed into Syslog. Then I will
have to parse messages and extract PROGNAME to do routing decisions. So I
do not foresee a need to read ALL or MAJORITY of the fields. I am hoping
PROGNAME and maybe one more will be sufficient. What I will need is some
way to sanity check the event so I can avoid passing on bad messages or
events to my consumers (Splunk and Hadoop).


anything that puts messages into some sort of 'standard' is good :-)

CEF seems to just have a handful of fields, but some of the 'fields' thencontain complex structures of their own. I think you will be using many/most ofthe CEF fields, but not needing to parse out the subfields.


David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] Optimizing rules for multi threading

Reply via email to