On Tue, 11 Jun 2013, Xuri Nagarin wrote:

First, David's suggestion of using a variable like "set $!vendor =
field($msg, 124, 2)" worked really well in reducing CPU utilization. My
packet loss is now zero :)

On Fri, Jun 7, 2013 at 4:59 AM, Rainer Gerhards <[email protected]>wrote:

On Thu, Jun 6, 2013 at 10:04 PM, David Lang <[email protected]> wrote:

On Thu, 6 Jun 2013, Xuri Nagarin wrote:

 I am trying to translate log flow from syslog-ng to rsyslog.

I have a few syslog sources that collect logs from all over the
infrastructure and deliver it to central syslog servers that in turn
deliver logs to different consumers.

At the moment, I have two consumers - one reads flat files and other
listens on TCP ports. For both, I need to tell them what type of event
am
I
delivering to them - Linux auth, firewall event, or web log.

Something like this:

LogFireHose====> SyslogNG/RSyslog =========>
                        (Parse and redirect events by type)||==> If
(Cisco
ASA) write to "FS (Cisco/ASA) & TCP DstHost: 5000"

||==> If (Apache Access) write to "FS (Apache/Access) & TCP DstHost:
5001"

||==> If (DNS logs) write to "FS (Bind/DNS) & TCP DstHost: 5002"


In Syslog-NG, every incoming message (in CEF format) is subject to a
parser
that splits the log message into eight fields. Fields 2 and 3 that are
vendor and product type are used to generate a template like
"/var/log/$vendor/$product/**logfile".

To deliver events, by type, to a specific network destination requires
filters and I have 30+ different vendor/product combinations. So I end
up
with 30+ log() statements, each with it's filter logic.
----------xxxxxxxxxxxxxxx-----**-----------
filter f1 (if $product contains "ASA")
filter f2 (if $product contains "ACCESS")
filter f3 (if $product contains "DNS")
...
..
filter 35 (if field3 contains "blah")

log (src=tcp;filter f1; dst=/var/log/$vendor/$product/**logfile;
dst=remotehost:5000)
log (src=tcp;filter f2; dst=/var/log/$vendor/$product/**logfile;
dst=remotehost:5001)
...
....
log (src=tcp;filter fx, dst=/var/log/$vendor/$product/**logfile;
dst=remotehost:5030)
----------xxxxxxxxxxxxxxx-----**-----------


In RSyslog, I have so far written the logic to write to filesystem like
this:
--------------xxxxxxxxxxxxx---**------------
template(name="cefdynfile" type="string"
string="/var/log/joe/%msg:F,**124:2%/%msg:F,124:3%/logfile")

ruleset(name="tcpcef") {
if $syslogtag=="CEF:" then { action (type="omfile" FileOwner="joe"
FileGroup="joe" DirOwner="joe" DirGroup="joe" DirCreateMode="0755"
FileCreateMode="0644" DynaFile="cefdynfile") stop }
}

module(load="imtcp") # needs to be done just once
input(type="imtcp" port="514" ruleset="tcpcef")
--------------xxxxxxxxxxxxx---**------------

Now, I am thinking how do I add rules for delivering events to tcp
destinations.

I could expand the "tcpcef" ruleset and add more "if condition then
{action()}" statements to it, OR
I can write multiple rulesets, one for each filter like
"rule f1 { if $msg contains "blah" then action()}"
"rule f2 { if $msg contains "foo" then action()}"

and then call these rules from "tcpcef" ruleset:

ruleset(name="tcpcef") {
call f1
call f2
...
...
call fx }

So two questions (1) Does this seem like a good way to parse/route
messages?


Well, if you are getting your messages in the CEE format, it would seem
to
me to make more sense to invoke the mmjson parsing module to extract all
the fields. That way you aren't doing string searches, you can just do an
if on the particular variable that you are looking for. I would expect
this
to be faster once you have a few different filters.



I never understood why in syslog-ng you need to name an if-condition and
then use that name. In any case, in rsyslog you should just naturally use
the conditions as they are needed. Creating extra rulesets and calling them
(without reuse) is defintely the wrong way to go.


I haven't looked at the code but from troubleshooting TCP packet collapse
issue extensively with Syslog-NG, I came to the conclusion that it
associates all actions related to a particular source with one thread. So,
like me, if you have one BIG source and multiple filters written to parse
that one big stream then all that gets associated with one thread/core.
That eventually leads to core saturation and packets are left unreaped in
the kernel tcp buffers. I tried to isolate the problem by using ramdisk and
essentially saw the same problem. Now this is all conjecture based on
several days of monitoring all sorts of OS parameters and I may have
written up the config very wrong in Syslog-NG. But the same logic ported to
RSyslog is working really well with zero packet collapse errors.

rsyslog has a very differnt architecture.

the receiving thread is only responsible for receiving the message, parsing out the standard properties, and putting the resulting message into the main queue.

Other threads then do the work of pulling messages out of that queue, applying rule to them, formatting the output, and outputting the message

This makes troubleshooting lost messages a bit more interesting as you have to figure out if the messages are being lost because the input thread can't keep up or the output thread can't keep up.

if it's the output threads, there are lots of things that you can do to try and deal with the issues.

one thing is to create additional queues so that instead of trying to deal with all the outputs in one thread, you have a thread that just moves messages from the main queue to a couple of output (action) queues. Then each output queue will have it's own threads working to output the messages from that queue.

Rsyslog allows the output threads to process multiple messages at a time (batch size), which greatly reduces the locking on the queues.

Reducing the amount of work done to deliver the messages (through better rulesets) is always something to look at.


To be honest, I used to prefer Syslog-NG over RSyslog because (1) I found
config easier to write/understand for NG (2) Documentation is better.

But it is hard to argue with performance :D

Now that I've gotten hang of the config syntax, I am comfortable with
RSyslog too but the way config variables work still leaves a lot to be
desired in terms of documentation and example configs.

There's a lot of bits of examples around, but it's hard to find them when you don't know roughly what you're looking for. Rainer concentrates on getting the code written and documentation is definantly secondary. We can always use more help :-)

I'm writing a series of articles for ;login on logging, and will be doing a bunch of examples as I go along. i'll keep everyone posted as thigs get written/published


 (2) Which way is better for multi-threading? I read that each ruleset
gets
it own queue and thread so I am thinking defining multiple rulesets and
then calling them for a master ruleset might offer better performance.


There's overhead to rulesets as well.


action can have their own queues as well. But with the simple file  writer,
it totally makes no sense to create rules. Writing to files is so fast,
there is no need to de-couple it from main procesing (except, maybe, if
writing to NFS or some other network destination). If you run into a
CPU-Bound, it's totally sufficient to increase the max number of worker
threads for the main queue. This queue will handle all you need.


In my use case, for every inbound event three things happen:
1. It needs to be parsed and classified based on certain fields/content.
2. Based on classification, it gets written to disk in it's own unique
folder/file structure. Here, I have Splunk tailing the log file. For each
log file, Splunk needs to be told the type of application it is reading so
corresponding index can be created on the splunk side.

Rather than having Splunk tail the file (large files are still not great on ext* filesystems, in spite of the improvements over the years), I use one of two approaches

1. I write the file and then roll the file every minute, when I roll it I move the file to a Splunk sinkhole directory. Splunk reads the file, then once it's processed, Splunk deletes the file.

2. I output the file to a named pipe that Splunk reads as it's input. barring a little kernel buffering, once the message is written to the pipe, I can consider that Splunk has the message (and since Splunk is not crash safe anyway, if it crashes, some logs will be lost. for that matter, rsyslog will loose logs if it crashes under normal configurations)

I like to deliver the logs via syslog all the way to the Splunk Indexer machines and only hand it to Splunk at that point. This also simplifies the configuration of my central box as it can just forward all the logs to the Splunk box, and the rsyslog on the Splunk box has a simpler config as it can ignore all messages not going into Splunk.

3. Based on classification, it gets forwarded over TCP to a Hadoop/Flume
listener. Each application type has a flume agent associated with it that
writes the event to it's own folder/file in HDFS. So I have 30+ application
types and there is an agent/port associated with each app type.

Decoupling each action rule from event receiver thread and each filter rule
allows rsyslog to adapt to however a destination behaves. For example, if a
flume agent crashes and stops listening or slows down, it should not slow
down rsyslog from reading events from senders or applying filters to
incoming events. Basically, isolate different components from each other in
case of various IO failures.

the question is just how finely you should decouple everythng. Each action queue you create eats more memory. It's a balancing act between decoupling everything and resource consuption.


Remember that premature optimization is the root of all evil. Set it up
the simple way (single ruleset) and only look at changing it if you find
that it isn't fast enough.

using the json parser can both speed things up, but more importantly, it
makes the rules easier to write and cleaner.


As a reply to a later message: I'll see if I can quickly hack together a
plugin that does field extraction for all fields at once. That would offer
better performance, BUT only if you actually NEED the MAJORITY of fields.
In any case, CEF is popular enough to provide such functionality and it is
on my whishlist for quite a while - no promise though!


Actually, we are moving away from CEF because HP Arcsight is such a PITA
:-) Today, all our sources log into Arcsight Loggers and Loggers send a
stream to Syslog and then Syslog distributes to Splunk and Hadoop. Once we
kill the Loggers, all sources will directly feed into Syslog. Then I will
have to parse messages and extract PROGNAME to do routing decisions. So I
do not foresee a need to read ALL or MAJORITY of the fields. I am hoping
PROGNAME and maybe one more will be sufficient. What I will need is some
way to sanity check the event so I can avoid passing on bad messages or
events to my consumers (Splunk and Hadoop).

anything that puts messages into some sort of 'standard' is good :-)

CEF seems to just have a handful of fields, but some of the 'fields' then contain complex structures of their own. I think you will be using many/most of the CEF fields, but not needing to parse out the subfields.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to