Re: [rsyslog] Optimizing rules for multi threading

Rainer Gerhards Tue, 11 Jun 2013 23:40:57 -0700

just FYI: I have merged the mmfields module yesterday, but doc still needs
to be written. Will see that I get a bit of this done today ... but more
RELP enhancements are demanding my attention, so I need to see how I
wrangle with all that... ;)


Rainer


On Wed, Jun 12, 2013 at 1:04 AM, David Lang <[email protected]> wrote:

> On Tue, 11 Jun 2013, Xuri Nagarin wrote:
>
>  First, David's suggestion of using a variable like "set $!vendor =
>> field($msg, 124, 2)" worked really well in reducing CPU utilization. My
>> packet loss is now zero :)
>>
>> On Fri, Jun 7, 2013 at 4:59 AM, Rainer Gerhards <[email protected]
>> >**wrote:
>>
>>  On Thu, Jun 6, 2013 at 10:04 PM, David Lang <[email protected]> wrote:
>>>
>>>  On Thu, 6 Jun 2013, Xuri Nagarin wrote:
>>>>
>>>>  I am trying to translate log flow from syslog-ng to rsyslog.
>>>>
>>>>>
>>>>> I have a few syslog sources that collect logs from all over the
>>>>> infrastructure and deliver it to central syslog servers that in turn
>>>>> deliver logs to different consumers.
>>>>>
>>>>> At the moment, I have two consumers - one reads flat files and other
>>>>> listens on TCP ports. For both, I need to tell them what type of event
>>>>>
>>>> am
>>>
>>>> I
>>>>> delivering to them - Linux auth, firewall event, or web log.
>>>>>
>>>>> Something like this:
>>>>>
>>>>> LogFireHose====> SyslogNG/RSyslog =========>
>>>>>                         (Parse and redirect events by type)||==> If
>>>>>
>>>> (Cisco
>>>
>>>> ASA) write to "FS (Cisco/ASA) & TCP DstHost: 5000"
>>>>>
>>>>> ||==> If (Apache Access) write to "FS (Apache/Access) & TCP DstHost:
>>>>>
>>>> 5001"
>>>
>>>>
>>>>> ||==> If (DNS logs) write to "FS (Bind/DNS) & TCP DstHost: 5002"
>>>>>
>>>>>
>>>>> In Syslog-NG, every incoming message (in CEF format) is subject to a
>>>>> parser
>>>>> that splits the log message into eight fields. Fields 2 and 3 that are
>>>>> vendor and product type are used to generate a template like
>>>>> "/var/log/$vendor/$product/****logfile".
>>>>>
>>>>> To deliver events, by type, to a specific network destination requires
>>>>> filters and I have 30+ different vendor/product combinations. So I end
>>>>>
>>>> up
>>>
>>>> with 30+ log() statements, each with it's filter logic.
>>>>> ----------xxxxxxxxxxxxxxx-----****-----------
>>>>> filter f1 (if $product contains "ASA")
>>>>> filter f2 (if $product contains "ACCESS")
>>>>> filter f3 (if $product contains "DNS")
>>>>> ...
>>>>> ..
>>>>> filter 35 (if field3 contains "blah")
>>>>>
>>>>> log (src=tcp;filter f1; dst=/var/log/$vendor/$product/****logfile;
>>>>> dst=remotehost:5000)
>>>>> log (src=tcp;filter f2; dst=/var/log/$vendor/$product/****logfile;
>>>>> dst=remotehost:5001)
>>>>> ...
>>>>> ....
>>>>> log (src=tcp;filter fx, dst=/var/log/$vendor/$product/****logfile;
>>>>> dst=remotehost:5030)
>>>>> ----------xxxxxxxxxxxxxxx-----****-----------
>>>>>
>>>>>
>>>>> In RSyslog, I have so far written the logic to write to filesystem like
>>>>> this:
>>>>> --------------xxxxxxxxxxxxx---****------------
>>>>> template(name="cefdynfile" type="string"
>>>>> string="/var/log/joe/%msg:F,****124:2%/%msg:F,124:3%/logfile")
>>>>>
>>>>> ruleset(name="tcpcef") {
>>>>> if $syslogtag=="CEF:" then { action (type="omfile" FileOwner="joe"
>>>>> FileGroup="joe" DirOwner="joe" DirGroup="joe" DirCreateMode="0755"
>>>>> FileCreateMode="0644" DynaFile="cefdynfile") stop }
>>>>> }
>>>>>
>>>>> module(load="imtcp") # needs to be done just once
>>>>> input(type="imtcp" port="514" ruleset="tcpcef")
>>>>> --------------xxxxxxxxxxxxx---****------------
>>>>>
>>>>> Now, I am thinking how do I add rules for delivering events to tcp
>>>>> destinations.
>>>>>
>>>>> I could expand the "tcpcef" ruleset and add more "if condition then
>>>>> {action()}" statements to it, OR
>>>>> I can write multiple rulesets, one for each filter like
>>>>> "rule f1 { if $msg contains "blah" then action()}"
>>>>> "rule f2 { if $msg contains "foo" then action()}"
>>>>>
>>>>> and then call these rules from "tcpcef" ruleset:
>>>>>
>>>>> ruleset(name="tcpcef") {
>>>>> call f1
>>>>> call f2
>>>>> ...
>>>>> ...
>>>>> call fx }
>>>>>
>>>>> So two questions (1) Does this seem like a good way to parse/route
>>>>> messages?
>>>>>
>>>>>
>>>> Well, if you are getting your messages in the CEE format, it would seem
>>>>
>>> to
>>>
>>>> me to make more sense to invoke the mmjson parsing module to extract all
>>>> the fields. That way you aren't doing string searches, you can just do
>>>> an
>>>> if on the particular variable that you are looking for. I would expect
>>>>
>>> this
>>>
>>>> to be faster once you have a few different filters.
>>>>
>>>>
>>>>
>>>>  I never understood why in syslog-ng you need to name an if-condition
>>> and
>>> then use that name. In any case, in rsyslog you should just naturally use
>>> the conditions as they are needed. Creating extra rulesets and calling
>>> them
>>> (without reuse) is defintely the wrong way to go.
>>>
>>>
>> I haven't looked at the code but from troubleshooting TCP packet collapse
>> issue extensively with Syslog-NG, I came to the conclusion that it
>> associates all actions related to a particular source with one thread. So,
>> like me, if you have one BIG source and multiple filters written to parse
>> that one big stream then all that gets associated with one thread/core.
>> That eventually leads to core saturation and packets are left unreaped in
>> the kernel tcp buffers. I tried to isolate the problem by using ramdisk
>> and
>> essentially saw the same problem. Now this is all conjecture based on
>> several days of monitoring all sorts of OS parameters and I may have
>> written up the config very wrong in Syslog-NG. But the same logic ported
>> to
>> RSyslog is working really well with zero packet collapse errors.
>>
>
> rsyslog has a very differnt architecture.
>
> the receiving thread is only responsible for receiving the message,
> parsing out the standard properties, and putting the resulting message into
> the main queue.
>
> Other threads then do the work of pulling messages out of that queue,
> applying rule to them, formatting the output, and outputting the message
>
> This makes troubleshooting lost messages a bit more interesting as you
> have to figure out if the messages are being lost because the input thread
> can't keep up or the output thread can't keep up.
>
> if it's the output threads, there are lots of things that you can do to
> try and deal with the issues.
>
> one thing is to create additional queues so that instead of trying to deal
> with all the outputs in one thread, you have a thread that just moves
> messages from the main queue to a couple of output (action) queues. Then
> each output queue will have it's own threads working to output the messages
> from that queue.
>
> Rsyslog allows the output threads to process multiple messages at a time
> (batch size), which greatly reduces the locking on the queues.
>
> Reducing the amount of work done to deliver the messages (through better
> rulesets) is always something to look at.
>
>
>
>> To be honest, I used to prefer Syslog-NG over RSyslog because (1) I found
>> config easier to write/understand for NG (2) Documentation is better.
>>
>> But it is hard to argue with performance :D
>>
>> Now that I've gotten hang of the config syntax, I am comfortable with
>> RSyslog too but the way config variables work still leaves a lot to be
>> desired in terms of documentation and example configs.
>>
>
> There's a lot of bits of examples around, but it's hard to find them when
> you don't know roughly what you're looking for. Rainer concentrates on
> getting the code written and documentation is definantly secondary. We can
> always use more help :-)
>
> I'm writing a series of articles for ;login on logging, and will be doing
> a bunch of examples as I go along. i'll keep everyone posted as thigs get
> written/published
>
>
>
>>>   (2) Which way is better for multi-threading? I read that each ruleset
>>>>
>>> gets
>>>
>>>> it own queue and thread so I am thinking defining multiple rulesets and
>>>>> then calling them for a master ruleset might offer better performance.
>>>>>
>>>>>
>>>> There's overhead to rulesets as well.
>>>>
>>>>
>>>>  action can have their own queues as well. But with the simple file
>>>  writer,
>>> it totally makes no sense to create rules. Writing to files is so fast,
>>> there is no need to de-couple it from main procesing (except, maybe, if
>>> writing to NFS or some other network destination). If you run into a
>>> CPU-Bound, it's totally sufficient to increase the max number of worker
>>> threads for the main queue. This queue will handle all you need.
>>>
>>>
>>>  In my use case, for every inbound event three things happen:
>> 1. It needs to be parsed and classified based on certain fields/content.
>> 2. Based on classification, it gets written to disk in it's own unique
>> folder/file structure. Here, I have Splunk tailing the log file. For each
>> log file, Splunk needs to be told the type of application it is reading so
>> corresponding index can be created on the splunk side.
>>
>
> Rather than having Splunk tail the file (large files are still not great
> on ext* filesystems, in spite of the improvements over the years), I use
> one of two approaches
>
> 1. I write the file and then roll the file every minute, when I roll it I
> move the file to a Splunk sinkhole directory. Splunk reads the file, then
> once it's processed, Splunk deletes the file.
>
> 2. I output the file to a named pipe that Splunk reads as it's input.
> barring a little kernel buffering, once the message is written to the pipe,
> I can consider that Splunk has the message (and since Splunk is not crash
> safe anyway, if it crashes, some logs will be lost. for that matter,
> rsyslog will loose logs if it crashes under normal configurations)
>
> I like to deliver the logs via syslog all the way to the Splunk Indexer
> machines and only hand it to Splunk at that point. This also simplifies the
> configuration of my central box as it can just forward all the logs to the
> Splunk box, and the rsyslog on the Splunk box has a simpler config as it
> can ignore all messages not going into Splunk.
>
>
>  3. Based on classification, it gets forwarded over TCP to a Hadoop/Flume
>> listener. Each application type has a flume agent associated with it that
>> writes the event to it's own folder/file in HDFS. So I have 30+
>> application
>> types and there is an agent/port associated with each app type.
>>
>> Decoupling each action rule from event receiver thread and each filter
>> rule
>> allows rsyslog to adapt to however a destination behaves. For example, if
>> a
>> flume agent crashes and stops listening or slows down, it should not slow
>> down rsyslog from reading events from senders or applying filters to
>> incoming events. Basically, isolate different components from each other
>> in
>> case of various IO failures.
>>
>
> the question is just how finely you should decouple everythng. Each action
> queue you create eats more memory. It's a balancing act between decoupling
> everything and resource consuption.
>
>
>
>>>  Remember that premature optimization is the root of all evil. Set it up
>>>> the simple way (single ruleset) and only look at changing it if you find
>>>> that it isn't fast enough.
>>>>
>>>> using the json parser can both speed things up, but more importantly, it
>>>> makes the rules easier to write and cleaner.
>>>>
>>>>
>>>>  As a reply to a later message: I'll see if I can quickly hack together
>>> a
>>> plugin that does field extraction for all fields at once. That would
>>> offer
>>> better performance, BUT only if you actually NEED the MAJORITY of fields.
>>> In any case, CEF is popular enough to provide such functionality and it
>>> is
>>> on my whishlist for quite a while - no promise though!
>>>
>>>
>>>  Actually, we are moving away from CEF because HP Arcsight is such a PITA
>> :-) Today, all our sources log into Arcsight Loggers and Loggers send a
>> stream to Syslog and then Syslog distributes to Splunk and Hadoop. Once we
>> kill the Loggers, all sources will directly feed into Syslog. Then I will
>> have to parse messages and extract PROGNAME to do routing decisions. So I
>> do not foresee a need to read ALL or MAJORITY of the fields. I am hoping
>> PROGNAME and maybe one more will be sufficient. What I will need is some
>> way to sanity check the event so I can avoid passing on bad messages or
>> events to my consumers (Splunk and Hadoop).
>>
>
> anything that puts messages into some sort of 'standard' is good :-)
>
> CEF seems to just have a handful of fields, but some of the 'fields' then
> contain complex structures of their own. I think you will be using
> many/most of the CEF fields, but not needing to parse out the subfields.
>
> David Lang
>
> ______________________________**_________________
> rsyslog mailing list
> http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>
> http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/>
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] Optimizing rules for multi threading

Reply via email to