On 2/21/2018 7:02 PM, David Lang wrote:
On Wed, 21 Feb 2018, deoren wrote:

On 2/20/2018 6:58 PM, David Lang wrote:
On 2/20/2018 6:39 PM, deoren wrote:

In this case, my specific goal is to look for log messages containing "SPECIFIC_PATTERN_HERE" (as shown in sample log message) and if a match is found parse the message to pull out specific values. Those values are then used to generate a notification for our ticketing system (e.g., specific URL patterns indicate abuse that we need to review further before our vendor contacts us and threatens to cut off service). In this case we're not matching a possible range of patterns, but a very specific string that is known to us.

you don't need to do this two stage approach (detect a pattern, then parse the log) with liblognorm. Instead you just create rules for all your logs that include the various patters that you want to match, and liblognorm uses whichever one matches. The two-stage approach is needed with regexes because they are so expensive to to evaluate, but since liblognorm rules are so fast, it makes far more sense to just define all the rules.

Do you recommend running mmnormalize as close to the source as possible or on the primary receiver? I'm guessing the former so that the rules are run on the original source and not on content that may have been modified by other receivers in transit?

there are arguments both ways.

running it close to the source distributes the work (but if you run it on the machine that has the source, it is some extra load)

but the resulting json is typically a bit larger than the original message (not always, but typically) and so it can take more network bandwidth to send the result.

liblognorm is so fast you really have to use it to believe it. At $lastjob I had a 1400 line ruleset handling >100K logs/sec without the liblognorm effort being noticable

Wow, that's pretty impressive. I may try employing mmnormalize in both locations to see which is easier to work with. I suspect that for some cases it would need to be run on the receiver to handle non-rsyslog clients (misc equipment for example).

Is mmnormalize primarily intended for content ingested via imfile or is it pretty standard to apply mmnormalize to all inputs? Perhaps just the inputs where you expected unstructured log content to be ingested?

it is very much NOT limited to imfile, it's the general purpose tool to convert unstructured log content to a normalized format.

Thanks for confirming. I've seen the two paired up in some guides I've looked over, so I began to wonder if that was the common scenario.
rsyslog mailing list
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 

Reply via email to