On Wed, 4 Dec 2013, Radu Gheorghe wrote:

Hi list :)

I'm trying to understand if mmnormalize is a good fit for parsing a high
traffic of logs, given the fact that events are really heterogeneous (think
log4j logs, apache logs, whatever logs are commonly produced).

My only frame of reference is Logstash's grok
filter<http://logstash.net/docs/1.2.2/filters/grok>,
which allows you to tag regular expressions in a dictionary, and then use
those tags to match fields from logs, and put them in a structured event.
Much like how you'd build a liblognorm rulebase.

If I got it right, the advantage of mmnormalize seems to be performance,
because it goes around using regular expressions. Not sure how this
actually work, though. Practically, it sounds like this comes at the
expense of flexibility: if I need to add a new "pattern" in liblognorm
(say, a new date format) I'd have to patch the library itself, no?

a completly new type of data you would have to modify the library, but you seldom need to do that because when you are processing the logs, all you really care about is that this string of characters is the date, you aren't parsing the date so that you can do calculations on it.

As long as you can say 'this string of characters is what I care about, and I'm going to label it "date"' you are in good shape.

mmnormalize is far better than regex engines for a couple of reasons.

1. full regex support requires supporting some very expensive types of expressions, even if you don't plan to use them. This costs.

2. regex engines almost always go down the list, does regex1 match, if not does regex2 match, if not does regex3 match, ....

mmnormalize in comparison compiles your config into a parse tree, so it can walk down the log message a character at a time, looking that character up in the parse tree and when it comes to the end of the line it knows it has the correct match, so instead of being O(N) based on the number of rules it's (1) based on the (relatively) short length of the lines.

Speaking of scope, can liblognorm be enhanced to support parsing multiline
messages? This seems to be possible in grok:
https://logstash.jira.com/browse/LOGSTASH-692

multiline logs cause all sorts of problems, in general you should avoid them or collapse the multiline logs into a single line when you get it into your logging system, too many things will break a multiline log into multiple logs. In some cases you can carefully configure everything to handle multiline logs, but it's very fragile and prevents you from using many tools and transport mechanisms.

For me, it's important to understand whether I should put effort in working
with mmnormalize and sponsor needed enhancements, or would sponsoring a new
"mmgrok" module be a better idea for my use-case. Because it looks like
grok is available as a C library as well:
https://github.com/jordansissel/grok

It's not clear what enhancements you are thinking that you need (other than the multiline support, which as I say is problomatic)

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to