On Wed, 4 Dec 2013, Radu Gheorghe wrote:
Hi list :)
I'm trying to understand if mmnormalize is a good fit for parsing a high
traffic of logs, given the fact that events are really heterogeneous (think
log4j logs, apache logs, whatever logs are commonly produced).
My only frame of reference is Logstash's grok
filter<http://logstash.net/docs/1.2.2/filters/grok>,
which allows you to tag regular expressions in a dictionary, and then use
those tags to match fields from logs, and put them in a structured event.
Much like how you'd build a liblognorm rulebase.
If I got it right, the advantage of mmnormalize seems to be performance,
because it goes around using regular expressions. Not sure how this
actually work, though. Practically, it sounds like this comes at the
expense of flexibility: if I need to add a new "pattern" in liblognorm
(say, a new date format) I'd have to patch the library itself, no?
a completly new type of data you would have to modify the library, but you
seldom need to do that because when you are processing the logs, all you really
care about is that this string of characters is the date, you aren't parsing the
date so that you can do calculations on it.
As long as you can say 'this string of characters is what I care about, and I'm
going to label it "date"' you are in good shape.
mmnormalize is far better than regex engines for a couple of reasons.
1. full regex support requires supporting some very expensive types of
expressions, even if you don't plan to use them. This costs.
2. regex engines almost always go down the list, does regex1 match, if not does
regex2 match, if not does regex3 match, ....
mmnormalize in comparison compiles your config into a parse tree, so it can walk
down the log message a character at a time, looking that character up in the
parse tree and when it comes to the end of the line it knows it has the correct
match, so instead of being O(N) based on the number of rules it's (1) based on
the (relatively) short length of the lines.
Speaking of scope, can liblognorm be enhanced to support parsing multiline
messages? This seems to be possible in grok:
https://logstash.jira.com/browse/LOGSTASH-692
multiline logs cause all sorts of problems, in general you should avoid them or
collapse the multiline logs into a single line when you get it into your logging
system, too many things will break a multiline log into multiple logs. In some
cases you can carefully configure everything to handle multiline logs, but it's
very fragile and prevents you from using many tools and transport mechanisms.
For me, it's important to understand whether I should put effort in working
with mmnormalize and sponsor needed enhancements, or would sponsoring a new
"mmgrok" module be a better idea for my use-case. Because it looks like
grok is available as a C library as well:
https://github.com/jordansissel/grok
It's not clear what enhancements you are thinking that you need (other than the
multiline support, which as I say is problomatic)
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.