Re: [rsyslog] liblognorm vs grok

David Lang Wed, 04 Dec 2013 02:59:05 -0800

On Wed, 4 Dec 2013, Radu Gheorghe wrote:

Hi list :)


I'm trying to understand if mmnormalize is a good fit for parsing a high
traffic of logs, given the fact that events are really heterogeneous (think
log4j logs, apache logs, whatever logs are commonly produced).

My only frame of reference is Logstash's grok
filter<http://logstash.net/docs/1.2.2/filters/grok>,
which allows you to tag regular expressions in a dictionary, and then use
those tags to match fields from logs, and put them in a structured event.
Much like how you'd build a liblognorm rulebase.

If I got it right, the advantage of mmnormalize seems to be performance,
because it goes around using regular expressions. Not sure how this
actually work, though. Practically, it sounds like this comes at the
expense of flexibility: if I need to add a new "pattern" in liblognorm
(say, a new date format) I'd have to patch the library itself, no?

a completly new type of data you would have to modify the library, but youseldom need to do that because when you are processing the logs, all you reallycare about is that this string of characters is the date, you aren't parsing thedate so that you can do calculations on it.

As long as you can say 'this string of characters is what I care about, and I'mgoing to label it "date"' you are in good shape.


mmnormalize is far better than regex engines for a couple of reasons.

1. full regex support requires supporting some very expensive types ofexpressions, even if you don't plan to use them. This costs.

2. regex engines almost always go down the list, does regex1 match, if not doesregex2 match, if not does regex3 match, ....

mmnormalize in comparison compiles your config into a parse tree, so it can walkdown the log message a character at a time, looking that character up in theparse tree and when it comes to the end of the line it knows it has the correctmatch, so instead of being O(N) based on the number of rules it's (1) based onthe (relatively) short length of the lines.

Speaking of scope, can liblognorm be enhanced to support parsing multiline
messages? This seems to be possible in grok:
https://logstash.jira.com/browse/LOGSTASH-692

multiline logs cause all sorts of problems, in general you should avoid them orcollapse the multiline logs into a single line when you get it into your loggingsystem, too many things will break a multiline log into multiple logs. In somecases you can carefully configure everything to handle multiline logs, but it'svery fragile and prevents you from using many tools and transport mechanisms.

For me, it's important to understand whether I should put effort in working
with mmnormalize and sponsor needed enhancements, or would sponsoring a new
"mmgrok" module be a better idea for my use-case. Because it looks like
grok is available as a C library as well:
https://github.com/jordansissel/grok

It's not clear what enhancements you are thinking that you need (other than themultiline support, which as I say is problomatic)


David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] liblognorm vs grok

Reply via email to