Re: [rsyslog] Fun with liblognorm / rsyslog

Rainer Gerhards Thu, 02 Dec 2010 07:24:20 -0800

sorry, I need to be a bit brief as I have a conference call upcoming later
today for which I need to prepare -- and I hope to fix some more thing. So
the most important facts first, more later (maybe monday...).


> -----Original Message-----
> From: [email protected] [mailto:rsyslog-
> [email protected]] On Behalf Of Champ Clark III [Softwink]
> Sent: Thursday, December 02, 2010 4:00 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] Fun with liblognorm / rsyslog
> 
> On Thu, Dec 02, 2010 at 09:22:46AM +0100, Rainer Gerhards wrote:
> > Thanks for the nice review and instructions :)
> >
> > I have begun to work heavily on a message modification module for
> rsyslog
> > which will support liblognorm-style normalization inside rsyslog. In
> git
> > there already is a branch "lognorm", which I will hopefully complete
> and
> > merge into master soon. It provides some *very* interesting shortcuts
> of
> 
>       In the rsyslog git tree?  I have a specific reason for asking,
> which I'll get to later...

http://git.adiscon.com/?p=rsyslog.git;a=shortlog;h=refs/heads/lognorm

> 
> > pulling specific information out of syslog messages. I'll probably
> promote it
> > some more when it is available. IMHO it's the coolest and potentially
> most
> > valuable feature I have added in the past three years. Once I have
> enabled
> > tags in liblognorm/libee, you can even very easily classify log
> messages
> > based on their content.
> 
>       To be honest,  it's not only going to be a valuable addition to
> rsyslog,  but my software as well...  Not only that,  I think once
> people 'understand it's function',   I could see many other projects
> benefiting from it!
> 

yup, that's why I made a lib and not solely a rsyslog component :)

> > I did a couple of bug fixes yesterday. Frequently pull from git ;)
> 
>       Oh!  I know it's a moving target! :)  I'll git it frequently!
> 
> > >   So,  that work nicely.  Nifty.  I made a few more 'complex'
> > > rules, and those worked fine as well.
> >
> > I added a capability to generate graphs of the actual call tree. I
> think this
> > is *very* useful. An article on how to do that will be posted soon to
> the web
> > site (will make sure a notification goes to the list).
> 
>       Yes,  please do!  I'd seen mention in the source about this,
> but didn't really dive into what you where trying to do there.

http://www.liblognorm.com/help/creating-a-graph-of-the-sampledb/

> 
> > > However,  if the rule is off a
> > > bit,  then you've got issue.  Here's what I mean..  Back on my
> example
> > > above..  If this:
> > >
> > > Dec  1 14:10:11 testbox ntpd[3821]: synchronized to 192.168.0.10
> > >
> > > changes to this:
> > >
> > > Dec  1 14:10:11 testbox ntpd[3821]: synchronized to 192.168.0.10,
> > > stratium 1
> >
> > Well, that's by intention. The normalizer must know exactly which
> message it
> > is dealing with. This is even more important when we use it for
> > classification. So these two messages are definitely different, and I
> would
> > consider it very dangerous to automatically merge them into a single
> one
> > (which would not be a problem from a purely implementation PoV). The
> more
> > fuzzy the recognition is, the higher is the chance of false
> recognition,
> > something that would be really, really bad in the context of
> normalization.
> 
>       I understand.   I merely meant that comment as a observation and
> wasn't criticizing it.  Once I played a little bit,  I actually
> thought,
> "oh,  actually that make sense why it'd work like that!" :)
> >
> > > The 'normalizer' will call the ",stratium 1" part of the message as
> > > "unclassified".  However,  it doesn't appear that it'll grab the IP
> > > address,  tag, etc.
> > >
> > > Also,  I thing the "real work" is going to be writing rules.
> That's
> > > going to take some effort,  in which I hope to assist with.
> >
> > Yes, that's definitely a lot of work -- and more than what a single
> person
> > can do. In order to make the normalizer really useful, we need a
> community
> > effort. If everyone contributes sample databases for their devices,
> we could
> > gain good results fast. But the key is getting enough momentum, so
> please
> > help spread the word!
> 
>       I already have been spreading the word.  Strangely,  I mentioned
> it to the OSSEC guys,  and they didn't seem that interested in it.
> That might change over time.  I've talked about it on the Sagan mailing
> list and people there seem to "get it".

I think many people still think "we can normalize in any case, just let's use
our regex approach". A very important point is speed. I will be very
interested in practical rsyslog performance with that parser. But I think we
can outperform most other normalizers by magnitudes.

> 
>       I'm tinkering with the idea about adding some liblognorm code
> into Sagan (probably today).  A couple of things dawned on me.  Many
> log
> lines that Sagan detects as "hostile" won't need normalization.  Much
> of
> the information I need I already have.  However,   information from
> appliances like firewalls,  routers,  etc.  will.   So,  I'll probably
> add a 'normalize' flag into my rule set.  That way,   I'm only
> attempting to normalize log lines when I know I need critical
> information from. This way,  I don't waste CPU ticks on attempting
> to normalize log lines that don't need it.

I am following a similar approach with the new mmnormalize in rsyslog. You
call it when you need it.

> 
>       Another thing....  Much of the base information I'm already
> getting.  For example,  if I have this rule....
> 
> :%date:date-rfc3164% %host:word% %tag:char-to:\x3a%:synchronized to
> %ip:ipv4%
> 
>       Sagan already has the data, host, tag.  Really,  all I'd need is:
> 
> :synchronized to %ip:ipv4%
> 
>       This is my dilemma.

It is! And I am well aware of it. In rsyslog, I have the same issue. I think
of something like a "common prefix" inside the sample db (maybe rulebase is a
better name, btw :)). That would be common to all rules, and only the common
prefix would need to be changed for different headers. It's not 100% sorted
out, there is still enough work to do on the core engine (needs more parsers,
parser priority, str optimizations).

>  Do I make my own 'Sagan' liblognorm rules,
> which
> are a stripped down version of the liblognorm rules or do I just have
> Sagan rebuild the 'syslog string' so that the 'standard' liblognorm
> rules can be used?  I like the idea of Sagan's own rules,  but then
> that
> means keeping up with two liblognorm rule sets.   I don't like that.
> 
>       Lastly,  and this goes back to the beginning of this post.  It's
> not a huge deal,  but how are you going to handle dependencies with
> rsyslog?  That is,  in the end,  will people have to
> download/compile/install libestr,  libee,  liblognorm,  {insert other
> dependencies here for rsyslog},  then rsyslog?  While that's not a huge
> deal for me,  IMHO added dependencies 'turn off' users from using
> features.   Even though the 'dependencies' in question are usually
> trivial to install,   it adds yet another layer for end users to find
> issues with.   Considering the libraries in question (ee/lognorm/estr)
> are pretty small (at least now!),  would a one time package/build of
> something like 'liblognorm-complete' be possible.  I know all this
> seems
> silly,  but it's these little things IMHO that cause potential users
> from shifting away from software.  I could be wrong,  and maybe I'm
> over analyzing the issue.

rsyslog v6 will need libestr and libee in any case. There is also a volunteer
who will package that. I try to keep dependencies as low as possible, but the
alternative would be to copy the  same code into different projects. I don't
like that. I hope those folks who build packages will tie the right things
together (and I am *very* grateful for their excellent work!).

>       Oh.. on more thing.  Do you think it's to early to start
> writing liblognorm rules?

Depends... You will probably want to revisit the rules in a few weeks, when
we have more capabilities. But on the other hand, I need some experience with
building them, so that I know what does not work out. The current parsers are
extremely limited and some (word, char-to) are very generic. But if that
works, it will continue to work with new version. Wehn the classifier is
there (hopefully december), you will probably want to add classification tags
for easy filtering (if that matters for Sagan).

Rainer
> 
> --
>         Champ Clark III | Softwink, Inc | 800-538-9357 x 101
>                      http://www.softwink.com
> 
> GPG Key ID: 58A2A58F
> Key fingerprint = 7734 2A1C 007D 581E BDF7  6AD5 0F1F 655F 58A2 A58F
> If it wasn't for C, we'd be using BASI, PASAL and OBOL.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: [rsyslog] Fun with liblognorm / rsyslog

Reply via email to