Hi David, as usual, many thanks for your great thoughts. I had a day of heavy hacking yesterday, thus the late response. See below...
> -----Original Message----- > From: [email protected] [mailto:rsyslog- > [email protected]] On Behalf Of [email protected] > Sent: Monday, October 11, 2010 10:33 PM > To: rsyslog-users > Subject: Re: [rsyslog] liblognorm > > On Mon, 11 Oct 2010, Rainer Gerhards wrote: > > > I have just written another post on the normalization library. It > looks like > > the design tends to favor a split into two libraries: > > > > http://blog.gerhards.net/2010/10/splitting-up-normalization- > library.html > > this seems like a good idea. > > there is a definate need for a good, efficient parsing tool that can be > used for high volume sites. There are a lot of tools that heavily use > regex matching, but those tend to collapse at high volumes. > > you can create your own parser with lex, yacc, bison, or flex, but th > work > needed to create the input config file for these (with their specific > syntax) is daunting. > > a tool that could take it's configuration in something that looks very > similar to log lines (with some sort of syntax to show the variable > part), > that would then compile into something very effient like the tools > above > would be very useful for a lot of different tools. That's the basic idea, instead that I do not intend to create e.g. lex source but rather have the engine do that part itself. The main advantage is that this could be done dynamically. I think this will be possible in almost constant time, as long as all fields can be parsed via primitive types (which do not require too much effort to back off). I will be working on the parse tree the next days, so you'll hopefully be able to get an idea of it by looking at the code. At it's heart, it simply is a radix tree, with constants and field syntaxes definig how the tree is traversed. > > this may just need to be a configuration generator for the tools listed > above that can take the list of annotated lines and create the > appropriate > config file to build the parser. If this can accept regex lines and > then compile them down to a parser tree it would be wonderful. Regex is a different beast, because for it you need to create a full-blown DFA, which also explains the slowness of regexes. I'll not tackle that beast. For some fields, I will support regex matches, but when they are used, performance is affected. The overall idea is that you usually do not need any regex/DFA at all. > > so once there is a high performance parser to pull the data apart, then > the question is what to do with it. > > some people will want to write it to various places, others will want > to > make decisions based on what is matched. > > for those who are wanting to write the normalized output to various > places, a plugin structure like rsyslog has (with the ability to format > the messages based on the various properties that are discovered) is > very > appealing, and it may make a lot of sense to see what can be done to > re-use that work. If so, there will need to be a 'format string' that > creates the output with all the properties that are known tagged, but > without including ones that didn't have any matches in this log > message. One thing I definitely intend to do is utilize the library in rsyslog. I envision a parser module that works based on the library. That also means rsyslog's core engine must be extended to support the additional fields, but something that can definitely be done. With that approach, no complex output engine is needed - one can just use the rsyslog plugin. And with a near O(1) algorithm, we can probably expect that this happens in real-time even for very large traffic loads (but probably not for the largest ones). It is important to know here that the current parsers also have some limited backout needs, for example for the date and tag/hostname fields. So this can be done quickly. > > for those who are wanting to then implement logic based on what it > gets, > thing get much more interesting. I suspect that the thing to do here > will > be to make the event normalization engine be something that can be a > library included in other programs (in various languages), something so > that you can have the config file be something along the lines of that's actually the idea. An initial sketch of the API is already in git and I hope to get some better-readbly doxgen-generated interface spec up later today. > > documentation (hopefully including a sample raw line) > line-to-match > function to call when matched > > there are a log of programs out there written to do good and > interesting > stuff with lines that it receives, if there was an ability to replace > their sequential 'does it match rule 1, does it match rule 2' logic > with a > more efficient parser it would be a huge win. > > I don't think you are wanting to tackle that portion of the task. The lib part yes, but that obviously requires a lot of changes for applications using it. Rainer > > David Lang > > > > Rainer > > > >> -----Original Message----- > >> From: [email protected] [mailto:rsyslog- > >> [email protected]] On Behalf Of Rainer Gerhards > >> Sent: Monday, October 11, 2010 9:01 AM > >> To: rsyslog-users > >> Subject: Re: [rsyslog] liblognorm vs. libeventnorm > >> > >> I would like to add as an argument pro liblognorm, that many people > >> probably > >> better understand what "log normalization" is whereas "event > >> normalization" > >> may sound strange. In that sense, liblognorm may be a better name. > >> Feedback > >> is appreciated. > >> > >> Rainer > >> > >>> -----Original Message----- > >>> From: [email protected] [mailto:rsyslog- > >>> [email protected]] On Behalf Of Rainer Gerhards > >>> Sent: Sunday, October 10, 2010 11:53 AM > >>> To: rsyslog-users > >>> Subject: [rsyslog] liblognorm vs. libeventnorm > >>> > >>> Hi all, > >>> > >>> I think I'll start with the libeventnorm name for the normalizing > >>> library > >>> instead of liblognorm. Reason here: > >>> > >>> http://blog.gerhards.net/2010/10/liblognorm-or-libeventnorm.html > >>> > >>> Further name suggestions or arguments are very welcome! > >>> > >>> Rainer > >>> _______________________________________________ > >>> rsyslog mailing list > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>> http://www.rsyslog.com > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

