> -----Original Message----- > From: [email protected] [mailto:rsyslog- > [email protected]] On Behalf Of [email protected] > Sent: Monday, March 01, 2010 8:39 PM > To: rsyslog-users > Subject: Re: [rsyslog] Feedback requested: inconsistency in config > statements > > On Mon, 1 Mar 2010, Rainer Gerhards wrote: > > > Hi David, > > > > thanks for the explanation, I now see where you are heading too. I > have not > > dug into all details, but find the post very interesting and > insightful. I'll > > archive it so that I have it at hand when I can take the route to > actually > > implement such an approach. All in all, it still looks rather complex > to me. > > I guess it would be useful to have a guy with experience in writing > compiler > > code optimizers at hand when I tackle that beast ;) > > you may also want to dig into yacc/bison as they are designed to help > people build parser trees.
I know yacc but not bison. Yacc creates decent single-threading parsers out of a fixed set of rules. I never thought about using them for message parsing, because of the ample limitations it has (most notably unable to run on multiple threads!). But we need to keep in mind that now are talking about the filter expressions. That is a different beast. The thing to look at probably would be the gcc code optimizer, but I have to admit that is too large an effort for me right now. A good (dragon?;)) book on compiler optimization technics for expression trees would probably the right path to take. > I am assuming that this is the type of thing that syslog-ng is doing to > be > able to efficiantly handle a lot of log processing. I hardly believe so. I don't see how this could work out well... > I currently have a script that looks for a couple hundred programnames > and > puts the data into ~50 different files depending on what that name is. > Given that the output side of rsyslog is currently the bottleneck, I > would > not want to try and implement that in rsyslog today, but if rsyslog > gained > the ability to handle this sort of ruleset efficiantly it would greatly > simplify my life. Rather than trying to optimize the whole expression engine, this could probably done very easy with a normalizing parser. As part of the normalization, it could also create a meta-property "filename" that than is used inside the rule engine. I'd assume that the effort to create that is by far less than optimizing the expression tree. *This* is part of my approach of log normalizing. Rianer > > David Lang > > > Rainer > > > >> -----Original Message----- > >> From: [email protected] [mailto:rsyslog- > >> [email protected]] On Behalf Of [email protected] > >> Sent: Monday, March 01, 2010 3:16 AM > >> To: rsyslog-users > >> Subject: Re: [rsyslog] Feedback requested: inconsistency in config > >> statements > >> > >> On Sun, 28 Feb 2010, Rainer Gerhards wrote: > >> > >>> Hi David, > >>> > >>> yes, that makes much more sense to me. Let's call that filter > >> expression tree > >>> for the time being (the parse tree used for normalization has some > >> similar > >>> ideas, but a totally different scope and semantics). > >>> > >>> While I now see where you are heading, I fail to see how this could > >>> sufficiently easy be implemented for a real (and complex) > >> configuration. > >>> > >>> Let's look at this example conf: > >>> > >>> mail.* /Var/log/mail.log > >>> if programname startswith abc log to /abc > >>> if programname startswith 'acc' and MSG contains 'error' log to > /acc > >>> # we don't want debug messages from here on > >>> *.debug ~ > >>> if programname startswith 'acc' or (programname contains 'bde' and > >> (facility > >>>> 1 and > >>> facility < 5)) log to @1.1.1.1 > >>> if programname startswith bcd or MSG startswith '123' or MSG > contains > >>> 'error2' log to /bcd > >>> *.* /var/log/catchall > >>> > >>> I simply cannot envision how you would store this as a tree that > does > >> not > >>> look like what we have today. > >> > >> Ok, Here is what I came up with. > >> > >> 0-1,0-6 > >> --programname startswith abc log to /abc > >> | programname startswith 'acc' and MSG contains 'error' log to > /acc > >> \programname startswith 'acc' log to @1.1.1.1 > >> --programname contains 'bde' log to @1.1.1.1 > >> --MSG startswith '123' log to/bcd > >> --MSG contains 'error2' log to /bcd > >> --log to /var/log/catchall > >> 0-1,7 > >> --programname startswith abc log to /abc > >> \programname startswith 'acc' and MSG contains 'error' log to > /acc > >> 2,0-6 > >> --log to /var/log/mail.log > >> --programname startswith abc log to /abc > >> | programname startswith 'acc' and MSG contains 'error' log to > /acc > >> | programname startswith 'acc' log to @1.1.1.1 > >> \programname startswith bcd log to /bcd > >> --programname contains 'bde' log to @1.1.1.1 > >> --MSG startswith '123' log to/bcd > >> --MSG contains 'error2' log to /bcd > >> --log to /var/log/catchall > >> 2,7 > >> --log to /var/log/mail.log > >> --programname startswith abc log to /abc > >> \programname startswith 'acc' and MSG contains 'error' log to > /acc > >> 3-4,0-6 > >> --programname startswith abc log to /abc > >> | programname startswith 'acc' and MSG contains 'error' log to > /acc > >> | programname startswith 'acc' log to @1.1.1.1 > >> \programname startswith bcd log to /bcd > >> --programname contains 'bde' log to @1.1.1.1 > >> --MSG startswith '123' log to/bcd > >> --MSG contains 'error2' log to /bcd > >> --log to /var/log/catchall > >> 3-4,7 > >> --programname startswith abc log to /abc > >> \programname startswith 'acc' and MSG contains 'error' log to > /acc > >> 5-23,0-6 > >> --programname startswith abc log to /abc > >> | programname startswith 'acc' and MSG contains 'error' log to > /acc > >> \programname startswith 'acc' log to @1.1.1.1 > >> --programname contains 'bde' log to @1.1.1.1 > >> --MSG startswith '123' log to/bcd > >> --MSG contains 'error2' log to /bcd > >> --log to /var/log/catchall > >> 5-23,7 > >> --programname startswith abc log to /abc > >> \programname startswith 'acc' and MSG contains 'error' log to > /acc > >> > >> This looks very ugly and is definantly more complicated, but I think > it > >> still ends up being a slight (but not drastic) win. > >> > >> for the initial ruleset you have to check a minimum of 4 tests and a > >> maximum of 13 tests (debug messages have 4-5 tests, all others have > 12- > >> 13) > >> > >> Assuming the facility/priority lookup can be a table lookup, for the > >> tree-based ruleset you have a minimum of 2 tests and a maximum of 7 > >> tests, > >> in each case one of the tests is a multi-option comparison which is > >> slightly more expensive than a single 'startswith' test, but should > be > >> cheaper than two 'startswith' tests. > >> > >> This definantly trades off start time complexity for runtime > >> efficiancy. > >> > >> David Lang > >> > >> > >> and here is the long, ugly process I went through to derive this. > >> > >> > >> > >> first normalize the tests (replace 'or' with two lines, change > facility > >> and priority to ranges of numbers) > >> > >> facility = 2 log to /Var/log/mail.log > >> programname startswith abc log to /abc > >> programname startswith 'acc' and MSG contains 'error' log to /acc > >> severity 0..6 and programname startswith 'acc' log to @1.1.1.1 > >> severity 0..6 and programname startswith bcd log to /bcd > >> facility 2..4 and severity 0..6 and programname contains 'bde' log > to > >> @1.1.1.1 > >> severity 0..6 and MSG startswith '123' log to/bcd > >> severity 0..6 and MSG contains 'error2' log to /bcd > >> severity 0..6 and log to /var/log/catchall > >> > >> I then add the full facility/priority ranges > >> > >> > >> now normalize the facility/severity tests (split overlaps) > >> > >> we have the following ranges for facility > >> 0-1 > >> 2 > >> 3-4 > >> 5-23 > >> > >> and for severity > >> 0-6 > >> 7 > >> > >> to shorten the lines in the e-mail I will letter the remaining > >> tests/actions > >> > >> A) log to /Var/log/mail.log > >> B) programname startswith abc log to /abc > >> C) programname startswith 'acc' and MSG contains 'error' log to /acc > >> D) programname startswith 'acc' log to @1.1.1.1 > >> E) programname startswith bcd log to /bcd > >> F) programname contains 'bde' log to @1.1.1.1 > >> G) MSG startswith '123' log to/bcd > >> H) MSG contains 'error2' log to /bcd > >> I) log to /var/log/catchall > >> > >> so the tree then becomes (straight translation no reordering) > >> > >> 2,0-6 A > >> 2,7 A > >> 0-1,0-6 B > >> 0-1,7 B > >> 2,0-6 B > >> 2,7 B > >> 3-4,0-6 B > >> 3-4,7 B > >> 5-23,0-6 B > >> 5-23,7 B > >> 0-1,0-6 C > >> 0-1,7 C > >> 2,0-6 C > >> 2,7 C > >> 3-4,0-6 C > >> 3-4,7 C > >> 5-23,0-6 C > >> 5-23,7 C > >> 0-1,0-6 D > >> 2,0-6 D > >> 3-4,0-6 D > >> 5-23,0-6 D > >> 2,0-6 E > >> 3-4,0-6 E > >> 0-1,0-6 F > >> 2,0-6 F > >> 3-4,0-6 F > >> 5-23,0-6 F > >> 0-1,0-6 G > >> 2,0-6 G > >> 3-4,0-6 G > >> 5-23,0-6 G > >> 0-1,0-6 H > >> 2,0-6 H > >> 3-4,0-6 H > >> 5-23,0-6 H > >> 0-1,0-6 I > >> 2,0-6 I > >> 3-4,0-6 I > >> 5-23,0-6 I > >> > >> now to order them > >> > >> 0-1,0-6 B > >> 0-1,0-6 C > >> 0-1,0-6 D > >> 0-1,0-6 F > >> 0-1,0-6 G > >> 0-1,0-6 H > >> 0-1,0-6 I > >> 0-1,7 B > >> 0-1,7 C > >> 2,0-6 A > >> 2,0-6 B > >> 2,0-6 I > >> 2,0-6 H > >> 2,0-6 G > >> 2,0-6 F > >> 2,0-6 E > >> 2,0-6 D > >> 2,0-6 C > >> 2,7 A > >> 2,7 B > >> 2,7 C > >> 3-4,0-6 B > >> 3-4,0-6 C > >> 3-4,0-6 D > >> 3-4,0-6 E > >> 3-4,0-6 F > >> 3-4,0-6 G > >> 3-4,0-6 H > >> 3-4,0-6 I > >> 3-4,7 B > >> 3-4,7 C > >> 5-23,0-6 B > >> 5-23,0-6 C > >> 5-23,0-6 D > >> 5-23,0-6 F > >> 5-23,0-6 G > >> 5-23,0-6 H > >> 5-23,0-6 I > >> 5-23,7 B > >> 5-23,7 C > >> > >> I then simplify this to > >> > >> 0-1,0-6 B,C,D,F,G,H,I > >> 0-1,7 B,C > >> 2,0-6 A,B,C,D,E,F,G,H,I > >> 2,7 A,B,C > >> 3-4,0-6 B,C,D,E,F,G,H,I > >> 3-4,7 B,C > >> 5-23,0-6 B,C,D,F,G,H,I > >> 5-23,7 B,C > >> > >> looking at the lettered tests > >> > >> A) log to /Var/log/mail.log > >> B) programname startswith abc log to /abc > >> C) programname startswith 'acc' and MSG contains 'error' log to /acc > >> D) programname startswith 'acc' log to @1.1.1.1 > >> E) programname startswith bcd log to /bcd > >> F) programname contains 'bde' log to @1.1.1.1 > >> G) MSG startswith '123' log to/bcd > >> H) MSG contains 'error2' log to /bcd > >> I) log to /var/log/catchall > >> > >> the only ones that can be combined are B-E > >> > >> so if you make tests > >> > >> J) B,C > >> K) B,C,D > >> L) B,C,D,E > >> > >> the results simplify to > >> > >> 0-1,0-6 K,F,G,H,I > >> 0-1,7 J > >> 2,0-6 A,L,F,G,H,I > >> 2,7 A,J > >> 3-4,0-6 L,F,G,H,I > >> 3-4,7 J > >> 5-23,0-6 K,F,G,H,I > >> 5-23,7 J > >> > >> expanded out to show all tests > >> > >> 0-1,0-6 > >> --programname startswith abc log to /abc > >> | programname startswith 'acc' and MSG contains 'error' log to > /acc > >> \programname startswith 'acc' log to @1.1.1.1 > >> --programname contains 'bde' log to @1.1.1.1 > >> --MSG startswith '123' log to/bcd > >> --MSG contains 'error2' log to /bcd > >> --log to /var/log/catchall > >> 0-1,7 > >> --programname startswith abc log to /abc > >> \programname startswith 'acc' and MSG contains 'error' log to > /acc > >> 2,0-6 > >> --log to /var/log/mail.log > >> --programname startswith abc log to /abc > >> | programname startswith 'acc' and MSG contains 'error' log to > /acc > >> | programname startswith 'acc' log to @1.1.1.1 > >> \programname startswith bcd log to /bcd > >> --programname contains 'bde' log to @1.1.1.1 > >> --MSG startswith '123' log to/bcd > >> --MSG contains 'error2' log to /bcd > >> --log to /var/log/catchall > >> 2,7 > >> --log to /var/log/mail.log > >> --programname startswith abc log to /abc > >> \programname startswith 'acc' and MSG contains 'error' log to > /acc > >> 3-4,0-6 > >> --programname startswith abc log to /abc > >> | programname startswith 'acc' and MSG contains 'error' log to > /acc > >> | programname startswith 'acc' log to @1.1.1.1 > >> \programname startswith bcd log to /bcd > >> --programname contains 'bde' log to @1.1.1.1 > >> --MSG startswith '123' log to/bcd > >> --MSG contains 'error2' log to /bcd > >> --log to /var/log/catchall > >> 3-4,7 > >> --programname startswith abc log to /abc > >> \programname startswith 'acc' and MSG contains 'error' log to > /acc > >> 5-23,0-6 > >> --programname startswith abc log to /abc > >> | programname startswith 'acc' and MSG contains 'error' log to > /acc > >> \programname startswith 'acc' log to @1.1.1.1 > >> --programname contains 'bde' log to @1.1.1.1 > >> --MSG startswith '123' log to/bcd > >> --MSG contains 'error2' log to /bcd > >> --log to /var/log/catchall > >> 5-23,7 > >> --programname startswith abc log to /abc > >> \programname startswith 'acc' and MSG contains 'error' log to > /acc > >> > >>> Rainer > >>>> -----Original Message----- > >>>> From: [email protected] [mailto:rsyslog- > >>>> [email protected]] On Behalf Of [email protected] > >>>> Sent: Sunday, February 28, 2010 2:02 PM > >>>> To: rsyslog-users > >>>> Subject: Re: [rsyslog] Feedback requested: inconsistency in config > >>>> statements > >>>> > >>>> On Sun, 28 Feb 2010, Rainer Gerhards wrote: > >>>> > >>>>> Quickly from the mobile: pri based filters require *excatly* one > >>>> single word (!) lookup currently. You can't beat that. > >>>>> > >>>>> I am not totally sure how you think about parse trees. In my > view, > >>>> parsin and filteing are two different things./stages. Maybe we > have > >>>> different concepts on our minds? > >>>> > >>>> I am probably using the wrong terminology here. > >>>> > >>>> say you have a set of rules that say > >>>> > >>>> if programname startswith abc log to /abc > >>>> if programname startswith acc log to /acc > >>>> if programname startswith acc log to @1.1.1.1 > >>>> if programname startswith bcd log to /bcd > >>>> > >>>> and assuming that these were the only possible programnames for > >>>> simplicity > >>>> in the explination. > >>>> > >>>> the filtering logic would go somthing like this > >>>> > >>>> the rules would compile into a tree > >>>> > >>>> -a-b -> /abc > >>>> -c -> /acc,@1.1.1.1 > >>>> -b -> /bcd > >>>> > >>>> receive programname abc > >>>> I have no facility/Pri filter rules > >>>> I have no time filter rules > >>>> I have no system filter rules > >>>> I have programname filter rules > >>>> look at the first character of the programname > >>>> it's "a", I have more than one rule that fits that. > >>>> look at the second character of the programname > >>>> it's a "b", There are no other decisions to make, > >>>> invoke the action /abc > >>>> > >>>> receive progranmane bcd > >>>> I have no facility/PRI filter rules > >>>> I have no time filter rules > >>>> I have no system filter rules > >>>> I have programname filter rules > >>>> look at the first character of the programname > >>>> it's a "b", There are no other decisions to make, > >>>> invoke the action /bcd > >>>> > >>>> receive programname acc > >>>> I have no facility/Pri filter rules > >>>> I have no time filter rules > >>>> I have no system filter rules > >>>> I have programname filter rules > >>>> look at the first character of the programname > >>>> it's "a", I have more than one rule that fits that. > >>>> look at the second character of the programname > >>>> it's a "c", There are no other decisions to make, > >>>> invoke the action /acc and the action @1.1.1.1 > >>>> > >>>> similarly the facility/pri filtering rules would be either > compiled > >>>> into a > >>>> tree, or (since it's 256 entries total) into a lookup table with > >>>> pointers > >>>> to the list of actions to take for that entry > >>>> > >>>> the idea is to spend the time at startup to create the tree that > >>>> represents the ruleset in order to speed up the processing of each > >>>> individual message. > >>>> > >>>> the real tree would be a bit more complicated than I describe here > >> as > >>>> it > >>>> needs to have 'anything else' entries between the known entries > >> (which > >>>> means that it would not be able to shortcut quite as much as I > >>>> describe), > >>>> and have provision for 'do a more complicated thing (like regex) > >> here > >>>> and > >>>> if it matches continue'. but except for regex matches, this would > >> make > >>>> processing the rules pretty much independant of how many rules > there > >>>> were > >>>> or how complex the ruleset is. > >>>> > >>>> there doe snot need to be one single programname filter tree, if > you > >>>> had a > >>>> rule that said > >>>> if pri = info and programname startswith def then def > >>>> > >>>> the pri tree/table would have an entry for info that would point > to > >> a > >>>> filter tree for programname that just had the check for def in it > >>>> > >>>> am I makeing any more sense now? > >>>> > >>>> David Lang > >>>> > >>>>> rainer > >>>>> > >>>>> ----- Urspr?ngliche Nachricht ----- > >>>>> Von: "[email protected]" <[email protected]> > >>>>> An: "rsyslog-users" <[email protected]> > >>>>> Gesendet: 28.02.10 11:28 > >>>>> Betreff: Re: [rsyslog] Feedback requested: inconsistency in > config > >>>> statements > >>>>> > >>>>> On Sun, 28 Feb 2010, Rainer Gerhards wrote: > >>>>> > >>>>>>> > >>>>>>> if the type is depriciated then the destination would be > another > >>>>>>> config_option (which you would then look up) > >>>>>>> > >>>>>>> if the type is 'ignore' then it would just be skipped over > >>>> (possibly > >>>>>>> with > >>>>>>> a warning being logged that the config line is being ignored) > >>>>>>> > >>>>>>> this would also clean up some of the current cases where a > >> boolean > >>>>>>> option > >>>>>>> doesn't honor on/off and true/false. > >>>>>> > >>>>>> True/false is NOT a valid boolean option. See > >>>>>> > >>>>>> > >>>> > >> > http://git.adiscon.com/?p=rsyslog.git;a=blob;f=runtime/cfsysline.c;h=18 > >>>> 4c0d87 > >>>>>> 47c5dcbbd8230782ca096e477738efa9;hb=HEAD#l308 > >>>>> > >>>>> I was not remembering correctly then, maby it was yes/no vs > on/off. > >> I > >>>> know > >>>>> I ran into something where what was documented didn't work and I > >>>> changed > >>>>> it to another version of a boolean answer and it fixed the > problem. > >>>>> > >>>>>>> For the second half of the config language (telling rsyslog > what > >> to > >>>> do > >>>>>>> with the logs it has received) also has several variations in > >> place > >>>> and > >>>>>>> is > >>>>>>> showing issues. > >>>>>>> > >>>>>>> However I think that the right answer here is to end up > >>>> implementing > >>>>>>> something like the parsing trees and then compile the other > >> config > >>>>>>> language options into that tree to be consistant (and fast) > under > >>>> the > >>>>>>> covers, no matter which way the instructions are written > (except > >>>> when > >>>>>>> you > >>>>>>> have to use regex matches) > >>>>>> > >>>>>> I don't fully agree here with you. For example, the traditional > >> PRI > >>>> based > >>>>>> filter is lightning fast. I don't see any way it could be nearly > >> as > >>>> fast with > >>>>>> any unified approach. Right now, we have a "unified" filter > >>>> structure, but it > >>>>>> contains three filter cases, each one adding potential power at > >> the > >>>> price of > >>>>>> decreased speed. I think we need to keep different types of > >> filters > >>>> in order > >>>>>> to have some lightning-fast ones. But more of this could be done > >>>> under the > >>>>>> hood. > >>>>> > >>>>> My guess/expectation is that the tree-based parsing would be > about > >>>> the > >>>>> same speed as the traditional PRI based filter for choices made > >> based > >>>> on > >>>>> PRI, slowing down for other types of conditions only in that more > >> of > >>>> the > >>>>> message may need to be scanned (and if there is no selection at a > >>>> level, > >>>>> skipping that level should be lightning fast). As a result, a set > >> of > >>>>> filteres based soley on programname would be as fast as the > current > >>>> PRI > >>>>> filters. > >>>>> > >>>>> David Lang > >>>>> _______________________________________________ > >>>>> rsyslog mailing list > >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>>>> http://www.rsyslog.com > >>>>> _______________________________________________ > >>>>> rsyslog mailing list > >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>>>> http://www.rsyslog.com > >>>> _______________________________________________ > >>>> rsyslog mailing list > >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>>> http://www.rsyslog.com > >>> _______________________________________________ > >>> rsyslog mailing list > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>> http://www.rsyslog.com > >>> > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

