Hi David, thanks for the explanation, I now see where you are heading too. I have not dug into all details, but find the post very interesting and insightful. I'll archive it so that I have it at hand when I can take the route to actually implement such an approach. All in all, it still looks rather complex to me. I guess it would be useful to have a guy with experience in writing compiler code optimizers at hand when I tackle that beast ;)
Rainer > -----Original Message----- > From: [email protected] [mailto:rsyslog- > [email protected]] On Behalf Of [email protected] > Sent: Monday, March 01, 2010 3:16 AM > To: rsyslog-users > Subject: Re: [rsyslog] Feedback requested: inconsistency in config > statements > > On Sun, 28 Feb 2010, Rainer Gerhards wrote: > > > Hi David, > > > > yes, that makes much more sense to me. Let's call that filter > expression tree > > for the time being (the parse tree used for normalization has some > similar > > ideas, but a totally different scope and semantics). > > > > While I now see where you are heading, I fail to see how this could > > sufficiently easy be implemented for a real (and complex) > configuration. > > > > Let's look at this example conf: > > > > mail.* /Var/log/mail.log > > if programname startswith abc log to /abc > > if programname startswith 'acc' and MSG contains 'error' log to /acc > > # we don't want debug messages from here on > > *.debug ~ > > if programname startswith 'acc' or (programname contains 'bde' and > (facility > >> 1 and > > facility < 5)) log to @1.1.1.1 > > if programname startswith bcd or MSG startswith '123' or MSG contains > > 'error2' log to /bcd > > *.* /var/log/catchall > > > > I simply cannot envision how you would store this as a tree that does > not > > look like what we have today. > > Ok, Here is what I came up with. > > 0-1,0-6 > --programname startswith abc log to /abc > | programname startswith 'acc' and MSG contains 'error' log to /acc > \programname startswith 'acc' log to @1.1.1.1 > --programname contains 'bde' log to @1.1.1.1 > --MSG startswith '123' log to/bcd > --MSG contains 'error2' log to /bcd > --log to /var/log/catchall > 0-1,7 > --programname startswith abc log to /abc > \programname startswith 'acc' and MSG contains 'error' log to /acc > 2,0-6 > --log to /var/log/mail.log > --programname startswith abc log to /abc > | programname startswith 'acc' and MSG contains 'error' log to /acc > | programname startswith 'acc' log to @1.1.1.1 > \programname startswith bcd log to /bcd > --programname contains 'bde' log to @1.1.1.1 > --MSG startswith '123' log to/bcd > --MSG contains 'error2' log to /bcd > --log to /var/log/catchall > 2,7 > --log to /var/log/mail.log > --programname startswith abc log to /abc > \programname startswith 'acc' and MSG contains 'error' log to /acc > 3-4,0-6 > --programname startswith abc log to /abc > | programname startswith 'acc' and MSG contains 'error' log to /acc > | programname startswith 'acc' log to @1.1.1.1 > \programname startswith bcd log to /bcd > --programname contains 'bde' log to @1.1.1.1 > --MSG startswith '123' log to/bcd > --MSG contains 'error2' log to /bcd > --log to /var/log/catchall > 3-4,7 > --programname startswith abc log to /abc > \programname startswith 'acc' and MSG contains 'error' log to /acc > 5-23,0-6 > --programname startswith abc log to /abc > | programname startswith 'acc' and MSG contains 'error' log to /acc > \programname startswith 'acc' log to @1.1.1.1 > --programname contains 'bde' log to @1.1.1.1 > --MSG startswith '123' log to/bcd > --MSG contains 'error2' log to /bcd > --log to /var/log/catchall > 5-23,7 > --programname startswith abc log to /abc > \programname startswith 'acc' and MSG contains 'error' log to /acc > > This looks very ugly and is definantly more complicated, but I think it > still ends up being a slight (but not drastic) win. > > for the initial ruleset you have to check a minimum of 4 tests and a > maximum of 13 tests (debug messages have 4-5 tests, all others have 12- > 13) > > Assuming the facility/priority lookup can be a table lookup, for the > tree-based ruleset you have a minimum of 2 tests and a maximum of 7 > tests, > in each case one of the tests is a multi-option comparison which is > slightly more expensive than a single 'startswith' test, but should be > cheaper than two 'startswith' tests. > > This definantly trades off start time complexity for runtime > efficiancy. > > David Lang > > > and here is the long, ugly process I went through to derive this. > > > > first normalize the tests (replace 'or' with two lines, change facility > and priority to ranges of numbers) > > facility = 2 log to /Var/log/mail.log > programname startswith abc log to /abc > programname startswith 'acc' and MSG contains 'error' log to /acc > severity 0..6 and programname startswith 'acc' log to @1.1.1.1 > severity 0..6 and programname startswith bcd log to /bcd > facility 2..4 and severity 0..6 and programname contains 'bde' log to > @1.1.1.1 > severity 0..6 and MSG startswith '123' log to/bcd > severity 0..6 and MSG contains 'error2' log to /bcd > severity 0..6 and log to /var/log/catchall > > I then add the full facility/priority ranges > > > now normalize the facility/severity tests (split overlaps) > > we have the following ranges for facility > 0-1 > 2 > 3-4 > 5-23 > > and for severity > 0-6 > 7 > > to shorten the lines in the e-mail I will letter the remaining > tests/actions > > A) log to /Var/log/mail.log > B) programname startswith abc log to /abc > C) programname startswith 'acc' and MSG contains 'error' log to /acc > D) programname startswith 'acc' log to @1.1.1.1 > E) programname startswith bcd log to /bcd > F) programname contains 'bde' log to @1.1.1.1 > G) MSG startswith '123' log to/bcd > H) MSG contains 'error2' log to /bcd > I) log to /var/log/catchall > > so the tree then becomes (straight translation no reordering) > > 2,0-6 A > 2,7 A > 0-1,0-6 B > 0-1,7 B > 2,0-6 B > 2,7 B > 3-4,0-6 B > 3-4,7 B > 5-23,0-6 B > 5-23,7 B > 0-1,0-6 C > 0-1,7 C > 2,0-6 C > 2,7 C > 3-4,0-6 C > 3-4,7 C > 5-23,0-6 C > 5-23,7 C > 0-1,0-6 D > 2,0-6 D > 3-4,0-6 D > 5-23,0-6 D > 2,0-6 E > 3-4,0-6 E > 0-1,0-6 F > 2,0-6 F > 3-4,0-6 F > 5-23,0-6 F > 0-1,0-6 G > 2,0-6 G > 3-4,0-6 G > 5-23,0-6 G > 0-1,0-6 H > 2,0-6 H > 3-4,0-6 H > 5-23,0-6 H > 0-1,0-6 I > 2,0-6 I > 3-4,0-6 I > 5-23,0-6 I > > now to order them > > 0-1,0-6 B > 0-1,0-6 C > 0-1,0-6 D > 0-1,0-6 F > 0-1,0-6 G > 0-1,0-6 H > 0-1,0-6 I > 0-1,7 B > 0-1,7 C > 2,0-6 A > 2,0-6 B > 2,0-6 I > 2,0-6 H > 2,0-6 G > 2,0-6 F > 2,0-6 E > 2,0-6 D > 2,0-6 C > 2,7 A > 2,7 B > 2,7 C > 3-4,0-6 B > 3-4,0-6 C > 3-4,0-6 D > 3-4,0-6 E > 3-4,0-6 F > 3-4,0-6 G > 3-4,0-6 H > 3-4,0-6 I > 3-4,7 B > 3-4,7 C > 5-23,0-6 B > 5-23,0-6 C > 5-23,0-6 D > 5-23,0-6 F > 5-23,0-6 G > 5-23,0-6 H > 5-23,0-6 I > 5-23,7 B > 5-23,7 C > > I then simplify this to > > 0-1,0-6 B,C,D,F,G,H,I > 0-1,7 B,C > 2,0-6 A,B,C,D,E,F,G,H,I > 2,7 A,B,C > 3-4,0-6 B,C,D,E,F,G,H,I > 3-4,7 B,C > 5-23,0-6 B,C,D,F,G,H,I > 5-23,7 B,C > > looking at the lettered tests > > A) log to /Var/log/mail.log > B) programname startswith abc log to /abc > C) programname startswith 'acc' and MSG contains 'error' log to /acc > D) programname startswith 'acc' log to @1.1.1.1 > E) programname startswith bcd log to /bcd > F) programname contains 'bde' log to @1.1.1.1 > G) MSG startswith '123' log to/bcd > H) MSG contains 'error2' log to /bcd > I) log to /var/log/catchall > > the only ones that can be combined are B-E > > so if you make tests > > J) B,C > K) B,C,D > L) B,C,D,E > > the results simplify to > > 0-1,0-6 K,F,G,H,I > 0-1,7 J > 2,0-6 A,L,F,G,H,I > 2,7 A,J > 3-4,0-6 L,F,G,H,I > 3-4,7 J > 5-23,0-6 K,F,G,H,I > 5-23,7 J > > expanded out to show all tests > > 0-1,0-6 > --programname startswith abc log to /abc > | programname startswith 'acc' and MSG contains 'error' log to /acc > \programname startswith 'acc' log to @1.1.1.1 > --programname contains 'bde' log to @1.1.1.1 > --MSG startswith '123' log to/bcd > --MSG contains 'error2' log to /bcd > --log to /var/log/catchall > 0-1,7 > --programname startswith abc log to /abc > \programname startswith 'acc' and MSG contains 'error' log to /acc > 2,0-6 > --log to /var/log/mail.log > --programname startswith abc log to /abc > | programname startswith 'acc' and MSG contains 'error' log to /acc > | programname startswith 'acc' log to @1.1.1.1 > \programname startswith bcd log to /bcd > --programname contains 'bde' log to @1.1.1.1 > --MSG startswith '123' log to/bcd > --MSG contains 'error2' log to /bcd > --log to /var/log/catchall > 2,7 > --log to /var/log/mail.log > --programname startswith abc log to /abc > \programname startswith 'acc' and MSG contains 'error' log to /acc > 3-4,0-6 > --programname startswith abc log to /abc > | programname startswith 'acc' and MSG contains 'error' log to /acc > | programname startswith 'acc' log to @1.1.1.1 > \programname startswith bcd log to /bcd > --programname contains 'bde' log to @1.1.1.1 > --MSG startswith '123' log to/bcd > --MSG contains 'error2' log to /bcd > --log to /var/log/catchall > 3-4,7 > --programname startswith abc log to /abc > \programname startswith 'acc' and MSG contains 'error' log to /acc > 5-23,0-6 > --programname startswith abc log to /abc > | programname startswith 'acc' and MSG contains 'error' log to /acc > \programname startswith 'acc' log to @1.1.1.1 > --programname contains 'bde' log to @1.1.1.1 > --MSG startswith '123' log to/bcd > --MSG contains 'error2' log to /bcd > --log to /var/log/catchall > 5-23,7 > --programname startswith abc log to /abc > \programname startswith 'acc' and MSG contains 'error' log to /acc > > > Rainer > >> -----Original Message----- > >> From: [email protected] [mailto:rsyslog- > >> [email protected]] On Behalf Of [email protected] > >> Sent: Sunday, February 28, 2010 2:02 PM > >> To: rsyslog-users > >> Subject: Re: [rsyslog] Feedback requested: inconsistency in config > >> statements > >> > >> On Sun, 28 Feb 2010, Rainer Gerhards wrote: > >> > >>> Quickly from the mobile: pri based filters require *excatly* one > >> single word (!) lookup currently. You can't beat that. > >>> > >>> I am not totally sure how you think about parse trees. In my view, > >> parsin and filteing are two different things./stages. Maybe we have > >> different concepts on our minds? > >> > >> I am probably using the wrong terminology here. > >> > >> say you have a set of rules that say > >> > >> if programname startswith abc log to /abc > >> if programname startswith acc log to /acc > >> if programname startswith acc log to @1.1.1.1 > >> if programname startswith bcd log to /bcd > >> > >> and assuming that these were the only possible programnames for > >> simplicity > >> in the explination. > >> > >> the filtering logic would go somthing like this > >> > >> the rules would compile into a tree > >> > >> -a-b -> /abc > >> -c -> /acc,@1.1.1.1 > >> -b -> /bcd > >> > >> receive programname abc > >> I have no facility/Pri filter rules > >> I have no time filter rules > >> I have no system filter rules > >> I have programname filter rules > >> look at the first character of the programname > >> it's "a", I have more than one rule that fits that. > >> look at the second character of the programname > >> it's a "b", There are no other decisions to make, > >> invoke the action /abc > >> > >> receive progranmane bcd > >> I have no facility/PRI filter rules > >> I have no time filter rules > >> I have no system filter rules > >> I have programname filter rules > >> look at the first character of the programname > >> it's a "b", There are no other decisions to make, > >> invoke the action /bcd > >> > >> receive programname acc > >> I have no facility/Pri filter rules > >> I have no time filter rules > >> I have no system filter rules > >> I have programname filter rules > >> look at the first character of the programname > >> it's "a", I have more than one rule that fits that. > >> look at the second character of the programname > >> it's a "c", There are no other decisions to make, > >> invoke the action /acc and the action @1.1.1.1 > >> > >> similarly the facility/pri filtering rules would be either compiled > >> into a > >> tree, or (since it's 256 entries total) into a lookup table with > >> pointers > >> to the list of actions to take for that entry > >> > >> the idea is to spend the time at startup to create the tree that > >> represents the ruleset in order to speed up the processing of each > >> individual message. > >> > >> the real tree would be a bit more complicated than I describe here > as > >> it > >> needs to have 'anything else' entries between the known entries > (which > >> means that it would not be able to shortcut quite as much as I > >> describe), > >> and have provision for 'do a more complicated thing (like regex) > here > >> and > >> if it matches continue'. but except for regex matches, this would > make > >> processing the rules pretty much independant of how many rules there > >> were > >> or how complex the ruleset is. > >> > >> there doe snot need to be one single programname filter tree, if you > >> had a > >> rule that said > >> if pri = info and programname startswith def then def > >> > >> the pri tree/table would have an entry for info that would point to > a > >> filter tree for programname that just had the check for def in it > >> > >> am I makeing any more sense now? > >> > >> David Lang > >> > >>> rainer > >>> > >>> ----- Urspr?ngliche Nachricht ----- > >>> Von: "[email protected]" <[email protected]> > >>> An: "rsyslog-users" <[email protected]> > >>> Gesendet: 28.02.10 11:28 > >>> Betreff: Re: [rsyslog] Feedback requested: inconsistency in config > >> statements > >>> > >>> On Sun, 28 Feb 2010, Rainer Gerhards wrote: > >>> > >>>>> > >>>>> if the type is depriciated then the destination would be another > >>>>> config_option (which you would then look up) > >>>>> > >>>>> if the type is 'ignore' then it would just be skipped over > >> (possibly > >>>>> with > >>>>> a warning being logged that the config line is being ignored) > >>>>> > >>>>> this would also clean up some of the current cases where a > boolean > >>>>> option > >>>>> doesn't honor on/off and true/false. > >>>> > >>>> True/false is NOT a valid boolean option. See > >>>> > >>>> > >> > http://git.adiscon.com/?p=rsyslog.git;a=blob;f=runtime/cfsysline.c;h=18 > >> 4c0d87 > >>>> 47c5dcbbd8230782ca096e477738efa9;hb=HEAD#l308 > >>> > >>> I was not remembering correctly then, maby it was yes/no vs on/off. > I > >> know > >>> I ran into something where what was documented didn't work and I > >> changed > >>> it to another version of a boolean answer and it fixed the problem. > >>> > >>>>> For the second half of the config language (telling rsyslog what > to > >> do > >>>>> with the logs it has received) also has several variations in > place > >> and > >>>>> is > >>>>> showing issues. > >>>>> > >>>>> However I think that the right answer here is to end up > >> implementing > >>>>> something like the parsing trees and then compile the other > config > >>>>> language options into that tree to be consistant (and fast) under > >> the > >>>>> covers, no matter which way the instructions are written (except > >> when > >>>>> you > >>>>> have to use regex matches) > >>>> > >>>> I don't fully agree here with you. For example, the traditional > PRI > >> based > >>>> filter is lightning fast. I don't see any way it could be nearly > as > >> fast with > >>>> any unified approach. Right now, we have a "unified" filter > >> structure, but it > >>>> contains three filter cases, each one adding potential power at > the > >> price of > >>>> decreased speed. I think we need to keep different types of > filters > >> in order > >>>> to have some lightning-fast ones. But more of this could be done > >> under the > >>>> hood. > >>> > >>> My guess/expectation is that the tree-based parsing would be about > >> the > >>> same speed as the traditional PRI based filter for choices made > based > >> on > >>> PRI, slowing down for other types of conditions only in that more > of > >> the > >>> message may need to be scanned (and if there is no selection at a > >> level, > >>> skipping that level should be lightning fast). As a result, a set > of > >>> filteres based soley on programname would be as fast as the current > >> PRI > >>> filters. > >>> > >>> David Lang > >>> _______________________________________________ > >>> rsyslog mailing list > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>> http://www.rsyslog.com > >>> _______________________________________________ > >>> rsyslog mailing list > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>> http://www.rsyslog.com > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

