Hi David,

thanks for the explanation, I now see where you are heading too. I have not
dug into all details, but find the post very interesting and insightful. I'll
archive it so that I have it at hand when I can take the route to actually
implement such an approach. All in all, it still looks rather complex to me.
I guess it would be useful to have a guy with experience in writing compiler
code optimizers at hand when I tackle that beast ;)

Rainer

> -----Original Message-----
> From: [email protected] [mailto:rsyslog-
> [email protected]] On Behalf Of [email protected]
> Sent: Monday, March 01, 2010 3:16 AM
> To: rsyslog-users
> Subject: Re: [rsyslog] Feedback requested: inconsistency in config
> statements
> 
> On Sun, 28 Feb 2010, Rainer Gerhards wrote:
> 
> > Hi David,
> >
> > yes, that makes much more sense to me. Let's call that filter
> expression tree
> > for the time being (the parse tree used for normalization has some
> similar
> > ideas, but a totally different scope and semantics).
> >
> > While I now see where you are heading, I fail to see how this could
> > sufficiently easy be implemented for a real (and complex)
> configuration.
> >
> > Let's look at this example conf:
> >
> > mail.* /Var/log/mail.log
> > if programname startswith abc log to /abc
> > if programname startswith 'acc' and MSG contains 'error' log to /acc
> > # we don't want debug messages from here on
> > *.debug ~
> > if programname startswith 'acc' or (programname contains 'bde' and
> (facility
> >> 1 and
> > facility < 5)) log to @1.1.1.1
> > if programname startswith bcd or MSG startswith '123' or MSG contains
> > 'error2' log to /bcd
> > *.* /var/log/catchall
> >
> > I simply cannot envision how you would store this as a tree that does
> not
> > look like what we have today.
> 
> Ok, Here is what I came up with.
> 
> 0-1,0-6
>    --programname startswith abc log to /abc
>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>     \programname startswith 'acc' log to @1.1.1.1
>    --programname contains 'bde' log to @1.1.1.1
>    --MSG startswith '123' log to/bcd
>    --MSG contains 'error2' log to /bcd
>    --log to /var/log/catchall
> 0-1,7
>    --programname startswith abc log to /abc
>     \programname startswith 'acc' and MSG contains 'error' log to /acc
> 2,0-6
>    --log to /var/log/mail.log
>    --programname startswith abc log to /abc
>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>    | programname startswith 'acc' log to @1.1.1.1
>     \programname startswith bcd log to /bcd
>    --programname contains 'bde' log to @1.1.1.1
>    --MSG startswith '123' log to/bcd
>    --MSG contains 'error2' log to /bcd
>    --log to /var/log/catchall
> 2,7
>    --log to /var/log/mail.log
>    --programname startswith abc log to /abc
>     \programname startswith 'acc' and MSG contains 'error' log to /acc
> 3-4,0-6
>    --programname startswith abc log to /abc
>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>    | programname startswith 'acc' log to @1.1.1.1
>     \programname startswith bcd log to /bcd
>    --programname contains 'bde' log to @1.1.1.1
>    --MSG startswith '123' log to/bcd
>    --MSG contains 'error2' log to /bcd
>    --log to /var/log/catchall
> 3-4,7
>    --programname startswith abc log to /abc
>     \programname startswith 'acc' and MSG contains 'error' log to /acc
> 5-23,0-6
>    --programname startswith abc log to /abc
>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>     \programname startswith 'acc' log to @1.1.1.1
>    --programname contains 'bde' log to @1.1.1.1
>    --MSG startswith '123' log to/bcd
>    --MSG contains 'error2' log to /bcd
>    --log to /var/log/catchall
> 5-23,7
>    --programname startswith abc log to /abc
>     \programname startswith 'acc' and MSG contains 'error' log to /acc
> 
> This looks very ugly and is definantly more complicated, but I think it
> still ends up being a slight (but not drastic) win.
> 
> for the initial ruleset you have to check a minimum of 4 tests and a
> maximum of 13 tests (debug messages have 4-5 tests, all others have 12-
> 13)
> 
> Assuming the facility/priority lookup can be a table lookup, for the
> tree-based ruleset you have a minimum of 2 tests and a maximum of 7
> tests,
> in each case one of the tests is a multi-option comparison which is
> slightly more expensive than a single 'startswith' test, but should be
> cheaper than two 'startswith' tests.
> 
> This definantly trades off start time complexity for runtime
> efficiancy.
> 
> David Lang
> 
> 
> and here is the long, ugly process I went through to derive this.
> 
> 
> 
> first normalize the tests (replace 'or' with two lines, change facility
> and priority to ranges of numbers)
> 
> facility = 2 log to /Var/log/mail.log
> programname startswith abc log to /abc
> programname startswith 'acc' and MSG contains 'error' log to /acc
> severity 0..6 and programname startswith 'acc' log to @1.1.1.1
> severity 0..6 and programname startswith bcd log to /bcd
> facility 2..4 and severity 0..6 and programname contains 'bde' log to
> @1.1.1.1
> severity 0..6 and MSG startswith '123' log to/bcd
> severity 0..6 and MSG contains 'error2' log to /bcd
> severity 0..6 and log to /var/log/catchall
> 
> I then add the full facility/priority ranges
> 
> 
> now normalize the facility/severity tests (split overlaps)
> 
> we have the following ranges for facility
> 0-1
> 2
> 3-4
> 5-23
> 
> and for severity
> 0-6
> 7
> 
> to shorten the lines in the e-mail I will letter the remaining
> tests/actions
> 
> A) log to /Var/log/mail.log
> B) programname startswith abc log to /abc
> C) programname startswith 'acc' and MSG contains 'error' log to /acc
> D) programname startswith 'acc' log to @1.1.1.1
> E) programname startswith bcd log to /bcd
> F) programname contains 'bde' log to @1.1.1.1
> G) MSG startswith '123' log to/bcd
> H) MSG contains 'error2' log to /bcd
> I) log to /var/log/catchall
> 
> so the tree then becomes (straight translation no reordering)
> 
> 2,0-6 A
> 2,7 A
> 0-1,0-6 B
> 0-1,7 B
> 2,0-6 B
> 2,7 B
> 3-4,0-6 B
> 3-4,7 B
> 5-23,0-6 B
> 5-23,7 B
> 0-1,0-6 C
> 0-1,7 C
> 2,0-6 C
> 2,7 C
> 3-4,0-6 C
> 3-4,7 C
> 5-23,0-6 C
> 5-23,7 C
> 0-1,0-6 D
> 2,0-6 D
> 3-4,0-6 D
> 5-23,0-6 D
> 2,0-6 E
> 3-4,0-6 E
> 0-1,0-6 F
> 2,0-6 F
> 3-4,0-6 F
> 5-23,0-6 F
> 0-1,0-6 G
> 2,0-6 G
> 3-4,0-6 G
> 5-23,0-6 G
> 0-1,0-6 H
> 2,0-6 H
> 3-4,0-6 H
> 5-23,0-6 H
> 0-1,0-6 I
> 2,0-6 I
> 3-4,0-6 I
> 5-23,0-6 I
> 
> now to order them
> 
> 0-1,0-6 B
> 0-1,0-6 C
> 0-1,0-6 D
> 0-1,0-6 F
> 0-1,0-6 G
> 0-1,0-6 H
> 0-1,0-6 I
> 0-1,7 B
> 0-1,7 C
> 2,0-6 A
> 2,0-6 B
> 2,0-6 I
> 2,0-6 H
> 2,0-6 G
> 2,0-6 F
> 2,0-6 E
> 2,0-6 D
> 2,0-6 C
> 2,7 A
> 2,7 B
> 2,7 C
> 3-4,0-6 B
> 3-4,0-6 C
> 3-4,0-6 D
> 3-4,0-6 E
> 3-4,0-6 F
> 3-4,0-6 G
> 3-4,0-6 H
> 3-4,0-6 I
> 3-4,7 B
> 3-4,7 C
> 5-23,0-6 B
> 5-23,0-6 C
> 5-23,0-6 D
> 5-23,0-6 F
> 5-23,0-6 G
> 5-23,0-6 H
> 5-23,0-6 I
> 5-23,7 B
> 5-23,7 C
> 
> I then simplify this to
> 
> 0-1,0-6 B,C,D,F,G,H,I
> 0-1,7 B,C
> 2,0-6 A,B,C,D,E,F,G,H,I
> 2,7 A,B,C
> 3-4,0-6 B,C,D,E,F,G,H,I
> 3-4,7 B,C
> 5-23,0-6 B,C,D,F,G,H,I
> 5-23,7 B,C
> 
> looking at the lettered tests
> 
> A) log to /Var/log/mail.log
> B) programname startswith abc log to /abc
> C) programname startswith 'acc' and MSG contains 'error' log to /acc
> D) programname startswith 'acc' log to @1.1.1.1
> E) programname startswith bcd log to /bcd
> F) programname contains 'bde' log to @1.1.1.1
> G) MSG startswith '123' log to/bcd
> H) MSG contains 'error2' log to /bcd
> I) log to /var/log/catchall
> 
> the only ones that can be combined are B-E
> 
> so if you make tests
> 
> J) B,C
> K) B,C,D
> L) B,C,D,E
> 
> the results simplify to
> 
> 0-1,0-6 K,F,G,H,I
> 0-1,7 J
> 2,0-6 A,L,F,G,H,I
> 2,7 A,J
> 3-4,0-6 L,F,G,H,I
> 3-4,7 J
> 5-23,0-6 K,F,G,H,I
> 5-23,7 J
> 
> expanded out to show all tests
> 
> 0-1,0-6
>    --programname startswith abc log to /abc
>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>     \programname startswith 'acc' log to @1.1.1.1
>    --programname contains 'bde' log to @1.1.1.1
>    --MSG startswith '123' log to/bcd
>    --MSG contains 'error2' log to /bcd
>    --log to /var/log/catchall
> 0-1,7
>    --programname startswith abc log to /abc
>     \programname startswith 'acc' and MSG contains 'error' log to /acc
> 2,0-6
>    --log to /var/log/mail.log
>    --programname startswith abc log to /abc
>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>    | programname startswith 'acc' log to @1.1.1.1
>     \programname startswith bcd log to /bcd
>    --programname contains 'bde' log to @1.1.1.1
>    --MSG startswith '123' log to/bcd
>    --MSG contains 'error2' log to /bcd
>    --log to /var/log/catchall
> 2,7
>    --log to /var/log/mail.log
>    --programname startswith abc log to /abc
>     \programname startswith 'acc' and MSG contains 'error' log to /acc
> 3-4,0-6
>    --programname startswith abc log to /abc
>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>    | programname startswith 'acc' log to @1.1.1.1
>     \programname startswith bcd log to /bcd
>    --programname contains 'bde' log to @1.1.1.1
>    --MSG startswith '123' log to/bcd
>    --MSG contains 'error2' log to /bcd
>    --log to /var/log/catchall
> 3-4,7
>    --programname startswith abc log to /abc
>     \programname startswith 'acc' and MSG contains 'error' log to /acc
> 5-23,0-6
>    --programname startswith abc log to /abc
>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>     \programname startswith 'acc' log to @1.1.1.1
>    --programname contains 'bde' log to @1.1.1.1
>    --MSG startswith '123' log to/bcd
>    --MSG contains 'error2' log to /bcd
>    --log to /var/log/catchall
> 5-23,7
>    --programname startswith abc log to /abc
>     \programname startswith 'acc' and MSG contains 'error' log to /acc
> 
> > Rainer
> >> -----Original Message-----
> >> From: [email protected] [mailto:rsyslog-
> >> [email protected]] On Behalf Of [email protected]
> >> Sent: Sunday, February 28, 2010 2:02 PM
> >> To: rsyslog-users
> >> Subject: Re: [rsyslog] Feedback requested: inconsistency in config
> >> statements
> >>
> >> On Sun, 28 Feb 2010, Rainer Gerhards wrote:
> >>
> >>> Quickly from the mobile: pri based filters require *excatly* one
> >> single word (!) lookup currently. You can't beat that.
> >>>
> >>> I am not totally sure how you think about parse trees. In my view,
> >> parsin and filteing are two different things./stages. Maybe we have
> >> different concepts on our minds?
> >>
> >> I am probably using the wrong terminology here.
> >>
> >> say you have a set of rules that say
> >>
> >> if programname startswith abc log to /abc
> >> if programname startswith acc log to /acc
> >> if programname startswith acc log to @1.1.1.1
> >> if programname startswith bcd log to /bcd
> >>
> >> and assuming that these were the only possible programnames for
> >> simplicity
> >> in the explination.
> >>
> >> the filtering logic would go somthing like this
> >>
> >> the rules would compile into a tree
> >>
> >> -a-b -> /abc
> >>    -c -> /acc,@1.1.1.1
> >> -b   -> /bcd
> >>
> >> receive programname abc
> >> I have no facility/Pri filter rules
> >> I have no time filter rules
> >> I have no system filter rules
> >> I have programname filter rules
> >>    look at the first character of the programname
> >>     it's "a", I have more than one rule that fits that.
> >>      look at the second character of the programname
> >>       it's a "b", There are no other decisions to make,
> >>         invoke the action /abc
> >>
> >> receive progranmane bcd
> >> I have no facility/PRI filter rules
> >> I have no time filter rules
> >> I have no system filter rules
> >> I have programname filter rules
> >>    look at the first character of the programname
> >>     it's a "b", There are no other decisions to make,
> >>       invoke the action /bcd
> >>
> >> receive programname acc
> >> I have no facility/Pri filter rules
> >> I have no time filter rules
> >> I have no system filter rules
> >> I have programname filter rules
> >>    look at the first character of the programname
> >>     it's "a", I have more than one rule that fits that.
> >>      look at the second character of the programname
> >>       it's a "c", There are no other decisions to make,
> >>         invoke the action /acc and the action @1.1.1.1
> >>
> >> similarly the facility/pri filtering rules would be either compiled
> >> into a
> >> tree, or (since it's 256 entries total) into a lookup table with
> >> pointers
> >> to the list of actions to take for that entry
> >>
> >> the idea is to spend the time at startup to create the tree that
> >> represents the ruleset in order to speed up the processing of each
> >> individual message.
> >>
> >> the real tree would be a bit more complicated than I describe here
> as
> >> it
> >> needs to have 'anything else' entries between the known entries
> (which
> >> means that it would not be able to shortcut quite as much as I
> >> describe),
> >> and have provision for 'do a more complicated thing (like regex)
> here
> >> and
> >> if it matches continue'. but except for regex matches, this would
> make
> >> processing the rules pretty much independant of how many rules there
> >> were
> >> or how complex the ruleset is.
> >>
> >> there doe snot need to be one single programname filter tree, if you
> >> had a
> >> rule that said
> >> if pri = info and programname startswith def then def
> >>
> >> the pri tree/table would have an entry for info that would point to
> a
> >> filter tree for programname that just had the check for def in it
> >>
> >> am I makeing any more sense now?
> >>
> >> David Lang
> >>
> >>> rainer
> >>>
> >>> ----- Urspr?ngliche Nachricht -----
> >>> Von: "[email protected]" <[email protected]>
> >>> An: "rsyslog-users" <[email protected]>
> >>> Gesendet: 28.02.10 11:28
> >>> Betreff: Re: [rsyslog] Feedback requested: inconsistency in config
> >> statements
> >>>
> >>> On Sun, 28 Feb 2010, Rainer Gerhards wrote:
> >>>
> >>>>>
> >>>>> if the type is depriciated then the destination would be another
> >>>>> config_option (which you would then look up)
> >>>>>
> >>>>> if the type is 'ignore' then it would just be skipped over
> >> (possibly
> >>>>> with
> >>>>> a warning being logged that the config line is being ignored)
> >>>>>
> >>>>> this would also clean up some of the current cases where a
> boolean
> >>>>> option
> >>>>> doesn't honor on/off and true/false.
> >>>>
> >>>> True/false is NOT a valid boolean option. See
> >>>>
> >>>>
> >>
> http://git.adiscon.com/?p=rsyslog.git;a=blob;f=runtime/cfsysline.c;h=18
> >> 4c0d87
> >>>> 47c5dcbbd8230782ca096e477738efa9;hb=HEAD#l308
> >>>
> >>> I was not remembering correctly then, maby it was yes/no vs on/off.
> I
> >> know
> >>> I ran into something where what was documented didn't work and I
> >> changed
> >>> it to another version of a boolean answer and it fixed the problem.
> >>>
> >>>>> For the second half of the config language (telling rsyslog what
> to
> >> do
> >>>>> with the logs it has received) also has several variations in
> place
> >> and
> >>>>> is
> >>>>> showing issues.
> >>>>>
> >>>>> However I think that the right answer here is to end up
> >> implementing
> >>>>> something like the parsing trees and then compile the other
> config
> >>>>> language options into that tree to be consistant (and fast) under
> >> the
> >>>>> covers, no matter which way the instructions are written (except
> >> when
> >>>>> you
> >>>>> have to use regex matches)
> >>>>
> >>>> I don't fully agree here with you. For example, the traditional
> PRI
> >> based
> >>>> filter is lightning fast. I don't see any way it could be nearly
> as
> >> fast with
> >>>> any unified approach. Right now, we have a "unified" filter
> >> structure, but it
> >>>> contains three filter cases, each one adding potential power at
> the
> >> price of
> >>>> decreased speed. I think we need to keep different types of
> filters
> >> in order
> >>>> to have some lightning-fast ones. But more of this could be done
> >> under the
> >>>> hood.
> >>>
> >>> My guess/expectation is that the tree-based parsing would be about
> >> the
> >>> same speed as the traditional PRI based filter for choices made
> based
> >> on
> >>> PRI, slowing down for other types of conditions only in that more
> of
> >> the
> >>> message may need to be scanned (and if there is no selection at a
> >> level,
> >>> skipping that level should be lightning fast). As a result, a set
> of
> >>> filteres based soley on programname would be as fast as the current
> >> PRI
> >>> filters.
> >>>
> >>> David Lang
> >>> _______________________________________________
> >>> rsyslog mailing list
> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>> http://www.rsyslog.com
> >>> _______________________________________________
> >>> rsyslog mailing list
> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>> http://www.rsyslog.com
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to