On Sun, 28 Feb 2010, Rainer Gerhards wrote:
> Hi David,
>
> yes, that makes much more sense to me. Let's call that filter expression tree
> for the time being (the parse tree used for normalization has some similar
> ideas, but a totally different scope and semantics).
>
> While I now see where you are heading, I fail to see how this could
> sufficiently easy be implemented for a real (and complex) configuration.
>
> Let's look at this example conf:
>
> mail.* /Var/log/mail.log
> if programname startswith abc log to /abc
> if programname startswith 'acc' and MSG contains 'error' log to /acc
> # we don't want debug messages from here on
> *.debug ~
> if programname startswith 'acc' or (programname contains 'bde' and (facility
>> 1 and
> facility < 5)) log to @1.1.1.1
> if programname startswith bcd or MSG startswith '123' or MSG contains
> 'error2' log to /bcd
> *.* /var/log/catchall
>
> I simply cannot envision how you would store this as a tree that does not
> look like what we have today.
Ok, Here is what I came up with.
0-1,0-6
--programname startswith abc log to /abc
| programname startswith 'acc' and MSG contains 'error' log to /acc
\programname startswith 'acc' log to @1.1.1.1
--programname contains 'bde' log to @1.1.1.1
--MSG startswith '123' log to/bcd
--MSG contains 'error2' log to /bcd
--log to /var/log/catchall
0-1,7
--programname startswith abc log to /abc
\programname startswith 'acc' and MSG contains 'error' log to /acc
2,0-6
--log to /var/log/mail.log
--programname startswith abc log to /abc
| programname startswith 'acc' and MSG contains 'error' log to /acc
| programname startswith 'acc' log to @1.1.1.1
\programname startswith bcd log to /bcd
--programname contains 'bde' log to @1.1.1.1
--MSG startswith '123' log to/bcd
--MSG contains 'error2' log to /bcd
--log to /var/log/catchall
2,7
--log to /var/log/mail.log
--programname startswith abc log to /abc
\programname startswith 'acc' and MSG contains 'error' log to /acc
3-4,0-6
--programname startswith abc log to /abc
| programname startswith 'acc' and MSG contains 'error' log to /acc
| programname startswith 'acc' log to @1.1.1.1
\programname startswith bcd log to /bcd
--programname contains 'bde' log to @1.1.1.1
--MSG startswith '123' log to/bcd
--MSG contains 'error2' log to /bcd
--log to /var/log/catchall
3-4,7
--programname startswith abc log to /abc
\programname startswith 'acc' and MSG contains 'error' log to /acc
5-23,0-6
--programname startswith abc log to /abc
| programname startswith 'acc' and MSG contains 'error' log to /acc
\programname startswith 'acc' log to @1.1.1.1
--programname contains 'bde' log to @1.1.1.1
--MSG startswith '123' log to/bcd
--MSG contains 'error2' log to /bcd
--log to /var/log/catchall
5-23,7
--programname startswith abc log to /abc
\programname startswith 'acc' and MSG contains 'error' log to /acc
This looks very ugly and is definantly more complicated, but I think it
still ends up being a slight (but not drastic) win.
for the initial ruleset you have to check a minimum of 4 tests and a
maximum of 13 tests (debug messages have 4-5 tests, all others have 12-13)
Assuming the facility/priority lookup can be a table lookup, for the
tree-based ruleset you have a minimum of 2 tests and a maximum of 7 tests,
in each case one of the tests is a multi-option comparison which is
slightly more expensive than a single 'startswith' test, but should be
cheaper than two 'startswith' tests.
This definantly trades off start time complexity for runtime efficiancy.
David Lang
and here is the long, ugly process I went through to derive this.
first normalize the tests (replace 'or' with two lines, change facility
and priority to ranges of numbers)
facility = 2 log to /Var/log/mail.log
programname startswith abc log to /abc
programname startswith 'acc' and MSG contains 'error' log to /acc
severity 0..6 and programname startswith 'acc' log to @1.1.1.1
severity 0..6 and programname startswith bcd log to /bcd
facility 2..4 and severity 0..6 and programname contains 'bde' log to @1.1.1.1
severity 0..6 and MSG startswith '123' log to/bcd
severity 0..6 and MSG contains 'error2' log to /bcd
severity 0..6 and log to /var/log/catchall
I then add the full facility/priority ranges
now normalize the facility/severity tests (split overlaps)
we have the following ranges for facility
0-1
2
3-4
5-23
and for severity
0-6
7
to shorten the lines in the e-mail I will letter the remaining
tests/actions
A) log to /Var/log/mail.log
B) programname startswith abc log to /abc
C) programname startswith 'acc' and MSG contains 'error' log to /acc
D) programname startswith 'acc' log to @1.1.1.1
E) programname startswith bcd log to /bcd
F) programname contains 'bde' log to @1.1.1.1
G) MSG startswith '123' log to/bcd
H) MSG contains 'error2' log to /bcd
I) log to /var/log/catchall
so the tree then becomes (straight translation no reordering)
2,0-6 A
2,7 A
0-1,0-6 B
0-1,7 B
2,0-6 B
2,7 B
3-4,0-6 B
3-4,7 B
5-23,0-6 B
5-23,7 B
0-1,0-6 C
0-1,7 C
2,0-6 C
2,7 C
3-4,0-6 C
3-4,7 C
5-23,0-6 C
5-23,7 C
0-1,0-6 D
2,0-6 D
3-4,0-6 D
5-23,0-6 D
2,0-6 E
3-4,0-6 E
0-1,0-6 F
2,0-6 F
3-4,0-6 F
5-23,0-6 F
0-1,0-6 G
2,0-6 G
3-4,0-6 G
5-23,0-6 G
0-1,0-6 H
2,0-6 H
3-4,0-6 H
5-23,0-6 H
0-1,0-6 I
2,0-6 I
3-4,0-6 I
5-23,0-6 I
now to order them
0-1,0-6 B
0-1,0-6 C
0-1,0-6 D
0-1,0-6 F
0-1,0-6 G
0-1,0-6 H
0-1,0-6 I
0-1,7 B
0-1,7 C
2,0-6 A
2,0-6 B
2,0-6 I
2,0-6 H
2,0-6 G
2,0-6 F
2,0-6 E
2,0-6 D
2,0-6 C
2,7 A
2,7 B
2,7 C
3-4,0-6 B
3-4,0-6 C
3-4,0-6 D
3-4,0-6 E
3-4,0-6 F
3-4,0-6 G
3-4,0-6 H
3-4,0-6 I
3-4,7 B
3-4,7 C
5-23,0-6 B
5-23,0-6 C
5-23,0-6 D
5-23,0-6 F
5-23,0-6 G
5-23,0-6 H
5-23,0-6 I
5-23,7 B
5-23,7 C
I then simplify this to
0-1,0-6 B,C,D,F,G,H,I
0-1,7 B,C
2,0-6 A,B,C,D,E,F,G,H,I
2,7 A,B,C
3-4,0-6 B,C,D,E,F,G,H,I
3-4,7 B,C
5-23,0-6 B,C,D,F,G,H,I
5-23,7 B,C
looking at the lettered tests
A) log to /Var/log/mail.log
B) programname startswith abc log to /abc
C) programname startswith 'acc' and MSG contains 'error' log to /acc
D) programname startswith 'acc' log to @1.1.1.1
E) programname startswith bcd log to /bcd
F) programname contains 'bde' log to @1.1.1.1
G) MSG startswith '123' log to/bcd
H) MSG contains 'error2' log to /bcd
I) log to /var/log/catchall
the only ones that can be combined are B-E
so if you make tests
J) B,C
K) B,C,D
L) B,C,D,E
the results simplify to
0-1,0-6 K,F,G,H,I
0-1,7 J
2,0-6 A,L,F,G,H,I
2,7 A,J
3-4,0-6 L,F,G,H,I
3-4,7 J
5-23,0-6 K,F,G,H,I
5-23,7 J
expanded out to show all tests
0-1,0-6
--programname startswith abc log to /abc
| programname startswith 'acc' and MSG contains 'error' log to /acc
\programname startswith 'acc' log to @1.1.1.1
--programname contains 'bde' log to @1.1.1.1
--MSG startswith '123' log to/bcd
--MSG contains 'error2' log to /bcd
--log to /var/log/catchall
0-1,7
--programname startswith abc log to /abc
\programname startswith 'acc' and MSG contains 'error' log to /acc
2,0-6
--log to /var/log/mail.log
--programname startswith abc log to /abc
| programname startswith 'acc' and MSG contains 'error' log to /acc
| programname startswith 'acc' log to @1.1.1.1
\programname startswith bcd log to /bcd
--programname contains 'bde' log to @1.1.1.1
--MSG startswith '123' log to/bcd
--MSG contains 'error2' log to /bcd
--log to /var/log/catchall
2,7
--log to /var/log/mail.log
--programname startswith abc log to /abc
\programname startswith 'acc' and MSG contains 'error' log to /acc
3-4,0-6
--programname startswith abc log to /abc
| programname startswith 'acc' and MSG contains 'error' log to /acc
| programname startswith 'acc' log to @1.1.1.1
\programname startswith bcd log to /bcd
--programname contains 'bde' log to @1.1.1.1
--MSG startswith '123' log to/bcd
--MSG contains 'error2' log to /bcd
--log to /var/log/catchall
3-4,7
--programname startswith abc log to /abc
\programname startswith 'acc' and MSG contains 'error' log to /acc
5-23,0-6
--programname startswith abc log to /abc
| programname startswith 'acc' and MSG contains 'error' log to /acc
\programname startswith 'acc' log to @1.1.1.1
--programname contains 'bde' log to @1.1.1.1
--MSG startswith '123' log to/bcd
--MSG contains 'error2' log to /bcd
--log to /var/log/catchall
5-23,7
--programname startswith abc log to /abc
\programname startswith 'acc' and MSG contains 'error' log to /acc
> Rainer
>> -----Original Message-----
>> From: [email protected] [mailto:rsyslog-
>> [email protected]] On Behalf Of [email protected]
>> Sent: Sunday, February 28, 2010 2:02 PM
>> To: rsyslog-users
>> Subject: Re: [rsyslog] Feedback requested: inconsistency in config
>> statements
>>
>> On Sun, 28 Feb 2010, Rainer Gerhards wrote:
>>
>>> Quickly from the mobile: pri based filters require *excatly* one
>> single word (!) lookup currently. You can't beat that.
>>>
>>> I am not totally sure how you think about parse trees. In my view,
>> parsin and filteing are two different things./stages. Maybe we have
>> different concepts on our minds?
>>
>> I am probably using the wrong terminology here.
>>
>> say you have a set of rules that say
>>
>> if programname startswith abc log to /abc
>> if programname startswith acc log to /acc
>> if programname startswith acc log to @1.1.1.1
>> if programname startswith bcd log to /bcd
>>
>> and assuming that these were the only possible programnames for
>> simplicity
>> in the explination.
>>
>> the filtering logic would go somthing like this
>>
>> the rules would compile into a tree
>>
>> -a-b -> /abc
>> -c -> /acc,@1.1.1.1
>> -b -> /bcd
>>
>> receive programname abc
>> I have no facility/Pri filter rules
>> I have no time filter rules
>> I have no system filter rules
>> I have programname filter rules
>> look at the first character of the programname
>> it's "a", I have more than one rule that fits that.
>> look at the second character of the programname
>> it's a "b", There are no other decisions to make,
>> invoke the action /abc
>>
>> receive progranmane bcd
>> I have no facility/PRI filter rules
>> I have no time filter rules
>> I have no system filter rules
>> I have programname filter rules
>> look at the first character of the programname
>> it's a "b", There are no other decisions to make,
>> invoke the action /bcd
>>
>> receive programname acc
>> I have no facility/Pri filter rules
>> I have no time filter rules
>> I have no system filter rules
>> I have programname filter rules
>> look at the first character of the programname
>> it's "a", I have more than one rule that fits that.
>> look at the second character of the programname
>> it's a "c", There are no other decisions to make,
>> invoke the action /acc and the action @1.1.1.1
>>
>> similarly the facility/pri filtering rules would be either compiled
>> into a
>> tree, or (since it's 256 entries total) into a lookup table with
>> pointers
>> to the list of actions to take for that entry
>>
>> the idea is to spend the time at startup to create the tree that
>> represents the ruleset in order to speed up the processing of each
>> individual message.
>>
>> the real tree would be a bit more complicated than I describe here as
>> it
>> needs to have 'anything else' entries between the known entries (which
>> means that it would not be able to shortcut quite as much as I
>> describe),
>> and have provision for 'do a more complicated thing (like regex) here
>> and
>> if it matches continue'. but except for regex matches, this would make
>> processing the rules pretty much independant of how many rules there
>> were
>> or how complex the ruleset is.
>>
>> there doe snot need to be one single programname filter tree, if you
>> had a
>> rule that said
>> if pri = info and programname startswith def then def
>>
>> the pri tree/table would have an entry for info that would point to a
>> filter tree for programname that just had the check for def in it
>>
>> am I makeing any more sense now?
>>
>> David Lang
>>
>>> rainer
>>>
>>> ----- Urspr?ngliche Nachricht -----
>>> Von: "[email protected]" <[email protected]>
>>> An: "rsyslog-users" <[email protected]>
>>> Gesendet: 28.02.10 11:28
>>> Betreff: Re: [rsyslog] Feedback requested: inconsistency in config
>> statements
>>>
>>> On Sun, 28 Feb 2010, Rainer Gerhards wrote:
>>>
>>>>>
>>>>> if the type is depriciated then the destination would be another
>>>>> config_option (which you would then look up)
>>>>>
>>>>> if the type is 'ignore' then it would just be skipped over
>> (possibly
>>>>> with
>>>>> a warning being logged that the config line is being ignored)
>>>>>
>>>>> this would also clean up some of the current cases where a boolean
>>>>> option
>>>>> doesn't honor on/off and true/false.
>>>>
>>>> True/false is NOT a valid boolean option. See
>>>>
>>>>
>> http://git.adiscon.com/?p=rsyslog.git;a=blob;f=runtime/cfsysline.c;h=18
>> 4c0d87
>>>> 47c5dcbbd8230782ca096e477738efa9;hb=HEAD#l308
>>>
>>> I was not remembering correctly then, maby it was yes/no vs on/off. I
>> know
>>> I ran into something where what was documented didn't work and I
>> changed
>>> it to another version of a boolean answer and it fixed the problem.
>>>
>>>>> For the second half of the config language (telling rsyslog what to
>> do
>>>>> with the logs it has received) also has several variations in place
>> and
>>>>> is
>>>>> showing issues.
>>>>>
>>>>> However I think that the right answer here is to end up
>> implementing
>>>>> something like the parsing trees and then compile the other config
>>>>> language options into that tree to be consistant (and fast) under
>> the
>>>>> covers, no matter which way the instructions are written (except
>> when
>>>>> you
>>>>> have to use regex matches)
>>>>
>>>> I don't fully agree here with you. For example, the traditional PRI
>> based
>>>> filter is lightning fast. I don't see any way it could be nearly as
>> fast with
>>>> any unified approach. Right now, we have a "unified" filter
>> structure, but it
>>>> contains three filter cases, each one adding potential power at the
>> price of
>>>> decreased speed. I think we need to keep different types of filters
>> in order
>>>> to have some lightning-fast ones. But more of this could be done
>> under the
>>>> hood.
>>>
>>> My guess/expectation is that the tree-based parsing would be about
>> the
>>> same speed as the traditional PRI based filter for choices made based
>> on
>>> PRI, slowing down for other types of conditions only in that more of
>> the
>>> message may need to be scanned (and if there is no selection at a
>> level,
>>> skipping that level should be lightning fast). As a result, a set of
>>> filteres based soley on programname would be as fast as the current
>> PRI
>>> filters.
>>>
>>> David Lang
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com