On Mon, 1 Mar 2010, Rainer Gerhards wrote:

> Hi David,
>
> thanks for the explanation, I now see where you are heading too. I have not
> dug into all details, but find the post very interesting and insightful. I'll
> archive it so that I have it at hand when I can take the route to actually
> implement such an approach. All in all, it still looks rather complex to me.
> I guess it would be useful to have a guy with experience in writing compiler
> code optimizers at hand when I tackle that beast ;)

you may also want to dig into yacc/bison as they are designed to help 
people build parser trees.

I am assuming that this is the type of thing that syslog-ng is doing to be 
able to efficiantly handle a lot of log processing.

I currently have a script that looks for a couple hundred programnames and 
puts the data into ~50 different files depending on what that name is. 
Given that the output side of rsyslog is currently the bottleneck, I would 
not want to try and implement that in rsyslog today, but if rsyslog gained 
the ability to handle this sort of ruleset efficiantly it would greatly 
simplify my life.

David Lang

> Rainer
>
>> -----Original Message-----
>> From: [email protected] [mailto:rsyslog-
>> [email protected]] On Behalf Of [email protected]
>> Sent: Monday, March 01, 2010 3:16 AM
>> To: rsyslog-users
>> Subject: Re: [rsyslog] Feedback requested: inconsistency in config
>> statements
>>
>> On Sun, 28 Feb 2010, Rainer Gerhards wrote:
>>
>>> Hi David,
>>>
>>> yes, that makes much more sense to me. Let's call that filter
>> expression tree
>>> for the time being (the parse tree used for normalization has some
>> similar
>>> ideas, but a totally different scope and semantics).
>>>
>>> While I now see where you are heading, I fail to see how this could
>>> sufficiently easy be implemented for a real (and complex)
>> configuration.
>>>
>>> Let's look at this example conf:
>>>
>>> mail.* /Var/log/mail.log
>>> if programname startswith abc log to /abc
>>> if programname startswith 'acc' and MSG contains 'error' log to /acc
>>> # we don't want debug messages from here on
>>> *.debug ~
>>> if programname startswith 'acc' or (programname contains 'bde' and
>> (facility
>>>> 1 and
>>> facility < 5)) log to @1.1.1.1
>>> if programname startswith bcd or MSG startswith '123' or MSG contains
>>> 'error2' log to /bcd
>>> *.* /var/log/catchall
>>>
>>> I simply cannot envision how you would store this as a tree that does
>> not
>>> look like what we have today.
>>
>> Ok, Here is what I came up with.
>>
>> 0-1,0-6
>>    --programname startswith abc log to /abc
>>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>>     \programname startswith 'acc' log to @1.1.1.1
>>    --programname contains 'bde' log to @1.1.1.1
>>    --MSG startswith '123' log to/bcd
>>    --MSG contains 'error2' log to /bcd
>>    --log to /var/log/catchall
>> 0-1,7
>>    --programname startswith abc log to /abc
>>     \programname startswith 'acc' and MSG contains 'error' log to /acc
>> 2,0-6
>>    --log to /var/log/mail.log
>>    --programname startswith abc log to /abc
>>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>>    | programname startswith 'acc' log to @1.1.1.1
>>     \programname startswith bcd log to /bcd
>>    --programname contains 'bde' log to @1.1.1.1
>>    --MSG startswith '123' log to/bcd
>>    --MSG contains 'error2' log to /bcd
>>    --log to /var/log/catchall
>> 2,7
>>    --log to /var/log/mail.log
>>    --programname startswith abc log to /abc
>>     \programname startswith 'acc' and MSG contains 'error' log to /acc
>> 3-4,0-6
>>    --programname startswith abc log to /abc
>>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>>    | programname startswith 'acc' log to @1.1.1.1
>>     \programname startswith bcd log to /bcd
>>    --programname contains 'bde' log to @1.1.1.1
>>    --MSG startswith '123' log to/bcd
>>    --MSG contains 'error2' log to /bcd
>>    --log to /var/log/catchall
>> 3-4,7
>>    --programname startswith abc log to /abc
>>     \programname startswith 'acc' and MSG contains 'error' log to /acc
>> 5-23,0-6
>>    --programname startswith abc log to /abc
>>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>>     \programname startswith 'acc' log to @1.1.1.1
>>    --programname contains 'bde' log to @1.1.1.1
>>    --MSG startswith '123' log to/bcd
>>    --MSG contains 'error2' log to /bcd
>>    --log to /var/log/catchall
>> 5-23,7
>>    --programname startswith abc log to /abc
>>     \programname startswith 'acc' and MSG contains 'error' log to /acc
>>
>> This looks very ugly and is definantly more complicated, but I think it
>> still ends up being a slight (but not drastic) win.
>>
>> for the initial ruleset you have to check a minimum of 4 tests and a
>> maximum of 13 tests (debug messages have 4-5 tests, all others have 12-
>> 13)
>>
>> Assuming the facility/priority lookup can be a table lookup, for the
>> tree-based ruleset you have a minimum of 2 tests and a maximum of 7
>> tests,
>> in each case one of the tests is a multi-option comparison which is
>> slightly more expensive than a single 'startswith' test, but should be
>> cheaper than two 'startswith' tests.
>>
>> This definantly trades off start time complexity for runtime
>> efficiancy.
>>
>> David Lang
>>
>>
>> and here is the long, ugly process I went through to derive this.
>>
>>
>>
>> first normalize the tests (replace 'or' with two lines, change facility
>> and priority to ranges of numbers)
>>
>> facility = 2 log to /Var/log/mail.log
>> programname startswith abc log to /abc
>> programname startswith 'acc' and MSG contains 'error' log to /acc
>> severity 0..6 and programname startswith 'acc' log to @1.1.1.1
>> severity 0..6 and programname startswith bcd log to /bcd
>> facility 2..4 and severity 0..6 and programname contains 'bde' log to
>> @1.1.1.1
>> severity 0..6 and MSG startswith '123' log to/bcd
>> severity 0..6 and MSG contains 'error2' log to /bcd
>> severity 0..6 and log to /var/log/catchall
>>
>> I then add the full facility/priority ranges
>>
>>
>> now normalize the facility/severity tests (split overlaps)
>>
>> we have the following ranges for facility
>> 0-1
>> 2
>> 3-4
>> 5-23
>>
>> and for severity
>> 0-6
>> 7
>>
>> to shorten the lines in the e-mail I will letter the remaining
>> tests/actions
>>
>> A) log to /Var/log/mail.log
>> B) programname startswith abc log to /abc
>> C) programname startswith 'acc' and MSG contains 'error' log to /acc
>> D) programname startswith 'acc' log to @1.1.1.1
>> E) programname startswith bcd log to /bcd
>> F) programname contains 'bde' log to @1.1.1.1
>> G) MSG startswith '123' log to/bcd
>> H) MSG contains 'error2' log to /bcd
>> I) log to /var/log/catchall
>>
>> so the tree then becomes (straight translation no reordering)
>>
>> 2,0-6 A
>> 2,7 A
>> 0-1,0-6 B
>> 0-1,7 B
>> 2,0-6 B
>> 2,7 B
>> 3-4,0-6 B
>> 3-4,7 B
>> 5-23,0-6 B
>> 5-23,7 B
>> 0-1,0-6 C
>> 0-1,7 C
>> 2,0-6 C
>> 2,7 C
>> 3-4,0-6 C
>> 3-4,7 C
>> 5-23,0-6 C
>> 5-23,7 C
>> 0-1,0-6 D
>> 2,0-6 D
>> 3-4,0-6 D
>> 5-23,0-6 D
>> 2,0-6 E
>> 3-4,0-6 E
>> 0-1,0-6 F
>> 2,0-6 F
>> 3-4,0-6 F
>> 5-23,0-6 F
>> 0-1,0-6 G
>> 2,0-6 G
>> 3-4,0-6 G
>> 5-23,0-6 G
>> 0-1,0-6 H
>> 2,0-6 H
>> 3-4,0-6 H
>> 5-23,0-6 H
>> 0-1,0-6 I
>> 2,0-6 I
>> 3-4,0-6 I
>> 5-23,0-6 I
>>
>> now to order them
>>
>> 0-1,0-6 B
>> 0-1,0-6 C
>> 0-1,0-6 D
>> 0-1,0-6 F
>> 0-1,0-6 G
>> 0-1,0-6 H
>> 0-1,0-6 I
>> 0-1,7 B
>> 0-1,7 C
>> 2,0-6 A
>> 2,0-6 B
>> 2,0-6 I
>> 2,0-6 H
>> 2,0-6 G
>> 2,0-6 F
>> 2,0-6 E
>> 2,0-6 D
>> 2,0-6 C
>> 2,7 A
>> 2,7 B
>> 2,7 C
>> 3-4,0-6 B
>> 3-4,0-6 C
>> 3-4,0-6 D
>> 3-4,0-6 E
>> 3-4,0-6 F
>> 3-4,0-6 G
>> 3-4,0-6 H
>> 3-4,0-6 I
>> 3-4,7 B
>> 3-4,7 C
>> 5-23,0-6 B
>> 5-23,0-6 C
>> 5-23,0-6 D
>> 5-23,0-6 F
>> 5-23,0-6 G
>> 5-23,0-6 H
>> 5-23,0-6 I
>> 5-23,7 B
>> 5-23,7 C
>>
>> I then simplify this to
>>
>> 0-1,0-6 B,C,D,F,G,H,I
>> 0-1,7 B,C
>> 2,0-6 A,B,C,D,E,F,G,H,I
>> 2,7 A,B,C
>> 3-4,0-6 B,C,D,E,F,G,H,I
>> 3-4,7 B,C
>> 5-23,0-6 B,C,D,F,G,H,I
>> 5-23,7 B,C
>>
>> looking at the lettered tests
>>
>> A) log to /Var/log/mail.log
>> B) programname startswith abc log to /abc
>> C) programname startswith 'acc' and MSG contains 'error' log to /acc
>> D) programname startswith 'acc' log to @1.1.1.1
>> E) programname startswith bcd log to /bcd
>> F) programname contains 'bde' log to @1.1.1.1
>> G) MSG startswith '123' log to/bcd
>> H) MSG contains 'error2' log to /bcd
>> I) log to /var/log/catchall
>>
>> the only ones that can be combined are B-E
>>
>> so if you make tests
>>
>> J) B,C
>> K) B,C,D
>> L) B,C,D,E
>>
>> the results simplify to
>>
>> 0-1,0-6 K,F,G,H,I
>> 0-1,7 J
>> 2,0-6 A,L,F,G,H,I
>> 2,7 A,J
>> 3-4,0-6 L,F,G,H,I
>> 3-4,7 J
>> 5-23,0-6 K,F,G,H,I
>> 5-23,7 J
>>
>> expanded out to show all tests
>>
>> 0-1,0-6
>>    --programname startswith abc log to /abc
>>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>>     \programname startswith 'acc' log to @1.1.1.1
>>    --programname contains 'bde' log to @1.1.1.1
>>    --MSG startswith '123' log to/bcd
>>    --MSG contains 'error2' log to /bcd
>>    --log to /var/log/catchall
>> 0-1,7
>>    --programname startswith abc log to /abc
>>     \programname startswith 'acc' and MSG contains 'error' log to /acc
>> 2,0-6
>>    --log to /var/log/mail.log
>>    --programname startswith abc log to /abc
>>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>>    | programname startswith 'acc' log to @1.1.1.1
>>     \programname startswith bcd log to /bcd
>>    --programname contains 'bde' log to @1.1.1.1
>>    --MSG startswith '123' log to/bcd
>>    --MSG contains 'error2' log to /bcd
>>    --log to /var/log/catchall
>> 2,7
>>    --log to /var/log/mail.log
>>    --programname startswith abc log to /abc
>>     \programname startswith 'acc' and MSG contains 'error' log to /acc
>> 3-4,0-6
>>    --programname startswith abc log to /abc
>>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>>    | programname startswith 'acc' log to @1.1.1.1
>>     \programname startswith bcd log to /bcd
>>    --programname contains 'bde' log to @1.1.1.1
>>    --MSG startswith '123' log to/bcd
>>    --MSG contains 'error2' log to /bcd
>>    --log to /var/log/catchall
>> 3-4,7
>>    --programname startswith abc log to /abc
>>     \programname startswith 'acc' and MSG contains 'error' log to /acc
>> 5-23,0-6
>>    --programname startswith abc log to /abc
>>    | programname startswith 'acc' and MSG contains 'error' log to /acc
>>     \programname startswith 'acc' log to @1.1.1.1
>>    --programname contains 'bde' log to @1.1.1.1
>>    --MSG startswith '123' log to/bcd
>>    --MSG contains 'error2' log to /bcd
>>    --log to /var/log/catchall
>> 5-23,7
>>    --programname startswith abc log to /abc
>>     \programname startswith 'acc' and MSG contains 'error' log to /acc
>>
>>> Rainer
>>>> -----Original Message-----
>>>> From: [email protected] [mailto:rsyslog-
>>>> [email protected]] On Behalf Of [email protected]
>>>> Sent: Sunday, February 28, 2010 2:02 PM
>>>> To: rsyslog-users
>>>> Subject: Re: [rsyslog] Feedback requested: inconsistency in config
>>>> statements
>>>>
>>>> On Sun, 28 Feb 2010, Rainer Gerhards wrote:
>>>>
>>>>> Quickly from the mobile: pri based filters require *excatly* one
>>>> single word (!) lookup currently. You can't beat that.
>>>>>
>>>>> I am not totally sure how you think about parse trees. In my view,
>>>> parsin and filteing are two different things./stages. Maybe we have
>>>> different concepts on our minds?
>>>>
>>>> I am probably using the wrong terminology here.
>>>>
>>>> say you have a set of rules that say
>>>>
>>>> if programname startswith abc log to /abc
>>>> if programname startswith acc log to /acc
>>>> if programname startswith acc log to @1.1.1.1
>>>> if programname startswith bcd log to /bcd
>>>>
>>>> and assuming that these were the only possible programnames for
>>>> simplicity
>>>> in the explination.
>>>>
>>>> the filtering logic would go somthing like this
>>>>
>>>> the rules would compile into a tree
>>>>
>>>> -a-b -> /abc
>>>>    -c -> /acc,@1.1.1.1
>>>> -b   -> /bcd
>>>>
>>>> receive programname abc
>>>> I have no facility/Pri filter rules
>>>> I have no time filter rules
>>>> I have no system filter rules
>>>> I have programname filter rules
>>>>    look at the first character of the programname
>>>>     it's "a", I have more than one rule that fits that.
>>>>      look at the second character of the programname
>>>>       it's a "b", There are no other decisions to make,
>>>>         invoke the action /abc
>>>>
>>>> receive progranmane bcd
>>>> I have no facility/PRI filter rules
>>>> I have no time filter rules
>>>> I have no system filter rules
>>>> I have programname filter rules
>>>>    look at the first character of the programname
>>>>     it's a "b", There are no other decisions to make,
>>>>       invoke the action /bcd
>>>>
>>>> receive programname acc
>>>> I have no facility/Pri filter rules
>>>> I have no time filter rules
>>>> I have no system filter rules
>>>> I have programname filter rules
>>>>    look at the first character of the programname
>>>>     it's "a", I have more than one rule that fits that.
>>>>      look at the second character of the programname
>>>>       it's a "c", There are no other decisions to make,
>>>>         invoke the action /acc and the action @1.1.1.1
>>>>
>>>> similarly the facility/pri filtering rules would be either compiled
>>>> into a
>>>> tree, or (since it's 256 entries total) into a lookup table with
>>>> pointers
>>>> to the list of actions to take for that entry
>>>>
>>>> the idea is to spend the time at startup to create the tree that
>>>> represents the ruleset in order to speed up the processing of each
>>>> individual message.
>>>>
>>>> the real tree would be a bit more complicated than I describe here
>> as
>>>> it
>>>> needs to have 'anything else' entries between the known entries
>> (which
>>>> means that it would not be able to shortcut quite as much as I
>>>> describe),
>>>> and have provision for 'do a more complicated thing (like regex)
>> here
>>>> and
>>>> if it matches continue'. but except for regex matches, this would
>> make
>>>> processing the rules pretty much independant of how many rules there
>>>> were
>>>> or how complex the ruleset is.
>>>>
>>>> there doe snot need to be one single programname filter tree, if you
>>>> had a
>>>> rule that said
>>>> if pri = info and programname startswith def then def
>>>>
>>>> the pri tree/table would have an entry for info that would point to
>> a
>>>> filter tree for programname that just had the check for def in it
>>>>
>>>> am I makeing any more sense now?
>>>>
>>>> David Lang
>>>>
>>>>> rainer
>>>>>
>>>>> ----- Urspr?ngliche Nachricht -----
>>>>> Von: "[email protected]" <[email protected]>
>>>>> An: "rsyslog-users" <[email protected]>
>>>>> Gesendet: 28.02.10 11:28
>>>>> Betreff: Re: [rsyslog] Feedback requested: inconsistency in config
>>>> statements
>>>>>
>>>>> On Sun, 28 Feb 2010, Rainer Gerhards wrote:
>>>>>
>>>>>>>
>>>>>>> if the type is depriciated then the destination would be another
>>>>>>> config_option (which you would then look up)
>>>>>>>
>>>>>>> if the type is 'ignore' then it would just be skipped over
>>>> (possibly
>>>>>>> with
>>>>>>> a warning being logged that the config line is being ignored)
>>>>>>>
>>>>>>> this would also clean up some of the current cases where a
>> boolean
>>>>>>> option
>>>>>>> doesn't honor on/off and true/false.
>>>>>>
>>>>>> True/false is NOT a valid boolean option. See
>>>>>>
>>>>>>
>>>>
>> http://git.adiscon.com/?p=rsyslog.git;a=blob;f=runtime/cfsysline.c;h=18
>>>> 4c0d87
>>>>>> 47c5dcbbd8230782ca096e477738efa9;hb=HEAD#l308
>>>>>
>>>>> I was not remembering correctly then, maby it was yes/no vs on/off.
>> I
>>>> know
>>>>> I ran into something where what was documented didn't work and I
>>>> changed
>>>>> it to another version of a boolean answer and it fixed the problem.
>>>>>
>>>>>>> For the second half of the config language (telling rsyslog what
>> to
>>>> do
>>>>>>> with the logs it has received) also has several variations in
>> place
>>>> and
>>>>>>> is
>>>>>>> showing issues.
>>>>>>>
>>>>>>> However I think that the right answer here is to end up
>>>> implementing
>>>>>>> something like the parsing trees and then compile the other
>> config
>>>>>>> language options into that tree to be consistant (and fast) under
>>>> the
>>>>>>> covers, no matter which way the instructions are written (except
>>>> when
>>>>>>> you
>>>>>>> have to use regex matches)
>>>>>>
>>>>>> I don't fully agree here with you. For example, the traditional
>> PRI
>>>> based
>>>>>> filter is lightning fast. I don't see any way it could be nearly
>> as
>>>> fast with
>>>>>> any unified approach. Right now, we have a "unified" filter
>>>> structure, but it
>>>>>> contains three filter cases, each one adding potential power at
>> the
>>>> price of
>>>>>> decreased speed. I think we need to keep different types of
>> filters
>>>> in order
>>>>>> to have some lightning-fast ones. But more of this could be done
>>>> under the
>>>>>> hood.
>>>>>
>>>>> My guess/expectation is that the tree-based parsing would be about
>>>> the
>>>>> same speed as the traditional PRI based filter for choices made
>> based
>>>> on
>>>>> PRI, slowing down for other types of conditions only in that more
>> of
>>>> the
>>>>> message may need to be scanned (and if there is no selection at a
>>>> level,
>>>>> skipping that level should be lightning fast). As a result, a set
>> of
>>>>> filteres based soley on programname would be as fast as the current
>>>> PRI
>>>>> filters.
>>>>>
>>>>> David Lang
>>>>> _______________________________________________
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com
>>>>> _______________________________________________
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com
>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com
>>>
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to