On Thu, Mar 12, 2015 at 9:29 PM, Rainer Gerhards <[email protected]> wrote: > 2015-03-12 16:41 GMT+01:00 David Lang <[email protected]>: > >> On Thu, 12 Mar 2015, Rainer Gerhards wrote: >> >> 2015-03-12 5:55 GMT+01:00 singh.janmejay <[email protected]>: >>> >>> On Thu, Mar 12, 2015 at 9:19 AM, David Lang <[email protected]> wrote: >>>> >>>>> On Thu, 12 Mar 2015, singh.janmejay wrote: >>>>> >>>>> Tried re-ordering it? Put the one with /port first? >>>>>> >>>>> >>>>> >>>>> no, lognorm rules are not supposed to be order dependent, so I didn't >>>>> try >>>>> that (especially after finding things failing to parse with rsyslog that >>>>> worked manually) >>>>> >>>> >>>> In case of input strings being matching-rule-wise disjoint, you are >>>> right, order won't matter. But when they are not disjoint, order does >>>> matter, because the first one to match the string wins. >>>> >>>> Consider this rulebase: >>>> rule=:%ip:ipv4%%last:rest% >>>> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number% >>>> >>>> If you write it the way I have above, you'll end up matching first >>>> rule for input 10.20.30.40/5 >>>> >>>> But if you write it this way: >>>> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number% >>>> rule=:%ip:ipv4%%last:rest% >>>> >>>> You'll end up matching the first one. >>>> >>>> >>> This shouldn't happen. The theory is: >>> >>> Let i be the current index to be looked at at the line. If for i a parser >>> is selected, parsers shall be tried first (in theory, according to parser >>> ordering, but I think this is not yet fully implemented). If a parser >>> fits, >>> processing is advanced to next tree node. >>> >>> If the node at i does not have a parser (or all parsers failed, I think >>> [but not sure]), advance to next node basded on character match.
This is precisely what it does. >>> >>> The order of apperance of rules inside the rulebase should not affect >>> this. It doesn't for litteral-subtree, but it does for field-subtree, because they are inserted at the tail of the linked-list. This code (https://github.com/rsyslog/liblognorm/blob/master/src/ptree.c#L394) adds new subtrees at the end of linked-list, which is what causes the ordering-sensitive behaviour. >>> If it does, it's either not yet implemented or a bug. this is also why I >>> don't like the "rest" syntax -it always matches and thus terminates >>> interpretation. >>> >> >> I'll post a simple test case when I get into the office in a bit. >> >> In this particular case, it's failing to check other parsers when it hits >> a failure and backs up. >> >> But there are other cases where multiple rules may match. stringto, rest, > > > word, stringto are "last resort parsers", to be used only if anything else > fails. > rest IMHO should never be used, but I think I can propose something in the > future that solves the need that comes with it (if there still is a need at > that point). > > >> iptables > > > iptables is a different story, it's actually for a different type of logs - > at least I think so now. I am unfortunately not prepared to discuss this > right now, as I want to keep concentrated on the log structure analyzer. It > doesn't help if I do a bit of everything without anything ever nearing > completion ;) > > >> are all things that can easily match a lot of data where other rules may >> also match by having more specific listings. In such cases it should still >> be deterministing which rule 'wins'. I can think of a few ways to define >> this. >> >> 1. fewest parsers needed wins >> >> 2. most parsers needed wins This is probably the closest simple approximation to best match. I was thinking about this too. >> >> 3. ordering of parsers, where the 'greedier' ones are put last so they >> only come into play if the more specific ones don't match. We could assist it by setting relative weights etc. Eg. ipv4 gets weight 10, but rest gets only 1 etc. Once we get the coefficients right, this can probably be achieved(its like a costing-based picker, run once ptree has been loaded to sort all subtree lists by cost in one shot). >> >> > That's the designed approach, and I am very sure it's the right one. As I > said, it's at least not fully implemented. > > This also means we need many more specific parsers. I never get there, > because of a) time shortage and b) lack of sufficient log samples. Where > log samples is not a single line or two, but at least several thousands, so > that I can evaluate false positives. While b) is still a very big problem > to me, a) has been much relaxed thanks to the thesis work. Also, work on > the semi-automatic rule creator looks promising. As it is a heuristic, the > lack of log samples unfortunately is a very large hindering block. > > Rainer > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > LIKE THAT. -- Regards, Janmejay http://codehunk.wordpress.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

