I haven't seen the reordering code yet, but the loading does preserve order.
It still is deterministic, just that the criteria is rule-order (and it being applicable only for field-subtrees makes it slightly odd). On Thu, Mar 12, 2015 at 10:55 PM, Rainer Gerhards <[email protected]> wrote: > 2015-03-12 18:16 GMT+01:00 singh.janmejay <[email protected]>: > >> On Thu, Mar 12, 2015 at 9:29 PM, Rainer Gerhards >> <[email protected]> wrote: >> > 2015-03-12 16:41 GMT+01:00 David Lang <[email protected]>: >> > >> >> On Thu, 12 Mar 2015, Rainer Gerhards wrote: >> >> >> >> 2015-03-12 5:55 GMT+01:00 singh.janmejay <[email protected]>: >> >>> >> >>> On Thu, Mar 12, 2015 at 9:19 AM, David Lang <[email protected]> wrote: >> >>>> >> >>>>> On Thu, 12 Mar 2015, singh.janmejay wrote: >> >>>>> >> >>>>> Tried re-ordering it? Put the one with /port first? >> >>>>>> >> >>>>> >> >>>>> >> >>>>> no, lognorm rules are not supposed to be order dependent, so I didn't >> >>>>> try >> >>>>> that (especially after finding things failing to parse with rsyslog >> that >> >>>>> worked manually) >> >>>>> >> >>>> >> >>>> In case of input strings being matching-rule-wise disjoint, you are >> >>>> right, order won't matter. But when they are not disjoint, order does >> >>>> matter, because the first one to match the string wins. >> >>>> >> >>>> Consider this rulebase: >> >>>> rule=:%ip:ipv4%%last:rest% >> >>>> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number% >> >>>> >> >>>> If you write it the way I have above, you'll end up matching first >> >>>> rule for input 10.20.30.40/5 >> >>>> >> >>>> But if you write it this way: >> >>>> rule=:%ip:ipv4%%junk:char-sep:/%/%port:number% >> >>>> rule=:%ip:ipv4%%last:rest% >> >>>> >> >>>> You'll end up matching the first one. >> >>>> >> >>>> >> >>> This shouldn't happen. The theory is: >> >>> >> >>> Let i be the current index to be looked at at the line. If for i a >> parser >> >>> is selected, parsers shall be tried first (in theory, according to >> parser >> >>> ordering, but I think this is not yet fully implemented). If a parser >> >>> fits, >> >>> processing is advanced to next tree node. >> >>> >> >>> If the node at i does not have a parser (or all parsers failed, I think >> >>> [but not sure]), advance to next node basded on character match. >> >> This is precisely what it does. >> >> >>> >> >>> The order of apperance of rules inside the rulebase should not affect >> >>> this. >> >> It doesn't for litteral-subtree, but it does for field-subtree, >> because they are inserted at the tail of the linked-list. >> >> This code ( >> https://github.com/rsyslog/liblognorm/blob/master/src/ptree.c#L394) >> adds new subtrees at the end of linked-list, which is what causes the >> ordering-sensitive behaviour. >> >> > OK, it seems like I overlooked this effect. I don't think it is good to > have any order dependence. Anyways, the work I am carrying out will most > probably lead to algorithmic changes and I'll re-evaluate that when I reach > that point (not soon). Of course, I won't break anything that exists. If > things diverge too much, I'll add an alternate library,. But again, this > needs to be seen and it is too early to think about this, > > On the ordering issue: are you sure that the order is always properly > preserved? I never put any effort into it (as order was designed > irrelevant) and some reodering (IIRC) happens intentionally (parser > priorities). > > Rainer > > >> >>> If it does, it's either not yet implemented or a bug. this is also why >> I >> >>> don't like the "rest" syntax -it always matches and thus terminates >> >>> interpretation. >> >>> >> >> >> >> I'll post a simple test case when I get into the office in a bit. >> >> >> >> In this particular case, it's failing to check other parsers when it >> hits >> >> a failure and backs up. >> >> >> >> But there are other cases where multiple rules may match. stringto, >> rest, >> > >> > >> > word, stringto are "last resort parsers", to be used only if anything >> else >> > fails. >> > rest IMHO should never be used, but I think I can propose something in >> the >> > future that solves the need that comes with it (if there still is a need >> at >> > that point). >> > >> > >> >> iptables >> > >> > >> > iptables is a different story, it's actually for a different type of >> logs - >> > at least I think so now. I am unfortunately not prepared to discuss this >> > right now, as I want to keep concentrated on the log structure analyzer. >> It >> > doesn't help if I do a bit of everything without anything ever nearing >> > completion ;) >> > >> > >> >> are all things that can easily match a lot of data where other rules may >> >> also match by having more specific listings. In such cases it should >> still >> >> be deterministing which rule 'wins'. I can think of a few ways to define >> >> this. >> >> >> >> 1. fewest parsers needed wins >> >> >> >> 2. most parsers needed wins >> >> This is probably the closest simple approximation to best match. >> >> I was thinking about this too. >> >> >> >> >> 3. ordering of parsers, where the 'greedier' ones are put last so they >> >> only come into play if the more specific ones don't match. >> >> We could assist it by setting relative weights etc. Eg. ipv4 gets >> weight 10, but rest gets only 1 etc. >> >> Once we get the coefficients right, this can probably be achieved(its >> like a costing-based picker, run once ptree has been loaded to sort >> all subtree lists by cost in one shot). >> >> >> >> >> >> > That's the designed approach, and I am very sure it's the right one. As I >> > said, it's at least not fully implemented. >> > >> > This also means we need many more specific parsers. I never get there, >> > because of a) time shortage and b) lack of sufficient log samples. Where >> > log samples is not a single line or two, but at least several thousands, >> so >> > that I can evaluate false positives. While b) is still a very big problem >> > to me, a) has been much relaxed thanks to the thesis work. Also, work on >> > the semi-automatic rule creator looks promising. As it is a heuristic, >> the >> > lack of log samples unfortunately is a very large hindering block. >> > >> > Rainer >> > _______________________________________________ >> > rsyslog mailing list >> > http://lists.adiscon.net/mailman/listinfo/rsyslog >> > http://www.rsyslog.com/professional-services/ >> > What's up with rsyslog? Follow https://twitter.com/rgerhards >> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> DON'T LIKE THAT. >> >> >> >> -- >> Regards, >> Janmejay >> http://codehunk.wordpress.com >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> DON'T LIKE THAT. >> > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > LIKE THAT. -- Regards, Janmejay http://codehunk.wordpress.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

