On Thu, 12 Mar 2015, singh.janmejay wrote:

I haven't seen the reordering code yet, but the loading does preserve order.

It still is deterministic, just that the criteria is rule-order (and
it being applicable only for field-subtrees makes it slightly odd).

this is definantly an issue

looking at my cisco.endpoint ruleset

origionally I had:

rule=:%ip:ipv4%%tail:rest%
rule=:%ip:ipv4%/%port:number%%tail:rest%
rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%)%tail:rest%
rule=:%ip:ipv4%/%port:number% (%label2:char-to:)%)%tail:rest%
rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%) 
(%label2:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number% 
(%label2:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%) 
(%label2:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%) 
(%label2:char-to:)%)

After learning about the rest issue I duplicated each line without the %tail:rest% at the end

still not working without disabling the items with rest in them

so after the discussion on ordering, I tried reversing all the rules, it still didn't work because the char-to matches better than the ipv4.

so for the moment I have the rules as:

rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%) (%label2:char-to:)%)
rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%) 
(%label2:char-to:)%)%tail:rest%
rule=:%ip:ipv4%/%port:number% (%label2:char-to:)%)
rule=:%ip:ipv4%/%port:number% (%label2:char-to:)%)%tail:rest%
rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%)
rule=:%ip:ipv4%/%port:number%(%label1:char-to:)%)%tail:rest%
rule=:%ip:ipv4%/%port:number%
rule=:%ip:ipv4%/%port:number%%tail:rest%
rule=:%ip:ipv4%
rule=:%ip:ipv4%%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%) 
(%label2:char-to:)%)
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%) 
(%label2:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number% (%label2:char-to:)%)
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number% 
(%label2:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%)
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%(%label1:char-to:)%)%tail:rest%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%
rule=:%iface:char-to:\x3a%\x3a%ip:ipv4%/%port:number%%tail:rest%

but I'm not sure if this really will work or not without testing every specific case because I don't know where the order is going to matter, and the char-to may match cases where It isn't going to match the rest of the rule and it won't fall through to the shorter match.

order dependency is not the right answer.

Why does this need to be added to the end of the tree rather than being positioned like any other rule components?

David Lang



On Thu, Mar 12, 2015 at 10:55 PM, Rainer Gerhards
<[email protected]> wrote:
2015-03-12 18:16 GMT+01:00 singh.janmejay <[email protected]>:

On Thu, Mar 12, 2015 at 9:29 PM, Rainer Gerhards
<[email protected]> wrote:
2015-03-12 16:41 GMT+01:00 David Lang <[email protected]>:

On Thu, 12 Mar 2015, Rainer Gerhards wrote:

 2015-03-12 5:55 GMT+01:00 singh.janmejay <[email protected]>:

 On Thu, Mar 12, 2015 at 9:19 AM, David Lang <[email protected]> wrote:

On Thu, 12 Mar 2015, singh.janmejay wrote:

 Tried re-ordering it? Put the one with /port first?



no, lognorm rules are not supposed to be order dependent, so I didn't
try
that (especially after finding things failing to parse with rsyslog
that
worked manually)


In case of input strings being matching-rule-wise disjoint, you are
right, order won't matter. But when they are not disjoint, order does
matter, because the first one to match the string wins.

Consider this rulebase:
rule=:%ip:ipv4%%last:rest%
rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%

If you write it the way I have above, you'll end up matching first
rule for input 10.20.30.40/5

But if you write it this way:
rule=:%ip:ipv4%%junk:char-sep:/%/%port:number%
rule=:%ip:ipv4%%last:rest%

You'll end up matching the first one.


This shouldn't happen. The theory is:

Let i be the current index to be looked at at the line. If for i a
parser
is selected, parsers shall be tried first (in theory, according to
parser
ordering, but I think this is not yet fully implemented). If a parser
fits,
processing is advanced to next tree node.

If the node at i does not have a parser (or all parsers failed, I think
[but not sure]), advance to next node basded on character match.

This is precisely what it does.


The order of apperance of rules inside the rulebase should not affect
this.

It doesn't for litteral-subtree, but it does for field-subtree,
because they are inserted at the tail of the linked-list.

This code (
https://github.com/rsyslog/liblognorm/blob/master/src/ptree.c#L394)
adds new subtrees at the end of linked-list, which is what causes the
ordering-sensitive behaviour.


OK, it seems like I overlooked this effect. I don't think it is good to
have any order dependence. Anyways, the work I am carrying out will most
probably lead to algorithmic changes and I'll re-evaluate that when I reach
that point (not soon). Of course, I won't break anything that exists. If
things diverge too much, I'll add an alternate library,. But again, this
needs to be seen and it is too early to think about this,

On the ordering issue: are you sure that the order is always properly
preserved? I never put any effort into it (as order was designed
irrelevant) and some reodering (IIRC) happens intentionally (parser
priorities).

Rainer


If it does, it's either not yet implemented or a bug. this is also why
I
don't like the "rest" syntax -it always matches and thus terminates
interpretation.


I'll post a simple test case when I get into the office in a bit.

In this particular case, it's failing to check other parsers when it
hits
a failure and backs up.

But there are other cases where multiple rules may match. stringto,
rest,


word, stringto are "last resort parsers", to be used only if anything
else
fails.
rest IMHO should never be used, but I think I can propose something in
the
future that solves the need that comes with it (if there still is a need
at
that point).


iptables


iptables is a different story, it's actually for a different type of
logs -
at least I think so now. I am unfortunately not prepared to discuss this
right now, as I want to keep concentrated on the log structure analyzer.
It
doesn't help if I do a bit of everything without anything ever nearing
completion ;)


are all things that can easily match a lot of data where other rules may
also match by having more specific listings. In such cases it should
still
be deterministing which rule 'wins'. I can think of a few ways to define
this.

1. fewest parsers needed wins

2. most parsers needed wins

This is probably the closest simple approximation to best match.

I was thinking about this too.


3. ordering of parsers, where the 'greedier' ones are put last so they
only come into play if the more specific ones don't match.

We could assist it by setting relative weights etc. Eg. ipv4 gets
weight 10, but rest gets only 1 etc.

Once we get the coefficients right, this can probably be achieved(its
like a costing-based picker, run once ptree has been loaded to sort
all subtree lists by cost in one shot).



That's the designed approach, and I am very sure it's the right one. As I
said, it's at least not fully implemented.

This also means we need many more specific parsers. I never get there,
because of a) time shortage and b) lack of sufficient log samples. Where
log samples is not a single line or two, but at least several thousands,
so
that I can evaluate false positives. While b) is still a very big problem
to me, a) has been much relaxed thanks to the thesis work. Also, work on
the semi-automatic rule creator looks promising. As it is a heuristic,
the
lack of log samples unfortunately is a very large hindering block.

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.



--
Regards,
Janmejay
http://codehunk.wordpress.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.




_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to