Re: [rsyslog] New Pull request for liblognorm - additional mmnormalize functionality

Kendall Green Tue, 27 Jan 2015 23:27:07 -0800

>>Thoughts?

Thanks for the examples, as I understand what you mean about missing
fields. I just want to clarify, for what I've described, when a field is
not populated, the label still exists, so it's the same sample, which takes
on a different shape, as pattern changes depending on the field values.


>> - Does it make sense for users to pack unrelated samples in the same
rulebase?

Data imput is stratified, so that each ruleset that calls mmnormalize with
specific rulebase definition for that specific data feed. Windows
normalization uses a separate rulebase than other operating systems and
platforms.

The rulebase has the prefix for standard message objects (timestamp
priority syslogtag), and individual rules for each event id. Each rule
provides the category, tag value for event id number, and annotation to
match string literals (since that data type doesn't exist)

When parsing the logs, unparsed-data identifies many which require rulebase
for every possible combination of if the fields are populated or not, or in
some cases the alternate includes a character the conflicts with full
parsing of the rule.

So for a big event logs, with lots of fields, in some cases more than 10
rules for a single event id not to mention a long time sampling to discover
which combinations of fields ever show up with a hyphen instead of expected
value; like are quoted if contains spaces, and no quote if doesn't as a
single word, or is a space if null, or it prints "NULL" or nil, or some
other 'known' string value which doesn't match the intended/initial data
type.

TL;DR, no, the samples are for the same rule in the rulebase, which are
related to the type/log source/specific EventID.

>>     * The rulebase will be composed of several unrelated rules, making
it harder to read

The rulebase is already hard to read, as it is currently a mess with
multiple rules for a single event id

There the 'or'|' type would resolve that also a lot of issues will be
cleared once able to match with the new feature to-string. I like that it's
similar to fields() extract give option of char or string separators. For
working on building a windows event log rulebase, I've had to set special
tags on each rule that are all for variations on a single eventID, just to
verify it's actually being used and not accidentally overlapping patterns
between different EventIDs.

Not having string literals or char literals in rule base means to map tags
and ontology annotation for that.

>> * Multiple parse-trees may have to be maintained in order to satisfy
>> all combinations of nullMarker (eg. a non-leaf field, marked for
>> null-handling in one sample, but not marked for it in the other) (so
>> matching will become O(n) in number of combinations). So it is some
>> dev-work and little bit of perf-overhead.

I'm not certain what you're referring to, but I understand the number of
combinations / per rule in a rulebase, would affect performance. Do you
mean for example, that hyphens could represent a nullMarker, and where the
nullMarkers would be 'potentially' on specified fields? I think it would
need to exist in rules for certain fields, not on the rulebase option, as
it would likely conflict with messages. Different than marker for word type
option, for the contribution for op-quoted-string, but nullMarker would
probably be useful for CEF where fields that are null are typically not in
the log...

Regards,
Kendall

On Tue, Jan 27, 2015 at 11:27 PM, singh.janmejay <[email protected]>
wrote:

> I see what you are thinking of, but somethings that may be worth thinking
> about before we decide:
>
> - Does it make sense for users to pack unrelated samples in the same
> rulebase?
>
>   There are 3 problems with this:
>      * The tree will become large, and back-tracking several unrelated
> branches will be wasteful (a condition in ruleset which calls the action
> will be much more efficient assuming tests is not very complex)
>
>      * The rulebase will be composed of several unrelated rules, making it
> harder to read
>
>      * Multiple parse-trees may have to be maintained in order to satisfy
> all combinations of nullMarker (eg. a non-leaf field, marked for
> null-handling in one sample, but not marked for it in the other) (so
> matching will become O(n) in number of combinations). So it is some
> dev-work and little bit of perf-overhead.
>
> - The alternative is to set nullMarker at top level in a rulebase (instead
> of being able to change it for every sample).
>
>   But then the flexibility is slightly lowered.
>
> - If we go with action level param, its useful in cases where one has
> standard access-log format but load-balancer level always have some fields
> (say upstream latency or upstream-ip) which app-layer access logs will not
> have.
>
>   This can use the same rulebase with nullMarker in one case, and without
> it in another.
>
> Thoughts?
>
> On Wed, Jan 28, 2015 at 11:13 AM, David Lang <[email protected]> wrote:
>
> > I'm thinking that it needs to only apply to part of a ruleset. I can't
> see
> > why you would use the same rulebase with different values overall, but I
> > can easily see a rulebase that covers more than one type of logs needing
> > different values for the different types of logs.
> >
> > remember that liblognorm is most effictive if it has one ruleset to cover
> > everything you are looking at rather than doing other conditionals and
> then
> > picking which rulset to use.
> >
> > David Lang
> >
> >
> > On Wed, 28 Jan 2015, singh.janmejay wrote:
> >
> >  I think action parameter is the most flexible place to have it at.
> Because
> >> same rulebase can be used with different values.
> >>
> >> Either module or rulebase level param will be less flexible compared to
> >> this.
> >>
> >> --
> >> Regards,
> >> Janmejay
> >>
> >> PS: Please blame the typos in this mail on my phone's uncivilized soft
> >> keyboard sporting it's not-so-smart-assist technology.
> >>
> >> On Jan 28, 2015 10:48 AM, "David Lang" <[email protected]> wrote:
> >>
> >>  On Wed, 28 Jan 2015, singh.janmejay wrote:
> >>>
> >>>  Ok, one way I can think of doing it: expose a parameter at
> action/module
> >>>
> >>>> level which turns on defaulting and picks a default string.
> >>>>
> >>>> Eg.
> >>>>
> >>>> action(type="mmnormalize "  nullMarker="-")
> >>>>
> >>>> Where nullMarker is a string (not a char).
> >>>>
> >>>> Whenever a "-" is encountered and a field is expected, it should skip
> >>>> the
> >>>> key(the key will not be present at all) and continue matching next
> token
> >>>> onwards.
> >>>>
> >>>> Thoughts?
> >>>>
> >>>>
> >>> This needs to be something in the liblognorm config, not in rsyslog.
> >>> different types of logs would have different nullMarker strings.
> >>>
> >>> with that adjustment, I think it's a good idea.
> >>>
> >>> David Lang
> >>>
> >>>  --
> >>>
> >>>> Regards,
> >>>> Janmejay
> >>>>
> >>>> PS: Please blame the typos in this mail on my phone's uncivilized soft
> >>>> keyboard sporting it's not-so-smart-assist technology.
> >>>>
> >>>> On Jan 28, 2015 6:38 AM, "David Lang" <[email protected]> wrote:
> >>>>
> >>>>  On Wed, 28 Jan 2015, singh.janmejay wrote:
> >>>>
> >>>>>
> >>>>>  May be it'll be useful to discuss what you want to achieve with such
> >>>>>
> >>>>>  representations of sample. I mean if possible, take a few samples
> from
> >>>>>> your
> >>>>>> existing rulebase which you think highlight the problem(s) you are
> >>>>>> facing.
> >>>>>>
> >>>>>>
> >>>>>>  I think the example is the Apache logs, where Apache either puts a
> >>>>> value,
> >>>>> or it puts a placeholder '-'
> >>>>>
> >>>>> if you want to capture a specific type (number or ip address for
> >>>>> example),
> >>>>> you won't match a log entry that has a - in that field.
> >>>>>
> >>>>> If there are only a couple fields that are like this, you can list
> all
> >>>>> the
> >>>>> combinations in the ruleset, but if you have a lot of fields like
> this,
> >>>>> the
> >>>>> combinatorial explosion would make for a LOT of rules.
> >>>>>
> >>>>> So I don't think he really needs a generic 'or' allowing any types to
> >>>>> be
> >>>>> combined as much as a way to say "this field could be this type or
> this
> >>>>> constant"
> >>>>>
> >>>>> David Lang
> >>>>> _______________________________________________
> >>>>> rsyslog mailing list
> >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>>>> http://www.rsyslog.com/professional-services/
> >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> >>>>> myriad
> >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
> you
> >>>>> DON'T LIKE THAT.
> >>>>>
> >>>>>  _______________________________________________
> >>>>>
> >>>> rsyslog mailing list
> >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>>> http://www.rsyslog.com/professional-services/
> >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad
> >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> >>>> DON'T LIKE THAT.
> >>>>
> >>>>  _______________________________________________
> >>>>
> >>> rsyslog mailing list
> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>> http://www.rsyslog.com/professional-services/
> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad
> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> >>> DON'T LIKE THAT.
> >>>
> >>>  _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com/professional-services/
> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> >> DON'T LIKE THAT.
> >>
> >>  _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> > DON'T LIKE THAT.
> >
>
>
>
> --
> Regards,
> Janmejay
> http://codehunk.wordpress.com
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] New Pull request for liblognorm - additional mmnormalize functionality

Reply via email to