Folks, please bear with me. Right now, I can't comment in a way that makes sense, as I need to check with some third parties. Once I have done that, you'll understand. Please bear a day or maybe some few with me.
Rainer 2015-01-28 8:26 GMT+01:00 Kendall Green <[email protected]>: > >>Thoughts? > > Thanks for the examples, as I understand what you mean about missing > fields. I just want to clarify, for what I've described, when a field is > not populated, the label still exists, so it's the same sample, which takes > on a different shape, as pattern changes depending on the field values. > > >> - Does it make sense for users to pack unrelated samples in the same > rulebase? > > Data imput is stratified, so that each ruleset that calls mmnormalize with > specific rulebase definition for that specific data feed. Windows > normalization uses a separate rulebase than other operating systems and > platforms. > > The rulebase has the prefix for standard message objects (timestamp > priority syslogtag), and individual rules for each event id. Each rule > provides the category, tag value for event id number, and annotation to > match string literals (since that data type doesn't exist) > > When parsing the logs, unparsed-data identifies many which require rulebase > for every possible combination of if the fields are populated or not, or in > some cases the alternate includes a character the conflicts with full > parsing of the rule. > > So for a big event logs, with lots of fields, in some cases more than 10 > rules for a single event id not to mention a long time sampling to discover > which combinations of fields ever show up with a hyphen instead of expected > value; like are quoted if contains spaces, and no quote if doesn't as a > single word, or is a space if null, or it prints "NULL" or nil, or some > other 'known' string value which doesn't match the intended/initial data > type. > > TL;DR, no, the samples are for the same rule in the rulebase, which are > related to the type/log source/specific EventID. > > >> * The rulebase will be composed of several unrelated rules, making > it harder to read > > The rulebase is already hard to read, as it is currently a mess with > multiple rules for a single event id > > There the 'or'|' type would resolve that also a lot of issues will be > cleared once able to match with the new feature to-string. I like that it's > similar to fields() extract give option of char or string separators. For > working on building a windows event log rulebase, I've had to set special > tags on each rule that are all for variations on a single eventID, just to > verify it's actually being used and not accidentally overlapping patterns > between different EventIDs. > > Not having string literals or char literals in rule base means to map tags > and ontology annotation for that. > > >> * Multiple parse-trees may have to be maintained in order to satisfy > >> all combinations of nullMarker (eg. a non-leaf field, marked for > >> null-handling in one sample, but not marked for it in the other) (so > >> matching will become O(n) in number of combinations). So it is some > >> dev-work and little bit of perf-overhead. > > I'm not certain what you're referring to, but I understand the number of > combinations / per rule in a rulebase, would affect performance. Do you > mean for example, that hyphens could represent a nullMarker, and where the > nullMarkers would be 'potentially' on specified fields? I think it would > need to exist in rules for certain fields, not on the rulebase option, as > it would likely conflict with messages. Different than marker for word type > option, for the contribution for op-quoted-string, but nullMarker would > probably be useful for CEF where fields that are null are typically not in > the log... > > Regards, > Kendall > > On Tue, Jan 27, 2015 at 11:27 PM, singh.janmejay <[email protected] > > > wrote: > > > I see what you are thinking of, but somethings that may be worth thinking > > about before we decide: > > > > - Does it make sense for users to pack unrelated samples in the same > > rulebase? > > > > There are 3 problems with this: > > * The tree will become large, and back-tracking several unrelated > > branches will be wasteful (a condition in ruleset which calls the action > > will be much more efficient assuming tests is not very complex) > > > > * The rulebase will be composed of several unrelated rules, making > it > > harder to read > > > > * Multiple parse-trees may have to be maintained in order to satisfy > > all combinations of nullMarker (eg. a non-leaf field, marked for > > null-handling in one sample, but not marked for it in the other) (so > > matching will become O(n) in number of combinations). So it is some > > dev-work and little bit of perf-overhead. > > > > - The alternative is to set nullMarker at top level in a rulebase > (instead > > of being able to change it for every sample). > > > > But then the flexibility is slightly lowered. > > > > - If we go with action level param, its useful in cases where one has > > standard access-log format but load-balancer level always have some > fields > > (say upstream latency or upstream-ip) which app-layer access logs will > not > > have. > > > > This can use the same rulebase with nullMarker in one case, and without > > it in another. > > > > Thoughts? > > > > On Wed, Jan 28, 2015 at 11:13 AM, David Lang <[email protected]> wrote: > > > > > I'm thinking that it needs to only apply to part of a ruleset. I can't > > see > > > why you would use the same rulebase with different values overall, but > I > > > can easily see a rulebase that covers more than one type of logs > needing > > > different values for the different types of logs. > > > > > > remember that liblognorm is most effictive if it has one ruleset to > cover > > > everything you are looking at rather than doing other conditionals and > > then > > > picking which rulset to use. > > > > > > David Lang > > > > > > > > > On Wed, 28 Jan 2015, singh.janmejay wrote: > > > > > > I think action parameter is the most flexible place to have it at. > > Because > > >> same rulebase can be used with different values. > > >> > > >> Either module or rulebase level param will be less flexible compared > to > > >> this. > > >> > > >> -- > > >> Regards, > > >> Janmejay > > >> > > >> PS: Please blame the typos in this mail on my phone's uncivilized soft > > >> keyboard sporting it's not-so-smart-assist technology. > > >> > > >> On Jan 28, 2015 10:48 AM, "David Lang" <[email protected]> wrote: > > >> > > >> On Wed, 28 Jan 2015, singh.janmejay wrote: > > >>> > > >>> Ok, one way I can think of doing it: expose a parameter at > > action/module > > >>> > > >>>> level which turns on defaulting and picks a default string. > > >>>> > > >>>> Eg. > > >>>> > > >>>> action(type="mmnormalize " nullMarker="-") > > >>>> > > >>>> Where nullMarker is a string (not a char). > > >>>> > > >>>> Whenever a "-" is encountered and a field is expected, it should > skip > > >>>> the > > >>>> key(the key will not be present at all) and continue matching next > > token > > >>>> onwards. > > >>>> > > >>>> Thoughts? > > >>>> > > >>>> > > >>> This needs to be something in the liblognorm config, not in rsyslog. > > >>> different types of logs would have different nullMarker strings. > > >>> > > >>> with that adjustment, I think it's a good idea. > > >>> > > >>> David Lang > > >>> > > >>> -- > > >>> > > >>>> Regards, > > >>>> Janmejay > > >>>> > > >>>> PS: Please blame the typos in this mail on my phone's uncivilized > soft > > >>>> keyboard sporting it's not-so-smart-assist technology. > > >>>> > > >>>> On Jan 28, 2015 6:38 AM, "David Lang" <[email protected]> wrote: > > >>>> > > >>>> On Wed, 28 Jan 2015, singh.janmejay wrote: > > >>>> > > >>>>> > > >>>>> May be it'll be useful to discuss what you want to achieve with > such > > >>>>> > > >>>>> representations of sample. I mean if possible, take a few samples > > from > > >>>>>> your > > >>>>>> existing rulebase which you think highlight the problem(s) you are > > >>>>>> facing. > > >>>>>> > > >>>>>> > > >>>>>> I think the example is the Apache logs, where Apache either puts > a > > >>>>> value, > > >>>>> or it puts a placeholder '-' > > >>>>> > > >>>>> if you want to capture a specific type (number or ip address for > > >>>>> example), > > >>>>> you won't match a log entry that has a - in that field. > > >>>>> > > >>>>> If there are only a couple fields that are like this, you can list > > all > > >>>>> the > > >>>>> combinations in the ruleset, but if you have a lot of fields like > > this, > > >>>>> the > > >>>>> combinatorial explosion would make for a LOT of rules. > > >>>>> > > >>>>> So I don't think he really needs a generic 'or' allowing any types > to > > >>>>> be > > >>>>> combined as much as a way to say "this field could be this type or > > this > > >>>>> constant" > > >>>>> > > >>>>> David Lang > > >>>>> _______________________________________________ > > >>>>> rsyslog mailing list > > >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > > >>>>> http://www.rsyslog.com/professional-services/ > > >>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards > > >>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a > > >>>>> myriad > > >>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if > > you > > >>>>> DON'T LIKE THAT. > > >>>>> > > >>>>> _______________________________________________ > > >>>>> > > >>>> rsyslog mailing list > > >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > > >>>> http://www.rsyslog.com/professional-services/ > > >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards > > >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a > > myriad > > >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if > you > > >>>> DON'T LIKE THAT. > > >>>> > > >>>> _______________________________________________ > > >>>> > > >>> rsyslog mailing list > > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > > >>> http://www.rsyslog.com/professional-services/ > > >>> What's up with rsyslog? Follow https://twitter.com/rgerhards > > >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a > > myriad > > >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if > you > > >>> DON'T LIKE THAT. > > >>> > > >>> _______________________________________________ > > >> rsyslog mailing list > > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > > >> http://www.rsyslog.com/professional-services/ > > >> What's up with rsyslog? Follow https://twitter.com/rgerhards > > >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a > myriad > > >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > > >> DON'T LIKE THAT. > > >> > > >> _______________________________________________ > > > rsyslog mailing list > > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > > http://www.rsyslog.com/professional-services/ > > > What's up with rsyslog? Follow https://twitter.com/rgerhards > > > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a > myriad > > > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > > > DON'T LIKE THAT. > > > > > > > > > > > -- > > Regards, > > Janmejay > > http://codehunk.wordpress.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com/professional-services/ > > What's up with rsyslog? Follow https://twitter.com/rgerhards > > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > > DON'T LIKE THAT. > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you > DON'T LIKE THAT. > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

