Re: Normalization topology or separate normalization bolt for parsing topology

2017-05-03 Thread Nick Allen
> Clearly, a generic parser would be useful for the community not a type of parser that is highly customised for our noisy environment. Increasing the number of generic parsers for the community is definitely a good goal. I agree with you there. Could we achieve the same goal by making our

Re: Normalization topology or separate normalization bolt for parsing topology

2017-05-02 Thread Ali Nazemian
Hi Nick, I am happy to continue the development using the current architecture and embed the pre-parsing steps in the parser code. However, this would be against the policy to have a contribution to Metron community to expand the range of supported devices. Clearly, a generic parser would be

Re: Normalization topology or separate normalization bolt for parsing topology

2017-05-02 Thread Nick Allen
Yes, and currently that normalization step is the Parsers. I am not saying the message has to be entirely clear and well-defined. But there are a minimum set of expectations that you must have of any data that you're ingesting. Once it meets that "minimum set", the parser should be able to

Re: Normalization topology or separate normalization bolt for parsing topology

2017-05-02 Thread Ali Nazemian
Hi Nick, The date could be corrupted due to any reason, and sometimes we haven't got any control on the device. Obviously, it is not a big deal if we lose <166> severity message, but it could be a different situation for <161> severity or an actual critical threat. However, I have mentioned those

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-27 Thread Ali Nazemian
Are you sure? The syslog_host name is way more complicated than something that can be a coincidence. I need to double check with one of the security device experts, but I thought it is some kind of noises. Yes, we do have more use cases that seem to be corrupted. For example, having duplicate IP

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-27 Thread Simon Elliston Ball
Is that instance, you're looking at valid syslog which should be parsed as such. The repeat host is not really a host in syslog terms, it's an application name header which happens to be the same. This is definitely a parser bug which should be handled, esp since the header is perfectly RFC

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-27 Thread Ali Nazemian
I do agree there is a fair amount of overhead for using another bolt for this purpose. I am not pointing to the way of implementation. It might be a way of implementation to segregate two extension points without adding overhead; I haven't thought about it yet. However, the main issue is sometimes

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-26 Thread Simon Elliston Ball
Ali, Sounds very much like what you’re talking about when you say normalization, and what I would understand it as, is the process fulfilled by stellar field transformation in the parser config. Agreed that some of these will be general, based on common metron standard schema, but others will

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-26 Thread Ali Nazemian
Hi Simon, The reason I am asking for a specific normalisation step is due to the fact that normalisation is not a general use case which can be used by other users. It is completely bounded to our application. The way we have fixed it, for now, is to add a normalisation step to the parser and

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-26 Thread Casey Stella
Yeah, we definitely don't want to rewrite parsing in Stellar. I would expect the job of the parser, however, to handle structural issues. In my mind, parsing is about transforming structures into fields and the role of the field transformations are to transform values. There's obvious overlap

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-26 Thread Casey Stella
Ok, this may be easier with a couple of examples: *Simple Example : Downstream Processing is Independent of Normalization* Pretend we have a data format that is CSV and the first field, let's call it 'input_dname' is supposed to be a domain name, but sometimes you get IP addresses. In the

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-26 Thread Nick Allen
> For some reason, the incoming data do not look like in the way that has to be. In my mind that would be something for your parser to handle. On Wed, Apr 26, 2017 at 9:43 AM, Ali Nazemian wrote: > Having Stellar function for the normalization is very cool actually. > >

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-26 Thread Ali Nazemian
Having Stellar function for the normalization is very cool actually. Casey, how are you going to deal with normalization after the parsing if that noise affects the parsing? For some reason, the incoming data do not look like in the way that has to be. On Wed, Apr 26, 2017 at 11:37 PM, Casey

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-26 Thread Casey Stella
Ok, that's another story. h, we don't generally pre-parse becuase we try to not assume any particular format there (i.e. it could be strings, could be byte arrays). Maybe the right answer is to pass the raw, non-normalized data (best effort tyep of thing) through the parser and do the

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-26 Thread Ali Nazemian
Hi Casey, It is actually pre-parse process, not a post-parse one. These type of noises affect the position of an attribute for example and give us parsing exception. The timestamp example was not a good one because that is actually a post-parse exception. On Wed, Apr 26, 2017 at 11:28 PM, Casey

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-26 Thread Otto Fowler
I’ve added this to the jira On April 26, 2017 at 09:28:54, Otto Fowler (ottobackwa...@gmail.com) wrote: What if you could implement your cleaning in Stellar functions, which would be in libraries that were loaded as plugins and available to all your parsers? my_field =

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-26 Thread Casey Stella
So, having stellar operate on the whole message is definitely something that would be cool. That being said, it's also nice to motivate the construction of functions to do simple transformations/normalizations. That way, common useful capabilities may be reused all the places Stellar is used

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-26 Thread Casey Stella
So, further transformation post-parse was one of the motivating reasons for Stellar (to do that transformation post-parse). Is there a capability that it's lacking that we can add to fit your usecase? On Wed, Apr 26, 2017 at 9:24 AM, Ali Nazemian wrote: > I've created a

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-26 Thread Ali Nazemian
I've created a Jira ticket regarding this feature. https://issues.apache.org/jira/browse/METRON-893 On Wed, Apr 26, 2017 at 11:11 PM, Ali Nazemian wrote: > Currently, we are using normal regex at the Java source code to handle > those situations. However, it would be

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-26 Thread Ali Nazemian
Currently, we are using normal regex at the Java source code to handle those situations. However, it would be nice to have a separate bolt and deal with them separately. Yeah, I can create a Jira issue regarding that. The main reason I am asking for such a feature is the fact that lack of such a

Re: Normalization topology or separate normalization bolt for parsing topology

2017-04-26 Thread Otto Fowler
Hi, Are you doing this cleansing all in the parser or are you using any Stellar to do it? Can you create a jira? On April 26, 2017 at 08:59:16, Ali Nazemian (alinazem...@gmail.com) wrote: Hi all, We are facing certain use cases in Metron production that happen to be related to noisy stream.