> Clearly, a generic parser would be useful for the community not a type of
parser that is highly customised for our noisy environment.
Increasing the number of generic parsers for the community is definitely a
good goal. I agree with you there.
Could we achieve the same goal by making our
I am happy to continue the development using the current architecture and
embed the pre-parsing steps in the parser code. However, this would be
against the policy to have a contribution to Metron community to expand the
range of supported devices. Clearly, a generic parser would be
Yes, and currently that normalization step is the Parsers.
I am not saying the message has to be entirely clear and well-defined. But
there are a minimum set of expectations that you must have of any data that
you're ingesting. Once it meets that "minimum set", the parser should be
The date could be corrupted due to any reason, and sometimes we haven't got
any control on the device. Obviously, it is not a big deal if we lose <166>
severity message, but it could be a different situation for <161>
severity or an actual critical threat. However, I have mentioned those
Are you sure? The syslog_host name is way more complicated than something
that can be a coincidence. I need to double check with one of the security
device experts, but I thought it is some kind of noises.
Yes, we do have more use cases that seem to be corrupted. For example,
having duplicate IP
Is that instance, you're looking at valid syslog which should be parsed as
such. The repeat host is not really a host in syslog terms, it's an application
name header which happens to be the same. This is definitely a parser bug which
should be handled, esp since the header is perfectly RFC
I do agree there is a fair amount of overhead for using another bolt for
this purpose. I am not pointing to the way of implementation. It might be a
way of implementation to segregate two extension points without adding
overhead; I haven't thought about it yet. However, the main issue is
Sounds very much like what you’re talking about when you say normalization, and
what I would understand it as, is the process fulfilled by stellar field
transformation in the parser config. Agreed that some of these will be general,
based on common metron standard schema, but others will
The reason I am asking for a specific normalisation step is due to the fact
that normalisation is not a general use case which can be used by other
users. It is completely bounded to our application. The way we have fixed
it, for now, is to add a normalisation step to the parser and
Yeah, we definitely don't want to rewrite parsing in Stellar. I would
expect the job of the parser, however, to handle structural issues. In my
mind, parsing is about transforming structures into fields and the role of
the field transformations are to transform values. There's obvious overlap
Ok, this may be easier with a couple of examples:
*Simple Example : Downstream Processing is Independent of Normalization*
Pretend we have a data format that is CSV and the first field, let's call
it 'input_dname' is supposed to be a domain name, but sometimes you get IP
addresses. In the
> For some reason, the incoming data do not look like in the way that has
In my mind that would be something for your parser to handle.
On Wed, Apr 26, 2017 at 9:43 AM, Ali Nazemian wrote:
> Having Stellar function for the normalization is very cool actually.
Having Stellar function for the normalization is very cool actually.
Casey, how are you going to deal with normalization after the parsing if
that noise affects the parsing? For some reason, the incoming data do not
look like in the way that has to be.
On Wed, Apr 26, 2017 at 11:37 PM, Casey
Ok, that's another story. h, we don't generally pre-parse becuase we
try to not assume any particular format there (i.e. it could be strings,
could be byte arrays). Maybe the right answer is to pass the raw,
non-normalized data (best effort tyep of thing) through the parser and do
It is actually pre-parse process, not a post-parse one. These type of
noises affect the position of an attribute for example and give us parsing
exception. The timestamp example was not a good one because that is
actually a post-parse exception.
On Wed, Apr 26, 2017 at 11:28 PM, Casey
I’ve added this to the jira
On April 26, 2017 at 09:28:54, Otto Fowler (ottobackwa...@gmail.com) wrote:
What if you could implement your cleaning in Stellar functions, which would
be in libraries that were loaded as plugins and available to all your
So, having stellar operate on the whole message is definitely something
that would be cool. That being said, it's also nice to motivate the
construction of functions to do simple transformations/normalizations.
That way, common useful capabilities may be reused all the places Stellar
So, further transformation post-parse was one of the motivating reasons for
Stellar (to do that transformation post-parse). Is there a capability that
it's lacking that we can add to fit your usecase?
On Wed, Apr 26, 2017 at 9:24 AM, Ali Nazemian wrote:
> I've created a
I've created a Jira ticket regarding this feature.
On Wed, Apr 26, 2017 at 11:11 PM, Ali Nazemian
> Currently, we are using normal regex at the Java source code to handle
> those situations. However, it would be
Currently, we are using normal regex at the Java source code to handle
those situations. However, it would be nice to have a separate bolt and
deal with them separately. Yeah, I can create a Jira issue regarding that.
The main reason I am asking for such a feature is the fact that lack of
Are you doing this cleansing all in the parser or are you using any Stellar
to do it?
Can you create a jira?
On April 26, 2017 at 08:59:16, Ali Nazemian (alinazem...@gmail.com) wrote:
We are facing certain use cases in Metron production that happen to be
related to noisy stream.
Mail list logo