Re: [DISCUSS] Handling dropped messages in REGEX_SELECT with Kafka topic routing

2019-01-07 Thread Ali Nazemian
Just one thing to bear in mind, publishing an error may cause some
operational challenges as it fills up the error topic as well as storm logs
which may not be necessary. To wear a Metron user hat, dropping a message
with a debug/trace level log to specify the event is filter out makes
sense. I guess if we want to make this really fancy having the flexibility
to decide what happens next would be really nice to have as No. 2 and 3
would be required in some special cases  (Make it a bit complex, though).
Of course, the default can be the drop with the ack.

Cheers,
Ali

On Thu, Dec 20, 2018 at 8:18 AM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Completely agreed on the acking. The reason I posed the question to begin
> with was because, while I believe dropping+acking is the correct
> functionality, I could see a few alternative patterns for handling this:
>
>1. Require filtering to be handled by the message filter infrastructure
>and publish an error to the error queue if field transformations such as
>REGEX_SELECT violate this by dropping messages.
>2. Default records to be written to enrichments, or handle per my
>comments in #1
>3. Default records to be written to the topic defined by outputTopic
>(non-default version of #2)
>
> At any rate, we should fix the acking problem and then the dropped messages
> pattern makes sense to me. I've created a Jira to track it -
> https://issues.apache.org/jira/browse/METRON-1948.
>
> On Wed, Dec 19, 2018 at 12:43 PM Casey Stella  wrote:
>
> > We absolutely should be acking the dropped messages otherwise they'll be
> in
> > a replay loop.  Not acking is a flat-out bug IMO.
> >
> > On Wed, Dec 19, 2018 at 2:37 PM Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > When a message is filtered by the message filtering mechanism, we
> > > explicitly drop the message (and presumably ack it in Storm), as
> > explained
> > > here -
> > >
> > >
> >
> https://github.com/apache/metron/tree/master/metron-platform/metron-parsing#filtered
> > > .
> > > When using the REGEX_SELECT field transformation (see here -
> > >
> > >
> >
> https://github.com/apache/metron/tree/master/metron-platform/metron-parsing#fieldtransformation-configuration
> > > )
> > > with the kafka.topicField option for parser-chaining, it's unclear to
> me
> > > whether we expect the same behavior (drop message, ack it). The
> > > interpretation I get from this example in the parser-chaining doc
> > >
> > >
> >
> https://github.com/apache/metron/tree/master/use-cases/parser_chaining#the-pix_syslog_router-parser
> > > suggests to me that the approach we take for messages with message
> > > filtering is the correct one, however in testing an example with
> dropped
> > > messages, we appear not to ack those dropped messages.
> > >
> > > Before I go creating a fix I thought it best to summarize and confirm
> my
> > > expectations on this functionality. Messages from a REGEX_SELECT that
> > don't
> > > match a pattern, and therefore don't get a value assigned to their
> output
> > > topic value, should be dropped and acked.
> > >
> > > *Example:*
> > > {
> > > "parserClassName": "org.apache.metron.parsers.GrokParser",
> > > "sensorTopic": "myInTopic",
> > > ...
> > > "parserConfig": {
> > > ...,
> > > "kafka.topicField": "output_topic"
> > > },
> > > "fieldTransformations": [
> > > {
> > > "input": [
> > > "message"
> > > ],
> > > "output": [
> > > "output_topic"
> > > ],
> > > "transformation": "REGEX_SELECT",
> > > "config": {
> > > "world": "^Hello "
> > > }
> > > },
> > > ...
> > > }
> > >
> > > *Input Records:*
> > > "...sshd[32469]: Hello..."
> > > "...sshd[30432]: Bye..."
> > >
> > > *Output:*
> > > Kafka topic = "world" (as determined by the REGEX_SELECT pattern match
> > that
> > > sets the "output_topic" property used by kafka.topicField)
> > > 1 record present
> > > contents of that record = our record with "Hello" in it
> > > 1 record is dropped ("Bye" record) and will not be forwarded any
> further
> > > through the pipeline.
> > >
> >
>


-- 
A.Nazemian


Re: [DISCUSS] Handling dropped messages in REGEX_SELECT with Kafka topic routing

2018-12-19 Thread Michael Miklavcic
Completely agreed on the acking. The reason I posed the question to begin
with was because, while I believe dropping+acking is the correct
functionality, I could see a few alternative patterns for handling this:

   1. Require filtering to be handled by the message filter infrastructure
   and publish an error to the error queue if field transformations such as
   REGEX_SELECT violate this by dropping messages.
   2. Default records to be written to enrichments, or handle per my
   comments in #1
   3. Default records to be written to the topic defined by outputTopic
   (non-default version of #2)

At any rate, we should fix the acking problem and then the dropped messages
pattern makes sense to me. I've created a Jira to track it -
https://issues.apache.org/jira/browse/METRON-1948.

On Wed, Dec 19, 2018 at 12:43 PM Casey Stella  wrote:

> We absolutely should be acking the dropped messages otherwise they'll be in
> a replay loop.  Not acking is a flat-out bug IMO.
>
> On Wed, Dec 19, 2018 at 2:37 PM Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > When a message is filtered by the message filtering mechanism, we
> > explicitly drop the message (and presumably ack it in Storm), as
> explained
> > here -
> >
> >
> https://github.com/apache/metron/tree/master/metron-platform/metron-parsing#filtered
> > .
> > When using the REGEX_SELECT field transformation (see here -
> >
> >
> https://github.com/apache/metron/tree/master/metron-platform/metron-parsing#fieldtransformation-configuration
> > )
> > with the kafka.topicField option for parser-chaining, it's unclear to me
> > whether we expect the same behavior (drop message, ack it). The
> > interpretation I get from this example in the parser-chaining doc
> >
> >
> https://github.com/apache/metron/tree/master/use-cases/parser_chaining#the-pix_syslog_router-parser
> > suggests to me that the approach we take for messages with message
> > filtering is the correct one, however in testing an example with dropped
> > messages, we appear not to ack those dropped messages.
> >
> > Before I go creating a fix I thought it best to summarize and confirm my
> > expectations on this functionality. Messages from a REGEX_SELECT that
> don't
> > match a pattern, and therefore don't get a value assigned to their output
> > topic value, should be dropped and acked.
> >
> > *Example:*
> > {
> > "parserClassName": "org.apache.metron.parsers.GrokParser",
> > "sensorTopic": "myInTopic",
> > ...
> > "parserConfig": {
> > ...,
> > "kafka.topicField": "output_topic"
> > },
> > "fieldTransformations": [
> > {
> > "input": [
> > "message"
> > ],
> > "output": [
> > "output_topic"
> > ],
> > "transformation": "REGEX_SELECT",
> > "config": {
> > "world": "^Hello "
> > }
> > },
> > ...
> > }
> >
> > *Input Records:*
> > "...sshd[32469]: Hello..."
> > "...sshd[30432]: Bye..."
> >
> > *Output:*
> > Kafka topic = "world" (as determined by the REGEX_SELECT pattern match
> that
> > sets the "output_topic" property used by kafka.topicField)
> > 1 record present
> > contents of that record = our record with "Hello" in it
> > 1 record is dropped ("Bye" record) and will not be forwarded any further
> > through the pipeline.
> >
>


Re: [DISCUSS] Handling dropped messages in REGEX_SELECT with Kafka topic routing

2018-12-19 Thread Casey Stella
We absolutely should be acking the dropped messages otherwise they'll be in
a replay loop.  Not acking is a flat-out bug IMO.

On Wed, Dec 19, 2018 at 2:37 PM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> When a message is filtered by the message filtering mechanism, we
> explicitly drop the message (and presumably ack it in Storm), as explained
> here -
>
> https://github.com/apache/metron/tree/master/metron-platform/metron-parsing#filtered
> .
> When using the REGEX_SELECT field transformation (see here -
>
> https://github.com/apache/metron/tree/master/metron-platform/metron-parsing#fieldtransformation-configuration
> )
> with the kafka.topicField option for parser-chaining, it's unclear to me
> whether we expect the same behavior (drop message, ack it). The
> interpretation I get from this example in the parser-chaining doc
>
> https://github.com/apache/metron/tree/master/use-cases/parser_chaining#the-pix_syslog_router-parser
> suggests to me that the approach we take for messages with message
> filtering is the correct one, however in testing an example with dropped
> messages, we appear not to ack those dropped messages.
>
> Before I go creating a fix I thought it best to summarize and confirm my
> expectations on this functionality. Messages from a REGEX_SELECT that don't
> match a pattern, and therefore don't get a value assigned to their output
> topic value, should be dropped and acked.
>
> *Example:*
> {
> "parserClassName": "org.apache.metron.parsers.GrokParser",
> "sensorTopic": "myInTopic",
> ...
> "parserConfig": {
> ...,
> "kafka.topicField": "output_topic"
> },
> "fieldTransformations": [
> {
> "input": [
> "message"
> ],
> "output": [
> "output_topic"
> ],
> "transformation": "REGEX_SELECT",
> "config": {
> "world": "^Hello "
> }
> },
> ...
> }
>
> *Input Records:*
> "...sshd[32469]: Hello..."
> "...sshd[30432]: Bye..."
>
> *Output:*
> Kafka topic = "world" (as determined by the REGEX_SELECT pattern match that
> sets the "output_topic" property used by kafka.topicField)
> 1 record present
> contents of that record = our record with "Hello" in it
> 1 record is dropped ("Bye" record) and will not be forwarded any further
> through the pipeline.
>


[DISCUSS] Handling dropped messages in REGEX_SELECT with Kafka topic routing

2018-12-19 Thread Michael Miklavcic
When a message is filtered by the message filtering mechanism, we
explicitly drop the message (and presumably ack it in Storm), as explained
here -
https://github.com/apache/metron/tree/master/metron-platform/metron-parsing#filtered.
When using the REGEX_SELECT field transformation (see here -
https://github.com/apache/metron/tree/master/metron-platform/metron-parsing#fieldtransformation-configuration)
with the kafka.topicField option for parser-chaining, it's unclear to me
whether we expect the same behavior (drop message, ack it). The
interpretation I get from this example in the parser-chaining doc
https://github.com/apache/metron/tree/master/use-cases/parser_chaining#the-pix_syslog_router-parser
suggests to me that the approach we take for messages with message
filtering is the correct one, however in testing an example with dropped
messages, we appear not to ack those dropped messages.

Before I go creating a fix I thought it best to summarize and confirm my
expectations on this functionality. Messages from a REGEX_SELECT that don't
match a pattern, and therefore don't get a value assigned to their output
topic value, should be dropped and acked.

*Example:*
{
"parserClassName": "org.apache.metron.parsers.GrokParser",
"sensorTopic": "myInTopic",
...
"parserConfig": {
...,
"kafka.topicField": "output_topic"
},
"fieldTransformations": [
{
"input": [
"message"
],
"output": [
"output_topic"
],
"transformation": "REGEX_SELECT",
"config": {
"world": "^Hello "
}
},
...
}

*Input Records:*
"...sshd[32469]: Hello..."
"...sshd[30432]: Bye..."

*Output:*
Kafka topic = "world" (as determined by the REGEX_SELECT pattern match that
sets the "output_topic" property used by kafka.topicField)
1 record present
contents of that record = our record with "Hello" in it
1 record is dropped ("Bye" record) and will not be forwarded any further
through the pipeline.