[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-18 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/1099
  
Sure, actually I'll do a discuss thread when this all goes through.  That 
way I can try again to get @cestella to comment


---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-18 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/1099
  
Happy to help explain it, and it's really helpful to get feedback on it 
because it led to improved docs, so hopefully it makes sure things are clearer 
to anyone looking at this later.

I created tickets for:

- [Management UI changes for sensor 
aggregation](https://issues.apache.org/jira/browse/METRON-1678)
- [REST changes to support parser 
aggregation](https://issues.apache.org/jira/browse/METRON-1679)
- [Allow the option for intermediate kafka topics to be removed in 
aggregated sensors](https://issues.apache.org/jira/browse/METRON-1682)
- [Allow Ambari to accept quoted parser 
groups](https://issues.apache.org/jira/browse/METRON-1680)
- [Decouple the ParserBolt from the Parse execution 
logic](https://issues.apache.org/jira/browse/METRON-1681)

I held off on creating a ticket for the routing process applying a 
transform.  @ottobackwards Would you mind creating that, to make sure the use 
case and examples or assumption, etc. are what you expect?


---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-17 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/1099
  
All that being said I am a big +1 on this.  Great work @justinleet, thanks 
for taking the time to work it through my thick skull.


---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-17 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/1099
  
A mechanism for the routing process to apply a transform or some such. 
@cestella may have a better design idea.

What I would like us to do is remove the transport from the message where 
there are common wrappers.

Example:  All the message types delivered by syslog.   The parsers should 
not have to all parser syslog AND the message.  I imagine defining a 
transform/parser in the router that takes every message and transforms it ( 
parses syslog fields + structured data if 5424 into meta and MSG to the bytes ) 
and then passes it on to a parser that only needs to know the message.

That way we can have simpler parsers, and even support the same message 
when transported in different wrappers.

We can talk about if this is a function of parser chaining, such that each 
specialized parser will be second in a chain with the syslog unwrapped being 
first, or part of routing, but I think it is part of routing personally.



---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-17 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/1099
  
@ottobackwards Is there anything else we want as follow-on?  Assuming we're 
good with this where its at (or at least close to), I'll go ahead and create 
those tickets.


---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-17 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/1099
  
@cestella I think we want at least one more to capture making this 
available in the management UI (I'd argue that should inform any REST API task, 
so I'm not sure if we want that also created as a ticket).


---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-17 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/1099
  
Ok, so I want to capture the follow-on tasks:
* Change Ambari to accept quoted parser groups
* Decouple the ParserBolt from the Parse execution logic
* Allow the option for intermediate kafka topics to be removed

Did I miss anything?


---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-17 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/1099
  
@justinleet I am fine with that as a follow on, I would like the task or 
issue created. 


---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-17 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/1099
  
@ottobackwards I definitely think it's something we should consider adding 
and that's worthwhile.  I feel it's more of a follow-on task, and the follow-on 
is to allow for optionally direct routing. i.e. this PR generalizes what's 
already there in the ParserBolt and a follow-on is to provide that option to do 
direct routing in the aggregate case.

Do you feel that we need to provide that option in this PR, or is it fine 
to be left as a follow-on?


---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-17 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/1099
  
Ok @justinleet thanks for the diagram.  That really helps.  I did not see 
in the code how we were sending out to the sensor topic and then into the 
sensor, I though the bolt was just calling the parser ( which makes the sensor 
specific topic not make sense ).  I think that this flow is a valid case, one 
where the data could be coming from either an aggregated flow into the topic 
*or* straight to the topic.  My question looking at this, is why ( esp when 
people are asking about reduction of kafka overhead ) can't we have the 
*option* to eliminate the sensor topic completely?


---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-17 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/1099
  
@ottobackwards I agree, our bolts tend to do the same magic set of 
incantations to set up stellar.  It'd be better to either try to infer that 
initialization where possible or to externalize that logic.  Agreed that it's 
not this PR that should do that, but I would expect that we should probably 
consider a separate JIRA to the effect:
* Decouple ParserBolt the parserbolt initialization and execution logic 
into a separate component from the Storm Bolt infrastructure (this will aid in 
NiFi or spark integration).

Thoughts?  If we agree, then I'll create the JIRA.


---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-16 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/1099
  
@justinleet the main things I saw that I would think of cutting down, or I 
though about looking into ( the idea may turn out to be bad ) are places where 
the bolt 'knows' a lot of weird or complicated initialization logic around the 
configurations or classes it uses, like what we do initializing Stellar, or in 
getComponentConfiguration.

I'm ok with a follow on,  I don't think this PR is creating that issue.



---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-16 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/1099
  
@ottobackwards Is there anything we want to do in this PR about the 
ParserBolt? I agree that it's getting unwieldy, and if there's easy wins it's 
not a bad opportunity to fix it up.


---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-14 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/1099
  
I have been on vacation, but will be reviewing Monday and Tuesday.  Please 
do not commit


---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-13 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/1099
  
This looks good to me, I'm +1 by inspection, but I want to make sure enough 
time has passed so enough people can look at it.  I'll hold my +1 until EOD 
monday.


---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-11 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/1099
  
I ran through 
https://github.com/apache/metron/blob/master/use-cases/parser_chaining/README.md,
 but spinning it up as

```
$METRON_HOME/bin/start_parser_topology.sh -k $BROKERLIST -z $ZOOKEEPER -s 
cisco-6-302,cisco-5-304,pix_syslog_router
```

This results in indices (noting that I'd pushed the data to the topic a 
couple times, so the numbers won't line up directly if you run it):
```
[root@node1 ~]# curl -X GET "localhost:9200/_cat/indices?v"
health status index   uuid   pri 
rep docs.count docs.deleted store.size pri.store.size
yellow open   cisco-5-304_index_2018.07.11.18 z-MyPPEJSN6cur7FJbFORA   5   
1 450340.8kb340.8kb
yellow open   cisco-6-302_index_2018.07.11.18 vAFrok4sRpW49_RYt9RqbQ   5   
16600  1.4mb  1.4mb
...



---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-10 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/1099
  
Kicking Travis


---


[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm

2018-07-10 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/1099
  
As a heads up, this will have conflicts with 
https://github.com/apache/metron/pull/1084


---