[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/1099 Sure, actually I'll do a discuss thread when this all goes through. That way I can try again to get @cestella to comment ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/1099 Happy to help explain it, and it's really helpful to get feedback on it because it led to improved docs, so hopefully it makes sure things are clearer to anyone looking at this later. I created tickets for: - [Management UI changes for sensor aggregation](https://issues.apache.org/jira/browse/METRON-1678) - [REST changes to support parser aggregation](https://issues.apache.org/jira/browse/METRON-1679) - [Allow the option for intermediate kafka topics to be removed in aggregated sensors](https://issues.apache.org/jira/browse/METRON-1682) - [Allow Ambari to accept quoted parser groups](https://issues.apache.org/jira/browse/METRON-1680) - [Decouple the ParserBolt from the Parse execution logic](https://issues.apache.org/jira/browse/METRON-1681) I held off on creating a ticket for the routing process applying a transform. @ottobackwards Would you mind creating that, to make sure the use case and examples or assumption, etc. are what you expect? ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/1099 All that being said I am a big +1 on this. Great work @justinleet, thanks for taking the time to work it through my thick skull. ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/1099 A mechanism for the routing process to apply a transform or some such. @cestella may have a better design idea. What I would like us to do is remove the transport from the message where there are common wrappers. Example: All the message types delivered by syslog. The parsers should not have to all parser syslog AND the message. I imagine defining a transform/parser in the router that takes every message and transforms it ( parses syslog fields + structured data if 5424 into meta and MSG to the bytes ) and then passes it on to a parser that only needs to know the message. That way we can have simpler parsers, and even support the same message when transported in different wrappers. We can talk about if this is a function of parser chaining, such that each specialized parser will be second in a chain with the syslog unwrapped being first, or part of routing, but I think it is part of routing personally. ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/1099 @ottobackwards Is there anything else we want as follow-on? Assuming we're good with this where its at (or at least close to), I'll go ahead and create those tickets. ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/1099 @cestella I think we want at least one more to capture making this available in the management UI (I'd argue that should inform any REST API task, so I'm not sure if we want that also created as a ticket). ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user cestella commented on the issue: https://github.com/apache/metron/pull/1099 Ok, so I want to capture the follow-on tasks: * Change Ambari to accept quoted parser groups * Decouple the ParserBolt from the Parse execution logic * Allow the option for intermediate kafka topics to be removed Did I miss anything? ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/1099 @justinleet I am fine with that as a follow on, I would like the task or issue created. ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/1099 @ottobackwards I definitely think it's something we should consider adding and that's worthwhile. I feel it's more of a follow-on task, and the follow-on is to allow for optionally direct routing. i.e. this PR generalizes what's already there in the ParserBolt and a follow-on is to provide that option to do direct routing in the aggregate case. Do you feel that we need to provide that option in this PR, or is it fine to be left as a follow-on? ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/1099 Ok @justinleet thanks for the diagram. That really helps. I did not see in the code how we were sending out to the sensor topic and then into the sensor, I though the bolt was just calling the parser ( which makes the sensor specific topic not make sense ). I think that this flow is a valid case, one where the data could be coming from either an aggregated flow into the topic *or* straight to the topic. My question looking at this, is why ( esp when people are asking about reduction of kafka overhead ) can't we have the *option* to eliminate the sensor topic completely? ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user cestella commented on the issue: https://github.com/apache/metron/pull/1099 @ottobackwards I agree, our bolts tend to do the same magic set of incantations to set up stellar. It'd be better to either try to infer that initialization where possible or to externalize that logic. Agreed that it's not this PR that should do that, but I would expect that we should probably consider a separate JIRA to the effect: * Decouple ParserBolt the parserbolt initialization and execution logic into a separate component from the Storm Bolt infrastructure (this will aid in NiFi or spark integration). Thoughts? If we agree, then I'll create the JIRA. ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/1099 @justinleet the main things I saw that I would think of cutting down, or I though about looking into ( the idea may turn out to be bad ) are places where the bolt 'knows' a lot of weird or complicated initialization logic around the configurations or classes it uses, like what we do initializing Stellar, or in getComponentConfiguration. I'm ok with a follow on, I don't think this PR is creating that issue. ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/1099 @ottobackwards Is there anything we want to do in this PR about the ParserBolt? I agree that it's getting unwieldy, and if there's easy wins it's not a bad opportunity to fix it up. ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/1099 I have been on vacation, but will be reviewing Monday and Tuesday. Please do not commit ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user cestella commented on the issue: https://github.com/apache/metron/pull/1099 This looks good to me, I'm +1 by inspection, but I want to make sure enough time has passed so enough people can look at it. I'll hold my +1 until EOD monday. ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/1099 I ran through https://github.com/apache/metron/blob/master/use-cases/parser_chaining/README.md, but spinning it up as ``` $METRON_HOME/bin/start_parser_topology.sh -k $BROKERLIST -z $ZOOKEEPER -s cisco-6-302,cisco-5-304,pix_syslog_router ``` This results in indices (noting that I'd pushed the data to the topic a couple times, so the numbers won't line up directly if you run it): ``` [root@node1 ~]# curl -X GET "localhost:9200/_cat/indices?v" health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open cisco-5-304_index_2018.07.11.18 z-MyPPEJSN6cur7FJbFORA 5 1 450340.8kb340.8kb yellow open cisco-6-302_index_2018.07.11.18 vAFrok4sRpW49_RYt9RqbQ 5 16600 1.4mb 1.4mb ... ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/1099 Kicking Travis ---
[GitHub] metron issue #1099: METRON-1657: Parser aggregation in storm
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/1099 As a heads up, this will have conflicts with https://github.com/apache/metron/pull/1084 ---