[DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Casey Stella
Hi All, Last week, I encountered some weirdness in the Enrichment topology. Doing some somewhat high-latency enrichment work, I noticed that at some point, data stopped flowing through the enrichment topology. I tracked down the problem to the join bolt. For those who aren't aware, we do a

[GitHub] metron issue #567: METRON-891: Changed Kafka API to Create a KafkaConsumer P...

2017-05-16 Thread jjmeyer0
Github user jjmeyer0 commented on the issue: https://github.com/apache/metron/pull/567 @merrimanr @justinleet is there anything else --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] metron pull request #567: METRON-891: Changed Kafka API to Create a KafkaCon...

2017-05-16 Thread jjmeyer0
GitHub user jjmeyer0 reopened a pull request: https://github.com/apache/metron/pull/567 METRON-891: Changed Kafka API to Create a KafkaConsumer Per Request ## Contributor Comments [Please place any comments here. A description of the problem/enhancement, how to reproduce the

[GitHub] metron issue #567: METRON-891: Changed Kafka API to Create a KafkaConsumer P...

2017-05-16 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/567 Looks good to me. Thanks @jjmeyer0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] metron issue #574: METRON-934: Component and task id are missing in the inde...

2017-05-16 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/574 Wow, great addition. We have definitely not tested the HDFS Writer sufficiently. ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] metron issue #584: METRON-950: Migrate storm-kafka-client to 1.1

2017-05-16 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/584 Same here, +1. Thanks a lot for the contribution, this makes a big difference. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] metron issue #531: METRON-854 create dhcp dump parser

2017-05-16 Thread ctramnitz
Github user ctramnitz commented on the issue: https://github.com/apache/metron/pull/531 dhcp also carries a client-id that is often (but not always and not reliably) the hostname. While not reliable, this is intersting information, especially since you don't have to perform

[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread justinleet
Github user justinleet commented on a diff in the pull request: https://github.com/apache/metron/pull/585#discussion_r116741150 --- Diff: metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/PartitionHDFSWriter.java --- @@ -102,14 +106,43 @@ public void

[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread justinleet
Github user justinleet commented on a diff in the pull request: https://github.com/apache/metron/pull/585#discussion_r116742092 --- Diff: metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/deserializer/KeyValueDeserializer.java --- @@ -28,6 +26,17 @@

[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread justinleet
Github user justinleet commented on a diff in the pull request: https://github.com/apache/metron/pull/585#discussion_r116741758 --- Diff: metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/PartitionHDFSWriter.java --- @@ -183,14 +219,14 @@ private void

[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread justinleet
Github user justinleet commented on a diff in the pull request: https://github.com/apache/metron/pull/585#discussion_r116739444 --- Diff: metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/PartitionHDFSWriter.java --- @@ -102,14 +106,43 @@ public void

[GitHub] metron pull request #574: METRON-934: Component and task id are missing in t...

2017-05-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/metron/pull/574 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Casey Stella
I do want to say here, that I don't mean to sound the alarm and say that everything is broken. I would not characterize the topology as "broken" architecturally, but rather the lack of reporting when things go pear-shaped is a bug in implementation. With logging and documentation about the knobs

[GitHub] metron issue #531: METRON-854 create dhcp dump parser

2017-05-16 Thread JonZeolla
Github user JonZeolla commented on the issue: https://github.com/apache/metron/pull/531 Is there enough interest for me to pursue support of this in #586? I could probably throw that together today. --- If your project is set up for it, you can reply to this email and have your

Re: we currently have 31 PR’s that are not landed

2017-05-16 Thread zeo...@gmail.com
Assuming the unincubating process is almost completed (I don't know if that's true or not), I think there are some simple, obvious priorities based on our pending 0.4.0 release. Things like METRON-833, METRON-819, and METRON-953 should probably get finalized and merged in asap. Also, we have

[GitHub] metron pull request #584: METRON-950: Migrate storm-kafka-client to 1.1

2017-05-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/metron/pull/584 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Casey Stella
The problem is that an enrichment type won't necessarily have a fixed performance characteristic. Take stellar enrichments, for instance. Doing a HBase call for one sensor vs doing simple string munging will have vastly differing performance. Both of them are functioning within the stellar

[GitHub] metron pull request #584: METRON-950: Migrate storm-kafka-client to 1.1

2017-05-16 Thread cestella
GitHub user cestella reopened a pull request: https://github.com/apache/metron/pull/584 METRON-950: Migrate storm-kafka-client to 1.1 ## Contributor Comments There are MAJOR performance issues with the storm-kafka-client. Throughput is roughly an order of magnitude faster in

[GitHub] metron pull request #584: METRON-950: Migrate storm-kafka-client to 1.1

2017-05-16 Thread cestella
Github user cestella closed the pull request at: https://github.com/apache/metron/pull/584 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] metron issue #584: METRON-950: Migrate storm-kafka-client to 1.1

2017-05-16 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/584 @ctramnitz While we don't really test on 1.1 yet, this shouldn't have an impact on the topologies running there. --- If your project is set up for it, you can reply to this email and have your

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Nick Allen
I would like to see us just migrate wholly to Stellar enrichments and remove the separate HBase and Geo enrichment bolts from the Enrichment topology. Stellar provides a user with much greater flexibility than the existing HBase and Geo enrichment bolts. A side effect of this would be to greatly

[GitHub] metron issue #531: METRON-854 create dhcp dump parser

2017-05-16 Thread simonellistonball
Github user simonellistonball commented on the issue: https://github.com/apache/metron/pull/531 I'd love to see your bro PR expand for this @JonZeolla DHCP is a pretty key source, and Bro is a great way to extract it from taps. Let me know if there is anything I can do to help. ---

[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread cestella
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/585#discussion_r116791055 --- Diff: metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/deserializer/KeyValueDeserializer.java --- @@ -28,6 +26,17 @@

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Simon Elliston Ball
Nick, I’d tend to agree with you there. How about: If an enrichment fails / effectively times out, the join bolt emits the message before cache eviction (as Nick’s point 2), but also adds a field stub to indicate failed enrichment. This is then an indicator to an operator or investigator as

[GitHub] metron issue #589: METRON-955: Make the default sync policy for HDFS Writer ...

2017-05-16 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/589 I'm +1 by inspection, thanks for the contribution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread justinleet
Github user justinleet commented on a diff in the pull request: https://github.com/apache/metron/pull/585#discussion_r116795288 --- Diff: metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/deserializer/KeyValueDeserializer.java --- @@ -28,6 +26,17 @@

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread zeo...@gmail.com
The field stub also gives something that can potentially be used in the error dashboard (or similar) to graph, allowing failed enrichments to "shout" louder to the end user. Jon On Tue, May 16, 2017 at 12:34 PM Nick Allen wrote: > > but also adds a field stub to indicate

[GitHub] metron issue #585: METRON-936: Fixes to pcap for performance and testing

2017-05-16 Thread mmiklavc
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/585 I found some additional issues with error handling in the HDFSWriterCallback. So I fixed this to throw an IllegalArgumentException when the key is null, but that revealed further problems in our

[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread cestella
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/585#discussion_r116791165 --- Diff: metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/PartitionHDFSWriter.java --- @@ -183,14 +219,14 @@ private void

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Nick Allen
Ah, yes. Makes sense and I can see the value in the parallelism that the split/join provides. Personally, I would like to see the code do the following. (1) Scream and shout when something in the cache expires. We have to make sure that it is blatantly obvious to a user what happened. We also

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Otto Fowler
If we are timing out things from the cache, we have that latency already On May 16, 2017 at 12:09:32, Casey Stella (ceste...@gmail.com) wrote: We could definitely parallelize within the bolt, but you're right, it does break the storm model. I also like making things other people's problems

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Nick Allen
> but also adds a field stub to indicate failed enrichment. This is then an indicator to an operator or investigator as well that something is missing, and could drive things like replay of the message to retrospectively enrich when things have calmed down. Yes, I like the idea of a "field stub".

[GitHub] metron issue #567: METRON-891: Changed Kafka API to Create a KafkaConsumer P...

2017-05-16 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/567 I'm set, +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] metron issue #588: METRON-954: Create ability to change output topic of pars...

2017-05-16 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/588 Can we add this option to the README? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] metron pull request #588: METRON-954: Create ability to change output topic ...

2017-05-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/metron/pull/588 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] metron issue #531: METRON-854 create dhcp dump parser

2017-05-16 Thread basvdl
Github user basvdl commented on the issue: https://github.com/apache/metron/pull/531 @nickwallen sometimes we are not able to grep DNS events from the customer server. In these cases we use DHCPDump. I've to admit, Bro is new to me, but it looks promising. If this can

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Casey Stella
We still do use split/join even within stellar enrichments. Take for instance the following enrichment: { "enrichment" : { "fieldMap" : { "stellar" : { "config" : { "parallel-task-1" : { "my_field" : "PROFILE_GET()" },

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Otto Fowler
I am not sure that you can say we wouldn’t ‘need’ it. But we would not ‘have’ it rather. On May 16, 2017 at 11:59:42, Nick Allen (n...@nickallen.org) wrote: I would like to see us just migrate wholly to Stellar enrichments and remove the separate HBase and Geo enrichment bolts from the

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Simon Elliston Ball
Would you then parallelise within Stellar to handle things like multiple lookups? This feels like it would be breaking the storm model somewhat, and could lead to bad things with threads for example. Or would you think of doing something like the grouping Stellar uses today to parallelise

[GitHub] metron issue #584: METRON-950: Migrate storm-kafka-client to 1.1

2017-05-16 Thread ctramnitz
Github user ctramnitz commented on the issue: https://github.com/apache/metron/pull/584 Does this change have any impact on using Storm 1.1 (i.e. from HDP 2.6)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

we currently have 31 PR’s that are not landed

2017-05-16 Thread Otto Fowler
https://github.com/apache/metron/pulls This seems a little large given that I *think* we have been at around 19 or so consistently.

[GitHub] metron issue #574: METRON-934: Component and task id are missing in the inde...

2017-05-16 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/574 +1 by inspection --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] metron issue #584: METRON-950: Migrate storm-kafka-client to 1.1

2017-05-16 Thread mmiklavc
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/584 +1. Data flowed through to ES as expected, and I was able to spin up pcap. Along with @justinleet having tested the other topologies, I'm happy with the results. Great work @cestella! --- If your

[GitHub] metron issue #567: METRON-891: Changed Kafka API to Create a KafkaConsumer P...

2017-05-16 Thread jjmeyer0
Github user jjmeyer0 commented on the issue: https://github.com/apache/metron/pull/567 @merrimanr @justinleet Accidentally clicked close/comment. Sorry about that. Anyway, does this look good to you all now that the licenses are fixed? --- If your project is set up for it, you can

[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/585#discussion_r116848618 --- Diff: metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/PartitionHDFSWriter.java --- @@ -183,14 +219,14 @@ private void

[GitHub] metron issue #585: METRON-936: Fixes to pcap for performance and testing

2017-05-16 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/585 Regarding the test data, it's not a sequence file in the format suitable for reading in PcapInspector. Depending on the test case, we construct the appropriate kafka representation. The value is

[GitHub] metron issue #588: METRON-954: Create ability to change output topic of pars...

2017-05-16 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/588 I'm +1. Thanks for adding this. @ottobackwards You need to see anything on this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] metron issue #585: METRON-936: Fixes to pcap for performance and testing

2017-05-16 Thread mmiklavc
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/585 I see that. re: keys and methods for retrieving and saving them. I'll save refactoring and cleaning that up to a separate PR. --- If your project is set up for it, you can reply to this email and

[GitHub] metron issue #585: METRON-936: Fixes to pcap for performance and testing

2017-05-16 Thread cestella
Github user cestella commented on the issue: https://github.com/apache/metron/pull/585 @mmiklavc it depends on which test case you're talking about. We have two modes of operation in the pcap topology and 2 test cases in the integration test and these are defined by the flux

[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread justinleet
Github user justinleet commented on a diff in the pull request: https://github.com/apache/metron/pull/585#discussion_r116823693 --- Diff: metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/HDFSWriterCallback.java --- @@ -116,7 +117,11 @@ public

[GitHub] metron issue #531: METRON-854 create dhcp dump parser

2017-05-16 Thread basvdl
Github user basvdl commented on the issue: https://github.com/apache/metron/pull/531 @nickwallen I agree that relying on a modified source is not ideal. However with bro I'm not sure if you have all the functionality people wish for. If i'm correctly informed by the docs, bro

[GitHub] metron issue #531: METRON-854 create dhcp dump parser

2017-05-16 Thread nickwallen
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/531 > So I would still like to discuss the opportunities of getting the original DHCPDump log format into Metron via NiFi. Sure, I think that sounds like another reasonable approach. --- If

[GitHub] metron issue #581: METRON-844: Install Metron Management UI with Ambari MPac...

2017-05-16 Thread merrimanr
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/581 The Node.js repository setup has been moved outside of the MPack. In full dev this is now automated through the ambari-common Ansible task, which also handles other Ambari setup tasks. I

[GitHub] metron issue #531: METRON-854 create dhcp dump parser

2017-05-16 Thread nickwallen
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/531 > If i'm correctly informed by the docs, bro will give you the IP and MAC relation, which differs from DHCPDump which captures IP and Hostname relations. Giving context to an IP by adding the

[GitHub] metron issue #584: METRON-950: Migrate storm-kafka-client to 1.1

2017-05-16 Thread justinleet
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/584 I ran up the main topologies last night and everything seemed to correlate correctly and ran without issue. Hadn't run up pcap yet, but it look like @mmiklavc is working on validating it. Given

[GitHub] metron issue #531: METRON-854 create dhcp dump parser

2017-05-16 Thread JonZeolla
Github user JonZeolla commented on the issue: https://github.com/apache/metron/pull/531 With bro there's also an option to [do a lookup](https://github.com/bro/bro/blob/master/src/bro.bif#L3431-L3458) and [add

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Christian Tramnitz
I’m glad you bring this up. This is a huge architectural difference from the original OpenSOC topology and one that we have been warned to take back then. To be perfectly honest, I don’t see the big perfomance improvement from parallel processing. If a specific enrichment is a little more i/o

Re: [DISCUSS] Enrichment Split/Join issues

2017-05-16 Thread Casey Stella
We could definitely parallelize within the bolt, but you're right, it does break the storm model. I also like making things other people's problems (it's called working "smart" not "hard", right? not laziness, surely. ;), but yeah, using windowing for this seems like it might introduce some

[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/585#discussion_r116865536 --- Diff: metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/PartitionHDFSWriter.java --- @@ -102,14 +106,43 @@ public void

[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/585#discussion_r116866077 --- Diff: metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/HDFSWriterCallback.java --- @@ -116,7 +117,11 @@ public

[GitHub] metron pull request #589: METRON-955: Make the default sync policy for HDFS ...

2017-05-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/metron/pull/589 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] metron issue #586: METRON-508 Expand Elasticsearch templates to support the ...

2017-05-16 Thread JonZeolla
Github user JonZeolla commented on the issue: https://github.com/apache/metron/pull/586 Per @simonellistonball 's comments in #531 I added initial support for the native way that Bro handles tracking DHCP's Client ID field and updated the above instructions appropriately. --- If

[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/585#discussion_r116883407 --- Diff: metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/deserializer/KeyValueDeserializer.java --- @@ -28,6 +26,17 @@