[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/449 Yep, looks good, got my +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user nickwallen commented on the issue: https://github.com/apache/incubator-metron/pull/449 I think this is ready for final review. Come one, come all. Would love to get this closed out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/449 Looks great, quick question. If I submit a profile that looks like: ``` { "profiles": [ { "profile": "test", "foreach": "'global'", "onlyif": "source.type == 'bro'", "init":{ "count": "0" }, "update": { "count": "count + 1" }, "result": { "profile": "count", "triage": "{ 'blah' : count, 'zork' : 'zork'}" } } ] } ``` Will I get messages in kafka that look like: ``` {"period.start":148823382 ,"period":24803897 ,"profile":"test" ,"blah":161 ,"zork":"zork" ,"period.end":148823388 ,"is_alert":"true" ,"entity":"global" ,"timestamp":1488233841600 ,"source.type":"profiler" } ``` I think that's an important aspect as people will probably want to submit multiple things to further triage or give context since they cannot send along our summary objects. Also, if someone tries to submit something that JSON can't handle (like a stats object), will it get dropped or will an exception occur? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user nickwallen commented on the issue: https://github.com/apache/incubator-metron/pull/449 I made the required changes and updated the PR description to reflect that. Please take a look and review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user nickwallen commented on the issue: https://github.com/apache/incubator-metron/pull/449 > Am I missing something? Is there a way to define the topic dynamically while using the BulkMessageWriterBolt & KafkaMessageWriter classes unchanged? Created [METRON-738](https://issues.apache.org/jira/browse/METRON-738) to track this 'wish list' enhancement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/449 Ok, I'm ok with that. The writer should be more adaptable here and that shouldn't hold your PR up, agreed. Can we make it the enrichment queue, though? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user nickwallen commented on the issue: https://github.com/apache/incubator-metron/pull/449 > Ninja Edit: I think the kafka topic written to should be pulled from zookeeper... @cestella I remember now why I settled on making the topic name a static configuration from the topology properties. Our `BulkMessageWriterBolt` & `KafkaMessageWriter` classes seem to have been designed to only work with statically defined topic names. I would have to change those core classes, if I want to set the topic name dynamically from Zookeeper. And I tend to get smacked around when coming close to core classes :) Am I missing something? Is there a way to define the topic dynamically while using the `BulkMessageWriterBolt` & `KafkaMessageWriter` classes unchanged? If not, I'd prefer to keep the topic name as a static property defined in the topology properties, at least for this PR. My second choice would be to open a completely separate PR to update those core classes to accept dynamic topic names. It does not seem like a complex change, but those are core classes and will likely cause some debate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/449 @nickwallen yes, I was --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user nickwallen commented on the issue: https://github.com/apache/incubator-metron/pull/449 > Be backwards compatible with the current syntax. This proposed syntax isn't directly backwards compatible. Were you assuming we would do a translation of sorts? Like translate this... ``` { "profiles": [ { ... "result": "stats" } ] } ``` To this... ``` { "profiles": [ { ... "result": { "profile" : "stats", } } ] } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user ottobackwards commented on the issue: https://github.com/apache/incubator-metron/pull/449 I'm sorry, no Nick I don't --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user nickwallen commented on the issue: https://github.com/apache/incubator-metron/pull/449 @cestella Thanks for laying out your other ideas for Medium and Longer term. We can open those up for community debate on separate JIRAs, but it was very worthwhile for you to begin laying those out here. They provided some good context. @ottobackwards Not sure which part your comment was in reference to. Do you have any concerns specifically with the "Near Term" items that I will tackle as part of this PR? I'd like to make sure we reach consensus on those first. We can debate the other items later (if that works for you.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user nickwallen commented on the issue: https://github.com/apache/incubator-metron/pull/449 **Near Term:** I like it. I think we've converged on "near term". Yay! I will tackle these items as part of this PR. > **Longer Term:** ... In this world, the profiler is simple, it just writes messages out to the indexing topology. Great! Yep, this is what I was hoping to do when we first starting batting this idea around. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user mmiklavc commented on the issue: https://github.com/apache/incubator-metron/pull/449 @nickwallen I agree with you about exposing implementation details via our API. I think it better to name things according to feature/function, not the underlying implementation. @cestella I like the summary you provide for short/medium/long term solutions to this problem. I especially like the idea of being able to customize the writers used for the various endpoints. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user ottobackwards commented on the issue: https://github.com/apache/incubator-metron/pull/449 I think routing from stellar within routing in storm is confusing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/449 @nickwallen Yep, I see what you mean. I think we had different interpretations of "user focused." I think where I landed here on what I'd like can be broken down into a near-term, medium-term and a long-term vision for the profiler. **Near Term** In the near term, for this PR, we need the ability to: * Write from the profiler to kafka so we can triage the output of the profiler * Adjust the representations of the data the profiler writes based on the places it writes to * For HBase, any kryo serialized object * For Kafka, any fundamental structure (e.g. number, string) or Map of fundamental structures. * Be backwards compatible with the current syntax. Strictly speaking, I'll adopt your approach here of separating representation by its destination, where that destination is restricted to the possible destinations inside of Metron's current architecture. So called *destination-focused*, rather than separating representation by its storage mechanism, where those storage mechanisms are restricted to the possible mechanisms that we support in Metron. So called *writer-focused* In the following examples, every tick, the following happens: * 1 message is written to HBase with the stats function * 1 message is written to Kafka with a message that looks like this: ``` { 'profile' : 'test', 'entity' : 'global' 'mean' : , 'stddev' : } ``` This looks like: ``` { "profiles": [ { "profile": "test", "foreach": "'global'", "onlyif": "source.type == 'squid'", "update": { "stats": "STATS_ADD(stats, LENGTH(url))" }, "result": { "profile" : "stats", "triage" : "{ 'mean' : STATS_MEAN(stats), 'stddev' : STATS_SD(stats) }" } } ] } ``` **Medium Term** This gets expanded to allow for multiple elements written per profile. In the following examples, every tick, the following happens: * 2 message is written to HBase for profile `test` * entity: `global:stats` * entity: `global:count` * 2 message is written to Kafka with a message that looks like this: ``` { 'profile' : 'test', 'entity' : 'global' 'result_type' : 'baseline_stats' 'mean' : , 'stddev' : } ``` and ``` { 'profile' : 'test', 'entity' : 'global' 'result_type' : 'kurtosis' 'kurtosis' : } ``` This looks like: ``` { "profiles": [ { "profile": "test", "foreach": "'global'", "onlyif": "source.type == 'squid'", "update": { "stats": "STATS_ADD(stats, LENGTH(url))" }, "result": { "profile" : { "stats" : "stats", "count" : "STATS_COUNT(stats)" "triage" : { "baseline_stats" : "{ 'mean' : STATS_MEAN(stats), 'stddev' : STATS_SD(stats) }", "kurtosis" : "STATS_KURTOSIS(stats)" } } } ] } ``` **Longer Term** This is where, in my mind, the writer-focused morphs into 'writer configuration' focused, which is to say, not just the transport, but also the destination. In this world, we can directly associate the representation of the things we're writing from the profiler with the destination. Our point of configuration for new writers in Metron is the `MessageWriter` and `BulkMessageWriter` interfaces. We recently pulled out the configs into their own indexing configs, keyed by writer (kafka, elasticsearch, etc). Imagine that the writers are configured entirely there and that it's not writer-oriented, but use-case oriented. Instead of what we have now in the indexing config, we can make it: ``` { "writers" : { "kafka" : { "batchSize" : 1, "enabled" : true }, "hbase_profile" : { "batchSize" : 5, "enabled" : true } }, "endpoints" : { "triage" : { "writer" : "kafka", "queue" : "enrichments" }, "profile" : { "writer" : "hbase_profile", "table" : "profile:P" } } } ``` here, the two forms merge into one because we can represent using our core abstractions the capability-driven design that you are focused on, @nickwallen . In this world, the profiler is simple, it just writes messages out to the indexing topology. The structure looks of the tuple looks like: * message * endpoint The indexing topology
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user nickwallen commented on the issue: https://github.com/apache/incubator-metron/pull/449 I think what I mean is a little different (but maybe I've missed your point.) For example, when @james-sirota first reviewed this PR he was confused why we would send data to Kafka. He thought it was a replacement for HBase, rather than an addition to that. His mistake was totally understandable. Using terms like 'kafka' and 'hbase', forces the user to under why they would want to send profile data to HBase and why they would want to send profile data to Kafka. It forces the user to know the implementation. But I am saying that users should not need to know the implementation details of the Profiler. They should just tell us if they want the profile data stored for later and whether they want to triage the data from the Profiler. So I am suggesting that to be "user focused" we use terms that focus on the functionality from the user's perspective, not terms based on how we've implemented the Profiler. A user would tell us to 'triage' or not; they would not tell us 'kafka' or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/449 Well, let me try to make the case that this is user-focused while being aware of the limitations of implementation. ;) The main aim for adaptability is to allow multiple representations to be stored in multiple datastores. The representation has a 1 to n relationship with the data store, (maybe "writer" can be a list of writers or a single writer?). This puts the top-level citizen as the representation associated with a naming about what it is intended to be used for. Put simply, it's not that kafka can only handle JSON blobs, but rather it's that we need the kurtosis for the `kurtosis_triage` rule (by the way, `kurtosis_triage` should be part of the message constructed along with the `source.type` of `profiler`...maybe `profile.type`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user nickwallen commented on the issue: https://github.com/apache/incubator-metron/pull/449 I can see the value of the additional flexibility here. Of course, the flip side is that I am always worried about too much complexity, as you probably guessed. I don't know if your proposal gets us all the way there in regards to user-focused over implementation-focused terminology. Personally, I'd like to see profile definitions that are portable and work no matter where the Profiler is configured to persist data. But maybe that is a pipe dream. Your point on backwards compatibility is a good one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/449 @nickwallen Hmm, how about the following, with `profile` and `triage` here being entirely user specifiable to break up the various ways you write: ``` { "profiles": [ { "profile": "test", "foreach": "'global'", "onlyif": "source.type == 'squid'", "update": { "stats": "STATS_ADD(stats, LENGTH(url))" }, "result": { "profile" : { "output" : "stats", "if" : "STATS_COUNT(stats) > 0)", "writer" : "hbase" }, "triage" : { "output" : "{ 'mean' : STATS_MEAN(stats), 'stddev' : STATS_SD(stats) }", "if" : "STATS_COUNT(stats) > 0 && STATS_SD(stats) > 10", "writer" : "kafka" } } } ] } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user nickwallen commented on the issue: https://github.com/apache/incubator-metron/pull/449 Still thinking through the implications, but it looks pretty clean and intuitive this way (at least more intuitive). ``` { "profiles": [ { "profile": "test", "foreach": "'global'", "onlyif": "source.type == 'squid'", "update": { "stats": "STATS_ADD(stats, LENGTH(url))" }, "result": { "profile" : "stats", "triage" : "{ 'mean' : 'STATS_MEAN(stats)', 'stddev' : 'STATS_SD(stats)' }" } } ] } ``` Maybe even get rid of "result" altogether? ``` { "profiles": [ { "profile": "test", "foreach": "'global'", "onlyif": "source.type == 'squid'", "update": { "stats": "STATS_ADD(stats, LENGTH(url))" }, "profile" : "stats", "triage" : "{ 'mean' : 'STATS_MEAN(stats)', 'stddev' : 'STATS_SD(stats)' }" } ] } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user nickwallen commented on the issue: https://github.com/apache/incubator-metron/pull/449 Outside the scope of your "multiple result" idea that I need to think more on... The one thing I did not like about both approaches is the terminology. Kind of silly, but important for usability. What I mean is the use of the terms 'hbase' and 'kafka'. I think it doesn't make clear to the user why they would want to choose one over the other. I would really like to find non-implementation-specific terms that describe the function of each better. Random thoughts... * 'kafka' -> 'triage' or ?? * 'hbase' -> 'store' or 'profile' or ?? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/449 This probably motivates us to allow `OUTLIER_MAD_SCORE` to accept median and median absolute deviation as parameters rather than just the State object as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user nickwallen commented on the issue: https://github.com/apache/incubator-metron/pull/449 > You seem to be sending every profile into kafka, not just the configured ones Just for clarity, you can define the destination for each profile. It defaults to a `"destination" : ["hbase", "kafka"]`. For example, if I only wanted to send to HBase. ``` { "profile": "profile-one-destination", "foreach": "ip_src_addr", "init": { "x": "0" }, "update": { "x": "x + 1" }, "result": "x", "destination": ["hbase"] } ``` But that is just a side point. Your idea is really interesting. We've talked before about having multiple result values, which I think is super useful. I'll think on this a bit. Thanks for the feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/449 Also, I will point out, this sets us up architecturally in the future to pull writer configs from zookeeper and support a series of other writers for the output of the profiler. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #449: METRON-701 Triage Metrics Produced by the Profi...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/449 So, I think this is an interesting approach. My issues with it are: * You seem to be sending every profile into kafka, not just the configured ones. * You seem to be assuming that one value only is being sent into the telemetry and it's the value that you store in HBase I'd recommend, rather, that you make the `result` field more complex by making it a map where the key is the source (e.g. "hbase" or "kafka"). This allows you to separate the storage structure by storage medium. You may, for instance, want to *STORE* a stats object in Hbase, but only send along the mean and standard deviation. Also, I'd recommend allowing `result`to be either a string (which would presume only hbase is supported) or a Map, which would explicitly specify the structure for just the sources you want to write to. Here's a worked example config for maximum clarity (!): ``` { "profiles": [ { "profile": "test", "foreach": "'global'", "onlyif": "source.type == 'squid'", "init":{ "stats": "STATS_INIT()" }, "update": { "stats": "STATS_ADD(stats, LENGTH(url))" }, "result": { "hbase" : "stats", "kafka" : "{ 'mean' : STATS_MEAN(stats), 'stddev' : STATS_SD(stats) }" } } ] } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---