[GitHub] metron issue #966: METRON-1493 Unhelpful Error Message When Assignment Expre...

2018-03-16 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/966
  
+1 by inspection.


---


[GitHub] metron issue #914: METRON-1397 Support for JSON Path and complex documents i...

2018-03-15 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/914
  
I love this, sorry I didn't review it sooner @ottobackwards +1 by 
inspection.  This is great.


---


[GitHub] metron issue #959: METRON-1485 Upgrade vagrant for dev environments

2018-03-15 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/959
  
+1 by inspection, thanks Jon!


---


[GitHub] metron issue #964: METRON-1491: The indexing topology restart logic is wrong

2018-03-15 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/964
  
Ok, created 
[METRON-1492](https://issues.apache.org/jira/browse/METRON-1492). 


---


[GitHub] metron issue #964: METRON-1491: The indexing topology restart logic is wrong

2018-03-15 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/964
  
@ottobackwards yeah, definitely; I think that's ultimately where we want to 
go.  The first step to that is separating out these functions like I have in 
this PR.  The next is doing the ambari work which will utilize it.  Shall I 
create a follow-on JIRA?


---


[GitHub] metron pull request #964: METRON-1491: The indexing topology restart logic i...

2018-03-15 Thread cestella
GitHub user cestella opened a pull request:

https://github.com/apache/metron/pull/964

METRON-1491: The indexing topology restart logic is wrong

## Contributor Comments
If either topology is down, Ambari shows all of Indexing as dead. Clicking 
start attempts to start them both and fails if either is still running. 
Furthermore, it appears to retry 3 times before finally failing the command.

In order to test this, kill one of the indexing topologies then attempt to 
start indexing from ambari.  Things should go smoothly.  Also, try restarting 
and starting in various states.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cestella/incubator-metron METRON-1491

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/964.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #964


commit 423e9a074492ae2071022b9a4adb347d2a109633
Author: cstella <cestella@...>
Date:   2018-03-15T14:27:09Z

METRON-1491: The indexing topology restart logic is wrong




---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-15 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174787158
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/load/SendToKafka.java
 ---
@@ -0,0 +1,107 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load;
+
+import org.apache.kafka.clients.producer.KafkaProducer;
+import org.apache.kafka.clients.producer.ProducerRecord;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.TimerTask;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Future;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.function.Supplier;
+
+public class SendToKafka extends TimerTask {
--- End diff --

alright, done and done.


---


[GitHub] metron issue #958: METRON-1483: Create a tool to monitor performance of the ...

2018-03-14 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/958
  
Moving this conversation to the top.  I believe I have refactored the 
KafkaProducers appropriately.  Let me know if I missed something, @justinleet 


---


[GitHub] metron pull request #963: METRON-1490: Better error message when user specif...

2018-03-14 Thread cestella
GitHub user cestella opened a pull request:

https://github.com/apache/metron/pull/963

METRON-1490: Better error message when user specifies an enrichment type 
that doesn't exist

## Contributor Comments
If a user specifies an enrichment adapter name that doesn't exist (e.g. 
`hbaseEnrichment` vs `hbaseThreatIntel`), then we NPE rather than express the 
issue in the logs.

To test this, try to create an enrichment with an incorrect name.  For 
instance, an enrichment like so:
```
{
"enrichment": {
"fieldMap": {},
"fieldToTypeMap": {},
"config": {}
},
"threatIntel": {
"fieldMap": {
"hbaseEnrichment": ["ip_src_addr", "ip_dst_addr"]
},
"fieldToTypeMap": {
"ip_src_addr": ["malicious_ip"],
"ip_dst_addr": ["malicious_ip"]
},
"config": {},
"triageConfig": {
"riskLevelRules": [],
"aggregator": "MAX",
"aggregationConfig": {}
}
},
"configuration": {}
}
```
Because there is no enrichment adapter named `hbaseEnrichment` available 
for threat intel (the correct name is `hbaseThreatIntel`), an error should be 
noted in the logs and there should be a sensible message, such as: 
`java.lang.IllegalStateException: Unable to find an adapter for 
hbaseEnrichment, possible adapters are: stellar,hbaseThreatIntel` rather than a 
NPE.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?


 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cestella/incubator-metron METRON-1490

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/963.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #963


commit a0914b2db15ec94f98d2c82173c187bfc101bc04
Author: cstella <cestella@...>
Date:   2018-03-14T21:02:12Z

METRON-1490: Better error message when user specifies an enrichment type 
that doesn't exist




---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-14 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174478516
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/load/SendToKafka.java
 ---
@@ -0,0 +1,107 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load;
+
+import org.apache.kafka.clients.producer.KafkaProducer;
+import org.apache.kafka.clients.producer.ProducerRecord;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.TimerTask;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Future;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.function.Supplier;
+
+public class SendToKafka extends TimerTask {
--- End diff --

Ah, so if I'm following you, making the `ThreadLocal` into 
`ThreadLocal<KafkaProducer<String, String>>` would solve the issue, correct?  
I'd prefer to not change to`SendToKafka<K, V>` because it implies that we might 
be sending something other than String in this implementation.  We would not 
because the message generator sends Strings and I'd prefer to not generalize 
that.


---


[GitHub] metron issue #942: METRON-1461: Modify the MIN, MAX Stellar methods to take ...

2018-03-13 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/942
  
So, I would suggest rather than accepting a list of stats objects, `MAX` 
and `MIN` accept one of the following:
* A StatisticsProvider object
* A list of comparables

Essentially, these two would both function:
* `MAX(STATS_ADD(null, 1, 2, 3))`
* `MAX([1, 2, 3])`


---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174186826
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/sampler/BiasedSampler.java
 ---
@@ -0,0 +1,95 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.sampler;
+
+import com.google.common.base.Splitter;
+import com.google.common.collect.Iterables;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.AbstractMap;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+import java.util.TreeMap;
+
+public class BiasedSampler implements Sampler {
+  TreeMap<Double, Map.Entry<Integer, Integer>> discreteDistribution;
+  public BiasedSampler(List<Map.Entry<Integer, Integer>>  
discreteDistribution, int max) {
+this.discreteDistribution = createDistribution(discreteDistribution, 
max);
+  }
+
+  public static List<Map.Entry<Integer, Integer>> readDistribution(File 
distrFile) throws IOException {
+List<Map.Entry<Integer, Integer>> ret = new ArrayList<>();
+System.out.println("Using biased sampler with the following biases:");
+try(BufferedReader br = new BufferedReader(new FileReader(distrFile))) 
{
+  int sumLeft = 0;
+  int sumRight = 0;
+  for(String line = null;(line = br.readLine()) != null;) {
+if(line.startsWith("#")) {
+  continue;
+}
+Iterable it = Splitter.on(",").split(line.trim());
+int left = Integer.parseInt(Iterables.getFirst(it, null));
+int right = Integer.parseInt(Iterables.getLast(it, null));
+System.out.println("\t" + left + "% of templates will comprise 
roughly " + right + "% of sample output");
+ret.add(new AbstractMap.SimpleEntry<>(left, right));
+sumLeft += left;
+sumRight += right;
+  }
+  if(sumLeft > 100 || sumRight > 100 ) {
+throw new IllegalStateException("Neither columns must sum to 
beyond 100.  " +
+"The first column is the % of templates. " +
+"The second column is the % of the sample that % of 
template occupies.");
+  }
+  else if(sumLeft < 100 && sumRight < 100) {
+int left = 100 - sumLeft;
+int right = 100 - sumRight;
+System.out.println("\t" + left + "% of templates will comprise 
roughly " + right + "% of sample output");
+ret.add(new AbstractMap.SimpleEntry<>(left, right));
+  }
+  return ret;
+}
+  }
+
+  private static TreeMap<Double, Map.Entry<Integer, Integer>>
+ createDistribution(List<Map.Entry<Integer, Integer>>  
discreteDistribution, int max) {
+TreeMap<Double, Map.Entry<Integer, Integer>> ret = new TreeMap<>();
+int from = 0;
+double weight = 0.0d;
+for(Map.Entry<Integer, Integer> kv : discreteDistribution) {
+  double pctVals = kv.getKey()/100.0;
+  int to = from + (int)(max*pctVals);
+  double pctWeight = kv.getValue()/100.0;
+  ret.put(weight, new AbstractMap.SimpleEntry<>(from, to));
+  weight += pctWeight;
+  from = to;
+}
+return ret;
+  }
+
+  @Override
+  public int sample(Random rng, int limit) {
+double weight = rng.nextDouble();
+Map.Entry<Integer, Integer> range = 
discreteDistribution.floorEntry(weight).getValue();
+return rng.nextInt(range.getValue() - range.getKey()) + range.getKey();
--- End diff --

done


---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174186723
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/sampler/BiasedSampler.java
 ---
@@ -0,0 +1,95 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.sampler;
+
+import com.google.common.base.Splitter;
+import com.google.common.collect.Iterables;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.AbstractMap;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+import java.util.TreeMap;
+
+public class BiasedSampler implements Sampler {
+  TreeMap<Double, Map.Entry<Integer, Integer>> discreteDistribution;
+  public BiasedSampler(List<Map.Entry<Integer, Integer>>  
discreteDistribution, int max) {
+this.discreteDistribution = createDistribution(discreteDistribution, 
max);
+  }
+
+  public static List<Map.Entry<Integer, Integer>> readDistribution(File 
distrFile) throws IOException {
+List<Map.Entry<Integer, Integer>> ret = new ArrayList<>();
+System.out.println("Using biased sampler with the following biases:");
+try(BufferedReader br = new BufferedReader(new FileReader(distrFile))) 
{
+  int sumLeft = 0;
+  int sumRight = 0;
+  for(String line = null;(line = br.readLine()) != null;) {
+if(line.startsWith("#")) {
+  continue;
+}
+Iterable it = Splitter.on(",").split(line.trim());
+int left = Integer.parseInt(Iterables.getFirst(it, null));
--- End diff --

done


---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174186700
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/load/monitor/writers/Writer.java
 ---
@@ -0,0 +1,91 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load.monitor.writers;
+
+import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;
+import org.apache.metron.performance.load.monitor.AbstractMonitor;
+import org.apache.metron.performance.load.monitor.Results;
+
+import java.util.ArrayList;
+import java.util.Date;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Optional;
+import java.util.function.Consumer;
+
+public class Writer {
+
+  private int summaryLookback;
+  private List<LinkedList> summaries = new ArrayList<>();
+  private List<Consumer> writers;
+  private List monitors;
+
+  public Writer(List monitors, int summaryLookback, 
List<Consumer> writers) {
+this.summaryLookback = summaryLookback;
+this.writers = writers;
+this.monitors = monitors;
+for(AbstractMonitor m : monitors) {
+  this.summaries.add(new LinkedList<>());
+}
+  }
+
+  public void writeAll() {
+int i = 0;
+Date dateOf = new Date();
+List results = new ArrayList<>();
+for(AbstractMonitor m : monitors) {
+  Long eps = m.get();
+  if(eps != null) {
+if (summaryLookback > 0) {
+  LinkedList summary = summaries.get(i);
+  addToLookback(eps == null ? Double.NaN : eps.doubleValue(), 
summary);
+  results.add(new Results(m.format(), m.name(), eps, 
Optional.of(getStats(summary;
+}
+else {
+  results.add(new Results(m.format(),m.name(), eps, 
Optional.empty()));
+}
+  }
+  else {
--- End diff --

done


---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174186800
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/sampler/UnbiasedSampler.java
 ---
@@ -0,0 +1,28 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.sampler;
+
+import java.util.Random;
+
+public class UnbiasedSampler implements Sampler {
+
+  @Override
+  public int sample(Random rng, int limit) {
+return rng.nextInt(limit);
--- End diff --

done


---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174186641
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/load/monitor/writers/Writer.java
 ---
@@ -0,0 +1,91 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load.monitor.writers;
+
+import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;
+import org.apache.metron.performance.load.monitor.AbstractMonitor;
+import org.apache.metron.performance.load.monitor.Results;
+
+import java.util.ArrayList;
+import java.util.Date;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Optional;
+import java.util.function.Consumer;
+
+public class Writer {
+
+  private int summaryLookback;
+  private List<LinkedList> summaries = new ArrayList<>();
+  private List<Consumer> writers;
+  private List monitors;
+
+  public Writer(List monitors, int summaryLookback, 
List<Consumer> writers) {
+this.summaryLookback = summaryLookback;
+this.writers = writers;
+this.monitors = monitors;
+for(AbstractMonitor m : monitors) {
+  this.summaries.add(new LinkedList<>());
+}
+  }
+
+  public void writeAll() {
+int i = 0;
+Date dateOf = new Date();
+List results = new ArrayList<>();
+for(AbstractMonitor m : monitors) {
+  Long eps = m.get();
+  if(eps != null) {
+if (summaryLookback > 0) {
+  LinkedList summary = summaries.get(i);
+  addToLookback(eps == null ? Double.NaN : eps.doubleValue(), 
summary);
--- End diff --

done


---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174186597
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/load/monitor/writers/ConsoleWriter.java
 ---
@@ -0,0 +1,67 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load.monitor.writers;
+
+import org.apache.commons.math3.stat.descriptive.DescriptiveStatistics;
+import org.apache.metron.performance.load.monitor.Results;
+
+import java.text.DateFormat;
+import java.text.SimpleDateFormat;
+import java.util.ArrayList;
+import java.util.Date;
+import java.util.List;
+import java.util.function.Consumer;
+
+public class ConsoleWriter implements Consumer {
+
+  private String getSummary(DescriptiveStatistics stats) {
+return String.format("Mean: %d, Std Dev: %d", (int)stats.getMean(), 
(int)Math.sqrt(stats.getVariance()));
+  }
+
+  @Override
+  public void accept(Writable writable) {
+List parts = new ArrayList<>();
+Date date = writable.getDate();
+for(Results r : writable.getResults()) {
+  Long eps = r.getEps();
+  if(eps != null) {
+String part = String.format(r.getFormat(), eps);
+if (r.getHistory().isPresent()) {
+  part += " (" + getSummary(r.getHistory().get()) + ")";
+}
+parts.add(part);
+  }
+}
+if(date != null) {
+  DateFormat dateFormat = new SimpleDateFormat("/MM/dd HH:mm:ss");
+  String header = dateFormat.format(date) + " - ";
+  String emptyHeader = "";
+  for (int i = 0; i < header.length(); ++i) {
--- End diff --

done


---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174186529
  
--- Diff: 
metron-contrib/metron-performance/src/test/java/org/apache/metron/performance/load/SendToKafkaTest.java
 ---
@@ -0,0 +1,49 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load;
+
+import org.apache.kafka.clients.producer.KafkaProducer;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.ForkJoinPool;
+import java.util.concurrent.Future;
+import java.util.concurrent.atomic.AtomicLong;
+
+public class SendToKafkaTest {
+
+  @Test
+  public void testWritesCorrectNumber() throws InterruptedException {
--- End diff --

done


---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174186458
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/load/LoadOptions.java
 ---
@@ -0,0 +1,504 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load;
+
+import com.google.common.base.Joiner;
+import org.apache.commons.cli.*;
+import org.apache.metron.common.utils.JSONUtils;
+import org.apache.metron.common.utils.cli.CLIOptions;
+import org.apache.metron.performance.sampler.BiasedSampler;
+import org.apache.metron.stellar.common.utils.ConversionUtils;
+import org.apache.metron.common.utils.cli.OptionHandler;
+
+import javax.annotation.Nullable;
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileNotFoundException;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.EnumMap;
+import java.util.List;
+import java.util.Optional;
+
+public enum LoadOptions implements CLIOptions {
+  HELP(new OptionHandler() {
+
+@Override
+public String getShortCode() {
+  return "h";
+}
+
+@Nullable
+@Override
+public Option apply(@Nullable String s) {
+  return new Option(s, "help", false, "Generate Help screen");
+}
+  }),
+  ZK(new OptionHandler() {
+@Nullable
+@Override
+public Option apply(@Nullable String s) {
+  Option o = new Option(s, "zk_quorum", true, "zookeeper quorum");
+  o.setArgName("QUORUM");
+  o.setRequired(false);
+  return o;
+}
+
+@Override
+public Optional getValue(LoadOptions option, CommandLine cli) {
+  if(option.has(cli)) {
+return Optional.ofNullable(option.get(cli));
+  }
+  else {
+return Optional.empty();
+  }
+}
+
+@Override
+public String getShortCode() {
+  return "z";
+}
+  }),
+  CONSUMER_GROUP(new OptionHandler() {
+@Nullable
+@Override
+public Option apply(@Nullable String s) {
+  Option o = new Option(s, "consumer_group", true, "Consumer Group.  
The default is " + LoadGenerator.CONSUMER_GROUP);
+  o.setArgName("GROUP_ID");
+  o.setRequired(false);
+  return o;
+}
+
+@Override
+public Optional getValue(LoadOptions option, CommandLine cli) {
+  if(option.has(cli)) {
+return Optional.ofNullable(option.get(cli));
+  }
+  else {
+return Optional.of(LoadGenerator.CONSUMER_GROUP);
+  }
+}
+
+@Override
+public String getShortCode() {
+  return "cg";
+}
+  }),
+  BIASED_SAMPLE(new OptionHandler() {
+@Nullable
+@Override
+public Option apply(@Nullable String s) {
+  Option o = new Option(s, "sample_bias", true, "The discrete 
distribution to bias the sampling. " +
+  "This is a CSV of 2 columns.  The first column is the % of 
the templates " +
+  "and the 2nd column is the probability (0-100) that it's 
chosen.  For instance:\n" +
+  "  20,80\n" +
+  "  80,20\n" +
+  "implies that 20% of the templates will comprise 80% of the 
output and the remaining 80% of the templates will comprise 20% of the 
output.");
+  o.setArgName("BIAS_FILE");
+  o.setRequired(false);
+  return o;
+}
+
+@Override
+public Optional getValue(LoadOptions option, CommandLine cli) {
+  if(!option.has(cli)) {
+return Optional.empty();
+  }
+ 

[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174186359
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/load/LoadGenerator.java
 ---
@@ -0,0 +1,165 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load;
+
+
+import com.google.common.base.Joiner;
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.PosixParser;
+import org.apache.kafka.clients.producer.KafkaProducer;
+import org.apache.metron.common.utils.KafkaUtils;
+import org.apache.metron.performance.load.monitor.AbstractMonitor;
+import org.apache.metron.performance.load.monitor.EPSGeneratedMonitor;
+import 
org.apache.metron.performance.load.monitor.EPSThroughputWrittenMonitor;
+import org.apache.metron.performance.load.monitor.MonitorTask;
+import org.apache.metron.performance.load.monitor.writers.CSVWriter;
+import org.apache.metron.performance.load.monitor.writers.ConsoleWriter;
+import org.apache.metron.performance.load.monitor.writers.Writable;
+import org.apache.metron.performance.load.monitor.writers.Writer;
+import org.apache.metron.performance.sampler.BiasedSampler;
+import org.apache.metron.performance.sampler.Sampler;
+import org.apache.metron.performance.sampler.UnbiasedSampler;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.EnumMap;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Timer;
+import java.util.TimerTask;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.function.Consumer;
+
+public class LoadGenerator
+{
+  public static String CONSUMER_GROUP = "load.group";
+  public static long SEND_PERIOD_MS = 100;
+  public static long MONITOR_PERIOD_MS = 1000*10;
+  private static ExecutorService pool;
+  private static ThreadLocal kafkaProducer;
+  public static AtomicLong numSent = new AtomicLong(0);
+
+  public static void main( String[] args ) throws Exception {
+CommandLine cli = LoadOptions.parse(new PosixParser(), args);
+EnumMap<LoadOptions, Optional> evaluatedArgs = 
LoadOptions.createConfig(cli);
+Map<String, Object> kafkaConfig = new HashMap<>();
+kafkaConfig.put("key.serializer", 
"org.apache.kafka.common.serialization.StringSerializer");
+kafkaConfig.put("value.serializer", 
"org.apache.kafka.common.serialization.StringSerializer");
+kafkaConfig.put("key.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");
+kafkaConfig.put("value.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");
+if(LoadOptions.ZK.has(cli)) {
+  String zkQuorum = (String) evaluatedArgs.get(LoadOptions.ZK).get();
+  kafkaConfig.put("bootstrap.servers", 
Joiner.on(",").join(KafkaUtils.INSTANCE.getBrokersFromZookeeper(zkQuorum)));
--- End diff --

done


---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174186403
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/load/LoadGenerator.java
 ---
@@ -0,0 +1,165 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load;
+
+
+import com.google.common.base.Joiner;
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.PosixParser;
+import org.apache.kafka.clients.producer.KafkaProducer;
+import org.apache.metron.common.utils.KafkaUtils;
+import org.apache.metron.performance.load.monitor.AbstractMonitor;
+import org.apache.metron.performance.load.monitor.EPSGeneratedMonitor;
+import 
org.apache.metron.performance.load.monitor.EPSThroughputWrittenMonitor;
+import org.apache.metron.performance.load.monitor.MonitorTask;
+import org.apache.metron.performance.load.monitor.writers.CSVWriter;
+import org.apache.metron.performance.load.monitor.writers.ConsoleWriter;
+import org.apache.metron.performance.load.monitor.writers.Writable;
+import org.apache.metron.performance.load.monitor.writers.Writer;
+import org.apache.metron.performance.sampler.BiasedSampler;
+import org.apache.metron.performance.sampler.Sampler;
+import org.apache.metron.performance.sampler.UnbiasedSampler;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.EnumMap;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Timer;
+import java.util.TimerTask;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.function.Consumer;
+
+public class LoadGenerator
+{
+  public static String CONSUMER_GROUP = "load.group";
+  public static long SEND_PERIOD_MS = 100;
+  public static long MONITOR_PERIOD_MS = 1000*10;
+  private static ExecutorService pool;
+  private static ThreadLocal kafkaProducer;
+  public static AtomicLong numSent = new AtomicLong(0);
+
+  public static void main( String[] args ) throws Exception {
+CommandLine cli = LoadOptions.parse(new PosixParser(), args);
+EnumMap<LoadOptions, Optional> evaluatedArgs = 
LoadOptions.createConfig(cli);
+Map<String, Object> kafkaConfig = new HashMap<>();
+kafkaConfig.put("key.serializer", 
"org.apache.kafka.common.serialization.StringSerializer");
+kafkaConfig.put("value.serializer", 
"org.apache.kafka.common.serialization.StringSerializer");
+kafkaConfig.put("key.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");
+kafkaConfig.put("value.deserializer", 
"org.apache.kafka.common.serialization.StringDeserializer");
+if(LoadOptions.ZK.has(cli)) {
+  String zkQuorum = (String) evaluatedArgs.get(LoadOptions.ZK).get();
+  kafkaConfig.put("bootstrap.servers", 
Joiner.on(",").join(KafkaUtils.INSTANCE.getBrokersFromZookeeper(zkQuorum)));
+}
+String groupId = 
evaluatedArgs.get(LoadOptions.CONSUMER_GROUP).get().toString();
+System.out.println("Consumer Group: " + groupId);
+kafkaConfig.put("group.id", groupId);
--- End diff --

done


---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174186422
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/load/LoadOptions.java
 ---
@@ -0,0 +1,504 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load;
+
+import com.google.common.base.Joiner;
+import org.apache.commons.cli.*;
+import org.apache.metron.common.utils.JSONUtils;
+import org.apache.metron.common.utils.cli.CLIOptions;
+import org.apache.metron.performance.sampler.BiasedSampler;
+import org.apache.metron.stellar.common.utils.ConversionUtils;
+import org.apache.metron.common.utils.cli.OptionHandler;
+
+import javax.annotation.Nullable;
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileNotFoundException;
+import java.io.FileReader;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.EnumMap;
+import java.util.List;
+import java.util.Optional;
+
+public enum LoadOptions implements CLIOptions {
+  HELP(new OptionHandler() {
+
+@Override
+public String getShortCode() {
+  return "h";
+}
+
+@Nullable
+@Override
+public Option apply(@Nullable String s) {
+  return new Option(s, "help", false, "Generate Help screen");
+}
+  }),
+  ZK(new OptionHandler() {
+@Nullable
+@Override
+public Option apply(@Nullable String s) {
+  Option o = new Option(s, "zk_quorum", true, "zookeeper quorum");
+  o.setArgName("QUORUM");
+  o.setRequired(false);
+  return o;
+}
+
+@Override
+public Optional getValue(LoadOptions option, CommandLine cli) {
+  if(option.has(cli)) {
+return Optional.ofNullable(option.get(cli));
+  }
+  else {
+return Optional.empty();
+  }
+}
+
+@Override
+public String getShortCode() {
+  return "z";
+}
+  }),
+  CONSUMER_GROUP(new OptionHandler() {
+@Nullable
+@Override
+public Option apply(@Nullable String s) {
+  Option o = new Option(s, "consumer_group", true, "Consumer Group.  
The default is " + LoadGenerator.CONSUMER_GROUP);
+  o.setArgName("GROUP_ID");
+  o.setRequired(false);
+  return o;
+}
+
+@Override
+public Optional getValue(LoadOptions option, CommandLine cli) {
+  if(option.has(cli)) {
+return Optional.ofNullable(option.get(cli));
+  }
+  else {
+return Optional.of(LoadGenerator.CONSUMER_GROUP);
+  }
+}
+
+@Override
+public String getShortCode() {
+  return "cg";
+}
+  }),
+  BIASED_SAMPLE(new OptionHandler() {
+@Nullable
+@Override
+public Option apply(@Nullable String s) {
+  Option o = new Option(s, "sample_bias", true, "The discrete 
distribution to bias the sampling. " +
+  "This is a CSV of 2 columns.  The first column is the % of 
the templates " +
+  "and the 2nd column is the probability (0-100) that it's 
chosen.  For instance:\n" +
+  "  20,80\n" +
+  "  80,20\n" +
+  "implies that 20% of the templates will comprise 80% of the 
output and the remaining 80% of the templates will comprise 20% of the 
output.");
+  o.setArgName("BIAS_FILE");
+  o.setRequired(false);
+  return o;
+}
+
+@Override
+public Optional getValue(LoadOptions option, CommandLine cli) {
+  if(!option.has(cli)) {
+return Optional.empty();
+  }
+ 

[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174186490
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/load/MessageGenerator.java
 ---
@@ -0,0 +1,48 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load;
+
+import org.apache.metron.performance.sampler.Sampler;
+
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.function.Supplier;
+
+public class MessageGenerator implements Supplier {
+  private static ThreadLocal rng = ThreadLocal.withInitial(() -> 
new Random());
+  private static AtomicLong guidOffset = new AtomicLong(0);
+  private static String guidPrefix = "6141faf6-a8ba-4044-ab80-";
--- End diff --

done


---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174186316
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/load/LoadGenerator.java
 ---
@@ -0,0 +1,165 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load;
+
+
+import com.google.common.base.Joiner;
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.PosixParser;
+import org.apache.kafka.clients.producer.KafkaProducer;
+import org.apache.metron.common.utils.KafkaUtils;
+import org.apache.metron.performance.load.monitor.AbstractMonitor;
+import org.apache.metron.performance.load.monitor.EPSGeneratedMonitor;
+import 
org.apache.metron.performance.load.monitor.EPSThroughputWrittenMonitor;
+import org.apache.metron.performance.load.monitor.MonitorTask;
+import org.apache.metron.performance.load.monitor.writers.CSVWriter;
+import org.apache.metron.performance.load.monitor.writers.ConsoleWriter;
+import org.apache.metron.performance.load.monitor.writers.Writable;
+import org.apache.metron.performance.load.monitor.writers.Writer;
+import org.apache.metron.performance.sampler.BiasedSampler;
+import org.apache.metron.performance.sampler.Sampler;
+import org.apache.metron.performance.sampler.UnbiasedSampler;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.EnumMap;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Timer;
+import java.util.TimerTask;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.function.Consumer;
+
+public class LoadGenerator
+{
+  public static String CONSUMER_GROUP = "load.group";
--- End diff --

done


---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174186275
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/load/LoadGenerator.java
 ---
@@ -0,0 +1,165 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load;
+
+
+import com.google.common.base.Joiner;
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.PosixParser;
+import org.apache.kafka.clients.producer.KafkaProducer;
+import org.apache.metron.common.utils.KafkaUtils;
+import org.apache.metron.performance.load.monitor.AbstractMonitor;
+import org.apache.metron.performance.load.monitor.EPSGeneratedMonitor;
+import 
org.apache.metron.performance.load.monitor.EPSThroughputWrittenMonitor;
+import org.apache.metron.performance.load.monitor.MonitorTask;
+import org.apache.metron.performance.load.monitor.writers.CSVWriter;
+import org.apache.metron.performance.load.monitor.writers.ConsoleWriter;
+import org.apache.metron.performance.load.monitor.writers.Writable;
+import org.apache.metron.performance.load.monitor.writers.Writer;
+import org.apache.metron.performance.sampler.BiasedSampler;
+import org.apache.metron.performance.sampler.Sampler;
+import org.apache.metron.performance.sampler.UnbiasedSampler;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.EnumMap;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Timer;
+import java.util.TimerTask;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.function.Consumer;
+
+public class LoadGenerator
+{
+  public static String CONSUMER_GROUP = "load.group";
+  public static long SEND_PERIOD_MS = 100;
+  public static long MONITOR_PERIOD_MS = 1000*10;
+  private static ExecutorService pool;
+  private static ThreadLocal kafkaProducer;
+  public static AtomicLong numSent = new AtomicLong(0);
+
+  public static void main( String[] args ) throws Exception {
+CommandLine cli = LoadOptions.parse(new PosixParser(), args);
+EnumMap<LoadOptions, Optional> evaluatedArgs = 
LoadOptions.createConfig(cli);
+Map<String, Object> kafkaConfig = new HashMap<>();
+kafkaConfig.put("key.serializer", 
"org.apache.kafka.common.serialization.StringSerializer");
--- End diff --

done


---


[GitHub] metron issue #958: METRON-1483: Create a tool to monitor performance of the ...

2018-03-13 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/958
  
@justinleet thanks for the review.  I reacted to your comments either by 
fixing them or suggesting why I prefer what is there.  I will add a new set of 
tests in the next commit.


---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-13 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174181759
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/load/SendToKafka.java
 ---
@@ -0,0 +1,107 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load;
+
+import org.apache.kafka.clients.producer.KafkaProducer;
+import org.apache.kafka.clients.producer.ProducerRecord;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.TimerTask;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Future;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.function.Supplier;
+
+public class SendToKafka extends TimerTask {
--- End diff --

well, I mean, I suppose it could be, but I didn't see value in it since all 
of our inputs currently are strings and the message generator exclusively 
generates strings, not other messages.  It'd be a more serious refactoring to 
genericize this to make the message generator pluggable.  Also, for utility 
code, I think I'd rather avoid prematurely generalizing it.  Let me know what 
you think.


---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-12 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r173942355
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/load/monitor/AbstractMonitor.java
 ---
@@ -0,0 +1,49 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load.monitor;
+
+import java.util.Optional;
+import java.util.function.Supplier;
+
+public abstract class AbstractMonitor implements Supplier, 
MonitorNaming {
+  private static final double EPSILON = 1e-6;
+  protected Optional kafkaTopic;
+  protected long timestampPrevious = 0;
+  public AbstractMonitor(Optional kafkaTopic) {
+this.kafkaTopic = kafkaTopic;
+  }
+
+  protected abstract Long monitor(double deltaTs);
+
+  @Override
+  public Long get() {
+long timeStarted = System.currentTimeMillis();
--- End diff --

To my knowledge, `currentTimeMillis()` is not affected by DST as it is the 
milliseconds since unix epoch UTC.  Cursory googling seems to hold up this 
assertion.  Did I miss something?


---


[GitHub] metron pull request #961: METRON-1487 Define Performance Benchmarks for Enri...

2018-03-12 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/961#discussion_r173928377
  
--- Diff: metron-platform/metron-enrichment/Performance.md ---
@@ -0,0 +1,522 @@
+
+
+# Enrichment Performance
+
+This guide defines a set of benchmarks used to measure the performance of 
the Enrichment topology.  The guide also provides detailed steps on how to 
execute those benchmarks along with advice for tuning the Enrichment topology.
+
+* [Benchmarks](#benchmarks)
+* [Benchmark Execution](#benchmark-execution)
+* [Performance Tuning](#performance-tuning)
+* [Benchmark Results](#benchmark-results)
+
+## Benchmarks
+
+* [Geo IP Enrichment](#geo-ip-enrichment)
+* [HBase Enrichment](#hbase-enrichment)
+* [Stellar Enrichment](#stellar-enrichment)
+
+### Geo IP Enrichment
+
+This benchmark measures the performance of executing a Geo IP enrichment.  
Given a valid IP address the enrichment will append detailed location 
information for that IP.  The location information is sourced from an external 
Geo IP data source like [Maxmind](https://github.com/maxmind/GeoIP2-java). 
+
+ Configuration
+
+Adding the following Stellar expression to the Enrichment topology 
configuration will define a Geo IP enrichment.
+```
+geo := GEO_GET(ip_dst_addr)
+```
+
+After the enrichment process completes, the  telemetry message will 
contain a set of fields with location information for the given IP address.
+```
+{
+   "ip_dst_addr":"151.101.129.140",
+   ...
+   "geo.city":"San Francisco",
+   "geo.country":"US",
+   "geo.dmaCode":"807",
+   "geo.latitude":"37.7697",
+   "geo.location_point":"37.7697,-122.3933",
+   "geo.locID":"5391959",
+   "geo.longitude":"-122.3933",
+   "geo.postalCode":"94107",
+ }
+```
+
+### HBase Enrichment
+
+This benchmark measures the performance of executing an enrichment that 
retrieves data from an external HBase table. This type of enrichment is useful 
for enriching telemetry from an Asset Database or other source of relatively 
static data.
+
+ Configuration
+
+Adding the following Stellar expression to the Enrichment topology 
configuration will define an Hbase enrichment.  This looks up the 'ip_dst_addr' 
within an HBase table 'top-1m' and returns a hostname.
+```
+top1m := ENRICHMENT_GET('top-1m', ip_dst_addr, 'top-1m', 't')
+```
+
+After the telemetry has been enriched, it will contain the host and IP 
elements that were retrieved from the HBase table.
+```
+{
+   "ip_dst_addr":"151.101.2.166",
+   ...
+   "top1m.host":"earther.com",
+   "top1m.ip":"151.101.2.166"
+}
+```
+
+### Stellar Enrichment
+
+This benchmark measures the performance of executing a basic Stellar 
expression.  In this benchmark, the enrichment is purely a computational task 
that has no dependence on an external system like a database.  
+
+ Configuration 
+
+Adding the following Stellar expression to the Enrichment topology 
configuration will define a basic Stellar enrichment.  The following returns 
true if the IP is in the given subnet and false otherwise. 
+```
+local := IN_SUBNET(ip_dst_addr, '192.168.0.0/24')
+```
+
+After the telemetry has been enriched, it will contain a field with a 
boolean value indicating whether the IP was within the given subnet.
+```
+{
+   "ip_dst_addr":"151.101.2.166",
+   ...
+   "local":false
+}
+```
+   
+## Benchmark Execution
+
+* [Prepare Enrichment Data](#prepare-enrichment-data)
+* [Load HBase with Enrichment Data](#load-hbase-with-enrichment-data)
+* [Configure the Enrichments](#configure-the-enrichments)
+* [Create Input Telemetry](#create-input-telemetry)
+* [Cluster Setup](#cluster-setup)
+* [Monitoring](#monitoring)
+
+### Prepare Enrichment Data
+
+The Alexa Top 1 Million was used as an data source for these benchmarks.
+
+1. Download the [Alexa Top 1 
Million](http://s3.amazonaws.com/alexa-static/top-1m.csv.zip).
+
+2. For each hostname, query DNS to retrieve an associated IP address.  
+
+   A script like the following can be used for this.  There is no need to 
do this for all 1 million entries in the data set. Doing this for around 10,000 
records is sufficient.
+
+   ```python
+   import dns.resolver
+   import csv
+
+   resolver = dns.resolver.Resolver()
+   resolver.nameservers = ['8.8.8.8', 

[GitHub] metron pull request #962: METRON-1488: user_settings hbase table does not ha...

2018-03-12 Thread cestella
GitHub user cestella opened a pull request:

https://github.com/apache/metron/pull/962

METRON-1488: user_settings hbase table does not have acls set up for 
kerberos

## Contributor Comments
Currently some REST calls will fail because we do not set ACL's on the new 
user_settings table, which is new.

To exercise this, kerberize full-dev (or whatever cluster you please) and 
go to swagger and try to do a search via `api/v1/search/search` with the 
following arg:
```
{
 "indices": [],
 "facetFields":[],
 "query": "*",
 "from": 0,
 "size": 20
}
```
This should succeed now.  Before we got the following exception:
```
{
  "responseCode": 500,
  "message": "org.apache.hadoop.hbase.security.AccessDeniedException: 
Insufficient permissions for user 'metron' (table=user_settings, 
action=READ)\n\tat 
org.apache.hadoop.hbase.security.access.AccessController.internalPreRead(AccessController.java:1616)\n\tat
 
org.apache.hadoop.hbase.security.access.AccessController.preGetOp(AccessController.java:1624)\n\tat
 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$26.call(RegionCoprocessorHost.java:816)\n\tat
 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1660)\n\tat
 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1734)\n\tat
 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1692)\n\tat
 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:812)\n\tat
 org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6917)\n\tat 
org.apa
 che.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6905)\n\tat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2026)\n\tat
 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32381)\n\tat
 org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)\n\tat 
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)\n\tat 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)\n\tat 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)\n",
  "fullMessage": "RemoteWithExtrasException: 
org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
permissions for user 'metron' (table=user_settings, action=READ)\n\tat 
org.apache.hadoop.hbase.security.access.AccessController.internalPreRead(AccessController.java:1616)\n\tat
 
org.apache.hadoop.hbase.security.access.AccessController.preGetOp(AccessController.java:1624)\n\tat
 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$26.call(RegionCoprocessorHost.java:816)\n\tat
 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1660)\n\tat
 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1734)\n\tat
 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1692)\n\tat
 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:812)\n\tat
 org.apache.hadoop.hbase.regionserver.HRegion.get(H
 Region.java:6917)\n\tat 
org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6905)\n\tat 
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2026)\n\tat
 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32381)\n\tat
 org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)\n\tat 
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)\n\tat 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)\n\tat 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)\n"
}
```


## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR t

[GitHub] metron issue #958: METRON-1483: Create a tool to monitor performance of the ...

2018-03-08 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/958
  
# Testing Plan:
We presume
* `ZOOKEEPER` is an environment variable set to the zk quorum (e.g. 
`node1:2181`)
* `BROKER` is an environment variable set to the broker (e.g. `node1:6667`)

## Unbiased Generation

In order to test this, we will:
1. create a new topic
```
# Create dummy_12_unbiased with 12 partitions
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $ZOOKEEPER 
--create --topic dummy_12_unbiased --partitions 12 --replication-factor 1
```
2. Create some dummy data to send into kafka
```
# Generate some templates of dummy data that look like CSV in 
~/dummy.templates
# With a number, a string, a GUID and the timestamp
for i in $(seq 1 100);do echo "$i,foo,\$METRON_GUID,\$METRON_TS";done > 
~/dummy.templates
```
3. Write into that topic from a set of templates at a specified rate
```
# Write out 1000 messages per second to dummy_12_unbiased
# Monitor dummy_12_unbiased
# Each message is drawn from the set of templates in step 2.
# Stop after 60 seconds and report how much you've written
$METRON_HOME/bin/load_tool.sh -p 5 -ot dummy_12_unbiased -mt 
dummy_12_unbiased -t ~/dummy.template -eps 1000 -z $ZOOKEEPER -tl 6
```
  * It should indicate that it generated something like 60k messages
4. Measure if the messages generated roughly match the requested rate.
```
# You will have to ctrl-c this in a few seconds
/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh 
--bootstrap-server node1:6667 --topic dummy_12_topic --from-beginning | wc -l
```
  * It should output something like `Processed a total of 60400 messages`, 
which would be about 1000 messages per second.



---


[GitHub] metron pull request #949: METRON-1471: Migrate shuffle connections to local ...

2018-03-08 Thread cestella
Github user cestella closed the pull request at:

https://github.com/apache/metron/pull/949


---


[GitHub] metron pull request #949: METRON-1471: Migrate shuffle connections to local ...

2018-03-08 Thread cestella
GitHub user cestella reopened a pull request:

https://github.com/apache/metron/pull/949

METRON-1471: Migrate shuffle connections to local or shuffle

## Contributor Comments
Currently, we use shuffle groupings when we do not want to group by field.  
We should, instead, use local or shuffle groupings.

Manual tests should include ensuring data continues to flow as before.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cestella/incubator-metron 
local_or_shuffle_indexing

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #949


commit 0ab26fa172a90eaf8eb441c8c085308649efdfb9
Author: cstella <cestella@...>
Date:   2018-03-06T16:18:57Z

Updating the rest of the topologies to prefer local or shuffle over shuffle

commit 82a7c823ab27441ddda5c043dee3cef8aa8f03be
Author: cstella <cestella@...>
Date:   2018-03-08T14:25:19Z

Merge branch 'master' into local_or_shuffle_indexing

commit 2d3a7f68dc7570ca1a44b9b6a4b2880292e18824
Author: cstella <cestella@...>
Date:   2018-03-08T14:26:48Z

Fixed profiler




---


[GitHub] metron pull request #958: METRON-1483: In performance evaluation, generating...

2018-03-08 Thread cestella
GitHub user cestella opened a pull request:

https://github.com/apache/metron/pull/958

METRON-1483: In performance evaluation, generating synthetic load and 
monitoring the write throughput of our kafka-to-kafka topologies has required a 
lot of custom scripting.  We should have a tool that could do the following:  
Take a file representing a message template and generate synthetic load at a 
given events per second Monitor the kafka offsets of a topic and report 
throughput numbers in events per second

## Contributor Comments
In performance evaluation, generating synthetic load and monitoring the 
write throughput of our kafka-to-kafka topologies has required a lot of custom 
scripting.  We should have a tool that could do the following:

* Take a file representing a message template and generate synthetic load 
at a given events per second
* Monitor the kafka offsets of a topic and report throughput numbers in 
events per second

Currently pending:
* Test plan pending
* Documentation

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [ ] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cestella/incubator-metron perf_tool

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/958.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #958


commit 8ba8c57576ed318d1ff3f69cda1dc9020d62352a
Author: cstella <cestella@...>
Date:   2018-03-08T00:03:33Z

Build load testing tool

commit 638086f2f30cfe543b624721c1fb467062571eab
Author: cstella <cestella@...>
Date:   2018-03-08T15:31:55Z

refactoring monitoring to be more sensible.

commit 55be01de033d0ea6b489930cb97fefcca80efdb5
Author: cstella <cestella@...>
Date:   2018-03-08T16:20:52Z

Added summarization




---


[GitHub] metron issue #945: METRON-1464: Convert schemas to be compatible with Solr 5...

2018-03-08 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/945
  
Ok, I'm cool with it.  +1 by inspection; great work.


---


[GitHub] metron issue #949: METRON-1471: Migrate shuffle connections to local or shuf...

2018-03-08 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/949
  
@merrimanr Ah, I had forgotten to merge in master yesterday afternoon.  
There should be no more shuffles in any of the topologies.


---


[GitHub] metron issue #949: METRON-1471: Migrate shuffle connections to local or shuf...

2018-03-07 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/949
  
@merrimanr As of now, there should be no more SHUFFLE groupings in 
enrichments, but I'll change the profiler for sure.


---


[GitHub] metron pull request #947: METRON-1467: Replace guava caches in places where ...

2018-03-07 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/947#discussion_r172870992
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/bolt/JoinBolt.java
 ---
@@ -89,29 +91,25 @@ public void prepare(Map map, TopologyContext 
topologyContext, OutputCollector ou
 if (this.maxTimeRetain == null) {
   throw new IllegalStateException("maxTimeRetain must be specified");
 }
-loader = new CacheLoader<String, Map<String, Tuple>>() {
-  @Override
-  public Map<String, Tuple> load(String key) throws Exception {
-return new HashMap<>();
-  }
-};
-cache = CacheBuilder.newBuilder().maximumSize(maxCacheSize)
-.expireAfterWrite(maxTimeRetain, 
TimeUnit.MINUTES).removalListener(new JoinRemoveListener())
-.build(loader);
+loader = s -> new HashMap<>();
+cache = Caffeine.newBuilder().maximumSize(maxCacheSize)
+ .expireAfterWrite(maxTimeRetain, TimeUnit.MINUTES)
+ .removalListener(new JoinRemoveListener())
--- End diff --

So, I believe this was intentionally done before this PR (I migrated this 
to the new caching strategy) and the idea is that if a removal is happening 
from the join cache under specific circumstances, we want to know about it 
because a message could be being dropped because the cache is being 
overwhelmed.  @merrimanr Can you chime in here on the rationale?


---


[GitHub] metron issue #945: METRON-1464: Convert schemas to be compatible with Solr 5...

2018-03-07 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/945
  
@merrimanr you ran this up in Solr 5.5 as well as 6.6, right?  If so, then 
I'm content with the change and give a +1 pending (other than holding for an 
answer to Simon's question, which I was wondering as well).



---


[GitHub] metron pull request #952: Metron-1480 Add yarn as default build tool for the...

2018-03-07 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/952#discussion_r172858774
  
--- Diff: metron-interface/metron-alerts/package-lock.json ---
@@ -1,6427 +0,0 @@
-{
--- End diff --

I suspect the build failure is due to a lack of a license on this file.  
Since JSON cannot take comments, you'll need to adjust the top level pom to add 
an exclusion for `**/package-lock.json` 
[here](https://github.com/apache/metron/blob/master/pom.xml#L304)  Just make a 
quick comment above it and note that this is yarn collateral.


---


[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-06 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172711543
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/ParallelEnricher.java
 ---
@@ -0,0 +1,281 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.parallel;
+
+import com.github.benmanes.caffeine.cache.stats.CacheStats;
+import org.apache.metron.common.Constants;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.apache.metron.common.performance.PerformanceLogger;
+import org.apache.metron.common.utils.MessageUtils;
+import org.apache.metron.enrichment.bolt.CacheKey;
+import org.apache.metron.enrichment.interfaces.EnrichmentAdapter;
+import org.apache.metron.enrichment.utils.EnrichmentUtils;
+import org.json.simple.JSONObject;
+
+import java.util.AbstractMap;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.EnumMap;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.function.BinaryOperator;
+import java.util.function.Supplier;
+
+/**
+ * This is an independent component which will accept a message and a set 
of enrichment adapters as well as a config which defines
+ * how those enrichments should be performed and fully enrich the message. 
 The result will be the enriched message
+ * unified together and a list of errors which happened.
+ */
+public class ParallelEnricher {
+
+  private Map<String, EnrichmentAdapter> enrichmentsByType = new 
HashMap<>();
+  private EnumMap<EnrichmentStrategies, CacheStats> cacheStats = new 
EnumMap<>(EnrichmentStrategies.class);
+
+  /**
+   * The result of an enrichment.
+   */
+  public static class EnrichmentResult {
+private JSONObject result;
+private List<Map.Entry<Object, Throwable>> enrichmentErrors;
+
+public EnrichmentResult(JSONObject result, List<Map.Entry<Object, 
Throwable>> enrichmentErrors) {
+  this.result = result;
+  this.enrichmentErrors = enrichmentErrors;
+}
+
+/**
+ * The unified fully enriched result.
+ * @return
+ */
+public JSONObject getResult() {
+  return result;
+}
+
+/**
+ * The errors that happened in the course of enriching.
+ * @return
+ */
+public List<Map.Entry<Object, Throwable>> getEnrichmentErrors() {
+  return enrichmentErrors;
+}
+  }
+
+  private ConcurrencyContext concurrencyContext;
+
+  /**
+   * Construct a parallel enricher with a set of enrichment adapters 
associated with their enrichment types.
+   * @param enrichmentsByType
+   */
+  public ParallelEnricher( Map<String, EnrichmentAdapter> 
enrichmentsByType
+ , ConcurrencyContext concurrencyContext
+ , boolean logStats
+ )
+  {
+this.enrichmentsByType = enrichmentsByType;
+this.concurrencyContext = concurrencyContext;
+if(logStats) {
+  for(EnrichmentStrategies s : EnrichmentStrategies.values()) {
+cacheStats.put(s, null);
+  }
+}
+  }
+
+  /**
+   * Fully enriches a message.  Each enrichment is done in parallel via a 
threadpool.
+   * Each enrichment is fronted with a LRU cache.
+   *
+   * @param message the message to enrich
+   * @param strategy The enrichment strategy to use (e.g. enrichment or 
threat intel)
+   * @param config The sensor enri

[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-06 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172656809
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/ParallelEnricher.java
 ---
@@ -0,0 +1,281 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.parallel;
+
+import com.github.benmanes.caffeine.cache.stats.CacheStats;
+import org.apache.metron.common.Constants;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.apache.metron.common.performance.PerformanceLogger;
+import org.apache.metron.common.utils.MessageUtils;
+import org.apache.metron.enrichment.bolt.CacheKey;
+import org.apache.metron.enrichment.interfaces.EnrichmentAdapter;
+import org.apache.metron.enrichment.utils.EnrichmentUtils;
+import org.json.simple.JSONObject;
+
+import java.util.AbstractMap;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.EnumMap;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.function.BinaryOperator;
+import java.util.function.Supplier;
+
+/**
+ * This is an independent component which will accept a message and a set 
of enrichment adapters as well as a config which defines
+ * how those enrichments should be performed and fully enrich the message. 
 The result will be the enriched message
+ * unified together and a list of errors which happened.
+ */
+public class ParallelEnricher {
+
+  private Map<String, EnrichmentAdapter> enrichmentsByType = new 
HashMap<>();
+  private EnumMap<EnrichmentStrategies, CacheStats> cacheStats = new 
EnumMap<>(EnrichmentStrategies.class);
+
+  /**
+   * The result of an enrichment.
+   */
+  public static class EnrichmentResult {
+private JSONObject result;
+private List<Map.Entry<Object, Throwable>> enrichmentErrors;
+
+public EnrichmentResult(JSONObject result, List<Map.Entry<Object, 
Throwable>> enrichmentErrors) {
+  this.result = result;
+  this.enrichmentErrors = enrichmentErrors;
+}
+
+/**
+ * The unified fully enriched result.
+ * @return
+ */
+public JSONObject getResult() {
+  return result;
+}
+
+/**
+ * The errors that happened in the course of enriching.
+ * @return
+ */
+public List<Map.Entry<Object, Throwable>> getEnrichmentErrors() {
+  return enrichmentErrors;
+}
+  }
+
+  private ConcurrencyContext concurrencyContext;
+
+  /**
+   * Construct a parallel enricher with a set of enrichment adapters 
associated with their enrichment types.
+   * @param enrichmentsByType
+   */
+  public ParallelEnricher( Map<String, EnrichmentAdapter> 
enrichmentsByType
+ , ConcurrencyContext concurrencyContext
+ , boolean logStats
+ )
+  {
+this.enrichmentsByType = enrichmentsByType;
+this.concurrencyContext = concurrencyContext;
+if(logStats) {
+  for(EnrichmentStrategies s : EnrichmentStrategies.values()) {
+cacheStats.put(s, null);
+  }
+}
+  }
+
+  /**
+   * Fully enriches a message.  Each enrichment is done in parallel via a 
threadpool.
+   * Each enrichment is fronted with a LRU cache.
+   *
+   * @param message the message to enrich
+   * @param strategy The enrichment strategy to use (e.g. enrichment or 
threat intel)
+   * @param config The sensor enri

[GitHub] metron pull request #949: METRON-1471: Migrate shuffle connections to local ...

2018-03-06 Thread cestella
GitHub user cestella opened a pull request:

https://github.com/apache/metron/pull/949

METRON-1471: Migrate shuffle connections to local or shuffle

## Contributor Comments
Currently, we use shuffle groupings when we do not want to group by field.  
We should, instead, use local or shuffle groupings.

Manual tests should include ensuring data continues to flow as before.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cestella/incubator-metron 
local_or_shuffle_indexing

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #949


commit 0ab26fa172a90eaf8eb441c8c085308649efdfb9
Author: cstella <cestella@...>
Date:   2018-03-06T16:18:57Z

Updating the rest of the topologies to prefer local or shuffle over shuffle




---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-06 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
Ok, README is updated with the new topology diagram.  Let me know if 
there's anything else.


---


[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172383791
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/Strategy.java
 ---
@@ -0,0 +1,47 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.parallel;
+
+import org.apache.metron.common.Constants;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.json.simple.JSONObject;
+import org.slf4j.Logger;
+
+import java.util.Map;
+
+/**
+ * Enrichment strategy.  This interface provides a mechanism to interface 
with the enrichment config and any
+ * post processing steps that are needed to be done after-the-fact.
+ *
+ * The reasoning behind this is that the key difference between 
enrichments and threat intel is that they pull
+ * their configurations from different parts of the SensorEnrichmentConfig 
object and as a post-join step, they differ
+ * slightly.
+ *
+ */
+public interface Strategy {
+  Constants.ErrorType getErrorType();
--- End diff --

done


---


[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172383810
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/bolt/UnifiedEnrichmentBolt.java
 ---
@@ -0,0 +1,415 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.bolt;
+
+import org.apache.metron.common.Constants;
+import org.apache.metron.common.bolt.ConfiguredEnrichmentBolt;
+import org.apache.metron.common.configuration.ConfigurationType;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import org.apache.metron.common.error.MetronError;
+import org.apache.metron.common.performance.PerformanceLogger;
+import org.apache.metron.common.utils.ErrorUtils;
+import org.apache.metron.common.utils.MessageUtils;
+import org.apache.metron.enrichment.adapters.geo.GeoLiteDatabase;
+import org.apache.metron.enrichment.configuration.Enrichment;
+import org.apache.metron.enrichment.interfaces.EnrichmentAdapter;
+import org.apache.metron.enrichment.parallel.EnrichmentContext;
+import org.apache.metron.enrichment.parallel.EnrichmentStrategies;
+import org.apache.metron.enrichment.parallel.ParallelEnricher;
+import org.apache.metron.enrichment.parallel.WorkerPoolStrategy;
+import org.apache.metron.stellar.dsl.Context;
+import org.apache.metron.stellar.dsl.StellarFunction;
+import org.apache.metron.stellar.dsl.StellarFunctions;
+import org.apache.storm.task.OutputCollector;
+import org.apache.storm.task.TopologyContext;
+import org.apache.storm.topology.OutputFieldsDeclarer;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Tuple;
+import org.apache.storm.tuple.Values;
+import org.json.simple.JSONObject;
+import org.json.simple.parser.JSONParser;
+import org.json.simple.parser.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
+/**
+ * This bolt is a unified enrichment/threat intel bolt.  In contrast to 
the split/enrich/join
+ * bolts above, this handles the entire enrichment lifecycle in one bolt 
using a threadpool to
+ * enrich in parallel.
+ *
+ * From an architectural perspective, this is a divergence from the 
polymorphism based strategy we have
+ * used in the split/join bolts.  Rather, this bolt is provided a strategy 
to use, either enrichment or threat intel,
+ * through composition.  This allows us to move most of the implementation 
into components independent
+ * from Storm.  This will greater facilitate reuse.
+ */
+public class UnifiedEnrichmentBolt extends ConfiguredEnrichmentBolt {
+
+  public static class Perf {} // used for performance logging
+  private PerformanceLogger perfLog; // not static bc multiple bolts may 
exist in same worker
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  public static final String STELLAR_CONTEXT_CONF = "stellarContext";
+
+  /**
+   * The number of threads in the threadpool.  One threadpool is created 
per process.
+   * This is a topology-level configuration
+   */
+  public static final String THREADPOOL_NUM_THREADS_TOPOLOGY_CONF = 
"metron.threadpool.size";
+  /**
+   * The type of threadpool to create. This is a topology-level 
configuration.
+   */
+  public static final String THREADPOOL_TYPE_TOPOLOGY_CONF = 
"metron.threadpool.type";
+
+  /**
+   * The enricher implementation to use.  This will do the parallel 
enrichment via a thread pool.
+   */
+  protected ParallelEnricher enricher;
+
+  /**
+   * The strategy to use for this enrichment bolt.  Practically speak

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
@nickwallen Ok, I refactored the abstraction to separate some concerns, 
name things a bit, and collapse some of the more onerous abstractions.  Also 
updated javadocs.  

Can you give it another look and see what you think?  We probably should 
also give it another smoketest in the lab to make sure I didn't do something 
dumb.


---


[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172383754
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/EnrichmentStrategies.java
 ---
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.parallel;
+
+import org.apache.metron.common.Constants;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.apache.metron.enrichment.bolt.CacheKey;
+import org.json.simple.JSONObject;
+import org.slf4j.Logger;
+
+import java.util.Map;
+import java.util.concurrent.Executor;
+
+public enum EnrichmentStrategies implements Strategy {
--- End diff --

I decided that this is too onerous of an abstraction and rethought it a 
bit.  Give it another look, please.


---


[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172377136
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/EnrichmentStrategies.java
 ---
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.parallel;
+
+import org.apache.metron.common.Constants;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.apache.metron.enrichment.bolt.CacheKey;
+import org.json.simple.JSONObject;
+import org.slf4j.Logger;
+
+import java.util.Map;
+import java.util.concurrent.Executor;
+
+public enum EnrichmentStrategies implements Strategy {
--- End diff --

I might be answering a question that you're not asking, but this bit of 
awkwardness arises because we have merged the concepts of threat intel and 
enrichment, which differ really only in post-processing.  The approach 
presented here, in contrast to the inheritance-based approach in the bolts, 
allows for an abstraction through composition whereby we localize all the 
interactions with the sensor enrichment config in a strategy rather than bind 
the abstraction to Storm, our distributed processing engine.  That is the 
rationale behind this approach at least.


---


[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172373203
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/bolt/UnifiedEnrichmentBolt.java
 ---
@@ -0,0 +1,415 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.bolt;
+
+import org.apache.metron.common.Constants;
+import org.apache.metron.common.bolt.ConfiguredEnrichmentBolt;
+import org.apache.metron.common.configuration.ConfigurationType;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import org.apache.metron.common.error.MetronError;
+import org.apache.metron.common.performance.PerformanceLogger;
+import org.apache.metron.common.utils.ErrorUtils;
+import org.apache.metron.common.utils.MessageUtils;
+import org.apache.metron.enrichment.adapters.geo.GeoLiteDatabase;
+import org.apache.metron.enrichment.configuration.Enrichment;
+import org.apache.metron.enrichment.interfaces.EnrichmentAdapter;
+import org.apache.metron.enrichment.parallel.EnrichmentContext;
+import org.apache.metron.enrichment.parallel.EnrichmentStrategies;
+import org.apache.metron.enrichment.parallel.ParallelEnricher;
+import org.apache.metron.enrichment.parallel.WorkerPoolStrategy;
+import org.apache.metron.stellar.dsl.Context;
+import org.apache.metron.stellar.dsl.StellarFunction;
+import org.apache.metron.stellar.dsl.StellarFunctions;
+import org.apache.storm.task.OutputCollector;
+import org.apache.storm.task.TopologyContext;
+import org.apache.storm.topology.OutputFieldsDeclarer;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Tuple;
+import org.apache.storm.tuple.Values;
+import org.json.simple.JSONObject;
+import org.json.simple.parser.JSONParser;
+import org.json.simple.parser.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
+/**
+ * This bolt is a unified enrichment/threat intel bolt.  In contrast to 
the split/enrich/join
+ * bolts above, this handles the entire enrichment lifecycle in one bolt 
using a threadpool to
+ * enrich in parallel.
+ *
+ * From an architectural perspective, this is a divergence from the 
polymorphism based strategy we have
+ * used in the split/join bolts.  Rather, this bolt is provided a strategy 
to use, either enrichment or threat intel,
+ * through composition.  This allows us to move most of the implementation 
into components independent
+ * from Storm.  This will greater facilitate reuse.
+ */
+public class UnifiedEnrichmentBolt extends ConfiguredEnrichmentBolt {
+
+  public static class Perf {} // used for performance logging
+  private PerformanceLogger perfLog; // not static bc multiple bolts may 
exist in same worker
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  public static final String STELLAR_CONTEXT_CONF = "stellarContext";
+
+  /**
+   * The number of threads in the threadpool.  One threadpool is created 
per process.
+   * This is a topology-level configuration
+   */
+  public static final String THREADPOOL_NUM_THREADS_TOPOLOGY_CONF = 
"metron.threadpool.size";
+  /**
+   * The type of threadpool to create. This is a topology-level 
configuration.
+   */
+  public static final String THREADPOOL_TYPE_TOPOLOGY_CONF = 
"metron.threadpool.type";
+
+  /**
+   * The enricher implementation to use.  This will do the parallel 
enrichment via a thread pool.
+   */
+  protected ParallelEnricher enricher;
+
+  /**
+   * The strategy to use for this enrichment bolt.  Practically speak

[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172369461
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/EnrichmentStrategies.java
 ---
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.parallel;
+
+import org.apache.metron.common.Constants;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.apache.metron.enrichment.bolt.CacheKey;
+import org.json.simple.JSONObject;
+import org.slf4j.Logger;
+
+import java.util.Map;
+import java.util.concurrent.Executor;
+
+public enum EnrichmentStrategies implements Strategy {
--- End diff --

This is a strategy pattern using an enum.  The purpose of this class is to 
resolve the specific strategies possible.  It's broadly in line with other 
strategy patterns (e.g. Extractors). 


---


[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172369480
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/Strategy.java
 ---
@@ -0,0 +1,47 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.parallel;
+
+import org.apache.metron.common.Constants;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.json.simple.JSONObject;
+import org.slf4j.Logger;
+
+import java.util.Map;
+
+/**
+ * Enrichment strategy.  This interface provides a mechanism to interface 
with the enrichment config and any
+ * post processing steps that are needed to be done after-the-fact.
+ *
+ * The reasoning behind this is that the key difference between 
enrichments and threat intel is that they pull
+ * their configurations from different parts of the SensorEnrichmentConfig 
object and as a post-join step, they differ
+ * slightly.
+ *
+ */
+public interface Strategy {
+  Constants.ErrorType getErrorType();
--- End diff --

Sure thing.


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
Ahhh, that makes sense.  I bet we were getting killed by small allocations 
in the caching layer.


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
I actually suspect GC as well.  We adjusted the garbage collector to the 
G1GC and saw throughput gains, but not nearly the kinds of gains as we got with 
a drop-in of Caffeine to replace Guava.


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
In this case, the loader isn't doing anything terribly expensive, though it 
may in the future (incur a hbase get or some more expensive computation).


---


[GitHub] metron issue #924: METRON-1299 In MetronError tests, don't test for HostName...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/924
  
+1, sorry!


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
We actually did increase the concurrency level for guava to 64; that is 
what confused us as well.  The hash code is mostly standard, should be evenly 
distributed (the key is pretty much a POJO).


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
The interesting thing that we found was that guava seems to be doing poorly 
when the # of items in the cache gets large.  When we scaled the test down (830 
distinct IP addresses chosen randomly and sent in at a rate of 200k events per 
second with a cache size of 100) kept up but scaling the test up (300k distinct 
ip addresses chosen randomly and sent in at a rate of 200k events per second 
with a cache size of 100k) didn't. 


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
We were being purposefully unkind to the cache in the tests.  The load 
simulation chose a IP address at random to present, so each IP had an equal 
probability of being selected.  Whereas, in real traffic, we expect a coherent 
working set.  Not sure of the exact hit rates, though.


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
I ran this up with vagrant and ensured:
* Normal stellar works still in field transformations as well as enrichments
* swapped in and out new enrichments live
* swapped in and out new threat intel live

Are there any other pending issues here beyond a report of the performance 
impact?


---


[GitHub] metron issue #944: METRON-1463: Adjust the groupings and shuffles in enrichm...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/944
  
I ran this up with vagrant and ensured:
* Normal stellar works still in field transformations as well as enrichments
* swapped in and out new enrichments live
* swapped in and out new threat intel live


---


[GitHub] metron issue #947: METRON-1467: Replace guava caches in places where the key...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/947
  
I ran this up with vagrant and ensured:
* Normal stellar works still in field transformations as well as enrichments
* swapped in and out new enrichments live
* swapped in and out new threat intel live


---


[GitHub] metron pull request #947: METRON-1467: Replace guava caches in places where ...

2018-03-05 Thread cestella
GitHub user cestella reopened a pull request:

https://github.com/apache/metron/pull/947

METRON-1467: Replace guava caches in places where the keyspace might be 
large (NOTE: Review after METRON-1460)

## Contributor Comments
Based on the performance tuning exercise as part of METRON-1460, guava has 
difficulties with cache sizes over 10k.  We, unfortunately, are quite demanding 
of guava in this regard so we should transition a few uses of guava to Caffeine:

* Stellar processor cache
* The JoinBolt cache
* The Enrichment Bolt Cache

NOTE: This depends on METRON-1460 aka #940 and that PR is included in here. 
 It might be easier to review after #940 is merged.

Test plan pending

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- [x] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cestella/incubator-metron 
guava_cache_replacement

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/947.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #947


commit a4f618a3ad895d62772366e0e93e5b8b37c5c964
Author: cstella <cestella@...>
Date:   2018-02-21T23:59:16Z

Adding parallel enrichment bolt.

commit 99fe0b86005fe04294b3851a17ae3d88f228c5d2
Author: cstella <cestella@...>
Date:   2018-02-22T00:21:06Z

Updating to include trace statements.

commit 79736c6f3fab04d01dd1eb998b308f438003a0e1
Author: cstella <cestella@...>
Date:   2018-02-22T15:35:44Z

Updating with some cleanup

commit cb4a527c9146865dafad1d597ba93032ef398d94
Author: cstella <cestella@...>
Date:   2018-02-22T15:48:11Z

Updating spec.

commit fb4d4383f366776f446e33a422652c3ec1f56bfa
Author: cstella <cestella@...>
Date:   2018-02-22T18:00:36Z

Updating threadpool creation

commit 87ef6a72827c31f8adee42ee71272a32c350bc1f
Author: cstella <cestella@...>
Date:   2018-02-22T18:04:37Z

better docs

commit 6ae9594ee4ae2b4d33e0feca398b527077dac0d3
Author: cstella <cestella@...>
Date:   2018-02-22T18:41:20Z

Updating readme.

commit 82ebc9550d759ea0bd06b48c586

[GitHub] metron pull request #947: METRON-1467: Replace guava caches in places where ...

2018-03-05 Thread cestella
Github user cestella closed the pull request at:

https://github.com/apache/metron/pull/947


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-02 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
Just FYI, as part of the performance experimentation in the lab here, we 
found that one major impediment to scale was the guava cache in this topology 
when the size of the cache becomes non-trivial in size (e.g. 10k+).  Swapping 
out [Caffeine](https://github.com/ben-manes/caffeine) immediately had a 
substantial affect.  I created #947 to migrate the split/join infrastructure to 
use caffeine as well and will look at the performance impact of that change.  I 
wanted to separate that work from here as it may be that guava performance is 
fine outside of an explicit threadpool like we have here.


---


[GitHub] metron pull request #947: METRON-1467: Replace guava caches in places where ...

2018-03-02 Thread cestella
GitHub user cestella opened a pull request:

https://github.com/apache/metron/pull/947

METRON-1467: Replace guava caches in places where the keyspace might be 
large

## Contributor Comments
Based on the performance tuning exercise as part of METRON-1460, guava has 
difficulties with cache sizes over 10k.  We, unfortunately, are quite demanding 
of guava in this regard so we should transition a few uses of guava to Caffeine:

* Stellar processor cache
* The JoinBolt cache
* The Enrichment Bolt Cache

NOTE: This depends on METRON-1460 aka #940
Test plan pending

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- [x] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cestella/incubator-metron 
guava_cache_replacement

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/947.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #947


commit a4f618a3ad895d62772366e0e93e5b8b37c5c964
Author: cstella <cestella@...>
Date:   2018-02-21T23:59:16Z

Adding parallel enrichment bolt.

commit 99fe0b86005fe04294b3851a17ae3d88f228c5d2
Author: cstella <cestella@...>
Date:   2018-02-22T00:21:06Z

Updating to include trace statements.

commit 79736c6f3fab04d01dd1eb998b308f438003a0e1
Author: cstella <cestella@...>
Date:   2018-02-22T15:35:44Z

Updating with some cleanup

commit cb4a527c9146865dafad1d597ba93032ef398d94
Author: cstella <cestella@...>
Date:   2018-02-22T15:48:11Z

Updating spec.

commit fb4d4383f366776f446e33a422652c3ec1f56bfa
Author: cstella <cestella@...>
Date:   2018-02-22T18:00:36Z

Updating threadpool creation

commit 87ef6a72827c31f8adee42ee71272a32c350bc1f
Author: cstella <cestella@...>
Date:   2018-02-22T18:04:37Z

better docs

commit 6ae9594ee4ae2b4d33e0feca398b527077dac0d3
Author: cstella <cestella@...>
Date:   2018-02-22T18:41:20Z

Updating readme.

commit 82ebc9550d759ea0bd06b48c586fd5e53c6e553a
Author: cstella <cestella@...>
Date:   2018-02-22T20:53:31Z

Better documentation.

commit 235046d3d1

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-02 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
@arunmahadevan Thanks for chiming in Arun.   I would say that most of the 
enrichment work is I/O bound and we try to avoid it whenever possible with a a 
time-evicted LRU cache in front of the enrichments.  We don't always know a 
priori what enrichments users are doing, per se, as their individual 
enrichments may be expressed via stellar.  The threads here are entirely 
managed via the fixed threadpool service in storm and the threadpool is shared 
across all of the executors running in-process on the worker, so we try to 
minimize that.


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-02 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
@ottobackwards I haven't sent an email to the storm team, but I did run the 
PR past a storm committer that I know and asked his opinion prior to submitting 
the PR.  The general answer was something to the effect of `The overall goal 
should be to reduce the network shuffle unless its really required.`  Also, the 
notion of using an external threadpool didn't seem to be fundamentally 
offensive.


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-02 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
@mraliagha It's definitely a tradeoff.  This is why this is as a complement 
to the original split/join topology.  Keep in mind, also, that this 
architecture enables use-cases that the other would prevent or make extremely 
difficult and/or network intensive, such as multi-level stellar statements 
rather than the 2 levels we have now.  We are undergoing some preliminary 
testing in-lab right now, which @nickwallen alluded to, to compare the two 
approaches under at least synthetic load and will report back.

Ultimately this boils down to efficiencies gained by avoiding network hops 
and whether that's going to provide an outsized impact, I think.


---


[GitHub] metron issue #946: METRON-1465:Support for Elasticsearch X-pack

2018-03-01 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/946
  
ah, crap, looked at the wrong setting.  That's what I meant instead of 
`storm.library.path`


---


[GitHub] metron pull request #946: METRON-1465:Support for Elasticsearch X-pack

2018-03-01 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/946#discussion_r171624949
  
--- Diff: metron-platform/elasticsearch-shaded/pom.xml ---
@@ -31,7 +43,7 @@
 
 
 org.elasticsearch.client
-transport
+x-pack-transport
--- End diff --

Yep, I concede the point, that is not appropriately licensed via apache and 
we cannot bundle that dependency.


---


[GitHub] metron issue #946: METRON-1465:Support for Elasticsearch X-pack

2018-03-01 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/946
  
@mmiklavc Yeah, I think that's the approach, however, there's a snag.  
Storm requires us to create uber jars, so probably what we want to do is have 
users actually put the xpath transport client on the storm.library.path.


---


[GitHub] metron pull request #946: METRON-1465:Support for Elasticsearch X-pack

2018-03-01 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/946#discussion_r171618015
  
--- Diff: metron-platform/elasticsearch-shaded/pom.xml ---
@@ -31,7 +43,7 @@
 
 
 org.elasticsearch.client
-transport
+x-pack-transport
--- End diff --

I believe this is dual licensed ES commercial and ASLv2.  IT comes from 
[here](https://github.com/elastic/elasticsearch#license).


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-02-28 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
@nickwallen Sounds good.  When scale tests are done, can we make sure that 
we also include #944 ?


---


[GitHub] metron issue #938: METRON-1457: Move ASF links to main page in the Metron we...

2018-02-27 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/938
  
+1 as well, looks great



---


[GitHub] metron issue #938: METRON-1457: Move ASF links to main page in the Metron we...

2018-02-27 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/938
  
Where is that powered by apache logo from?  Are we sure it doesn't mean 
that the apache web server serves it up?


---


[GitHub] metron pull request #944: METRON-1463: Adjust the groupings and shuffles in ...

2018-02-27 Thread cestella
GitHub user cestella opened a pull request:

https://github.com/apache/metron/pull/944

METRON-1463: Adjust the groupings and shuffles in enrichment to be more 
efficient

## Contributor Comments
Currently there are some deficiencies in our grouping approach in the 
enrichment topology:

* We have field grouping by key in places where we need LOCAL_OR_SHUFFLE 
groupings
* We have shuffle groupings in places where we need LOCAL_OR_SHUFFLE 
groupings
* We have field groupings by key in places where we need field grouping by 
message (specifically in the connections from the splitter to the enrichments).
  * Field grouping by key, as we had before, ensured that we had fewer 
cache hits as opposed to field grouping by message.

To test this, just run things up in full-dev and ensure that data continues 
to flow.


## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- [x] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cestella/incubator-metron 
better_grouping_in_splitjoin

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/944.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #944


commit dd8821af912071e357c42c852361292c7c2c655c
Author: cstella <cestella@...>
Date:   2018-02-27T14:28:00Z

Adjusted groupings to be local or shuffle




---


[GitHub] metron issue #934: METRON-1423: Ambari work to handle Solr configuration

2018-02-26 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/934
  
+1 by inspection


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-02-23 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
The current architecture is described ![Image of 
Yaktocat](https://github.com/apache/metron/raw/master/metron-platform/metron-enrichment/enrichment_arch.png)

In short, for each message each splitter will
* inspect the configs for the sensor 
* For each sensor, extract the fields required for enrichment and send them 
to the appropriate enrichment bolt (e.g. hbase, geo, stellar)
  * If one enrichment enriches k fields, then k messages will be sent to 
the enrichment bolt
  * In the case of stellar, each stellar subgroup will be a separate message
  * the original message is sent directly to the join bolt
* The enrichment bolts do the enrichment and send the additional fields and 
values to the original message
* The join bolt will asynchronously collect the subresults and join them 
with the original message
  * The join bolt has a LRU cache to hold subresults until all results 
arrive
  * Tuning performance involves tuning this cache (max size and time until 
eviction)
  * Tuning this can be complex because it has to be large enough to handle 
spikes in traffic




---


[GitHub] metron issue #933: METRON-1452 Rebase Dev Environment on Latest CentOS 6

2018-02-23 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/933
  
piling on, +1 by inspection



---


[GitHub] metron issue #934: METRON-1423: Ambari work to handle Solr configuration

2018-02-23 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/934
  
Chiming in late.  I agree that we should not have an explicit dependency on 
an indexing Mpack, even one not of our own construction.  I think people will 
have a lot of different ways to install solr and Metron's mpack should just be 
configured to point to an existing solr instance.

I would generally be in favor of adding support for:
* solr.commitPerBatch
* solr.commit.waitSearcher
* solr.commit.waitFlush
* solr.commit.soft
* solr.collection
* solr.http.config

but sensible defaults are chosen there and people can adjust them in the 
global config, so I think they can wait for a follow-on.  Some of them may be 
problematic to encode in a UI (solr.http.config is a map, for instance), but 
most of them would be pretty trivial.  Frankly, I'm a bit hesitant to give 
people the ability to screw with transactions details easily.


---


[GitHub] metron pull request #934: METRON-1423: Ambari work to handle Solr configurat...

2018-02-23 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/934#discussion_r170269639
  
--- Diff: metron-platform/metron-solr/src/main/scripts/create_collection.sh 
---
@@ -0,0 +1,27 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+METRON_VERSION=${project.version}
+METRON_HOME=/usr/metron/$METRON_VERSION
+SOLR_VERSION=6.6.2
--- End diff --

I notice that we're depending on maven artifacts of 6.6.0 and we set the 
solr version to 6.6.2 here, is there a reason for that?  If we can align them, 
then we could replace this line with SOLR_VERSION=`${global_solr_version}`

This question applies for the rest of the shell scripts too


---


[GitHub] metron pull request #940: Single bolt split join poc

2018-02-22 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r170089790
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/bolt/UnifiedEnrichmentBolt.java
 ---
@@ -0,0 +1,323 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.bolt;
+
+import org.apache.metron.common.Constants;
+import org.apache.metron.common.bolt.ConfiguredEnrichmentBolt;
+import org.apache.metron.common.configuration.ConfigurationType;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.apache.metron.common.error.MetronError;
+import org.apache.metron.common.performance.PerformanceLogger;
+import org.apache.metron.common.utils.ErrorUtils;
+import org.apache.metron.common.utils.MessageUtils;
+import org.apache.metron.enrichment.adapters.geo.GeoLiteDatabase;
+import org.apache.metron.enrichment.configuration.Enrichment;
+import org.apache.metron.enrichment.interfaces.EnrichmentAdapter;
+import org.apache.metron.enrichment.parallel.EnrichmentContext;
+import org.apache.metron.enrichment.parallel.EnrichmentStrategies;
+import org.apache.metron.enrichment.parallel.ParallelEnricher;
+import org.apache.metron.enrichment.parallel.WorkerPoolStrategy;
+import org.apache.metron.stellar.dsl.Context;
+import org.apache.metron.stellar.dsl.StellarFunction;
+import org.apache.metron.stellar.dsl.StellarFunctions;
+import org.apache.storm.task.OutputCollector;
+import org.apache.storm.task.TopologyContext;
+import org.apache.storm.topology.OutputFieldsDeclarer;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Tuple;
+import org.apache.storm.tuple.Values;
+import org.json.simple.JSONObject;
+import org.json.simple.parser.JSONParser;
+import org.json.simple.parser.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
--- End diff --

Ok, I went through and did more rigorous documentation throughout the new 
classes.  Let me know if it makes sense or if there are still issues.


---


[GitHub] metron pull request #940: Single bolt split join poc

2018-02-22 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r170057327
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/bolt/UnifiedEnrichmentBolt.java
 ---
@@ -0,0 +1,323 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.bolt;
+
+import org.apache.metron.common.Constants;
+import org.apache.metron.common.bolt.ConfiguredEnrichmentBolt;
+import org.apache.metron.common.configuration.ConfigurationType;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.apache.metron.common.error.MetronError;
+import org.apache.metron.common.performance.PerformanceLogger;
+import org.apache.metron.common.utils.ErrorUtils;
+import org.apache.metron.common.utils.MessageUtils;
+import org.apache.metron.enrichment.adapters.geo.GeoLiteDatabase;
+import org.apache.metron.enrichment.configuration.Enrichment;
+import org.apache.metron.enrichment.interfaces.EnrichmentAdapter;
+import org.apache.metron.enrichment.parallel.EnrichmentContext;
+import org.apache.metron.enrichment.parallel.EnrichmentStrategies;
+import org.apache.metron.enrichment.parallel.ParallelEnricher;
+import org.apache.metron.enrichment.parallel.WorkerPoolStrategy;
+import org.apache.metron.stellar.dsl.Context;
+import org.apache.metron.stellar.dsl.StellarFunction;
+import org.apache.metron.stellar.dsl.StellarFunctions;
+import org.apache.storm.task.OutputCollector;
+import org.apache.storm.task.TopologyContext;
+import org.apache.storm.topology.OutputFieldsDeclarer;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Tuple;
+import org.apache.storm.tuple.Values;
+import org.json.simple.JSONObject;
+import org.json.simple.parser.JSONParser;
+import org.json.simple.parser.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
--- End diff --

Yeah, good call.


---


[GitHub] metron pull request #940: Single bolt split join poc

2018-02-22 Thread cestella
GitHub user cestella opened a pull request:

https://github.com/apache/metron/pull/940

Single bolt split join poc

## Contributor Comments
There are some deficiencies to the split/join topology.

It's hard to reason about
* Understanding the latency of enriching a message requires looking at 
multiple bolts that each give summary statistics
* The join bolt's cache is really hard to reason about when performance 
tuning
* During spikes in traffic, you can overload the join bolt's cache and drop 
messages if you aren't careful
* In general, it's hard to associate a cache size and a duration kept in 
cache with throughput and latency
* There are a lot of network hops per message
* Right now we are stuck at 2 stages of transformations being done 
(enrichment and threat intel).  It's very possible that you might want stellar 
enrichments to depend on the output of other stellar enrichments.  In order to 
implement this in split/join you'd have to create a cycle in the storm topology

I propose that we move to a model where we do enrichments in a single bolt 
in parallel using a static threadpool (e.g. multiple workers in the same 
process would share the threadpool).  IN all other ways, this would be 
backwards compatible.  A transparent drop-in for the existing enrichment 
topology.
There are some pros/cons about this too:
* Pro
  * Easier to reason about from an individual message perspective
  * Architecturally decoupled from Storm
  * This sets us up if we want to consider other streaming technologies
  * Fewer bolts
* spout -> enrichment bolt -> threatintel bolt -> output bolt
  * Way fewer network hops per message
currently 2n+1 where n is the number of enrichments used (if using stellar 
subgroups, each subgroup is a hop)
  * Easier to reason about from a performance perspective
  * We trade cache size and eviction timeout for threadpool size
  * We set ourselves up to have stellar subgroups with dependencies
i.e. stellar subgroups that depend on the output of other subgroups
If we do this, we can shrink the topology to just spout -> 
enrichment/threat intel -> output
* Con
  * We can no longer tune stellar enrichments independent from HBase 
enrichments
* To be fair, with enrichments moving to stellar, this is the case in 
the split/join approach too
  * No idea about performance

What I propose is to submit a PR that will deliver an alternative, 
completely backwards compatible topology for enrichment that you can use by 
adjusting the `start_enrichment_topology.sh` script to use 
`remote-unified.yaml` instead of `remote.yaml`.  If we live with it for a while 
and have some good experiences with it, maybe we can consider retiring the old 
enrichment topology.

To test this, spin up vagrant and edit 
`$METRON_HOME/bin/start_enrichment_topology.sh` to use `remote-unified.yaml` 
instead of `remote.yaml`.  Restart enrichment and you should see a topology 
that looks something like:

![image](https://user-images.githubusercontent.com/540359/36556636-e0ae092e-17d3-11e8-9e45-5160b4f23451.png)



## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [ ] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [ ] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [ ] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [ ] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- [ ] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [ ] If ad

[GitHub] metron issue #935: METRON-1386: Fix Metron Website Required Links

2018-02-15 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/935
  
This looks great!  +1 by inspection.


---


[GitHub] metron issue #937: METRON-1455: Patch and Replace methods in the REST Update...

2018-02-15 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/937
  
I verified this in full dev with kerberos as well.


---


[GitHub] metron pull request #937: METRON-1455: Patch and Replace methods in the REST...

2018-02-14 Thread cestella
GitHub user cestella opened a pull request:

https://github.com/apache/metron/pull/937

METRON-1455: Patch and Replace methods in the REST UpdateController return 
400

## Contributor Comments
Shading and relocating the jackson library had the unexpected behavior of 
breaking patch requests in the REST API.  The solution is that we moved the 
remaining patch functionality inside `JSONUtils` and stop relying on Jackson 
specific classes in our external API (e.g. `PatchRequest`).

To exercise this behavior, from the alerts UI, click on an message and 
click "Escalate."  From the console ensure that the patch request returns a 200 
code rather than a 400 or 500 code.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?


 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cestella/incubator-metron METRON-1455

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/937.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #937


commit cce8fda5557d9360e43de8a12443fa67d79e3681
Author: cstella <cestella@...>
Date:   2018-02-14T19:15:38Z

METRON-1455: Patch and Replace methods in the REST UpdateController return 
400




---


[GitHub] metron pull request #929: METRON-1448: Update SolrWriter to conform to new c...

2018-02-09 Thread cestella
Github user cestella closed the pull request at:

https://github.com/apache/metron/pull/929


---


[GitHub] metron issue #929: METRON-1448: Update SolrWriter to conform to new collecti...

2018-02-09 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/929
  
I updated the instructions and the `solr.zookeeper` business is legacy 
naming, but I think it does fit since it's specific to zookeeper rather than 
connecting to ES master.  I dunno, that's just what I told myself ;)


---


[GitHub] metron issue #929: METRON-1448: Update SolrWriter to conform to new collecti...

2018-02-08 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/929
  
Ok, I wrote up a detailed testing plan to get this running in full-dev.  I 
want to add a couple of unit tests around the new properties that I added, but 
at that point, this is feature complete and ready to review.


---


[GitHub] metron pull request #929: METRON-1448: Update SolrWriter to conform to new c...

2018-02-07 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/929#discussion_r166798136
  
--- Diff: 
metron-platform/metron-solr/src/main/java/org/apache/metron/solr/writer/SolrWriter.java
 ---
@@ -33,17 +39,19 @@
 
 import java.io.IOException;
 import java.io.Serializable;
-import java.util.List;
-import java.util.Map;
+import java.lang.invoke.MethodHandles;
+import java.util.*;
--- End diff --

UGh, it slipped back in again. Ok, removing it.


---


[GitHub] metron issue #929: METRON-1448: Update SolrWriter to conform to new collecti...

2018-02-07 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/929
  
Ok, I updated the configs to expose all the settings that I could find for 
committing and documented them.  By default we operate safely, though.  Though 
now you can use soft commits rather than durable commits if you so desire and 
wish to live dangerously.


---


[GitHub] metron issue #929: METRON-1448: Update SolrWriter to conform to new collecti...

2018-02-07 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/929
  
LOL, nah, I was just being crotchety.  Let me see what I can do and you 
tell me what you think :)


---


[GitHub] metron issue #929: METRON-1448: Update SolrWriter to conform to new collecti...

2018-02-07 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/929
  
Actually, I can make an option to let you do a soft commit instead of a 
hard commit if that'll be an extra layer of configuration.


---


[GitHub] metron issue #929: METRON-1448: Update SolrWriter to conform to new collecti...

2018-02-07 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/929
  
You can set it to not manually commit (set `solr.commitPerBatch` to `false` 
and no committing happens), but you're risking losing data if a worker dies.  
Honestly, want a durable commit with a fsync before you ack the tuples in a 
batch, otherwise you're courting data loss.  This is the same strategy we do 
for ES and HDFS (though commit there is a fsync).  That being said, I'm 
sensitive to performance issues around that that people may have, so I let 
people turn it off with a strong warning in the docs (also this was legacy 
behavior in the SolrWriter).


---


[GitHub] metron issue #929: METRON-1448: Update SolrWriter to conform to new collecti...

2018-02-07 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/929
  
What do you mean "auto-commit"?  By default it will commit at the batch 
level as the other writers do.  You can turn that behavior off and it'll commit 
when the Solr client decides at the cost of possibly losing data.

Regarding solr vs solrcloud, to my understanding solr is the product and 
solrcloud is a way of running Solr.  The other way is solr standalone.  
Originally we named things after the product, which I think is correct, and 
have made explicit that we support Solr running in Solr Cloud mode in the docs.


---


[GitHub] metron pull request #929: METRON-1448: Update SolrWriter to conform to new c...

2018-02-07 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/929#discussion_r166682417
  
--- Diff: 
metron-platform/metron-solr/src/main/java/org/apache/metron/solr/writer/SolrWriter.java
 ---
@@ -33,17 +39,19 @@
 
 import java.io.IOException;
 import java.io.Serializable;
-import java.util.List;
-import java.util.Map;
+import java.lang.invoke.MethodHandles;
+import java.util.*;
--- End diff --

You got it :)


---


[GitHub] metron pull request #929: METRON-1448: Update SolrWriter to conform to new c...

2018-02-07 Thread cestella
GitHub user cestella opened a pull request:

https://github.com/apache/metron/pull/929

METRON-1448: Update SolrWriter to conform to new collection strategy

## Contributor Comments
Currently the SolrWriter presumes a single collection to be written to.  
The new collection strategy for Solr implies a collection per sensor.  Also, 
there are a few rough edges in the writer which could stand smoothing:
* By default, we use solr's implicit commit mechanism, rather than 
committing at the batch granularity.  This may result in lost data on worker 
failure.
* We do not use the the batch add api, but rather message-by-message add

Testing plan pending.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- [x] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cestella/incubator-metron SOLR_writer_mod

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/929.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #929


commit 6bb30af9d2005414e3ee44c0bdb0ea14540ce13c
Author: cstella <cestella@...>
Date:   2018-02-01T21:33:56Z

METRON-1441: Create complementary Solr schemas for the main sensors

commit f4ff0c401eff23d9c1b2ca3b264bd9b0d4e8f381
Author: cstella <cestella@...>
Date:   2018-02-01T21:47:12Z

Updating dao

commit 7e2ecb0f2f55ea16529128fec14920bc2a546b07
Author: cstella <cestella@...>
Date:   2018-02-02T21:43:38Z

Migrated data to files, renamed test and added yaf and error.

commit 2aacd202ff1a2ebcbeb30300b30d080391cfe1cf
Author: cstella <cestella@...>
Date:   2018-02-02T21:45:08Z

Merge branch 'feature/METRON-1416-upgrade-solr' into SOLR_METRON-1441

commit 2e32e7ea4ef8cace764394c1dec693d8385a6b9a
Author: cstella <cestella@...>
Date:   2018-02-02T21:50:06Z

Added to readme.

commit e2901d4bd4b9787f668c2dccd2e4f8aa53a926d7
Author: cstella <cestella@...>
Date:   2018-02-05T14:39:31Z

Updating error to have a guid and removed docValues=true for bytes type.

commit 3c4319ec4581fdb259a697b548a267225316874a
Author: cstella <cestella@...>

[GitHub] metron pull request #922: METRON-1441: Create complementary Solr schemas for...

2018-02-07 Thread cestella
Github user cestella closed the pull request at:

https://github.com/apache/metron/pull/922


---


  1   2   3   4   5   6   7   8   >