[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/940
  
Maybe the issue has to do with our keys, and their distribution as the size 
get's larger?  Maybe when we get larger sizes we get more collisions and end up 
calling equals() more or something.


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/940
  
This should have the equiv. diagram and documentation ( i believe as shown 
above ) to the original split join strategy.



---


[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172383791
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/Strategy.java
 ---
@@ -0,0 +1,47 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.parallel;
+
+import org.apache.metron.common.Constants;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.json.simple.JSONObject;
+import org.slf4j.Logger;
+
+import java.util.Map;
+
+/**
+ * Enrichment strategy.  This interface provides a mechanism to interface 
with the enrichment config and any
+ * post processing steps that are needed to be done after-the-fact.
+ *
+ * The reasoning behind this is that the key difference between 
enrichments and threat intel is that they pull
+ * their configurations from different parts of the SensorEnrichmentConfig 
object and as a post-join step, they differ
+ * slightly.
+ *
+ */
+public interface Strategy {
+  Constants.ErrorType getErrorType();
--- End diff --

done


---


[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172383810
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/bolt/UnifiedEnrichmentBolt.java
 ---
@@ -0,0 +1,415 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.bolt;
+
+import org.apache.metron.common.Constants;
+import org.apache.metron.common.bolt.ConfiguredEnrichmentBolt;
+import org.apache.metron.common.configuration.ConfigurationType;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import org.apache.metron.common.error.MetronError;
+import org.apache.metron.common.performance.PerformanceLogger;
+import org.apache.metron.common.utils.ErrorUtils;
+import org.apache.metron.common.utils.MessageUtils;
+import org.apache.metron.enrichment.adapters.geo.GeoLiteDatabase;
+import org.apache.metron.enrichment.configuration.Enrichment;
+import org.apache.metron.enrichment.interfaces.EnrichmentAdapter;
+import org.apache.metron.enrichment.parallel.EnrichmentContext;
+import org.apache.metron.enrichment.parallel.EnrichmentStrategies;
+import org.apache.metron.enrichment.parallel.ParallelEnricher;
+import org.apache.metron.enrichment.parallel.WorkerPoolStrategy;
+import org.apache.metron.stellar.dsl.Context;
+import org.apache.metron.stellar.dsl.StellarFunction;
+import org.apache.metron.stellar.dsl.StellarFunctions;
+import org.apache.storm.task.OutputCollector;
+import org.apache.storm.task.TopologyContext;
+import org.apache.storm.topology.OutputFieldsDeclarer;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Tuple;
+import org.apache.storm.tuple.Values;
+import org.json.simple.JSONObject;
+import org.json.simple.parser.JSONParser;
+import org.json.simple.parser.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
+/**
+ * This bolt is a unified enrichment/threat intel bolt.  In contrast to 
the split/enrich/join
+ * bolts above, this handles the entire enrichment lifecycle in one bolt 
using a threadpool to
+ * enrich in parallel.
+ *
+ * From an architectural perspective, this is a divergence from the 
polymorphism based strategy we have
+ * used in the split/join bolts.  Rather, this bolt is provided a strategy 
to use, either enrichment or threat intel,
+ * through composition.  This allows us to move most of the implementation 
into components independent
+ * from Storm.  This will greater facilitate reuse.
+ */
+public class UnifiedEnrichmentBolt extends ConfiguredEnrichmentBolt {
+
+  public static class Perf {} // used for performance logging
+  private PerformanceLogger perfLog; // not static bc multiple bolts may 
exist in same worker
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  public static final String STELLAR_CONTEXT_CONF = "stellarContext";
+
+  /**
+   * The number of threads in the threadpool.  One threadpool is created 
per process.
+   * This is a topology-level configuration
+   */
+  public static final String THREADPOOL_NUM_THREADS_TOPOLOGY_CONF = 
"metron.threadpool.size";
+  /**
+   * The type of threadpool to create. This is a topology-level 
configuration.
+   */
+  public static final String THREADPOOL_TYPE_TOPOLOGY_CONF = 
"metron.threadpool.type";
+
+  /**
+   * The enricher implementation to use.  This will do the parallel 
enrichment via a thread pool.
+   */
+  protected ParallelEnricher enricher;
+
+  /**
+   * The strategy to use for this enrichment bolt.  Practically speaking 
this is either
+   * enrichment or threat 

[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
@nickwallen Ok, I refactored the abstraction to separate some concerns, 
name things a bit, and collapse some of the more onerous abstractions.  Also 
updated javadocs.  

Can you give it another look and see what you think?  We probably should 
also give it another smoketest in the lab to make sure I didn't do something 
dumb.


---


[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172383754
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/EnrichmentStrategies.java
 ---
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.parallel;
+
+import org.apache.metron.common.Constants;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.apache.metron.enrichment.bolt.CacheKey;
+import org.json.simple.JSONObject;
+import org.slf4j.Logger;
+
+import java.util.Map;
+import java.util.concurrent.Executor;
+
+public enum EnrichmentStrategies implements Strategy {
--- End diff --

I decided that this is too onerous of an abstraction and rethought it a 
bit.  Give it another look, please.


---


[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172377136
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/EnrichmentStrategies.java
 ---
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.parallel;
+
+import org.apache.metron.common.Constants;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.apache.metron.enrichment.bolt.CacheKey;
+import org.json.simple.JSONObject;
+import org.slf4j.Logger;
+
+import java.util.Map;
+import java.util.concurrent.Executor;
+
+public enum EnrichmentStrategies implements Strategy {
--- End diff --

I might be answering a question that you're not asking, but this bit of 
awkwardness arises because we have merged the concepts of threat intel and 
enrichment, which differ really only in post-processing.  The approach 
presented here, in contrast to the inheritance-based approach in the bolts, 
allows for an abstraction through composition whereby we localize all the 
interactions with the sensor enrichment config in a strategy rather than bind 
the abstraction to Storm, our distributed processing engine.  That is the 
rationale behind this approach at least.


---


[GitHub] metron issue #933: METRON-1452 Rebase Dev Environment on Latest CentOS 6

2018-03-05 Thread mmiklavc
Github user mmiklavc commented on the issue:

https://github.com/apache/metron/pull/933
  
Oh, I guess we need to reaffirm. Yes, +1 still stands.


---


[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172373203
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/bolt/UnifiedEnrichmentBolt.java
 ---
@@ -0,0 +1,415 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.bolt;
+
+import org.apache.metron.common.Constants;
+import org.apache.metron.common.bolt.ConfiguredEnrichmentBolt;
+import org.apache.metron.common.configuration.ConfigurationType;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import org.apache.metron.common.error.MetronError;
+import org.apache.metron.common.performance.PerformanceLogger;
+import org.apache.metron.common.utils.ErrorUtils;
+import org.apache.metron.common.utils.MessageUtils;
+import org.apache.metron.enrichment.adapters.geo.GeoLiteDatabase;
+import org.apache.metron.enrichment.configuration.Enrichment;
+import org.apache.metron.enrichment.interfaces.EnrichmentAdapter;
+import org.apache.metron.enrichment.parallel.EnrichmentContext;
+import org.apache.metron.enrichment.parallel.EnrichmentStrategies;
+import org.apache.metron.enrichment.parallel.ParallelEnricher;
+import org.apache.metron.enrichment.parallel.WorkerPoolStrategy;
+import org.apache.metron.stellar.dsl.Context;
+import org.apache.metron.stellar.dsl.StellarFunction;
+import org.apache.metron.stellar.dsl.StellarFunctions;
+import org.apache.storm.task.OutputCollector;
+import org.apache.storm.task.TopologyContext;
+import org.apache.storm.topology.OutputFieldsDeclarer;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Tuple;
+import org.apache.storm.tuple.Values;
+import org.json.simple.JSONObject;
+import org.json.simple.parser.JSONParser;
+import org.json.simple.parser.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
+/**
+ * This bolt is a unified enrichment/threat intel bolt.  In contrast to 
the split/enrich/join
+ * bolts above, this handles the entire enrichment lifecycle in one bolt 
using a threadpool to
+ * enrich in parallel.
+ *
+ * From an architectural perspective, this is a divergence from the 
polymorphism based strategy we have
+ * used in the split/join bolts.  Rather, this bolt is provided a strategy 
to use, either enrichment or threat intel,
+ * through composition.  This allows us to move most of the implementation 
into components independent
+ * from Storm.  This will greater facilitate reuse.
+ */
+public class UnifiedEnrichmentBolt extends ConfiguredEnrichmentBolt {
+
+  public static class Perf {} // used for performance logging
+  private PerformanceLogger perfLog; // not static bc multiple bolts may 
exist in same worker
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  public static final String STELLAR_CONTEXT_CONF = "stellarContext";
+
+  /**
+   * The number of threads in the threadpool.  One threadpool is created 
per process.
+   * This is a topology-level configuration
+   */
+  public static final String THREADPOOL_NUM_THREADS_TOPOLOGY_CONF = 
"metron.threadpool.size";
+  /**
+   * The type of threadpool to create. This is a topology-level 
configuration.
+   */
+  public static final String THREADPOOL_TYPE_TOPOLOGY_CONF = 
"metron.threadpool.type";
+
+  /**
+   * The enricher implementation to use.  This will do the parallel 
enrichment via a thread pool.
+   */
+  protected ParallelEnricher enricher;
+
+  /**
+   * The strategy to use for this enrichment bolt.  Practically speaking 
this is either
+   * enrichment or threat 

[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172369461
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/EnrichmentStrategies.java
 ---
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.parallel;
+
+import org.apache.metron.common.Constants;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.apache.metron.enrichment.bolt.CacheKey;
+import org.json.simple.JSONObject;
+import org.slf4j.Logger;
+
+import java.util.Map;
+import java.util.concurrent.Executor;
+
+public enum EnrichmentStrategies implements Strategy {
--- End diff --

This is a strategy pattern using an enum.  The purpose of this class is to 
resolve the specific strategies possible.  It's broadly in line with other 
strategy patterns (e.g. Extractors). 


---


[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172369480
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/Strategy.java
 ---
@@ -0,0 +1,47 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.parallel;
+
+import org.apache.metron.common.Constants;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.json.simple.JSONObject;
+import org.slf4j.Logger;
+
+import java.util.Map;
+
+/**
+ * Enrichment strategy.  This interface provides a mechanism to interface 
with the enrichment config and any
+ * post processing steps that are needed to be done after-the-fact.
+ *
+ * The reasoning behind this is that the key difference between 
enrichments and threat intel is that they pull
+ * their configurations from different parts of the SensorEnrichmentConfig 
object and as a post-join step, they differ
+ * slightly.
+ *
+ */
+public interface Strategy {
+  Constants.ErrorType getErrorType();
--- End diff --

Sure thing.


---


[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172359339
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/Strategy.java
 ---
@@ -0,0 +1,47 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.parallel;
+
+import org.apache.metron.common.Constants;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.json.simple.JSONObject;
+import org.slf4j.Logger;
+
+import java.util.Map;
+
+/**
+ * Enrichment strategy.  This interface provides a mechanism to interface 
with the enrichment config and any
+ * post processing steps that are needed to be done after-the-fact.
+ *
+ * The reasoning behind this is that the key difference between 
enrichments and threat intel is that they pull
+ * their configurations from different parts of the SensorEnrichmentConfig 
object and as a post-join step, they differ
+ * slightly.
+ *
+ */
+public interface Strategy {
+  Constants.ErrorType getErrorType();
--- End diff --

Can we javadoc each method?  This seems like an important interface.


---


[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172353404
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/bolt/UnifiedEnrichmentBolt.java
 ---
@@ -0,0 +1,415 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.bolt;
+
+import org.apache.metron.common.Constants;
+import org.apache.metron.common.bolt.ConfiguredEnrichmentBolt;
+import org.apache.metron.common.configuration.ConfigurationType;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import org.apache.metron.common.error.MetronError;
+import org.apache.metron.common.performance.PerformanceLogger;
+import org.apache.metron.common.utils.ErrorUtils;
+import org.apache.metron.common.utils.MessageUtils;
+import org.apache.metron.enrichment.adapters.geo.GeoLiteDatabase;
+import org.apache.metron.enrichment.configuration.Enrichment;
+import org.apache.metron.enrichment.interfaces.EnrichmentAdapter;
+import org.apache.metron.enrichment.parallel.EnrichmentContext;
+import org.apache.metron.enrichment.parallel.EnrichmentStrategies;
+import org.apache.metron.enrichment.parallel.ParallelEnricher;
+import org.apache.metron.enrichment.parallel.WorkerPoolStrategy;
+import org.apache.metron.stellar.dsl.Context;
+import org.apache.metron.stellar.dsl.StellarFunction;
+import org.apache.metron.stellar.dsl.StellarFunctions;
+import org.apache.storm.task.OutputCollector;
+import org.apache.storm.task.TopologyContext;
+import org.apache.storm.topology.OutputFieldsDeclarer;
+import org.apache.storm.tuple.Fields;
+import org.apache.storm.tuple.Tuple;
+import org.apache.storm.tuple.Values;
+import org.json.simple.JSONObject;
+import org.json.simple.parser.JSONParser;
+import org.json.simple.parser.ParseException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
+/**
+ * This bolt is a unified enrichment/threat intel bolt.  In contrast to 
the split/enrich/join
+ * bolts above, this handles the entire enrichment lifecycle in one bolt 
using a threadpool to
+ * enrich in parallel.
+ *
+ * From an architectural perspective, this is a divergence from the 
polymorphism based strategy we have
+ * used in the split/join bolts.  Rather, this bolt is provided a strategy 
to use, either enrichment or threat intel,
+ * through composition.  This allows us to move most of the implementation 
into components independent
+ * from Storm.  This will greater facilitate reuse.
+ */
+public class UnifiedEnrichmentBolt extends ConfiguredEnrichmentBolt {
+
+  public static class Perf {} // used for performance logging
+  private PerformanceLogger perfLog; // not static bc multiple bolts may 
exist in same worker
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  public static final String STELLAR_CONTEXT_CONF = "stellarContext";
+
+  /**
+   * The number of threads in the threadpool.  One threadpool is created 
per process.
+   * This is a topology-level configuration
+   */
+  public static final String THREADPOOL_NUM_THREADS_TOPOLOGY_CONF = 
"metron.threadpool.size";
+  /**
+   * The type of threadpool to create. This is a topology-level 
configuration.
+   */
+  public static final String THREADPOOL_TYPE_TOPOLOGY_CONF = 
"metron.threadpool.type";
+
+  /**
+   * The enricher implementation to use.  This will do the parallel 
enrichment via a thread pool.
+   */
+  protected ParallelEnricher enricher;
+
+  /**
+   * The strategy to use for this enrichment bolt.  Practically speaking 
this is either
+   * enrichment or threat 

[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...

2018-03-05 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/940#discussion_r172363362
  
--- Diff: 
metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/EnrichmentStrategies.java
 ---
@@ -0,0 +1,79 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.enrichment.parallel;
+
+import org.apache.metron.common.Constants;
+import 
org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig;
+import 
org.apache.metron.common.configuration.enrichment.handler.ConfigHandler;
+import org.apache.metron.enrichment.bolt.CacheKey;
+import org.json.simple.JSONObject;
+import org.slf4j.Logger;
+
+import java.util.Map;
+import java.util.concurrent.Executor;
+
+public enum EnrichmentStrategies implements Strategy {
--- End diff --

I don't understand the purpose of this class.  Why have an 
`EnrichmentStrategy`, a `ThreatIntelStrategy`, and `EnrichmentStrategies`?


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
Ahhh, that makes sense.  I bet we were getting killed by small allocations 
in the caching layer.


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/metron/pull/940
  
Caffeine doesn't allocate on read, so that would make sense. I saw a [25x 
boost](https://github.com/google/guava/issues/2063#issuecomment-107169736) 
(compared to 
[current](https://github.com/google/guava/issues/2063#issue-82444927)) when 
porting the buffers to Guava.


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
I actually suspect GC as well.  We adjusted the garbage collector to the 
G1GC and saw throughput gains, but not nearly the kinds of gains as we got with 
a drop-in of Caffeine to replace Guava.


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/metron/pull/940
  
Interesting. Then I guess the size must trigger the read bottleneck as 
larger than writes. Perhaps it is incurring a lot more GC overhead that causes 
more collections? The CLQ additions requires allocating a new queue node. That 
and the cache entry probably get promoted to old gen due to the high churn 
rate, causing everything to slow down. Probably isn't too interesting to 
investigate vs swapping libraries :)


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
In this case, the loader isn't doing anything terribly expensive, though it 
may in the future (incur a hbase get or some more expensive computation).


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/metron/pull/940
  
Internally Guava uses a `ConcurrentLinkedQueue` and an `AtomicInteger` to 
record its size, per segment. When a read occurs, it records that in the queue 
and then drains it under the segment's lock (via tryLock) to replay the events. 
This is similar to Caffeine, which uses optimized structures instead. I 
intended the CLQ & counter as baseline scaffolding for replacement, as it is an 
obvious bottleneck, but I could never get it replaced despite advocating for 
it. The penalty of draining the buffers is amortized, but unfortunately this 
buffer isn't capped.

Since there would be a higher hit rate with a larger cache, the reads would 
be recorded more often. Perhaps contention there and the penalty of draining 
the queue is more observable than a cache miss. That's still surprising since a 
cache miss is usually more expensive I/O. Is the loader doing expensive work in 
your case?

Caffeine gets around this problem by using more optimal buffers and being 
lossy (on reads only) if it can't keep up. By default it delegates the 
amortized maintenance work to a ForkJoinPool to avoid user-facing latencies, 
since you'll want those variances to be tight. Much of that can be back ported 
onto Guava for a nice boost.


---


[GitHub] metron issue #924: METRON-1299 In MetronError tests, don't test for HostName...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/924
  
+1, sorry!


---


[GitHub] metron issue #924: METRON-1299 In MetronError tests, don't test for HostName...

2018-03-05 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/924
  
@cestella Bump


---


[GitHub] metron issue #933: METRON-1452 Rebase Dev Environment on Latest CentOS 6

2018-03-05 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/933
  
@mmiklavc @cestella Bump


---


[GitHub] metron pull request #946: METRON-1465:Support for Elasticsearch X-pack

2018-03-05 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request:

https://github.com/apache/metron/pull/946#discussion_r172351786
  
--- Diff: pom.xml ---
@@ -97,7 +97,7 @@
 
${base_hadoop_version}
 ${base_hbase_version}
 ${base_flume_version}
-5.6.2
+5.6.7
--- End diff --

Just noticed this - what's the motivation for changing ES versions as part 
of this PR?


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
We actually did increase the concurrency level for guava to 64; that is 
what confused us as well.  The hash code is mostly standard, should be evenly 
distributed (the key is pretty much a POJO).


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/metron/pull/940
  
Guava defaults to a `concurrencyLevel` of 4, given its age and a desire to 
not abuse memory in low concurrent situations. You probably want to increase it 
to 64 in a heavy workload, which has a ~4x throughput gain on reads. It won't 
scale much higher, since it has internal bottlenecks and I could never get 
patches reviewed to fix those.

I've only noticed overall throughput be based on threads, and never 
realized there was a capacity constraint to its performance. One should expect 
some due to the older hash table design resulting in more collisions, whereas 
CHMv8 does much better there. Still, I would have expected it to even out 
enough unless have a bad hash code?


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
The interesting thing that we found was that guava seems to be doing poorly 
when the # of items in the cache gets large.  When we scaled the test down (830 
distinct IP addresses chosen randomly and sent in at a rate of 200k events per 
second with a cache size of 100) kept up but scaling the test up (300k distinct 
ip addresses chosen randomly and sent in at a rate of 200k events per second 
with a cache size of 100k) didn't. 


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/metron/pull/940
  
That makes sense. A uniform distribution will, of course, degrades all 
policies to random replacement so the test is then about how well the 
implementations handle concurrency. Most often caches exhibit a Zipfian 
distribution (80-20 rule), so our bias towards frequency is a net gain. We have 
observed a few rare cases where frequency is a poor signal and LRU is optimal, 
and we are exploring adaptive techniques to dynamically tune the cache based on 
the workload's characteristics. These cases don't seem to occur in many 
real-world scenarios that we know of, but it is always nice to know what users 
are experiencing and how much better (or worse) we perform than a standard LRU 
cache.


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
We were being purposefully unkind to the cache in the tests.  The load 
simulation chose a IP address at random to present, so each IP had an equal 
probability of being selected.  Whereas, in real traffic, we expect a coherent 
working set.  Not sure of the exact hit rates, though.


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread ben-manes
Github user ben-manes commented on the issue:

https://github.com/apache/metron/pull/940
  
Do you know what the hit rates were, for the same data set, between Guava 
and Caffeine? The caches use different policies so it is always interesting to 
see how the handle given workloads. As we continue to refine our adaptive 
algorithm W-TinyLFU, its handy to know what types of workloads to investigate. 
(P.S. We have a simulator for re-running persisted traces if useful for your 
tuning)


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread nickwallen
Github user nickwallen commented on the issue:

https://github.com/apache/metron/pull/940
  
I completed some fairly extensive performance testing comparing this new 
Unified topology against the existing Split-Join implementation.  The 
difference was dramatic. 

- The Unified topology _performed roughly 3.4 times faster than Split-Join._

Both topologies in this side-by-side test included the same fixes, 
including the Guava cache problem fixed in #947. The tests included two 
enrichments:
* GeoIP enrichment; `geo := GEO_GET(ip_dst_addr)`
* Compute-only Stellar enrichment; `local := IN_SUBNET(ip_dst_addr, 
'192.168.0.0/24')`

The number one driver of performance is the cache hit rate, which is 
heavily dependent on what your data looks-like.  With these enrichments, that's 
driven by how varied the `ip_dst_addr` is in the data.  

I tested both of these topologies with different sets of data intended to 
either increase or decrease that cache hit rate.  The differences between the 
two topologies were fairly consistent across the different data sets. 

When running these topologies, reasonably well-tuned, on the same data, I 
was able to consistently maintain 70,000 events per second with the Split/Join 
topology.  In the same environment, I was able to maintain 312,000 events per 
second using the Unified topology.  

The raw throughput numbers are relative and depend on how much hardware you 
are willing to throw at the problem.  I was running on 3 nodes dedicated to 
running the Enrichment topology only.  But with the same data, on the same 
hardware, the difference was 3.4 times.  That's big.

Pushing as much as you can into a single executor and avoiding network hops 
is definitely the way to go here.



---


[GitHub] metron pull request #941: METRON-1355: Convert metron-elasticsearch to new i...

2018-03-05 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/941#discussion_r172318991
  
--- Diff: metron-contrib/metron-docker-e2e/README.md ---
@@ -0,0 +1,94 @@
+
+# Metron Docker
+
+Metron Docker E2E is a [Docker 
Compose](https://docs.docker.com/compose/overview/) application that serves as 
a backend to integration tests.
+
+Metron Docker includes these images that have been customized for Metron:
+
+  - Kafka
+  - Zookeeper
+  - Elasticsearch
+  - Metron REST
+  - Metron UIs
+
+Setup
+-
+
+Install [Docker for Mac](https://docs.docker.com/docker-for-mac/) or 
[Docker for Windows](https://docs.docker.com/docker-for-windows/).  The 
following versions have been tested:
+
+  - Docker version 17.12.0-ce
+  - docker-machine version 0.13.0
+  - docker-compose version 1.18.0
+
+Build Metron from the top level directory with:
+```
+$ cd $METRON_HOME
+$ mvn clean install -DskipTests
+```
+
+Create a Docker machine:
+```
+$ export 
METRON_DOCKER_E2E_HOME=$METRON_HOME/metron-contrib/metron-docker-e2e
+$ cd $METRON_DOCKER_E2E_HOME
+$ ./scripts/create-docker-machine.sh
+```
+
+This will create a host called "metron-machine".  Anytime you want to run 
Docker commands against this host, make sure you run this first to set the 
Docker environment variables:
+```
+$ eval "$(docker-machine env metron-machine)"
+```
+
+If you wish to use a local docker-engine install, please set an 
environment variable BROKER_IP_ADDR to the IP address of your host machine. 
This cannot be the loopback address.
+
+Usage
+-
+
+Navigate to the compose application root:
+```
+$ cd $METRON_DOCKER_E2E_HOME/compose/
+```
+
+The Metron Docker environment lifecycle is controlled by the 
[docker-compose](https://docs.docker.com/compose/reference/overview/) command.  
The service names can be found in the docker-compose.yml file.  For example, to 
build and start the environment run this command:
+```
+$ eval "$(docker-machine env metron-machine)"
+$ docker-compose up -d
+```
+
+After all services have started list the containers and ensure their 
status is 'Up':
+```
+$ docker-compose ps
+ Name   Command   State
   Ports 

+
+metron_elasticsearch_1   /bin/bash bin/es-docker  Up  
0.0.0.0:9210->9200/tcp, 0.0.0.0:9310->9300/tcp
+metron_kafka_1   start-kafka.sh   Up  
0.0.0.0:9092->9092/tcp
+metron_metron-rest_1 /bin/sh -c ./bin/start.shUp  
0.0.0.0:8082->8082/tcp
+metron_metron-ui_1   /bin/sh -c ./bin/start.shUp  
0.0.0.0:4201->4201/tcp
+metron_zookeeper_1   /docker-entrypoint.sh zkSe ...   Up  
0.0.0.0:2181->2181/tcp, 2888/tcp, 3888/tcp
+```
+
+Various services are exposed through http on the Docker host.  Get the 
host ip from the URL property:
+```
+$ docker-machine ls
+NAME ACTIVE   DRIVER   STATE URL   
  SWARM   DOCKERERRORS
+metron-machine   *virtualbox   Running   tcp://192.168.99.100:2376 
  v1.12.5
+```
+
+The various integration tests can now be run against this environment.
+
+TODO: document how to set docker machine ip address for e2e tests
--- End diff --

Looks like a leftover TODO.  Would you mind adding this info?


---


[GitHub] metron issue #941: METRON-1355: Convert metron-elasticsearch to new infrastr...

2018-03-05 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/941
  
I'm unable to get the integration tests running locally.  I've been able to 
get the docker containers up and running, but ES isn't exposed at localhost, 
only through the explicit docker-machine ip.  This causes the integration tests 
to fail, as they are unable to successfully query against ES.
Any idea why this is happening? Am I missing an env config or something?

Also, is it worthwhile to wrap most of the setup in a script? There's 
enough steps that have to be taken that I'd really like a script that's just 
"setup_integration_env.sh".  Extra credit if it can be rolled into the maven 
build such that it just automagically takes care of it when we run the maven 
integration test target (or it's run by install or whatever).  It would be 
super nice if we didn't have to worry about anything other than just running 
the maven target.

Regardless of what wrapping we do, it would be really nice to clean some 
stuff up so the integration test instructions become a one-shot "Here's the 
step by step". Right now, I had to look at the Travis run to figure out 
(remember) the semi-odd way we split unit and integration test calls in maven, 
etc.


---


[GitHub] metron-bro-plugin-kafka issue #6: Configurable JSON timestamps and default a...

2018-03-05 Thread dcode
Github user dcode commented on the issue:

https://github.com/apache/metron-bro-plugin-kafka/pull/6
  
That'd be great if you wouldn't mind to create a ticket for this.


---


[GitHub] metron-bro-plugin-kafka issue #7: METRON-1324: Increment metron-bro-plugin-k...

2018-03-05 Thread JonZeolla
Github user JonZeolla commented on the issue:

https://github.com/apache/metron-bro-plugin-kafka/pull/7
  
# Testing
## Build and install manually
Some guideline commands to test:

```
mkdir tmp
cd tmp
git clone https://github.com/bro/bro
cd bro
git checkout v2.5.3
git submodule update --recursive --init
./configure && make
cd ..
git clone https://github.com/apache/metron-bro-plugin-kafka
cd metron-bro-plugin-kafka
./configure --bro-dist=../bro && make && sudo make install
bro -N Apache::Kafka
```

## Use `bro-pkg`

```
bro-pkg install metron-bro-plugin-kafka
bro -N Apache::Kafka
```


---


[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/940
  
I ran this up with vagrant and ensured:
* Normal stellar works still in field transformations as well as enrichments
* swapped in and out new enrichments live
* swapped in and out new threat intel live

Are there any other pending issues here beyond a report of the performance 
impact?


---


[GitHub] metron issue #944: METRON-1463: Adjust the groupings and shuffles in enrichm...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/944
  
I ran this up with vagrant and ensured:
* Normal stellar works still in field transformations as well as enrichments
* swapped in and out new enrichments live
* swapped in and out new threat intel live


---


[GitHub] metron issue #947: METRON-1467: Replace guava caches in places where the key...

2018-03-05 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/947
  
I ran this up with vagrant and ensured:
* Normal stellar works still in field transformations as well as enrichments
* swapped in and out new enrichments live
* swapped in and out new threat intel live


---


[GitHub] metron pull request #948: METRON-1468: Add support for apache/metron-bro-plu...

2018-03-05 Thread JonZeolla
GitHub user JonZeolla opened a pull request:

https://github.com/apache/metron/pull/948

METRON-1468: Add support for apache/metron-bro-plugin-kafka to 
prepare-commit

## Contributor Comments
This updates the prepare-commit script to work with 
`apache/metron-bro-plugin-kafka`.

To test, run the `prepare-commit` script and specify `metron`, `bro`, or 
blank when it asks for which repo.  It should use metron by default, but 
properly accounts for the bro repo if you specify it.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [X] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [X] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [X] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [X] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [X] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [ ] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh
  ```

- [N/A] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [N/A] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [N/A] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- [X] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JonZeolla/metron METRON-1468

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/948.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #948


commit 4ca17d5780729a61615e7ee6bc86e6ddf86c339e
Author: Nick Allen 
Date:   2017-11-27T20:29:38Z

METRON-1320 Cannot perform a bare-metal installation

commit b64606997307e233a4ef9741c2ad16e622eac952
Author: Nick Allen 
Date:   2017-11-27T20:55:12Z

Cleaning up after the C++ file(s) that we create

commit 551e3084c60346a6459101994043afa84869ab61
Author: Jon Zeolla 
Date:   2017-11-29T02:03:54Z

Merge branch 'METRON-1320' of https://github.com/nickwallen/metron into 
METRON-1320

commit 1040681a59907fdedf8784bf19fc025cc9125b1c
Author: Jon Zeolla 
Date:   2017-12-05T12:56:40Z

Merge branch 'master' of https://github.com/apache/metron

commit e916d3dcbeb744a426559d7e7e47695ebd37748b
Author: Jon Zeolla 
Date:   2017-12-06T14:00:51Z

Merge branch 'master' of https://github.com/apache/metron

commit a7b5bdfd5652035beb38aed52fd4e1e12e680439
Author: Jon Zeolla 
Date:   2017-12-07T20:14:46Z

Merge branch 'master' of https://github.com/apache/metron

commit f9af29979385877b90f7940561382c722404eff8
Author: Jon Zeolla 
Date:   2017-12-08T14:22:24Z

Merge branch 'master' of https://github.com/apache/metron

commit 5c6dc76d9fbdb58274171446bc75800a42466ba2
Author: Jon Zeolla 
Date:   

[GitHub] metron-bro-plugin-kafka issue #6: Configurable JSON timestamps and default a...

2018-03-05 Thread JonZeolla
Github user JonZeolla commented on the issue:

https://github.com/apache/metron-bro-plugin-kafka/pull/6
  
It's a part of the `apache/metron` project (of which this is considered a 
component) and uses the open apache JIRA that I linked above.  In order to 
accept PRs we need to have a JIRA.  You should be able to register and submit 
something rather simply, but I also wouldn't mind handling this if you'd 
prefer, just let me know.


---


[GitHub] metron-bro-plugin-kafka issue #6: Configurable JSON timestamps and default a...

2018-03-05 Thread dcode
Github user dcode commented on the issue:

https://github.com/apache/metron-bro-plugin-kafka/pull/6
  
I haven't created a JIRA ticket. Not sure if that's something internal.


---


[GitHub] metron-bro-plugin-kafka pull request #7: METRON-1324: Increment metron-bro-p...

2018-03-05 Thread JonZeolla
GitHub user JonZeolla opened a pull request:

https://github.com/apache/metron-bro-plugin-kafka/pull/7

METRON-1324:  Increment metron-bro-plugin-kafka version

We have some changes staged to upgrade the plugin, so we should increment 
the version.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JonZeolla/metron-bro-plugin-kafka METRON-1324

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron-bro-plugin-kafka/pull/7.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7


commit 41a4ee6c478f59676c5a1a3ce7d5f1e0e1a39b85
Author: Jon Zeolla 
Date:   2018-03-05T17:45:05Z

METRON-1324:  Increment metron-bro-plugin-kafka version




---


[GitHub] metron-bro-plugin-kafka issue #6: Configurable JSON timestamps and default a...

2018-03-05 Thread JonZeolla
Github user JonZeolla commented on the issue:

https://github.com/apache/metron-bro-plugin-kafka/pull/6
  
This is really coming together.  Is there a 
[JIRA](https://issues.apache.org/jira/browse/METRON-1325?filter=-5=project%20%3D%20METRON%20AND%20resolution%20%3D%20Unresolved%20order%20by%20priority%20DESC%2Cupdated%20DESC)
 for this?  I poked around for a bit and couldn't find one.  


---


[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread dcode
Github user dcode commented on a diff in the pull request:


https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172248492
  
--- Diff: src/KafkaWriter.cc ---
@@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): 
WriterBackend(frontend), for
 }
 
 KafkaWriter::~KafkaWriter()
-{}
+{
+
+// Cleanup all the things
+delete topic;
+delete producer;
+delete formatter;
+delete conf;
+delete topic_conf;
+
+}
 
 bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const 
threading::Field* const* fields)
 {
+// Timeformat object, default to TS_EPOCH
+threading::formatter::JSON::TimeFormat tf = 
threading::formatter::JSON::TS_EPOCH;
+
 // if no global 'topic_name' is defined, use the log stream's 'path'
 if(topic_name.empty()) {
 topic_name = info.path;
 }
 
+// format timestamps
+if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) {
--- End diff --

Done.


---


[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread ottobackwards
Github user ottobackwards commented on a diff in the pull request:


https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172243410
  
--- Diff: src/KafkaWriter.cc ---
@@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): 
WriterBackend(frontend), for
 }
 
 KafkaWriter::~KafkaWriter()
-{}
+{
+
+// Cleanup all the things
+delete topic;
+delete producer;
+delete formatter;
+delete conf;
+delete topic_conf;
+
+}
 
 bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const 
threading::Field* const* fields)
 {
+// Timeformat object, default to TS_EPOCH
+threading::formatter::JSON::TimeFormat tf = 
threading::formatter::JSON::TS_EPOCH;
+
 // if no global 'topic_name' is defined, use the log stream's 'path'
 if(topic_name.empty()) {
 topic_name = info.path;
 }
 
+// format timestamps
+if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) {
--- End diff --

Maybe we should put an implementation comment explaining that in there


---


[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:


https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172240860
  
--- Diff: src/KafkaWriter.cc ---
@@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): 
WriterBackend(frontend), for
 }
 
 KafkaWriter::~KafkaWriter()
-{}
+{
+
+// Cleanup all the things
+delete topic;
+delete producer;
+delete formatter;
+delete conf;
+delete topic_conf;
+
+}
 
 bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const 
threading::Field* const* fields)
 {
+// Timeformat object, default to TS_EPOCH
+threading::formatter::JSON::TimeFormat tf = 
threading::formatter::JSON::TS_EPOCH;
+
 // if no global 'topic_name' is defined, use the log stream's 'path'
 if(topic_name.empty()) {
 topic_name = info.path;
 }
 
+// format timestamps
+if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) {
--- End diff --

Ah, I see.  Thanks for clarifying.  Let's work with what you have.  I agree 
there is little documentation in these parts.


---


[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread dcode
Github user dcode commented on a diff in the pull request:


https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172229125
  
--- Diff: src/KafkaWriter.cc ---
@@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): 
WriterBackend(frontend), for
 }
 
 KafkaWriter::~KafkaWriter()
-{}
+{
+
+// Cleanup all the things
+delete topic;
+delete producer;
+delete formatter;
+delete conf;
+delete topic_conf;
+
+}
 
 bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const 
threading::Field* const* fields)
 {
+// Timeformat object, default to TS_EPOCH
+threading::formatter::JSON::TimeFormat tf = 
threading::formatter::JSON::TS_EPOCH;
+
 // if no global 'topic_name' is defined, use the log stream's 'path'
 if(topic_name.empty()) {
 topic_name = info.path;
 }
 
+// format timestamps
+if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) {
--- End diff --

I copied this approach from the ASCII log writer here: 

https://github.com/bro/bro/blob/fc33bf2014704fe0ae512b76dae64de7fc5e83ac/src/logging/writers/ascii/Ascii.cc#L166-L176

I don't think BinPac can pass through a BinPac enum to a C++ enum. Bro 
enums are defined at runtime, whereas C++ is a compile-time definition. If you 
can point to an example to make this work, I'm happy to do it. The C API for 
BinPac isn't extremely well documented :face_with_head_bandage: 


---


[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread dcode
Github user dcode commented on a diff in the pull request:


https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172229215
  
--- Diff: src/KafkaWriter.cc ---
@@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): 
WriterBackend(frontend), for
 }
 
 KafkaWriter::~KafkaWriter()
-{}
+{
+
+// Cleanup all the things
+delete topic;
+delete producer;
+delete formatter;
+delete conf;
+delete topic_conf;
+
+}
 
 bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const 
threading::Field* const* fields)
 {
+// Timeformat object, default to TS_EPOCH
+threading::formatter::JSON::TimeFormat tf = 
threading::formatter::JSON::TS_EPOCH;
+
 // if no global 'topic_name' is defined, use the log stream's 'path'
 if(topic_name.empty()) {
 topic_name = info.path;
 }
 
+// format timestamps
+if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) {
+  tf = threading::formatter::JSON::TS_EPOCH;
+}
+else if ( strcmp(json_timestamps.c_str(), "JSON::TS_MILLIS") == 0 ) {
+  tf = threading::formatter::JSON::TS_MILLIS;
+}
+else if ( strcmp(json_timestamps.c_str(), "JSON::TS_ISO8601") == 0 ) {
+  tf = threading::formatter::JSON::TS_ISO8601;
+}
+else
+{
--- End diff --

will do :+1: 



---


[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread dcode
Github user dcode commented on a diff in the pull request:


https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172228973
  
--- Diff: src/KafkaWriter.cc ---
@@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): 
WriterBackend(frontend), for
 }
 
 KafkaWriter::~KafkaWriter()
-{}
+{
+
+// Cleanup all the things
+delete topic;
+delete producer;
+delete formatter;
+delete conf;
+delete topic_conf;
+
+}
 
 bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const 
threading::Field* const* fields)
 {
+// Timeformat object, default to TS_EPOCH
+threading::formatter::JSON::TimeFormat tf = 
threading::formatter::JSON::TS_EPOCH;
+
 // if no global 'topic_name' is defined, use the log stream's 'path'
 if(topic_name.empty()) {
 topic_name = info.path;
 }
 
+// format timestamps
+if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) {
+  tf = threading::formatter::JSON::TS_EPOCH;
+}
+else if ( strcmp(json_timestamps.c_str(), "JSON::TS_MILLIS") == 0 ) {
+  tf = threading::formatter::JSON::TS_MILLIS;
+}
+else if ( strcmp(json_timestamps.c_str(), "JSON::TS_ISO8601") == 0 ) {
+  tf = threading::formatter::JSON::TS_ISO8601;
+}
+else
+{
--- End diff --

I copied this approach from the ASCII log writer here: 

https://github.com/bro/bro/blob/fc33bf2014704fe0ae512b76dae64de7fc5e83ac/src/logging/writers/ascii/Ascii.cc#L166-L176

I don't think BinPac can pass through a BinPac enum to a C++ enum. Bro 
enums are defined at runtime, whereas C++ is a compile-time definition. If you 
can point to an example to make this work, I'm happy to do it. The C API for 
BinPac isn't extremely well documented :face_with_head_bandage: 


---


[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread dcode
Github user dcode commented on a diff in the pull request:


https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172225023
  
--- Diff: README.md ---
@@ -37,10 +37,11 @@ The following examples highlight different ways that 
the plugin can be used.  Si
 
 ### Example 1
 
-The goal in this example is to send all HTTP and DNS records to a Kafka 
topic named `bro`. 
+The goal in this example is to send all HTTP and DNS records to a Kafka 
topic named `bro`.
  * Any configuration value accepted by librdkafka can be added to the 
`kafka_conf` configuration table.  
  * By defining `topic_name` all records will be sent to the same Kafka 
topic.
- * Defining `logs_to_send` will ensure that only HTTP and DNS records are 
sent.
+ * Defining `logs_to_send` will ensure that only HTTP and DNS records are 
sent. An empty set will default to all logs being
--- End diff --

willdo


---


[GitHub] metron pull request #947: METRON-1467: Replace guava caches in places where ...

2018-03-05 Thread cestella
GitHub user cestella reopened a pull request:

https://github.com/apache/metron/pull/947

METRON-1467: Replace guava caches in places where the keyspace might be 
large (NOTE: Review after METRON-1460)

## Contributor Comments
Based on the performance tuning exercise as part of METRON-1460, guava has 
difficulties with cache sizes over 10k.  We, unfortunately, are quite demanding 
of guava in this regard so we should transition a few uses of guava to Caffeine:

* Stellar processor cache
* The JoinBolt cache
* The Enrichment Bolt Cache

NOTE: This depends on METRON-1460 aka #940 and that PR is included in here. 
 It might be easier to review after #940 is merged.

Test plan pending

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- [x] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cestella/incubator-metron 
guava_cache_replacement

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/947.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #947


commit a4f618a3ad895d62772366e0e93e5b8b37c5c964
Author: cstella 
Date:   2018-02-21T23:59:16Z

Adding parallel enrichment bolt.

commit 99fe0b86005fe04294b3851a17ae3d88f228c5d2
Author: cstella 
Date:   2018-02-22T00:21:06Z

Updating to include trace statements.

commit 79736c6f3fab04d01dd1eb998b308f438003a0e1
Author: cstella 
Date:   2018-02-22T15:35:44Z

Updating with some cleanup

commit cb4a527c9146865dafad1d597ba93032ef398d94
Author: cstella 
Date:   2018-02-22T15:48:11Z

Updating spec.

commit fb4d4383f366776f446e33a422652c3ec1f56bfa
Author: cstella 
Date:   2018-02-22T18:00:36Z

Updating threadpool creation

commit 87ef6a72827c31f8adee42ee71272a32c350bc1f
Author: cstella 
Date:   2018-02-22T18:04:37Z

better docs

commit 6ae9594ee4ae2b4d33e0feca398b527077dac0d3
Author: cstella 
Date:   2018-02-22T18:41:20Z

Updating readme.

commit 82ebc9550d759ea0bd06b48c586fd5e53c6e553a
Author: cstella 
Date:   

[GitHub] metron pull request #947: METRON-1467: Replace guava caches in places where ...

2018-03-05 Thread cestella
Github user cestella closed the pull request at:

https://github.com/apache/metron/pull/947


---


[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:


https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172192869
  
--- Diff: src/KafkaWriter.cc ---
@@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): 
WriterBackend(frontend), for
 }
 
 KafkaWriter::~KafkaWriter()
-{}
+{
+
+// Cleanup all the things
+delete topic;
+delete producer;
+delete formatter;
+delete conf;
+delete topic_conf;
+
+}
 
 bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const 
threading::Field* const* fields)
 {
+// Timeformat object, default to TS_EPOCH
+threading::formatter::JSON::TimeFormat tf = 
threading::formatter::JSON::TS_EPOCH;
+
 // if no global 'topic_name' is defined, use the log stream's 'path'
 if(topic_name.empty()) {
 topic_name = info.path;
 }
 
+// format timestamps
+if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) {
--- End diff --

Just curious, why do we treat `json_timestamps` as a string?  Why not just 
treat it as a `JSON::TimestampFormat`?  Wouldn't that simplify a lot of this 
logic and remove all of the string comparison and string copy logic?


---


[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...

2018-03-05 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:


https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172193204
  
--- Diff: src/KafkaWriter.cc ---
@@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): 
WriterBackend(frontend), for
 }
 
 KafkaWriter::~KafkaWriter()
-{}
+{
+
+// Cleanup all the things
+delete topic;
+delete producer;
+delete formatter;
+delete conf;
+delete topic_conf;
+
+}
 
 bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const 
threading::Field* const* fields)
 {
+// Timeformat object, default to TS_EPOCH
+threading::formatter::JSON::TimeFormat tf = 
threading::formatter::JSON::TS_EPOCH;
+
 // if no global 'topic_name' is defined, use the log stream's 'path'
 if(topic_name.empty()) {
 topic_name = info.path;
 }
 
+// format timestamps
+if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) {
+  tf = threading::formatter::JSON::TS_EPOCH;
+}
+else if ( strcmp(json_timestamps.c_str(), "JSON::TS_MILLIS") == 0 ) {
+  tf = threading::formatter::JSON::TS_MILLIS;
+}
+else if ( strcmp(json_timestamps.c_str(), "JSON::TS_ISO8601") == 0 ) {
+  tf = threading::formatter::JSON::TS_ISO8601;
+}
+else
+{
--- End diff --

Small nit: Can we join the open paren to the line above just to match the 
rest of the code style.  Gracias.


---