[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/940 Maybe the issue has to do with our keys, and their distribution as the size get's larger? Maybe when we get larger sizes we get more collisions and end up calling equals() more or something. ---
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/940 This should have the equiv. diagram and documentation ( i believe as shown above ) to the original split join strategy. ---
[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172383791 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/Strategy.java --- @@ -0,0 +1,47 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.enrichment.parallel; + +import org.apache.metron.common.Constants; +import org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig; +import org.apache.metron.common.configuration.enrichment.handler.ConfigHandler; +import org.json.simple.JSONObject; +import org.slf4j.Logger; + +import java.util.Map; + +/** + * Enrichment strategy. This interface provides a mechanism to interface with the enrichment config and any + * post processing steps that are needed to be done after-the-fact. + * + * The reasoning behind this is that the key difference between enrichments and threat intel is that they pull + * their configurations from different parts of the SensorEnrichmentConfig object and as a post-join step, they differ + * slightly. + * + */ +public interface Strategy { + Constants.ErrorType getErrorType(); --- End diff -- done ---
[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172383810 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/bolt/UnifiedEnrichmentBolt.java --- @@ -0,0 +1,415 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.enrichment.bolt; + +import org.apache.metron.common.Constants; +import org.apache.metron.common.bolt.ConfiguredEnrichmentBolt; +import org.apache.metron.common.configuration.ConfigurationType; +import org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig; +import org.apache.metron.common.error.MetronError; +import org.apache.metron.common.performance.PerformanceLogger; +import org.apache.metron.common.utils.ErrorUtils; +import org.apache.metron.common.utils.MessageUtils; +import org.apache.metron.enrichment.adapters.geo.GeoLiteDatabase; +import org.apache.metron.enrichment.configuration.Enrichment; +import org.apache.metron.enrichment.interfaces.EnrichmentAdapter; +import org.apache.metron.enrichment.parallel.EnrichmentContext; +import org.apache.metron.enrichment.parallel.EnrichmentStrategies; +import org.apache.metron.enrichment.parallel.ParallelEnricher; +import org.apache.metron.enrichment.parallel.WorkerPoolStrategy; +import org.apache.metron.stellar.dsl.Context; +import org.apache.metron.stellar.dsl.StellarFunction; +import org.apache.metron.stellar.dsl.StellarFunctions; +import org.apache.storm.task.OutputCollector; +import org.apache.storm.task.TopologyContext; +import org.apache.storm.topology.OutputFieldsDeclarer; +import org.apache.storm.tuple.Fields; +import org.apache.storm.tuple.Tuple; +import org.apache.storm.tuple.Values; +import org.json.simple.JSONObject; +import org.json.simple.parser.JSONParser; +import org.json.simple.parser.ParseException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.UnsupportedEncodingException; +import java.lang.invoke.MethodHandles; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.UUID; + +/** + * This bolt is a unified enrichment/threat intel bolt. In contrast to the split/enrich/join + * bolts above, this handles the entire enrichment lifecycle in one bolt using a threadpool to + * enrich in parallel. + * + * From an architectural perspective, this is a divergence from the polymorphism based strategy we have + * used in the split/join bolts. Rather, this bolt is provided a strategy to use, either enrichment or threat intel, + * through composition. This allows us to move most of the implementation into components independent + * from Storm. This will greater facilitate reuse. + */ +public class UnifiedEnrichmentBolt extends ConfiguredEnrichmentBolt { + + public static class Perf {} // used for performance logging + private PerformanceLogger perfLog; // not static bc multiple bolts may exist in same worker + + protected static final Logger LOG = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + public static final String STELLAR_CONTEXT_CONF = "stellarContext"; + + /** + * The number of threads in the threadpool. One threadpool is created per process. + * This is a topology-level configuration + */ + public static final String THREADPOOL_NUM_THREADS_TOPOLOGY_CONF = "metron.threadpool.size"; + /** + * The type of threadpool to create. This is a topology-level configuration. + */ + public static final String THREADPOOL_TYPE_TOPOLOGY_CONF = "metron.threadpool.type"; + + /** + * The enricher implementation to use. This will do the parallel enrichment via a thread pool. + */ + protected ParallelEnricher enricher; + + /** + * The strategy to use for this enrichment bolt. Practically speaking this is either + * enrichment or threat
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 @nickwallen Ok, I refactored the abstraction to separate some concerns, name things a bit, and collapse some of the more onerous abstractions. Also updated javadocs. Can you give it another look and see what you think? We probably should also give it another smoketest in the lab to make sure I didn't do something dumb. ---
[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172383754 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/EnrichmentStrategies.java --- @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.enrichment.parallel; + +import org.apache.metron.common.Constants; +import org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig; +import org.apache.metron.common.configuration.enrichment.handler.ConfigHandler; +import org.apache.metron.enrichment.bolt.CacheKey; +import org.json.simple.JSONObject; +import org.slf4j.Logger; + +import java.util.Map; +import java.util.concurrent.Executor; + +public enum EnrichmentStrategies implements Strategy { --- End diff -- I decided that this is too onerous of an abstraction and rethought it a bit. Give it another look, please. ---
[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172377136 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/EnrichmentStrategies.java --- @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.enrichment.parallel; + +import org.apache.metron.common.Constants; +import org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig; +import org.apache.metron.common.configuration.enrichment.handler.ConfigHandler; +import org.apache.metron.enrichment.bolt.CacheKey; +import org.json.simple.JSONObject; +import org.slf4j.Logger; + +import java.util.Map; +import java.util.concurrent.Executor; + +public enum EnrichmentStrategies implements Strategy { --- End diff -- I might be answering a question that you're not asking, but this bit of awkwardness arises because we have merged the concepts of threat intel and enrichment, which differ really only in post-processing. The approach presented here, in contrast to the inheritance-based approach in the bolts, allows for an abstraction through composition whereby we localize all the interactions with the sensor enrichment config in a strategy rather than bind the abstraction to Storm, our distributed processing engine. That is the rationale behind this approach at least. ---
[GitHub] metron issue #933: METRON-1452 Rebase Dev Environment on Latest CentOS 6
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/933 Oh, I guess we need to reaffirm. Yes, +1 still stands. ---
[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172373203 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/bolt/UnifiedEnrichmentBolt.java --- @@ -0,0 +1,415 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.enrichment.bolt; + +import org.apache.metron.common.Constants; +import org.apache.metron.common.bolt.ConfiguredEnrichmentBolt; +import org.apache.metron.common.configuration.ConfigurationType; +import org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig; +import org.apache.metron.common.error.MetronError; +import org.apache.metron.common.performance.PerformanceLogger; +import org.apache.metron.common.utils.ErrorUtils; +import org.apache.metron.common.utils.MessageUtils; +import org.apache.metron.enrichment.adapters.geo.GeoLiteDatabase; +import org.apache.metron.enrichment.configuration.Enrichment; +import org.apache.metron.enrichment.interfaces.EnrichmentAdapter; +import org.apache.metron.enrichment.parallel.EnrichmentContext; +import org.apache.metron.enrichment.parallel.EnrichmentStrategies; +import org.apache.metron.enrichment.parallel.ParallelEnricher; +import org.apache.metron.enrichment.parallel.WorkerPoolStrategy; +import org.apache.metron.stellar.dsl.Context; +import org.apache.metron.stellar.dsl.StellarFunction; +import org.apache.metron.stellar.dsl.StellarFunctions; +import org.apache.storm.task.OutputCollector; +import org.apache.storm.task.TopologyContext; +import org.apache.storm.topology.OutputFieldsDeclarer; +import org.apache.storm.tuple.Fields; +import org.apache.storm.tuple.Tuple; +import org.apache.storm.tuple.Values; +import org.json.simple.JSONObject; +import org.json.simple.parser.JSONParser; +import org.json.simple.parser.ParseException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.UnsupportedEncodingException; +import java.lang.invoke.MethodHandles; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.UUID; + +/** + * This bolt is a unified enrichment/threat intel bolt. In contrast to the split/enrich/join + * bolts above, this handles the entire enrichment lifecycle in one bolt using a threadpool to + * enrich in parallel. + * + * From an architectural perspective, this is a divergence from the polymorphism based strategy we have + * used in the split/join bolts. Rather, this bolt is provided a strategy to use, either enrichment or threat intel, + * through composition. This allows us to move most of the implementation into components independent + * from Storm. This will greater facilitate reuse. + */ +public class UnifiedEnrichmentBolt extends ConfiguredEnrichmentBolt { + + public static class Perf {} // used for performance logging + private PerformanceLogger perfLog; // not static bc multiple bolts may exist in same worker + + protected static final Logger LOG = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + public static final String STELLAR_CONTEXT_CONF = "stellarContext"; + + /** + * The number of threads in the threadpool. One threadpool is created per process. + * This is a topology-level configuration + */ + public static final String THREADPOOL_NUM_THREADS_TOPOLOGY_CONF = "metron.threadpool.size"; + /** + * The type of threadpool to create. This is a topology-level configuration. + */ + public static final String THREADPOOL_TYPE_TOPOLOGY_CONF = "metron.threadpool.type"; + + /** + * The enricher implementation to use. This will do the parallel enrichment via a thread pool. + */ + protected ParallelEnricher enricher; + + /** + * The strategy to use for this enrichment bolt. Practically speaking this is either + * enrichment or threat
[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172369461 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/EnrichmentStrategies.java --- @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.enrichment.parallel; + +import org.apache.metron.common.Constants; +import org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig; +import org.apache.metron.common.configuration.enrichment.handler.ConfigHandler; +import org.apache.metron.enrichment.bolt.CacheKey; +import org.json.simple.JSONObject; +import org.slf4j.Logger; + +import java.util.Map; +import java.util.concurrent.Executor; + +public enum EnrichmentStrategies implements Strategy { --- End diff -- This is a strategy pattern using an enum. The purpose of this class is to resolve the specific strategies possible. It's broadly in line with other strategy patterns (e.g. Extractors). ---
[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...
Github user cestella commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172369480 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/Strategy.java --- @@ -0,0 +1,47 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.enrichment.parallel; + +import org.apache.metron.common.Constants; +import org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig; +import org.apache.metron.common.configuration.enrichment.handler.ConfigHandler; +import org.json.simple.JSONObject; +import org.slf4j.Logger; + +import java.util.Map; + +/** + * Enrichment strategy. This interface provides a mechanism to interface with the enrichment config and any + * post processing steps that are needed to be done after-the-fact. + * + * The reasoning behind this is that the key difference between enrichments and threat intel is that they pull + * their configurations from different parts of the SensorEnrichmentConfig object and as a post-join step, they differ + * slightly. + * + */ +public interface Strategy { + Constants.ErrorType getErrorType(); --- End diff -- Sure thing. ---
[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172359339 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/Strategy.java --- @@ -0,0 +1,47 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.enrichment.parallel; + +import org.apache.metron.common.Constants; +import org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig; +import org.apache.metron.common.configuration.enrichment.handler.ConfigHandler; +import org.json.simple.JSONObject; +import org.slf4j.Logger; + +import java.util.Map; + +/** + * Enrichment strategy. This interface provides a mechanism to interface with the enrichment config and any + * post processing steps that are needed to be done after-the-fact. + * + * The reasoning behind this is that the key difference between enrichments and threat intel is that they pull + * their configurations from different parts of the SensorEnrichmentConfig object and as a post-join step, they differ + * slightly. + * + */ +public interface Strategy { + Constants.ErrorType getErrorType(); --- End diff -- Can we javadoc each method? This seems like an important interface. ---
[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172353404 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/bolt/UnifiedEnrichmentBolt.java --- @@ -0,0 +1,415 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.enrichment.bolt; + +import org.apache.metron.common.Constants; +import org.apache.metron.common.bolt.ConfiguredEnrichmentBolt; +import org.apache.metron.common.configuration.ConfigurationType; +import org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig; +import org.apache.metron.common.error.MetronError; +import org.apache.metron.common.performance.PerformanceLogger; +import org.apache.metron.common.utils.ErrorUtils; +import org.apache.metron.common.utils.MessageUtils; +import org.apache.metron.enrichment.adapters.geo.GeoLiteDatabase; +import org.apache.metron.enrichment.configuration.Enrichment; +import org.apache.metron.enrichment.interfaces.EnrichmentAdapter; +import org.apache.metron.enrichment.parallel.EnrichmentContext; +import org.apache.metron.enrichment.parallel.EnrichmentStrategies; +import org.apache.metron.enrichment.parallel.ParallelEnricher; +import org.apache.metron.enrichment.parallel.WorkerPoolStrategy; +import org.apache.metron.stellar.dsl.Context; +import org.apache.metron.stellar.dsl.StellarFunction; +import org.apache.metron.stellar.dsl.StellarFunctions; +import org.apache.storm.task.OutputCollector; +import org.apache.storm.task.TopologyContext; +import org.apache.storm.topology.OutputFieldsDeclarer; +import org.apache.storm.tuple.Fields; +import org.apache.storm.tuple.Tuple; +import org.apache.storm.tuple.Values; +import org.json.simple.JSONObject; +import org.json.simple.parser.JSONParser; +import org.json.simple.parser.ParseException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.UnsupportedEncodingException; +import java.lang.invoke.MethodHandles; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.UUID; + +/** + * This bolt is a unified enrichment/threat intel bolt. In contrast to the split/enrich/join + * bolts above, this handles the entire enrichment lifecycle in one bolt using a threadpool to + * enrich in parallel. + * + * From an architectural perspective, this is a divergence from the polymorphism based strategy we have + * used in the split/join bolts. Rather, this bolt is provided a strategy to use, either enrichment or threat intel, + * through composition. This allows us to move most of the implementation into components independent + * from Storm. This will greater facilitate reuse. + */ +public class UnifiedEnrichmentBolt extends ConfiguredEnrichmentBolt { + + public static class Perf {} // used for performance logging + private PerformanceLogger perfLog; // not static bc multiple bolts may exist in same worker + + protected static final Logger LOG = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + public static final String STELLAR_CONTEXT_CONF = "stellarContext"; + + /** + * The number of threads in the threadpool. One threadpool is created per process. + * This is a topology-level configuration + */ + public static final String THREADPOOL_NUM_THREADS_TOPOLOGY_CONF = "metron.threadpool.size"; + /** + * The type of threadpool to create. This is a topology-level configuration. + */ + public static final String THREADPOOL_TYPE_TOPOLOGY_CONF = "metron.threadpool.type"; + + /** + * The enricher implementation to use. This will do the parallel enrichment via a thread pool. + */ + protected ParallelEnricher enricher; + + /** + * The strategy to use for this enrichment bolt. Practically speaking this is either + * enrichment or threat
[GitHub] metron pull request #940: METRON-1460: Create a complementary non-split-join...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/940#discussion_r172363362 --- Diff: metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/parallel/EnrichmentStrategies.java --- @@ -0,0 +1,79 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.metron.enrichment.parallel; + +import org.apache.metron.common.Constants; +import org.apache.metron.common.configuration.enrichment.SensorEnrichmentConfig; +import org.apache.metron.common.configuration.enrichment.handler.ConfigHandler; +import org.apache.metron.enrichment.bolt.CacheKey; +import org.json.simple.JSONObject; +import org.slf4j.Logger; + +import java.util.Map; +import java.util.concurrent.Executor; + +public enum EnrichmentStrategies implements Strategy { --- End diff -- I don't understand the purpose of this class. Why have an `EnrichmentStrategy`, a `ThreatIntelStrategy`, and `EnrichmentStrategies`? ---
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 Ahhh, that makes sense. I bet we were getting killed by small allocations in the caching layer. ---
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Caffeine doesn't allocate on read, so that would make sense. I saw a [25x boost](https://github.com/google/guava/issues/2063#issuecomment-107169736) (compared to [current](https://github.com/google/guava/issues/2063#issue-82444927)) when porting the buffers to Guava. ---
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 I actually suspect GC as well. We adjusted the garbage collector to the G1GC and saw throughput gains, but not nearly the kinds of gains as we got with a drop-in of Caffeine to replace Guava. ---
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Interesting. Then I guess the size must trigger the read bottleneck as larger than writes. Perhaps it is incurring a lot more GC overhead that causes more collections? The CLQ additions requires allocating a new queue node. That and the cache entry probably get promoted to old gen due to the high churn rate, causing everything to slow down. Probably isn't too interesting to investigate vs swapping libraries :) ---
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 In this case, the loader isn't doing anything terribly expensive, though it may in the future (incur a hbase get or some more expensive computation). ---
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Internally Guava uses a `ConcurrentLinkedQueue` and an `AtomicInteger` to record its size, per segment. When a read occurs, it records that in the queue and then drains it under the segment's lock (via tryLock) to replay the events. This is similar to Caffeine, which uses optimized structures instead. I intended the CLQ & counter as baseline scaffolding for replacement, as it is an obvious bottleneck, but I could never get it replaced despite advocating for it. The penalty of draining the buffers is amortized, but unfortunately this buffer isn't capped. Since there would be a higher hit rate with a larger cache, the reads would be recorded more often. Perhaps contention there and the penalty of draining the queue is more observable than a cache miss. That's still surprising since a cache miss is usually more expensive I/O. Is the loader doing expensive work in your case? Caffeine gets around this problem by using more optimal buffers and being lossy (on reads only) if it can't keep up. By default it delegates the amortized maintenance work to a ForkJoinPool to avoid user-facing latencies, since you'll want those variances to be tight. Much of that can be back ported onto Guava for a nice boost. ---
[GitHub] metron issue #924: METRON-1299 In MetronError tests, don't test for HostName...
Github user cestella commented on the issue: https://github.com/apache/metron/pull/924 +1, sorry! ---
[GitHub] metron issue #924: METRON-1299 In MetronError tests, don't test for HostName...
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/924 @cestella Bump ---
[GitHub] metron issue #933: METRON-1452 Rebase Dev Environment on Latest CentOS 6
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/933 @mmiklavc @cestella Bump ---
[GitHub] metron pull request #946: METRON-1465:Support for Elasticsearch X-pack
Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/946#discussion_r172351786 --- Diff: pom.xml --- @@ -97,7 +97,7 @@ ${base_hadoop_version} ${base_hbase_version} ${base_flume_version} -5.6.2 +5.6.7 --- End diff -- Just noticed this - what's the motivation for changing ES versions as part of this PR? ---
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 We actually did increase the concurrency level for guava to 64; that is what confused us as well. The hash code is mostly standard, should be evenly distributed (the key is pretty much a POJO). ---
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Guava defaults to a `concurrencyLevel` of 4, given its age and a desire to not abuse memory in low concurrent situations. You probably want to increase it to 64 in a heavy workload, which has a ~4x throughput gain on reads. It won't scale much higher, since it has internal bottlenecks and I could never get patches reviewed to fix those. I've only noticed overall throughput be based on threads, and never realized there was a capacity constraint to its performance. One should expect some due to the older hash table design resulting in more collisions, whereas CHMv8 does much better there. Still, I would have expected it to even out enough unless have a bad hash code? ---
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 The interesting thing that we found was that guava seems to be doing poorly when the # of items in the cache gets large. When we scaled the test down (830 distinct IP addresses chosen randomly and sent in at a rate of 200k events per second with a cache size of 100) kept up but scaling the test up (300k distinct ip addresses chosen randomly and sent in at a rate of 200k events per second with a cache size of 100k) didn't. ---
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 That makes sense. A uniform distribution will, of course, degrades all policies to random replacement so the test is then about how well the implementations handle concurrency. Most often caches exhibit a Zipfian distribution (80-20 rule), so our bias towards frequency is a net gain. We have observed a few rare cases where frequency is a poor signal and LRU is optimal, and we are exploring adaptive techniques to dynamically tune the cache based on the workload's characteristics. These cases don't seem to occur in many real-world scenarios that we know of, but it is always nice to know what users are experiencing and how much better (or worse) we perform than a standard LRU cache. ---
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 We were being purposefully unkind to the cache in the tests. The load simulation chose a IP address at random to present, so each IP had an equal probability of being selected. Whereas, in real traffic, we expect a coherent working set. Not sure of the exact hit rates, though. ---
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/940 Do you know what the hit rates were, for the same data set, between Guava and Caffeine? The caches use different policies so it is always interesting to see how the handle given workloads. As we continue to refine our adaptive algorithm W-TinyLFU, its handy to know what types of workloads to investigate. (P.S. We have a simulator for re-running persisted traces if useful for your tuning) ---
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/940 I completed some fairly extensive performance testing comparing this new Unified topology against the existing Split-Join implementation. The difference was dramatic. - The Unified topology _performed roughly 3.4 times faster than Split-Join._ Both topologies in this side-by-side test included the same fixes, including the Guava cache problem fixed in #947. The tests included two enrichments: * GeoIP enrichment; `geo := GEO_GET(ip_dst_addr)` * Compute-only Stellar enrichment; `local := IN_SUBNET(ip_dst_addr, '192.168.0.0/24')` The number one driver of performance is the cache hit rate, which is heavily dependent on what your data looks-like. With these enrichments, that's driven by how varied the `ip_dst_addr` is in the data. I tested both of these topologies with different sets of data intended to either increase or decrease that cache hit rate. The differences between the two topologies were fairly consistent across the different data sets. When running these topologies, reasonably well-tuned, on the same data, I was able to consistently maintain 70,000 events per second with the Split/Join topology. In the same environment, I was able to maintain 312,000 events per second using the Unified topology. The raw throughput numbers are relative and depend on how much hardware you are willing to throw at the problem. I was running on 3 nodes dedicated to running the Enrichment topology only. But with the same data, on the same hardware, the difference was 3.4 times. That's big. Pushing as much as you can into a single executor and avoiding network hops is definitely the way to go here. ---
[GitHub] metron pull request #941: METRON-1355: Convert metron-elasticsearch to new i...
Github user justinleet commented on a diff in the pull request: https://github.com/apache/metron/pull/941#discussion_r172318991 --- Diff: metron-contrib/metron-docker-e2e/README.md --- @@ -0,0 +1,94 @@ + +# Metron Docker + +Metron Docker E2E is a [Docker Compose](https://docs.docker.com/compose/overview/) application that serves as a backend to integration tests. + +Metron Docker includes these images that have been customized for Metron: + + - Kafka + - Zookeeper + - Elasticsearch + - Metron REST + - Metron UIs + +Setup +- + +Install [Docker for Mac](https://docs.docker.com/docker-for-mac/) or [Docker for Windows](https://docs.docker.com/docker-for-windows/). The following versions have been tested: + + - Docker version 17.12.0-ce + - docker-machine version 0.13.0 + - docker-compose version 1.18.0 + +Build Metron from the top level directory with: +``` +$ cd $METRON_HOME +$ mvn clean install -DskipTests +``` + +Create a Docker machine: +``` +$ export METRON_DOCKER_E2E_HOME=$METRON_HOME/metron-contrib/metron-docker-e2e +$ cd $METRON_DOCKER_E2E_HOME +$ ./scripts/create-docker-machine.sh +``` + +This will create a host called "metron-machine". Anytime you want to run Docker commands against this host, make sure you run this first to set the Docker environment variables: +``` +$ eval "$(docker-machine env metron-machine)" +``` + +If you wish to use a local docker-engine install, please set an environment variable BROKER_IP_ADDR to the IP address of your host machine. This cannot be the loopback address. + +Usage +- + +Navigate to the compose application root: +``` +$ cd $METRON_DOCKER_E2E_HOME/compose/ +``` + +The Metron Docker environment lifecycle is controlled by the [docker-compose](https://docs.docker.com/compose/reference/overview/) command. The service names can be found in the docker-compose.yml file. For example, to build and start the environment run this command: +``` +$ eval "$(docker-machine env metron-machine)" +$ docker-compose up -d +``` + +After all services have started list the containers and ensure their status is 'Up': +``` +$ docker-compose ps + Name Command State Ports + +metron_elasticsearch_1 /bin/bash bin/es-docker Up 0.0.0.0:9210->9200/tcp, 0.0.0.0:9310->9300/tcp +metron_kafka_1 start-kafka.sh Up 0.0.0.0:9092->9092/tcp +metron_metron-rest_1 /bin/sh -c ./bin/start.shUp 0.0.0.0:8082->8082/tcp +metron_metron-ui_1 /bin/sh -c ./bin/start.shUp 0.0.0.0:4201->4201/tcp +metron_zookeeper_1 /docker-entrypoint.sh zkSe ... Up 0.0.0.0:2181->2181/tcp, 2888/tcp, 3888/tcp +``` + +Various services are exposed through http on the Docker host. Get the host ip from the URL property: +``` +$ docker-machine ls +NAME ACTIVE DRIVER STATE URL SWARM DOCKERERRORS +metron-machine *virtualbox Running tcp://192.168.99.100:2376 v1.12.5 +``` + +The various integration tests can now be run against this environment. + +TODO: document how to set docker machine ip address for e2e tests --- End diff -- Looks like a leftover TODO. Would you mind adding this info? ---
[GitHub] metron issue #941: METRON-1355: Convert metron-elasticsearch to new infrastr...
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/941 I'm unable to get the integration tests running locally. I've been able to get the docker containers up and running, but ES isn't exposed at localhost, only through the explicit docker-machine ip. This causes the integration tests to fail, as they are unable to successfully query against ES. Any idea why this is happening? Am I missing an env config or something? Also, is it worthwhile to wrap most of the setup in a script? There's enough steps that have to be taken that I'd really like a script that's just "setup_integration_env.sh". Extra credit if it can be rolled into the maven build such that it just automagically takes care of it when we run the maven integration test target (or it's run by install or whatever). It would be super nice if we didn't have to worry about anything other than just running the maven target. Regardless of what wrapping we do, it would be really nice to clean some stuff up so the integration test instructions become a one-shot "Here's the step by step". Right now, I had to look at the Travis run to figure out (remember) the semi-odd way we split unit and integration test calls in maven, etc. ---
[GitHub] metron-bro-plugin-kafka issue #6: Configurable JSON timestamps and default a...
Github user dcode commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/6 That'd be great if you wouldn't mind to create a ticket for this. ---
[GitHub] metron-bro-plugin-kafka issue #7: METRON-1324: Increment metron-bro-plugin-k...
Github user JonZeolla commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/7 # Testing ## Build and install manually Some guideline commands to test: ``` mkdir tmp cd tmp git clone https://github.com/bro/bro cd bro git checkout v2.5.3 git submodule update --recursive --init ./configure && make cd .. git clone https://github.com/apache/metron-bro-plugin-kafka cd metron-bro-plugin-kafka ./configure --bro-dist=../bro && make && sudo make install bro -N Apache::Kafka ``` ## Use `bro-pkg` ``` bro-pkg install metron-bro-plugin-kafka bro -N Apache::Kafka ``` ---
[GitHub] metron issue #940: METRON-1460: Create a complementary non-split-join enrich...
Github user cestella commented on the issue: https://github.com/apache/metron/pull/940 I ran this up with vagrant and ensured: * Normal stellar works still in field transformations as well as enrichments * swapped in and out new enrichments live * swapped in and out new threat intel live Are there any other pending issues here beyond a report of the performance impact? ---
[GitHub] metron issue #944: METRON-1463: Adjust the groupings and shuffles in enrichm...
Github user cestella commented on the issue: https://github.com/apache/metron/pull/944 I ran this up with vagrant and ensured: * Normal stellar works still in field transformations as well as enrichments * swapped in and out new enrichments live * swapped in and out new threat intel live ---
[GitHub] metron issue #947: METRON-1467: Replace guava caches in places where the key...
Github user cestella commented on the issue: https://github.com/apache/metron/pull/947 I ran this up with vagrant and ensured: * Normal stellar works still in field transformations as well as enrichments * swapped in and out new enrichments live * swapped in and out new threat intel live ---
[GitHub] metron pull request #948: METRON-1468: Add support for apache/metron-bro-plu...
GitHub user JonZeolla opened a pull request: https://github.com/apache/metron/pull/948 METRON-1468: Add support for apache/metron-bro-plugin-kafka to prepare-commit ## Contributor Comments This updates the prepare-commit script to work with `apache/metron-bro-plugin-kafka`. To test, run the `prepare-commit` script and specify `metron`, `bro`, or blank when it asks for which repo. It should use metron by default, but properly accounts for the bro repo if you specify it. ## Pull Request Checklist Thank you for submitting a contribution to Apache Metron. Please refer to our [Development Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235) for the complete guide to follow for contributions. Please refer also to our [Build Verification Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview) for complete smoke testing guides. In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following: ### For all changes: - [X] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [X] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [X] Has your PR been rebased against the latest commit within the target branch (typically master)? ### For code changes: - [X] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [X] Have you included steps or a guide to how the change may be verified and tested manually? - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: ``` mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh ``` - [N/A] Have you written or updated unit tests and or integration tests to verify your changes? - [N/A] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [N/A] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? ### For documentation related changes: - [X] Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via `site-book/target/site/index.html`: ``` cd site-book mvn site ``` Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. It is also recommended that [travis-ci](https://travis-ci.org) is set up for your personal repository such that your branches are built there before submitting a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JonZeolla/metron METRON-1468 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/948.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #948 commit 4ca17d5780729a61615e7ee6bc86e6ddf86c339e Author: Nick AllenDate: 2017-11-27T20:29:38Z METRON-1320 Cannot perform a bare-metal installation commit b64606997307e233a4ef9741c2ad16e622eac952 Author: Nick Allen Date: 2017-11-27T20:55:12Z Cleaning up after the C++ file(s) that we create commit 551e3084c60346a6459101994043afa84869ab61 Author: Jon Zeolla Date: 2017-11-29T02:03:54Z Merge branch 'METRON-1320' of https://github.com/nickwallen/metron into METRON-1320 commit 1040681a59907fdedf8784bf19fc025cc9125b1c Author: Jon Zeolla Date: 2017-12-05T12:56:40Z Merge branch 'master' of https://github.com/apache/metron commit e916d3dcbeb744a426559d7e7e47695ebd37748b Author: Jon Zeolla Date: 2017-12-06T14:00:51Z Merge branch 'master' of https://github.com/apache/metron commit a7b5bdfd5652035beb38aed52fd4e1e12e680439 Author: Jon Zeolla Date: 2017-12-07T20:14:46Z Merge branch 'master' of https://github.com/apache/metron commit f9af29979385877b90f7940561382c722404eff8 Author: Jon Zeolla Date: 2017-12-08T14:22:24Z Merge branch 'master' of https://github.com/apache/metron commit 5c6dc76d9fbdb58274171446bc75800a42466ba2 Author: Jon Zeolla Date:
[GitHub] metron-bro-plugin-kafka issue #6: Configurable JSON timestamps and default a...
Github user JonZeolla commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/6 It's a part of the `apache/metron` project (of which this is considered a component) and uses the open apache JIRA that I linked above. In order to accept PRs we need to have a JIRA. You should be able to register and submit something rather simply, but I also wouldn't mind handling this if you'd prefer, just let me know. ---
[GitHub] metron-bro-plugin-kafka issue #6: Configurable JSON timestamps and default a...
Github user dcode commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/6 I haven't created a JIRA ticket. Not sure if that's something internal. ---
[GitHub] metron-bro-plugin-kafka pull request #7: METRON-1324: Increment metron-bro-p...
GitHub user JonZeolla opened a pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/7 METRON-1324: Increment metron-bro-plugin-kafka version We have some changes staged to upgrade the plugin, so we should increment the version. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JonZeolla/metron-bro-plugin-kafka METRON-1324 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron-bro-plugin-kafka/pull/7.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #7 commit 41a4ee6c478f59676c5a1a3ce7d5f1e0e1a39b85 Author: Jon ZeollaDate: 2018-03-05T17:45:05Z METRON-1324: Increment metron-bro-plugin-kafka version ---
[GitHub] metron-bro-plugin-kafka issue #6: Configurable JSON timestamps and default a...
Github user JonZeolla commented on the issue: https://github.com/apache/metron-bro-plugin-kafka/pull/6 This is really coming together. Is there a [JIRA](https://issues.apache.org/jira/browse/METRON-1325?filter=-5=project%20%3D%20METRON%20AND%20resolution%20%3D%20Unresolved%20order%20by%20priority%20DESC%2Cupdated%20DESC) for this? I poked around for a bit and couldn't find one. ---
[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...
Github user dcode commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172248492 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for } KafkaWriter::~KafkaWriter() -{} +{ + +// Cleanup all the things +delete topic; +delete producer; +delete formatter; +delete conf; +delete topic_conf; + +} bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const threading::Field* const* fields) { +// Timeformat object, default to TS_EPOCH +threading::formatter::JSON::TimeFormat tf = threading::formatter::JSON::TS_EPOCH; + // if no global 'topic_name' is defined, use the log stream's 'path' if(topic_name.empty()) { topic_name = info.path; } +// format timestamps +if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) { --- End diff -- Done. ---
[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...
Github user ottobackwards commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172243410 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for } KafkaWriter::~KafkaWriter() -{} +{ + +// Cleanup all the things +delete topic; +delete producer; +delete formatter; +delete conf; +delete topic_conf; + +} bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const threading::Field* const* fields) { +// Timeformat object, default to TS_EPOCH +threading::formatter::JSON::TimeFormat tf = threading::formatter::JSON::TS_EPOCH; + // if no global 'topic_name' is defined, use the log stream's 'path' if(topic_name.empty()) { topic_name = info.path; } +// format timestamps +if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) { --- End diff -- Maybe we should put an implementation comment explaining that in there ---
[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172240860 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for } KafkaWriter::~KafkaWriter() -{} +{ + +// Cleanup all the things +delete topic; +delete producer; +delete formatter; +delete conf; +delete topic_conf; + +} bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const threading::Field* const* fields) { +// Timeformat object, default to TS_EPOCH +threading::formatter::JSON::TimeFormat tf = threading::formatter::JSON::TS_EPOCH; + // if no global 'topic_name' is defined, use the log stream's 'path' if(topic_name.empty()) { topic_name = info.path; } +// format timestamps +if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) { --- End diff -- Ah, I see. Thanks for clarifying. Let's work with what you have. I agree there is little documentation in these parts. ---
[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...
Github user dcode commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172229125 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for } KafkaWriter::~KafkaWriter() -{} +{ + +// Cleanup all the things +delete topic; +delete producer; +delete formatter; +delete conf; +delete topic_conf; + +} bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const threading::Field* const* fields) { +// Timeformat object, default to TS_EPOCH +threading::formatter::JSON::TimeFormat tf = threading::formatter::JSON::TS_EPOCH; + // if no global 'topic_name' is defined, use the log stream's 'path' if(topic_name.empty()) { topic_name = info.path; } +// format timestamps +if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) { --- End diff -- I copied this approach from the ASCII log writer here: https://github.com/bro/bro/blob/fc33bf2014704fe0ae512b76dae64de7fc5e83ac/src/logging/writers/ascii/Ascii.cc#L166-L176 I don't think BinPac can pass through a BinPac enum to a C++ enum. Bro enums are defined at runtime, whereas C++ is a compile-time definition. If you can point to an example to make this work, I'm happy to do it. The C API for BinPac isn't extremely well documented :face_with_head_bandage: ---
[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...
Github user dcode commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172229215 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for } KafkaWriter::~KafkaWriter() -{} +{ + +// Cleanup all the things +delete topic; +delete producer; +delete formatter; +delete conf; +delete topic_conf; + +} bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const threading::Field* const* fields) { +// Timeformat object, default to TS_EPOCH +threading::formatter::JSON::TimeFormat tf = threading::formatter::JSON::TS_EPOCH; + // if no global 'topic_name' is defined, use the log stream's 'path' if(topic_name.empty()) { topic_name = info.path; } +// format timestamps +if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) { + tf = threading::formatter::JSON::TS_EPOCH; +} +else if ( strcmp(json_timestamps.c_str(), "JSON::TS_MILLIS") == 0 ) { + tf = threading::formatter::JSON::TS_MILLIS; +} +else if ( strcmp(json_timestamps.c_str(), "JSON::TS_ISO8601") == 0 ) { + tf = threading::formatter::JSON::TS_ISO8601; +} +else +{ --- End diff -- will do :+1: ---
[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...
Github user dcode commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172228973 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for } KafkaWriter::~KafkaWriter() -{} +{ + +// Cleanup all the things +delete topic; +delete producer; +delete formatter; +delete conf; +delete topic_conf; + +} bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const threading::Field* const* fields) { +// Timeformat object, default to TS_EPOCH +threading::formatter::JSON::TimeFormat tf = threading::formatter::JSON::TS_EPOCH; + // if no global 'topic_name' is defined, use the log stream's 'path' if(topic_name.empty()) { topic_name = info.path; } +// format timestamps +if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) { + tf = threading::formatter::JSON::TS_EPOCH; +} +else if ( strcmp(json_timestamps.c_str(), "JSON::TS_MILLIS") == 0 ) { + tf = threading::formatter::JSON::TS_MILLIS; +} +else if ( strcmp(json_timestamps.c_str(), "JSON::TS_ISO8601") == 0 ) { + tf = threading::formatter::JSON::TS_ISO8601; +} +else +{ --- End diff -- I copied this approach from the ASCII log writer here: https://github.com/bro/bro/blob/fc33bf2014704fe0ae512b76dae64de7fc5e83ac/src/logging/writers/ascii/Ascii.cc#L166-L176 I don't think BinPac can pass through a BinPac enum to a C++ enum. Bro enums are defined at runtime, whereas C++ is a compile-time definition. If you can point to an example to make this work, I'm happy to do it. The C API for BinPac isn't extremely well documented :face_with_head_bandage: ---
[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...
Github user dcode commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172225023 --- Diff: README.md --- @@ -37,10 +37,11 @@ The following examples highlight different ways that the plugin can be used. Si ### Example 1 -The goal in this example is to send all HTTP and DNS records to a Kafka topic named `bro`. +The goal in this example is to send all HTTP and DNS records to a Kafka topic named `bro`. * Any configuration value accepted by librdkafka can be added to the `kafka_conf` configuration table. * By defining `topic_name` all records will be sent to the same Kafka topic. - * Defining `logs_to_send` will ensure that only HTTP and DNS records are sent. + * Defining `logs_to_send` will ensure that only HTTP and DNS records are sent. An empty set will default to all logs being --- End diff -- willdo ---
[GitHub] metron pull request #947: METRON-1467: Replace guava caches in places where ...
GitHub user cestella reopened a pull request: https://github.com/apache/metron/pull/947 METRON-1467: Replace guava caches in places where the keyspace might be large (NOTE: Review after METRON-1460) ## Contributor Comments Based on the performance tuning exercise as part of METRON-1460, guava has difficulties with cache sizes over 10k. We, unfortunately, are quite demanding of guava in this regard so we should transition a few uses of guava to Caffeine: * Stellar processor cache * The JoinBolt cache * The Enrichment Bolt Cache NOTE: This depends on METRON-1460 aka #940 and that PR is included in here. It might be easier to review after #940 is merged. Test plan pending ## Pull Request Checklist Thank you for submitting a contribution to Apache Metron. Please refer to our [Development Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235) for the complete guide to follow for contributions. Please refer also to our [Build Verification Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview) for complete smoke testing guides. In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following: ### For all changes: - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [x] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? ### For code changes: - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [x] Have you included steps or a guide to how the change may be verified and tested manually? - [x] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: ``` mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh ``` - [x] Have you written or updated unit tests and or integration tests to verify your changes? - [x] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [x] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? ### For documentation related changes: - [x] Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via `site-book/target/site/index.html`: ``` cd site-book mvn site ``` Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. It is also recommended that [travis-ci](https://travis-ci.org) is set up for your personal repository such that your branches are built there before submitting a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cestella/incubator-metron guava_cache_replacement Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/947.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #947 commit a4f618a3ad895d62772366e0e93e5b8b37c5c964 Author: cstellaDate: 2018-02-21T23:59:16Z Adding parallel enrichment bolt. commit 99fe0b86005fe04294b3851a17ae3d88f228c5d2 Author: cstella Date: 2018-02-22T00:21:06Z Updating to include trace statements. commit 79736c6f3fab04d01dd1eb998b308f438003a0e1 Author: cstella Date: 2018-02-22T15:35:44Z Updating with some cleanup commit cb4a527c9146865dafad1d597ba93032ef398d94 Author: cstella Date: 2018-02-22T15:48:11Z Updating spec. commit fb4d4383f366776f446e33a422652c3ec1f56bfa Author: cstella Date: 2018-02-22T18:00:36Z Updating threadpool creation commit 87ef6a72827c31f8adee42ee71272a32c350bc1f Author: cstella Date: 2018-02-22T18:04:37Z better docs commit 6ae9594ee4ae2b4d33e0feca398b527077dac0d3 Author: cstella Date: 2018-02-22T18:41:20Z Updating readme. commit 82ebc9550d759ea0bd06b48c586fd5e53c6e553a Author: cstella Date:
[GitHub] metron pull request #947: METRON-1467: Replace guava caches in places where ...
Github user cestella closed the pull request at: https://github.com/apache/metron/pull/947 ---
[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172192869 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for } KafkaWriter::~KafkaWriter() -{} +{ + +// Cleanup all the things +delete topic; +delete producer; +delete formatter; +delete conf; +delete topic_conf; + +} bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const threading::Field* const* fields) { +// Timeformat object, default to TS_EPOCH +threading::formatter::JSON::TimeFormat tf = threading::formatter::JSON::TS_EPOCH; + // if no global 'topic_name' is defined, use the log stream's 'path' if(topic_name.empty()) { topic_name = info.path; } +// format timestamps +if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) { --- End diff -- Just curious, why do we treat `json_timestamps` as a string? Why not just treat it as a `JSON::TimestampFormat`? Wouldn't that simplify a lot of this logic and remove all of the string comparison and string copy logic? ---
[GitHub] metron-bro-plugin-kafka pull request #6: Configurable JSON timestamps and de...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron-bro-plugin-kafka/pull/6#discussion_r172193204 --- Diff: src/KafkaWriter.cc --- @@ -54,20 +66,49 @@ KafkaWriter::KafkaWriter(WriterFrontend* frontend): WriterBackend(frontend), for } KafkaWriter::~KafkaWriter() -{} +{ + +// Cleanup all the things +delete topic; +delete producer; +delete formatter; +delete conf; +delete topic_conf; + +} bool KafkaWriter::DoInit(const WriterInfo& info, int num_fields, const threading::Field* const* fields) { +// Timeformat object, default to TS_EPOCH +threading::formatter::JSON::TimeFormat tf = threading::formatter::JSON::TS_EPOCH; + // if no global 'topic_name' is defined, use the log stream's 'path' if(topic_name.empty()) { topic_name = info.path; } +// format timestamps +if ( strcmp(json_timestamps.c_str(), "JSON::TS_EPOCH") == 0 ) { + tf = threading::formatter::JSON::TS_EPOCH; +} +else if ( strcmp(json_timestamps.c_str(), "JSON::TS_MILLIS") == 0 ) { + tf = threading::formatter::JSON::TS_MILLIS; +} +else if ( strcmp(json_timestamps.c_str(), "JSON::TS_ISO8601") == 0 ) { + tf = threading::formatter::JSON::TS_ISO8601; +} +else +{ --- End diff -- Small nit: Can we join the open paren to the line above just to match the rest of the code style. Gracias. ---