[jira] [Commented] (METRON-1707) Port Profiler to Spark
[ https://issues.apache.org/jira/browse/METRON-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578957#comment-16578957 ] ASF GitHub Bot commented on METRON-1707: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1150#discussion_r209771180 --- Diff: metron-analytics/metron-profiler-spark/src/main/java/org/apache/metron/profiler/spark/function/ProfileBuilderFunction.java --- @@ -0,0 +1,107 @@ +/* + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ +package org.apache.metron.profiler.spark.function; + +import org.apache.metron.profiler.DefaultMessageDistributor; +import org.apache.metron.profiler.MessageDistributor; +import org.apache.metron.profiler.MessageRoute; +import org.apache.metron.profiler.ProfileMeasurement; +import org.apache.metron.profiler.spark.ProfileMeasurementAdapter; +import org.apache.metron.stellar.dsl.Context; +import org.apache.spark.api.java.function.MapGroupsFunction; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.lang.invoke.MethodHandles; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.Properties; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import java.util.stream.StreamSupport; + +import static java.util.Comparator.comparing; +import static org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION; +import static org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION_UNITS; + +/** + * The function responsible for building profiles in Spark. + */ +public class ProfileBuilderFunction implements MapGroupsFunction { + + protected static final Logger LOG = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + private long periodDurationMillis; + private Map globals; + + public ProfileBuilderFunction(Properties properties, Map globals) { +TimeUnit periodDurationUnits = TimeUnit.valueOf(PERIOD_DURATION_UNITS.get(properties, String.class)); +int periodDuration = PERIOD_DURATION.get(properties, Integer.class); +this.periodDurationMillis = periodDurationUnits.toMillis(periodDuration); +this.globals = globals; + } + + /** + * Build a profile from a set of message routes. + * + * This assumes that all of the necessary routes have been provided + * + * @param group The group identifier. + * @param iterator The message routes. + * @return + */ + @Override + public ProfileMeasurementAdapter call(String group, Iterator iterator) throws Exception { +// create the distributor; some settings are unnecessary because it is cleaned-up immediately after processing the batch +int maxRoutes = Integer.MAX_VALUE; +long profileTTLMillis = Long.MAX_VALUE; +MessageDistributor distributor = new DefaultMessageDistributor(periodDurationMillis, profileTTLMillis, maxRoutes); +Context context = TaskUtils.getContext(globals); + +// sort the messages/routes +List routes = toStream(iterator) +.sorted(comparing(rt -> rt.getTimestamp())) +.collect(Collectors.toList()); +LOG.debug("Building a profile for group '{}' from {} message(s)", group, routes.size()); + +// apply each message/route to build the profile +for(MessageRoute route: routes) { + distributor.distribute(route, context); +} --- End diff -- Or maybe we not need to apply the timestamps in order? There are no strict guarantees of ordering when the Profiler runs in Storm really. Hmm. > Port Profiler to Spark > -- > > Key: METRON-1707 > URL: https://issues.apache.org/jira/browse/METRON-1707 > Project: Metron >
[GitHub] metron pull request #1150: METRON-1707 Port Profiler to Spark [Feature Branc...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1150#discussion_r209771180 --- Diff: metron-analytics/metron-profiler-spark/src/main/java/org/apache/metron/profiler/spark/function/ProfileBuilderFunction.java --- @@ -0,0 +1,107 @@ +/* + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ +package org.apache.metron.profiler.spark.function; + +import org.apache.metron.profiler.DefaultMessageDistributor; +import org.apache.metron.profiler.MessageDistributor; +import org.apache.metron.profiler.MessageRoute; +import org.apache.metron.profiler.ProfileMeasurement; +import org.apache.metron.profiler.spark.ProfileMeasurementAdapter; +import org.apache.metron.stellar.dsl.Context; +import org.apache.spark.api.java.function.MapGroupsFunction; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.lang.invoke.MethodHandles; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.Properties; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import java.util.stream.StreamSupport; + +import static java.util.Comparator.comparing; +import static org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION; +import static org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION_UNITS; + +/** + * The function responsible for building profiles in Spark. + */ +public class ProfileBuilderFunction implements MapGroupsFunction { + + protected static final Logger LOG = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + private long periodDurationMillis; + private Map globals; + + public ProfileBuilderFunction(Properties properties, Map globals) { +TimeUnit periodDurationUnits = TimeUnit.valueOf(PERIOD_DURATION_UNITS.get(properties, String.class)); +int periodDuration = PERIOD_DURATION.get(properties, Integer.class); +this.periodDurationMillis = periodDurationUnits.toMillis(periodDuration); +this.globals = globals; + } + + /** + * Build a profile from a set of message routes. + * + * This assumes that all of the necessary routes have been provided + * + * @param group The group identifier. + * @param iterator The message routes. + * @return + */ + @Override + public ProfileMeasurementAdapter call(String group, Iterator iterator) throws Exception { +// create the distributor; some settings are unnecessary because it is cleaned-up immediately after processing the batch +int maxRoutes = Integer.MAX_VALUE; +long profileTTLMillis = Long.MAX_VALUE; +MessageDistributor distributor = new DefaultMessageDistributor(periodDurationMillis, profileTTLMillis, maxRoutes); +Context context = TaskUtils.getContext(globals); + +// sort the messages/routes +List routes = toStream(iterator) +.sorted(comparing(rt -> rt.getTimestamp())) +.collect(Collectors.toList()); +LOG.debug("Building a profile for group '{}' from {} message(s)", group, routes.size()); + +// apply each message/route to build the profile +for(MessageRoute route: routes) { + distributor.distribute(route, context); +} --- End diff -- Or maybe we not need to apply the timestamps in order? There are no strict guarantees of ordering when the Profiler runs in Storm really. Hmm. ---
[jira] [Commented] (METRON-1735) Empty print status option causes NPE
[ https://issues.apache.org/jira/browse/METRON-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578946#comment-16578946 ] ASF GitHub Bot commented on METRON-1735: Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/1160 @justinleet it was something that needed to be fixed and should be good now. > Empty print status option causes NPE > > > Key: METRON-1735 > URL: https://issues.apache.org/jira/browse/METRON-1735 > Project: Metron > Issue Type: Sub-task >Reporter: Ryan Merriman >Assignee: Ryan Merriman >Priority: Major > > REST does not set a print job status property causing a NPE in PcapJob > because the property is never added to the config. The PcapJob should > default to false. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron issue #1160: METRON-1735: Empty print status option causes NPE
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/1160 @justinleet it was something that needed to be fixed and should be good now. ---
[GitHub] metron pull request #1150: METRON-1707 Port Profiler to Spark [Feature Branc...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1150#discussion_r209764724 --- Diff: metron-analytics/metron-profiler-spark/src/main/java/org/apache/metron/profiler/spark/function/ProfileBuilderFunction.java --- @@ -0,0 +1,107 @@ +/* + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ +package org.apache.metron.profiler.spark.function; + +import org.apache.metron.profiler.DefaultMessageDistributor; +import org.apache.metron.profiler.MessageDistributor; +import org.apache.metron.profiler.MessageRoute; +import org.apache.metron.profiler.ProfileMeasurement; +import org.apache.metron.profiler.spark.ProfileMeasurementAdapter; +import org.apache.metron.stellar.dsl.Context; +import org.apache.spark.api.java.function.MapGroupsFunction; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.lang.invoke.MethodHandles; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.Properties; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import java.util.stream.StreamSupport; + +import static java.util.Comparator.comparing; +import static org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION; +import static org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION_UNITS; + +/** + * The function responsible for building profiles in Spark. + */ +public class ProfileBuilderFunction implements MapGroupsFunction { + + protected static final Logger LOG = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + private long periodDurationMillis; + private Map globals; + + public ProfileBuilderFunction(Properties properties, Map globals) { +TimeUnit periodDurationUnits = TimeUnit.valueOf(PERIOD_DURATION_UNITS.get(properties, String.class)); +int periodDuration = PERIOD_DURATION.get(properties, Integer.class); +this.periodDurationMillis = periodDurationUnits.toMillis(periodDuration); +this.globals = globals; + } + + /** + * Build a profile from a set of message routes. + * + * This assumes that all of the necessary routes have been provided + * + * @param group The group identifier. + * @param iterator The message routes. + * @return + */ + @Override + public ProfileMeasurementAdapter call(String group, Iterator iterator) throws Exception { +// create the distributor; some settings are unnecessary because it is cleaned-up immediately after processing the batch +int maxRoutes = Integer.MAX_VALUE; +long profileTTLMillis = Long.MAX_VALUE; +MessageDistributor distributor = new DefaultMessageDistributor(periodDurationMillis, profileTTLMillis, maxRoutes); +Context context = TaskUtils.getContext(globals); + +// sort the messages/routes +List routes = toStream(iterator) +.sorted(comparing(rt -> rt.getTimestamp())) +.collect(Collectors.toList()); +LOG.debug("Building a profile for group '{}' from {} message(s)", group, routes.size()); + +// apply each message/route to build the profile +for(MessageRoute route: routes) { + distributor.distribute(route, context); +} --- End diff -- > @simonellistonball: Do we have to use groupByKey in the spark implementation, is it not possible to use reduceByKey to build the profiles... What we do now is group by (profile, entity, period) to aggregate all of the messages needed to produce a measurement for any given profile period. Then those messages are sorted by timestamp and applied to the profile in that order. I didn't see an easy way to use `reduceByKey` and ensure that the messages are applied to the profile in timestamp order. Can you think of an alternative that maintains the ordering?
[jira] [Commented] (METRON-1707) Port Profiler to Spark
[ https://issues.apache.org/jira/browse/METRON-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578927#comment-16578927 ] ASF GitHub Bot commented on METRON-1707: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1150#discussion_r209764724 --- Diff: metron-analytics/metron-profiler-spark/src/main/java/org/apache/metron/profiler/spark/function/ProfileBuilderFunction.java --- @@ -0,0 +1,107 @@ +/* + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + */ +package org.apache.metron.profiler.spark.function; + +import org.apache.metron.profiler.DefaultMessageDistributor; +import org.apache.metron.profiler.MessageDistributor; +import org.apache.metron.profiler.MessageRoute; +import org.apache.metron.profiler.ProfileMeasurement; +import org.apache.metron.profiler.spark.ProfileMeasurementAdapter; +import org.apache.metron.stellar.dsl.Context; +import org.apache.spark.api.java.function.MapGroupsFunction; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.lang.invoke.MethodHandles; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.Properties; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; +import java.util.stream.Stream; +import java.util.stream.StreamSupport; + +import static java.util.Comparator.comparing; +import static org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION; +import static org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION_UNITS; + +/** + * The function responsible for building profiles in Spark. + */ +public class ProfileBuilderFunction implements MapGroupsFunction { + + protected static final Logger LOG = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + + private long periodDurationMillis; + private Map globals; + + public ProfileBuilderFunction(Properties properties, Map globals) { +TimeUnit periodDurationUnits = TimeUnit.valueOf(PERIOD_DURATION_UNITS.get(properties, String.class)); +int periodDuration = PERIOD_DURATION.get(properties, Integer.class); +this.periodDurationMillis = periodDurationUnits.toMillis(periodDuration); +this.globals = globals; + } + + /** + * Build a profile from a set of message routes. + * + * This assumes that all of the necessary routes have been provided + * + * @param group The group identifier. + * @param iterator The message routes. + * @return + */ + @Override + public ProfileMeasurementAdapter call(String group, Iterator iterator) throws Exception { +// create the distributor; some settings are unnecessary because it is cleaned-up immediately after processing the batch +int maxRoutes = Integer.MAX_VALUE; +long profileTTLMillis = Long.MAX_VALUE; +MessageDistributor distributor = new DefaultMessageDistributor(periodDurationMillis, profileTTLMillis, maxRoutes); +Context context = TaskUtils.getContext(globals); + +// sort the messages/routes +List routes = toStream(iterator) +.sorted(comparing(rt -> rt.getTimestamp())) +.collect(Collectors.toList()); +LOG.debug("Building a profile for group '{}' from {} message(s)", group, routes.size()); + +// apply each message/route to build the profile +for(MessageRoute route: routes) { + distributor.distribute(route, context); +} --- End diff -- > @simonellistonball: Do we have to use groupByKey in the spark implementation, is it not possible to use reduceByKey to build the profiles... What we do now is group by (profile, entity, period) to aggregate all of the messages needed to produce a measurement for any given profile period. Then those messages are sorted by timestamp
[jira] [Created] (METRON-1737) Document Job cleanup
Ryan Merriman created METRON-1737: - Summary: Document Job cleanup Key: METRON-1737 URL: https://issues.apache.org/jira/browse/METRON-1737 Project: Metron Issue Type: Sub-task Reporter: Ryan Merriman Pcap query results are written to HDFS. Overtime more HDFS file space will be used as queries are run. There is currently no automated cleanup feature so we need to document how to do this in case a user needs to do it manually or with a script. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1733) PCAP UI - PCAP queries don't work on Safari
[ https://issues.apache.org/jira/browse/METRON-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578913#comment-16578913 ] ASF GitHub Bot commented on METRON-1733: Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/1158 Can you close this @sardell? > PCAP UI - PCAP queries don't work on Safari > --- > > Key: METRON-1733 > URL: https://issues.apache.org/jira/browse/METRON-1733 > Project: Metron > Issue Type: Sub-task >Reporter: Shane Ardell >Assignee: Shane Ardell >Priority: Major > > On Safari, PCAP queries fail with a 500 internal server error. No issues seen > with Chrome or Firefox. After digging into the search request, it looks like > the values for the startTime and endTime are 'NaN'. It looks like Safari > cannot parse the format of the time we are passing to the getDate() funciton. > For more on this issue: > https://stackoverflow.com/questions/21883699/safari-javascript-date-nan-issue--mm-dd-hhmmss -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron issue #1158: METRON-1733: PCAP UI - PCAP queries don't work on Safari
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/1158 Can you close this @sardell? ---
[jira] [Commented] (METRON-1733) PCAP UI - PCAP queries don't work on Safari
[ https://issues.apache.org/jira/browse/METRON-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578910#comment-16578910 ] ASF GitHub Bot commented on METRON-1733: Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/1158 I verified this in full dev. +1 > PCAP UI - PCAP queries don't work on Safari > --- > > Key: METRON-1733 > URL: https://issues.apache.org/jira/browse/METRON-1733 > Project: Metron > Issue Type: Sub-task >Reporter: Shane Ardell >Assignee: Shane Ardell >Priority: Major > > On Safari, PCAP queries fail with a 500 internal server error. No issues seen > with Chrome or Firefox. After digging into the search request, it looks like > the values for the startTime and endTime are 'NaN'. It looks like Safari > cannot parse the format of the time we are passing to the getDate() funciton. > For more on this issue: > https://stackoverflow.com/questions/21883699/safari-javascript-date-nan-issue--mm-dd-hhmmss -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron issue #1158: METRON-1733: PCAP UI - PCAP queries don't work on Safari
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/1158 I verified this in full dev. +1 ---
[jira] [Commented] (METRON-1734) Src and Dst port filters are incorrect after changing to empty
[ https://issues.apache.org/jira/browse/METRON-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578884#comment-16578884 ] ASF GitHub Bot commented on METRON-1734: Github user merrimanr closed the pull request at: https://github.com/apache/metron/pull/1159 > Src and Dst port filters are incorrect after changing to empty > -- > > Key: METRON-1734 > URL: https://issues.apache.org/jira/browse/METRON-1734 > Project: Metron > Issue Type: Sub-task >Reporter: Ryan Merriman >Assignee: Ryan Merriman >Priority: Major > > When changing a port filter after a job has run, setting it to empty causes > the old value to be sent in the request. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron pull request #1159: METRON-1734: Src and Dst port filters are incorre...
Github user merrimanr closed the pull request at: https://github.com/apache/metron/pull/1159 ---
[jira] [Commented] (METRON-1736) Enhance Batch Profiler Integration Test
[ https://issues.apache.org/jira/browse/METRON-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578868#comment-16578868 ] ASF GitHub Bot commented on METRON-1736: GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/1162 METRON-1736 Enhance Batch Profiler Integration Test The integration test for the Batch Profiler should use the Profiler Client API and `PROFILE_GET` to validate the values that are produced. This is more effective end-to-end validation that the Batch Profiler is working. This is a pull request against the `METRON-1699-create-batch-profiler` feature branch. This is dependent on #1161 . By filtering on the last commit, this PR can be reviewed before the others are reviewed and merged. ## Pull Request Checklist - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [x] Have you included steps or a guide to how the change may be verified and tested manually? - [x] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: - [x] Have you written or updated unit tests and or integration tests to verify your changes? - [x] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [x] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? You can merge this pull request into a Git repository by running: $ git pull https://github.com/nickwallen/metron METRON-1736 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/1162.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1162 commit 6ce28594659928a8c87c57e22e1ab00d798d Author: Nick Allen Date: 2018-07-10T14:08:48Z METRON-1703 Make Core Profiler Components Serializable commit 0051359cbb277881de896526345bb4fce1d5139c Author: Nick Allen Date: 2018-07-10T19:42:19Z METRON-1704 Message Timestamp Logic Should be Shared commit 2413726bdf96221ec775a9c8de524e3ec92148b7 Author: Nick Allen Date: 2018-07-27T17:20:15Z METRON-1706: HbaseClient.mutate should return the number of mutations commit 21980ca764b98ddb96c4c8732e0ef7a6c5ea2c56 Author: Nick Allen Date: 2018-07-24T18:02:36Z METRON-1705 Create ProfilePeriod Using Period ID commit be15126419a2862864a7acd67349281b086f52cf Author: Nick Allen Date: 2018-07-31T19:26:20Z METRON-1707 Port Profiler to Spark commit c410e412c50f4510f8674cd4fa5d4481f28a4a13 Author: Nick Allen Date: 2018-08-09T15:54:41Z No need to handle packaging yet. That will come in a future PR commit f1a8b49f99029e8d801dc62cfa9c2a0827a46cd8 Author: Nick Allen Date: 2018-08-13T13:25:56Z Renamed execute() to run() commit 7f585e0afaa76386934f785407eecc5d65175d8c Author: Nick Allen Date: 2018-08-13T14:52:17Z Reducing the size of the telemetry for the integration test. No need to have so much data commit 6bce4797b33bee6c161b81188f94b4fa3e931a53 Author: Nick Allen Date: 2018-08-13T19:14:48Z Only create an Hbase connection if there are measurements to write commit ca038f9e4e65212158970046ce95c681a2ebda1b Author: Nick Allen Date: 2018-08-13T20:24:58Z METRON-1736 Enahnce Batch Profiler Integration Test > Enhance Batch Profiler Integration Test > --- > > Key: METRON-1736 > URL: https://issues.apache.org/jira/browse/METRON-1736 > Project: Metron > Issue Type: Sub-task >Reporter: Nick Allen >Assignee: Nick Allen >Priority: Major > > The integration test for the Batch Profiler should use the Profiler Client > API and `PROFILE_GET` to validate the values that are produced. This is more > effective end-to-end validation that the Batch Profiler is working. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron pull request #1162: METRON-1736 Enhance Batch Profiler Integration Te...
GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/1162 METRON-1736 Enhance Batch Profiler Integration Test The integration test for the Batch Profiler should use the Profiler Client API and `PROFILE_GET` to validate the values that are produced. This is more effective end-to-end validation that the Batch Profiler is working. This is a pull request against the `METRON-1699-create-batch-profiler` feature branch. This is dependent on #1161 . By filtering on the last commit, this PR can be reviewed before the others are reviewed and merged. ## Pull Request Checklist - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [x] Have you included steps or a guide to how the change may be verified and tested manually? - [x] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: - [x] Have you written or updated unit tests and or integration tests to verify your changes? - [x] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [x] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? You can merge this pull request into a Git repository by running: $ git pull https://github.com/nickwallen/metron METRON-1736 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/1162.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1162 commit 6ce28594659928a8c87c57e22e1ab00d798d Author: Nick Allen Date: 2018-07-10T14:08:48Z METRON-1703 Make Core Profiler Components Serializable commit 0051359cbb277881de896526345bb4fce1d5139c Author: Nick Allen Date: 2018-07-10T19:42:19Z METRON-1704 Message Timestamp Logic Should be Shared commit 2413726bdf96221ec775a9c8de524e3ec92148b7 Author: Nick Allen Date: 2018-07-27T17:20:15Z METRON-1706: HbaseClient.mutate should return the number of mutations commit 21980ca764b98ddb96c4c8732e0ef7a6c5ea2c56 Author: Nick Allen Date: 2018-07-24T18:02:36Z METRON-1705 Create ProfilePeriod Using Period ID commit be15126419a2862864a7acd67349281b086f52cf Author: Nick Allen Date: 2018-07-31T19:26:20Z METRON-1707 Port Profiler to Spark commit c410e412c50f4510f8674cd4fa5d4481f28a4a13 Author: Nick Allen Date: 2018-08-09T15:54:41Z No need to handle packaging yet. That will come in a future PR commit f1a8b49f99029e8d801dc62cfa9c2a0827a46cd8 Author: Nick Allen Date: 2018-08-13T13:25:56Z Renamed execute() to run() commit 7f585e0afaa76386934f785407eecc5d65175d8c Author: Nick Allen Date: 2018-08-13T14:52:17Z Reducing the size of the telemetry for the integration test. No need to have so much data commit 6bce4797b33bee6c161b81188f94b4fa3e931a53 Author: Nick Allen Date: 2018-08-13T19:14:48Z Only create an Hbase connection if there are measurements to write commit ca038f9e4e65212158970046ce95c681a2ebda1b Author: Nick Allen Date: 2018-08-13T20:24:58Z METRON-1736 Enahnce Batch Profiler Integration Test ---
[jira] [Created] (METRON-1736) Enhance Batch Profiler Integration Test
Nick Allen created METRON-1736: -- Summary: Enhance Batch Profiler Integration Test Key: METRON-1736 URL: https://issues.apache.org/jira/browse/METRON-1736 Project: Metron Issue Type: Sub-task Reporter: Nick Allen Assignee: Nick Allen The integration test for the Batch Profiler should use the Profiler Client API and `PROFILE_GET` to validate the values that are produced. This is more effective end-to-end validation that the Batch Profiler is working. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1708) Run the Batch Profiler in Spark
[ https://issues.apache.org/jira/browse/METRON-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578777#comment-16578777 ] ASF GitHub Bot commented on METRON-1708: GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/1161 METRON-1708 Run the Batch Profiler in Spark This adds the ability to run the Batch Profiler from the command line. This also packages up the Batch Profiler into a tarball. This is a pull request against the `METRON-1699-create-batch-profiler` feature branch. This is dependent on #1145 #1146 #1148 #1147 #1150 . By filtering on the last commit, this PR can be reviewed before the others are reviewed and merged. ## Testing 1. Start-up the development environment. Allow Metron to run for a bit so that a fair amount of telemetry is archived in HDFS. 1. Stop all Metron services. 1. Install Spark2 using Ambari. * Use Add Service > Spark2, then follow prompts. 1. Deploy the Batch Profiler to the development environment. From the host machine; outside the development VM, run the following. ``` cd metron-deployment/development/centos6 vagrant scp ../../../metron-analytics/metron-profiler-spark/target/metron-profiler-spark-0.5.1-archive.tar.gz /tmp ``` Then from the development VM, run the following. ``` source /etc/default/metron cd $METRON_HOME tar -xvf /tmp/metron-profiler-spark-0.5.1-archive.tar.gz ``` 1. Create a profile by editing `$METRON_HOME/config/zookeeper/profiler.json` as follows. ``` [root@node1 0.5.1]# cat $METRON_HOME/config/zookeeper/profiler.json { "profiles": [ { "profile": "hello-world", "foreach": "'global'", "init":{ "count": "0" }, "update": { "count": "count + 1" }, "result": "count" } ], "timestampField": "timestamp" } ``` 1. Count the number of messages in the 'indexing' topic. This should not be changing. ``` [root@node1 ~]# /usr/hdp/current/kafka-broker/bin/kafka-run-class.sh kafka.tools.GetOffsetShell \ --broker-list $BROKERLIST \ --topic indexing \ --time -1 indexing:0:8130 ``` In this case there are 8,131 messages. 1. Delete any previously written profile measurements from HBase. ``` [root@node1 ~]# hbase shell ... hbase(main):001:0> truncate 'profiler' Truncating 'profiler' table (it may take a while): - Disabling table... - Truncating table... 0 row(s) in 4.1070 seconds ``` 1. Confirm that all of the messages were successfully indexed in HDFS. ``` [root@node1 ~]# hdfs dfs -cat /apps/metron/indexing/indexed/*/* | wc -l 8130 ``` * Remember that we found 8,130 in the indexing topic previously. This shows that all of them were indexed in HDFS successfully. 1. Alter the `$METRON_HOME/config/batch-profiler.properties` as follows. ``` [root@node1 0.5.1]# cat config/batch-profiler.properties spark.master=local spark.app.name=Batch Profiler spark.sql.shuffle.partitions=8 profiler.period.duration=1 profiler.period.duration.units=MINUTES profiler.batch.input.path=hdfs://localhost:8020/apps/metron/indexing/indexed/*/* ``` 1. Fix-up some of the Spark configuration. ``` SPARK_HOME=/usr/hdp/current/spark2-client cp /usr/hdp/current/hbase-client/conf/hbase-site.xml $SPARK_HOME/conf/ cp $SPARK_HOME/conf/log4j.properties.template $SPARK_HOME/conf/log4j.properties echo "log4j.logger.org.apache.metron.profiler.spark=DEBUG" >> $SPARK_HOME/conf/log4j.properties ``` 1. You may need to create the Spark history directory in HDFS (if doing this in Full Dev.) ``` export HADOOP_USER_NAME=hdfs hdfs dfs -mkdir /spark2-history ``` 1. You may want to edit the log4j properties that sits in your config directory in $SPARK_HOME, or create one. ``` # Set everything to be logged to the console log4j.rootCategory=WARN, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Set the default spark-shell log level to WARN. When running the spark-shell, the # log level for this class is used to overwrite the root
[GitHub] metron pull request #1161: METRON-1708 Run the Batch Profiler in Spark
GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/1161 METRON-1708 Run the Batch Profiler in Spark This adds the ability to run the Batch Profiler from the command line. This also packages up the Batch Profiler into a tarball. This is a pull request against the `METRON-1699-create-batch-profiler` feature branch. This is dependent on #1145 #1146 #1148 #1147 #1150 . By filtering on the last commit, this PR can be reviewed before the others are reviewed and merged. ## Testing 1. Start-up the development environment. Allow Metron to run for a bit so that a fair amount of telemetry is archived in HDFS. 1. Stop all Metron services. 1. Install Spark2 using Ambari. * Use Add Service > Spark2, then follow prompts. 1. Deploy the Batch Profiler to the development environment. From the host machine; outside the development VM, run the following. ``` cd metron-deployment/development/centos6 vagrant scp ../../../metron-analytics/metron-profiler-spark/target/metron-profiler-spark-0.5.1-archive.tar.gz /tmp ``` Then from the development VM, run the following. ``` source /etc/default/metron cd $METRON_HOME tar -xvf /tmp/metron-profiler-spark-0.5.1-archive.tar.gz ``` 1. Create a profile by editing `$METRON_HOME/config/zookeeper/profiler.json` as follows. ``` [root@node1 0.5.1]# cat $METRON_HOME/config/zookeeper/profiler.json { "profiles": [ { "profile": "hello-world", "foreach": "'global'", "init":{ "count": "0" }, "update": { "count": "count + 1" }, "result": "count" } ], "timestampField": "timestamp" } ``` 1. Count the number of messages in the 'indexing' topic. This should not be changing. ``` [root@node1 ~]# /usr/hdp/current/kafka-broker/bin/kafka-run-class.sh kafka.tools.GetOffsetShell \ --broker-list $BROKERLIST \ --topic indexing \ --time -1 indexing:0:8130 ``` In this case there are 8,131 messages. 1. Delete any previously written profile measurements from HBase. ``` [root@node1 ~]# hbase shell ... hbase(main):001:0> truncate 'profiler' Truncating 'profiler' table (it may take a while): - Disabling table... - Truncating table... 0 row(s) in 4.1070 seconds ``` 1. Confirm that all of the messages were successfully indexed in HDFS. ``` [root@node1 ~]# hdfs dfs -cat /apps/metron/indexing/indexed/*/* | wc -l 8130 ``` * Remember that we found 8,130 in the indexing topic previously. This shows that all of them were indexed in HDFS successfully. 1. Alter the `$METRON_HOME/config/batch-profiler.properties` as follows. ``` [root@node1 0.5.1]# cat config/batch-profiler.properties spark.master=local spark.app.name=Batch Profiler spark.sql.shuffle.partitions=8 profiler.period.duration=1 profiler.period.duration.units=MINUTES profiler.batch.input.path=hdfs://localhost:8020/apps/metron/indexing/indexed/*/* ``` 1. Fix-up some of the Spark configuration. ``` SPARK_HOME=/usr/hdp/current/spark2-client cp /usr/hdp/current/hbase-client/conf/hbase-site.xml $SPARK_HOME/conf/ cp $SPARK_HOME/conf/log4j.properties.template $SPARK_HOME/conf/log4j.properties echo "log4j.logger.org.apache.metron.profiler.spark=DEBUG" >> $SPARK_HOME/conf/log4j.properties ``` 1. You may need to create the Spark history directory in HDFS (if doing this in Full Dev.) ``` export HADOOP_USER_NAME=hdfs hdfs dfs -mkdir /spark2-history ``` 1. You may want to edit the log4j properties that sits in your config directory in $SPARK_HOME, or create one. ``` # Set everything to be logged to the console log4j.rootCategory=WARN, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Set the default spark-shell log level to WARN. When running the spark-shell, the # log level for this class is used to overwrite the root logger's log level, so that # the user can have different defaults for the shell and regular Spark apps. log4j.logger.org.apache.spark.repl.Main=WARN # Settings to quiet third party logs that are too verbose
[jira] [Commented] (METRON-1733) PCAP UI - PCAP queries don't work on Safari
[ https://issues.apache.org/jira/browse/METRON-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578771#comment-16578771 ] ASF GitHub Bot commented on METRON-1733: Github user sardell commented on the issue: https://github.com/apache/metron/pull/1158 Closing and reopening to rerun Travis. > PCAP UI - PCAP queries don't work on Safari > --- > > Key: METRON-1733 > URL: https://issues.apache.org/jira/browse/METRON-1733 > Project: Metron > Issue Type: Sub-task >Reporter: Shane Ardell >Assignee: Shane Ardell >Priority: Major > > On Safari, PCAP queries fail with a 500 internal server error. No issues seen > with Chrome or Firefox. After digging into the search request, it looks like > the values for the startTime and endTime are 'NaN'. It looks like Safari > cannot parse the format of the time we are passing to the getDate() funciton. > For more on this issue: > https://stackoverflow.com/questions/21883699/safari-javascript-date-nan-issue--mm-dd-hhmmss -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron pull request #1158: METRON-1733: PCAP UI - PCAP queries don't work on...
GitHub user sardell reopened a pull request: https://github.com/apache/metron/pull/1158 METRON-1733: PCAP UI - PCAP queries don't work on Safari ## Contributor Comments This PR fixes a bug where Safari cannot read the format of the date we are passing to the startTimeMs and endTimeMs parameters. To resolve this, I used moment js (which was already being used in the project) to get the numeric value of the time strings instead of new Date().getTime(). ## Testing Using Safari, run a PCAP query in the Alerts UI. If you check the request payload, it should contain the correct numeric values for the startTimeMs and endTimeMs instead of NaN, and your search results should complete the same as they would in Chrome or another browser. ## Pull Request Checklist Thank you for submitting a contribution to Apache Metron. Please refer to our [Development Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235) for the complete guide to follow for contributions. Please refer also to our [Build Verification Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview) for complete smoke testing guides. In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following: ### For all changes: - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [x] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? ### For code changes: - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [x] Have you included steps or a guide to how the change may be verified and tested manually? - [x] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: ``` mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh ``` - n/a ~~Have you written or updated unit tests and or integration tests to verify your changes?~~ - n/a ~~If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?~~ - [x] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? ### For documentation related changes: - n/a ~~Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via `site-book/target/site/index.html`:~~ ``` cd site-book mvn site ``` Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. It is also recommended that [travis-ci](https://travis-ci.org) is set up for your personal repository such that your branches are built there before submitting a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sardell/metron METRON-1733 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/1158.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1158 commit 348d70da6b95ecd65f434f13805f0c95b0c62161 Author: Shane Ardell Date: 2018-08-10T15:34:52Z fix safari date NaN issue commit 027520ffba1577f3c0c8216c418d469465d496e7 Author: Shane Ardell Date: 2018-08-12T05:57:29Z Merge branch 'feature/METRON-1554-pcap-query-panel' into METRON-1733 ---
[jira] [Commented] (METRON-1733) PCAP UI - PCAP queries don't work on Safari
[ https://issues.apache.org/jira/browse/METRON-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578773#comment-16578773 ] ASF GitHub Bot commented on METRON-1733: GitHub user sardell reopened a pull request: https://github.com/apache/metron/pull/1158 METRON-1733: PCAP UI - PCAP queries don't work on Safari ## Contributor Comments This PR fixes a bug where Safari cannot read the format of the date we are passing to the startTimeMs and endTimeMs parameters. To resolve this, I used moment js (which was already being used in the project) to get the numeric value of the time strings instead of new Date().getTime(). ## Testing Using Safari, run a PCAP query in the Alerts UI. If you check the request payload, it should contain the correct numeric values for the startTimeMs and endTimeMs instead of NaN, and your search results should complete the same as they would in Chrome or another browser. ## Pull Request Checklist Thank you for submitting a contribution to Apache Metron. Please refer to our [Development Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235) for the complete guide to follow for contributions. Please refer also to our [Build Verification Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview) for complete smoke testing guides. In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following: ### For all changes: - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [x] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? ### For code changes: - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [x] Have you included steps or a guide to how the change may be verified and tested manually? - [x] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: ``` mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh ``` - n/a ~~Have you written or updated unit tests and or integration tests to verify your changes?~~ - n/a ~~If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?~~ - [x] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? ### For documentation related changes: - n/a ~~Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via `site-book/target/site/index.html`:~~ ``` cd site-book mvn site ``` Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. It is also recommended that [travis-ci](https://travis-ci.org) is set up for your personal repository such that your branches are built there before submitting a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sardell/metron METRON-1733 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/1158.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1158 commit 348d70da6b95ecd65f434f13805f0c95b0c62161 Author: Shane Ardell Date: 2018-08-10T15:34:52Z fix safari date NaN issue commit 027520ffba1577f3c0c8216c418d469465d496e7 Author: Shane Ardell Date: 2018-08-12T05:57:29Z Merge branch 'feature/METRON-1554-pcap-query-panel' into METRON-1733 > PCAP UI - PCAP queries don't work on Safari > --- > > Key: METRON-1733 > URL: https://issues.apache.org/jira/browse/METRON-1733 > Project: Metron > Issue Type: Sub-task >Reporter: Shane Ardell >Assignee: Shane Ardell >Priority: Major > > On Safari, PCAP queries fail with a 500 internal server error. No issues seen > with Chrome or Firefox. After
[jira] [Commented] (METRON-1733) PCAP UI - PCAP queries don't work on Safari
[ https://issues.apache.org/jira/browse/METRON-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578772#comment-16578772 ] ASF GitHub Bot commented on METRON-1733: Github user sardell closed the pull request at: https://github.com/apache/metron/pull/1158 > PCAP UI - PCAP queries don't work on Safari > --- > > Key: METRON-1733 > URL: https://issues.apache.org/jira/browse/METRON-1733 > Project: Metron > Issue Type: Sub-task >Reporter: Shane Ardell >Assignee: Shane Ardell >Priority: Major > > On Safari, PCAP queries fail with a 500 internal server error. No issues seen > with Chrome or Firefox. After digging into the search request, it looks like > the values for the startTime and endTime are 'NaN'. It looks like Safari > cannot parse the format of the time we are passing to the getDate() funciton. > For more on this issue: > https://stackoverflow.com/questions/21883699/safari-javascript-date-nan-issue--mm-dd-hhmmss -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron issue #1158: METRON-1733: PCAP UI - PCAP queries don't work on Safari
Github user sardell commented on the issue: https://github.com/apache/metron/pull/1158 Closing and reopening to rerun Travis. ---
[GitHub] metron pull request #1158: METRON-1733: PCAP UI - PCAP queries don't work on...
Github user sardell closed the pull request at: https://github.com/apache/metron/pull/1158 ---
[jira] [Commented] (METRON-1734) Src and Dst port filters are incorrect after changing to empty
[ https://issues.apache.org/jira/browse/METRON-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578758#comment-16578758 ] ASF GitHub Bot commented on METRON-1734: Github user justinleet commented on the issue: https://github.com/apache/metron/pull/1159 +1 by inspection, thanks for the explanation on the string vs. number and the casting. > Src and Dst port filters are incorrect after changing to empty > -- > > Key: METRON-1734 > URL: https://issues.apache.org/jira/browse/METRON-1734 > Project: Metron > Issue Type: Sub-task >Reporter: Ryan Merriman >Assignee: Ryan Merriman >Priority: Major > > When changing a port filter after a job has run, setting it to empty causes > the old value to be sent in the request. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron issue #1159: METRON-1734: Src and Dst port filters are incorrect afte...
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/1159 +1 by inspection, thanks for the explanation on the string vs. number and the casting. ---
[jira] [Commented] (METRON-1733) PCAP UI - PCAP queries don't work on Safari
[ https://issues.apache.org/jira/browse/METRON-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578732#comment-16578732 ] ASF GitHub Bot commented on METRON-1733: Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/1158 @sardell can you close and reopen this PR to trigger another travis run? > PCAP UI - PCAP queries don't work on Safari > --- > > Key: METRON-1733 > URL: https://issues.apache.org/jira/browse/METRON-1733 > Project: Metron > Issue Type: Sub-task >Reporter: Shane Ardell >Assignee: Shane Ardell >Priority: Major > > On Safari, PCAP queries fail with a 500 internal server error. No issues seen > with Chrome or Firefox. After digging into the search request, it looks like > the values for the startTime and endTime are 'NaN'. It looks like Safari > cannot parse the format of the time we are passing to the getDate() funciton. > For more on this issue: > https://stackoverflow.com/questions/21883699/safari-javascript-date-nan-issue--mm-dd-hhmmss -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron issue #1158: METRON-1733: PCAP UI - PCAP queries don't work on Safari
Github user merrimanr commented on the issue: https://github.com/apache/metron/pull/1158 @sardell can you close and reopen this PR to trigger another travis run? ---
[jira] [Commented] (METRON-1734) Src and Dst port filters are incorrect after changing to empty
[ https://issues.apache.org/jira/browse/METRON-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578693#comment-16578693 ] ASF GitHub Bot commented on METRON-1734: Github user merrimanr commented on a diff in the pull request: https://github.com/apache/metron/pull/1159#discussion_r209698673 --- Diff: metron-interface/metron-alerts/src/app/pcap/pcap-filters/pcap-filters.component.ts --- @@ -63,9 +63,13 @@ export class PcapFiltersComponent implements OnInit, OnChanges { this.model.endTimeMs = new Date(this.endTimeStr).getTime(); if (this.ipSrcPort !== '') { this.model.ipSrcPort = +this.ipSrcPort; +} else { --- End diff -- They are different types, one is a string and one is a number. I believe we created the this.ipSrcPort and this.ipDstPort string variables for regex validation purposes. As such we can't just assign an empty string to a number type. The + operator converts a string to a number. > Src and Dst port filters are incorrect after changing to empty > -- > > Key: METRON-1734 > URL: https://issues.apache.org/jira/browse/METRON-1734 > Project: Metron > Issue Type: Sub-task >Reporter: Ryan Merriman >Assignee: Ryan Merriman >Priority: Major > > When changing a port filter after a job has run, setting it to empty causes > the old value to be sent in the request. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1735) Empty print status option causes NPE
[ https://issues.apache.org/jira/browse/METRON-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578695#comment-16578695 ] ASF GitHub Bot commented on METRON-1735: GitHub user merrimanr reopened a pull request: https://github.com/apache/metron/pull/1160 METRON-1735: Empty print status option causes NPE ## Contributor Comments This is a regression in the feature branch introduced by https://github.com/apache/metron/pull/1138. The default behavior of PcapJob is that it should not print status by default and not fail when that setting is missing. ### Changed Included - Changed the default behavior of the Pcap CLI to print status by default - Removed the print status flag from the CLI - Fixed bug in getting print status option in PcapJob - Added getter/setting methods to PcapJob for testing purposes - Added test cases ### Testing Still testing in full dev. - You should get result in the Pcap UI now - The print status option in the CLI should be missing - The CLI should print status every time ## Pull Request Checklist Thank you for submitting a contribution to Apache Metron. Please refer to our [Development Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235) for the complete guide to follow for contributions. Please refer also to our [Build Verification Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview) for complete smoke testing guides. In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following: ### For all changes: - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [x] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? ### For code changes: - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [x] Have you included steps or a guide to how the change may be verified and tested manually? - [x] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: ``` mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh ``` - [x] Have you written or updated unit tests and or integration tests to verify your changes? - [x] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via `site-book/target/site/index.html`: ``` cd site-book mvn site ``` Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. It is also recommended that [travis-ci](https://travis-ci.org) is set up for your personal repository such that your branches are built there before submitting a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/merrimanr/incubator-metron METRON-1735 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/1160.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1160 commit d13db64c9ae816fded89128a99cd5d6a8a71648c Author: merrimanr Date: 2018-08-10T22:03:25Z initial commit > Empty print status option causes NPE > > > Key: METRON-1735 > URL: https://issues.apache.org/jira/browse/METRON-1735 > Project: Metron > Issue Type: Sub-task >Reporter: Ryan Merriman >Assignee: Ryan Merriman >Priority: Major > > REST does not set a print job status property causing a NPE in PcapJob > because the property is never added to the config. The PcapJob should > default to
[jira] [Commented] (METRON-1735) Empty print status option causes NPE
[ https://issues.apache.org/jira/browse/METRON-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578694#comment-16578694 ] ASF GitHub Bot commented on METRON-1735: Github user merrimanr closed the pull request at: https://github.com/apache/metron/pull/1160 > Empty print status option causes NPE > > > Key: METRON-1735 > URL: https://issues.apache.org/jira/browse/METRON-1735 > Project: Metron > Issue Type: Sub-task >Reporter: Ryan Merriman >Assignee: Ryan Merriman >Priority: Major > > REST does not set a print job status property causing a NPE in PcapJob > because the property is never added to the config. The PcapJob should > default to false. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron pull request #1160: METRON-1735: Empty print status option causes NPE
GitHub user merrimanr reopened a pull request: https://github.com/apache/metron/pull/1160 METRON-1735: Empty print status option causes NPE ## Contributor Comments This is a regression in the feature branch introduced by https://github.com/apache/metron/pull/1138. The default behavior of PcapJob is that it should not print status by default and not fail when that setting is missing. ### Changed Included - Changed the default behavior of the Pcap CLI to print status by default - Removed the print status flag from the CLI - Fixed bug in getting print status option in PcapJob - Added getter/setting methods to PcapJob for testing purposes - Added test cases ### Testing Still testing in full dev. - You should get result in the Pcap UI now - The print status option in the CLI should be missing - The CLI should print status every time ## Pull Request Checklist Thank you for submitting a contribution to Apache Metron. Please refer to our [Development Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235) for the complete guide to follow for contributions. Please refer also to our [Build Verification Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview) for complete smoke testing guides. In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following: ### For all changes: - [x] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [x] Does your PR title start with METRON- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? ### For code changes: - [x] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [x] Have you included steps or a guide to how the change may be verified and tested manually? - [x] Have you ensured that the full suite of tests and checks have been executed in the root metron folder via: ``` mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh ``` - [x] Have you written or updated unit tests and or integration tests to verify your changes? - [x] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? ### For documentation related changes: - [ ] Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via `site-book/target/site/index.html`: ``` cd site-book mvn site ``` Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. It is also recommended that [travis-ci](https://travis-ci.org) is set up for your personal repository such that your branches are built there before submitting a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/merrimanr/incubator-metron METRON-1735 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/1160.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1160 commit d13db64c9ae816fded89128a99cd5d6a8a71648c Author: merrimanr Date: 2018-08-10T22:03:25Z initial commit ---
[GitHub] metron pull request #1160: METRON-1735: Empty print status option causes NPE
Github user merrimanr closed the pull request at: https://github.com/apache/metron/pull/1160 ---
[GitHub] metron pull request #1159: METRON-1734: Src and Dst port filters are incorre...
Github user merrimanr commented on a diff in the pull request: https://github.com/apache/metron/pull/1159#discussion_r209698673 --- Diff: metron-interface/metron-alerts/src/app/pcap/pcap-filters/pcap-filters.component.ts --- @@ -63,9 +63,13 @@ export class PcapFiltersComponent implements OnInit, OnChanges { this.model.endTimeMs = new Date(this.endTimeStr).getTime(); if (this.ipSrcPort !== '') { this.model.ipSrcPort = +this.ipSrcPort; +} else { --- End diff -- They are different types, one is a string and one is a number. I believe we created the this.ipSrcPort and this.ipDstPort string variables for regex validation purposes. As such we can't just assign an empty string to a number type. The + operator converts a string to a number. ---
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578642#comment-16578642 ] ASF GitHub Bot commented on METRON-1732: Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209689930 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/mr/PcapJob.java --- @@ -307,8 +307,11 @@ public void setCompleteCheckInterval(long interval) { } return this; } -mrJob.submit(); -jobStatus.withState(State.SUBMITTED).withDescription("Job submitted").withJobId(mrJob.getJobID().toString()); +synchronized (this) { --- End diff -- Will do. This lock is about thread visibility as opposed to actual issues with concurrent modification. It may be that this lock is not need with getStatus being synchronized. I will double check and report back via modified code and/or code comment on this. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...
Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209689930 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/mr/PcapJob.java --- @@ -307,8 +307,11 @@ public void setCompleteCheckInterval(long interval) { } return this; } -mrJob.submit(); -jobStatus.withState(State.SUBMITTED).withDescription("Job submitted").withJobId(mrJob.getJobID().toString()); +synchronized (this) { --- End diff -- Will do. This lock is about thread visibility as opposed to actual issues with concurrent modification. It may be that this lock is not need with getStatus being synchronized. I will double check and report back via modified code and/or code comment on this. ---
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578627#comment-16578627 ] ASF GitHub Bot commented on METRON-1732: Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/1157 Good feedback @nickwallen, I'll make adjustments. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578626#comment-16578626 ] ASF GitHub Bot commented on METRON-1732: Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209687780 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java --- @@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() { LOG.warn("Unable to cleanup files in HDFS", e); } } +LOG.info("Done finalizing results"); return new PcapPages(outFiles); } - protected abstract void write(PcapResultsWriter resultsWriter, Configuration hadoopConfig, List data, Path outputPath) throws IOException; + /** + * Figure out how many threads to use in the thread pool. If it's a string and ends with "C", + * then strip the C and treat it as an integral multiple of the number of cores. If it's a + * string and does not end with a C, then treat it as a number in string form. + */ + private static int getNumThreads(String numThreads) { +String numThreadsStr = ((String) numThreads).trim().toUpperCase(); +if (numThreadsStr.endsWith("C")) { + Integer factor = Integer.parseInt(numThreadsStr.replace("C", "")); + return factor * Runtime.getRuntime().availableProcessors(); +} else { + return Integer.parseInt(numThreadsStr); +} + } + + protected List writeParallel(Configuration hadoopConfig, Map> toWrite, + int parallelism) throws IOException { +List outFiles = Collections.synchronizedList(new ArrayList<>()); +ForkJoinPool tp = new ForkJoinPool(parallelism); +try { + tp.submit(() -> { +toWrite.entrySet().parallelStream().forEach(e -> { --- End diff -- As I understand it, submit is effectively submitting the set of tasks for the parallel stream to execute within this threadpool, e.g. https://www.baeldung.com/java-8-parallel-streams-custom-threadpool. As a side note, the reason for a custom threadpool at all is so that this doesn't cause issues with other streams since the default in Java is to use a global context for this sort of thing. Liveness issues may arise when using the shared global context. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron issue #1157: METRON-1732: Fix job status liveness bug and parallelize...
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/1157 Good feedback @nickwallen, I'll make adjustments. ---
[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...
Github user mmiklavc commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209687780 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java --- @@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() { LOG.warn("Unable to cleanup files in HDFS", e); } } +LOG.info("Done finalizing results"); return new PcapPages(outFiles); } - protected abstract void write(PcapResultsWriter resultsWriter, Configuration hadoopConfig, List data, Path outputPath) throws IOException; + /** + * Figure out how many threads to use in the thread pool. If it's a string and ends with "C", + * then strip the C and treat it as an integral multiple of the number of cores. If it's a + * string and does not end with a C, then treat it as a number in string form. + */ + private static int getNumThreads(String numThreads) { +String numThreadsStr = ((String) numThreads).trim().toUpperCase(); +if (numThreadsStr.endsWith("C")) { + Integer factor = Integer.parseInt(numThreadsStr.replace("C", "")); + return factor * Runtime.getRuntime().availableProcessors(); +} else { + return Integer.parseInt(numThreadsStr); +} + } + + protected List writeParallel(Configuration hadoopConfig, Map> toWrite, + int parallelism) throws IOException { +List outFiles = Collections.synchronizedList(new ArrayList<>()); +ForkJoinPool tp = new ForkJoinPool(parallelism); +try { + tp.submit(() -> { +toWrite.entrySet().parallelStream().forEach(e -> { --- End diff -- As I understand it, submit is effectively submitting the set of tasks for the parallel stream to execute within this threadpool, e.g. https://www.baeldung.com/java-8-parallel-streams-custom-threadpool. As a side note, the reason for a custom threadpool at all is so that this doesn't cause issues with other streams since the default in Java is to use a global context for this sort of thing. Liveness issues may arise when using the shared global context. ---
[jira] [Commented] (METRON-1734) Src and Dst port filters are incorrect after changing to empty
[ https://issues.apache.org/jira/browse/METRON-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578615#comment-16578615 ] ASF GitHub Bot commented on METRON-1734: Github user justinleet commented on a diff in the pull request: https://github.com/apache/metron/pull/1159#discussion_r209683673 --- Diff: metron-interface/metron-alerts/src/app/pcap/pcap-filters/pcap-filters.component.ts --- @@ -63,9 +63,13 @@ export class PcapFiltersComponent implements OnInit, OnChanges { this.model.endTimeMs = new Date(this.endTimeStr).getTime(); if (this.ipSrcPort !== '') { this.model.ipSrcPort = +this.ipSrcPort; +} else { --- End diff -- This is probably a dumb question, but why do we have to specifically delete the value? Don't we treat empty string the same as missing? If that's the case, why isn't this just ``` this.model.ipSrcPort = +this.ipSrcPort; this.model.ipDstPort = +this.ipDstPort; ``` Sidenote, are the pluses in here even doing anything? Could it just be ``` this.model.ipSrcPort = this.ipSrcPort; this.model.ipDstPort = this.ipDstPort; ``` > Src and Dst port filters are incorrect after changing to empty > -- > > Key: METRON-1734 > URL: https://issues.apache.org/jira/browse/METRON-1734 > Project: Metron > Issue Type: Sub-task >Reporter: Ryan Merriman >Assignee: Ryan Merriman >Priority: Major > > When changing a port filter after a job has run, setting it to empty causes > the old value to be sent in the request. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron pull request #1159: METRON-1734: Src and Dst port filters are incorre...
Github user justinleet commented on a diff in the pull request: https://github.com/apache/metron/pull/1159#discussion_r209683673 --- Diff: metron-interface/metron-alerts/src/app/pcap/pcap-filters/pcap-filters.component.ts --- @@ -63,9 +63,13 @@ export class PcapFiltersComponent implements OnInit, OnChanges { this.model.endTimeMs = new Date(this.endTimeStr).getTime(); if (this.ipSrcPort !== '') { this.model.ipSrcPort = +this.ipSrcPort; +} else { --- End diff -- This is probably a dumb question, but why do we have to specifically delete the value? Don't we treat empty string the same as missing? If that's the case, why isn't this just ``` this.model.ipSrcPort = +this.ipSrcPort; this.model.ipDstPort = +this.ipDstPort; ``` Sidenote, are the pluses in here even doing anything? Could it just be ``` this.model.ipSrcPort = this.ipSrcPort; this.model.ipDstPort = this.ipDstPort; ``` ---
[jira] [Commented] (METRON-1735) Empty print status option causes NPE
[ https://issues.apache.org/jira/browse/METRON-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578600#comment-16578600 ] ASF GitHub Bot commented on METRON-1735: Github user justinleet commented on the issue: https://github.com/apache/metron/pull/1160 @merrimanr Could you bump Travis? Looks like a maven connection issue, rather than something to actually be fixed. > Empty print status option causes NPE > > > Key: METRON-1735 > URL: https://issues.apache.org/jira/browse/METRON-1735 > Project: Metron > Issue Type: Sub-task >Reporter: Ryan Merriman >Assignee: Ryan Merriman >Priority: Major > > REST does not set a print job status property causing a NPE in PcapJob > because the property is never added to the config. The PcapJob should > default to false. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron issue #1160: METRON-1735: Empty print status option causes NPE
Github user justinleet commented on the issue: https://github.com/apache/metron/pull/1160 @merrimanr Could you bump Travis? Looks like a maven connection issue, rather than something to actually be fixed. ---
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578569#comment-16578569 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209649851 --- Diff: metron-interface/metron-rest/README.md --- @@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through standard in and expects P Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. +Pcap query jobs have a finalization routine that writes their results out to HDFS in pages. There is a threadpool used for this finalization that can be configured to use a specified number of threads. +This setting is exposed as the Spring property `pcap.finalizer.threadpool.size` --- End diff -- Can we document the default value for this? > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209674410 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java --- @@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() { LOG.warn("Unable to cleanup files in HDFS", e); } } +LOG.info("Done finalizing results"); return new PcapPages(outFiles); } - protected abstract void write(PcapResultsWriter resultsWriter, Configuration hadoopConfig, List data, Path outputPath) throws IOException; + /** + * Figure out how many threads to use in the thread pool. If it's a string and ends with "C", + * then strip the C and treat it as an integral multiple of the number of cores. If it's a + * string and does not end with a C, then treat it as a number in string form. + */ + private static int getNumThreads(String numThreads) { +String numThreadsStr = ((String) numThreads).trim().toUpperCase(); +if (numThreadsStr.endsWith("C")) { + Integer factor = Integer.parseInt(numThreadsStr.replace("C", "")); + return factor * Runtime.getRuntime().availableProcessors(); +} else { + return Integer.parseInt(numThreadsStr); +} + } + + protected List writeParallel(Configuration hadoopConfig, Map> toWrite, + int parallelism) throws IOException { +List outFiles = Collections.synchronizedList(new ArrayList<>()); +ForkJoinPool tp = new ForkJoinPool(parallelism); +try { + tp.submit(() -> { +toWrite.entrySet().parallelStream().forEach(e -> { --- End diff -- Shouldn't we be calling `tp.submit` for each (path, data)? ---
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578570#comment-16578570 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209651293 --- Diff: metron-interface/metron-rest/README.md --- @@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through standard in and expects P Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. +Pcap query jobs have a finalization routine that writes their results out to HDFS in pages. There is a threadpool used for this finalization that can be configured to use a specified number of threads. +This setting is exposed as the Spring property `pcap.finalizer.threadpool.size` --- End diff -- Should we mention that 1C, 4C are valid values in addition to integers? Perhaps just copy the text you have in the Ambari description into the README. Good stuff. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209650724 --- Diff: metron-interface/metron-rest/README.md --- @@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through standard in and expects P Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. +Pcap query jobs have a finalization routine that writes their results out to HDFS in pages. There is a threadpool used for this finalization that can be configured to use a specified number of threads. +This setting is exposed as the Spring property `pcap.finalizer.threadpool.size` --- End diff -- Do you have any advice on when a user should increase/decrease this value? Are there errors I might see that would be resolved by increasing/decreasing this value? If you don't have a good understanding of this, then we don't need to worry about it. ---
[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209671313 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java --- @@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() { LOG.warn("Unable to cleanup files in HDFS", e); } } +LOG.info("Done finalizing results"); return new PcapPages(outFiles); } - protected abstract void write(PcapResultsWriter resultsWriter, Configuration hadoopConfig, List data, Path outputPath) throws IOException; + /** + * Figure out how many threads to use in the thread pool. If it's a string and ends with "C", + * then strip the C and treat it as an integral multiple of the number of cores. If it's a + * string and does not end with a C, then treat it as a number in string form. + */ + private static int getNumThreads(String numThreads) { +String numThreadsStr = ((String) numThreads).trim().toUpperCase(); +if (numThreadsStr.endsWith("C")) { + Integer factor = Integer.parseInt(numThreadsStr.replace("C", "")); + return factor * Runtime.getRuntime().availableProcessors(); +} else { + return Integer.parseInt(numThreadsStr); +} + } + + protected List writeParallel(Configuration hadoopConfig, Map> toWrite, + int parallelism) throws IOException { +List outFiles = Collections.synchronizedList(new ArrayList<>()); +ForkJoinPool tp = new ForkJoinPool(parallelism); +try { + tp.submit(() -> { +toWrite.entrySet().parallelStream().forEach(e -> { + try { +Path path = e.getKey(); +List data = e.getValue(); +if (data.size() > 0) { + write(getResultsWriter(), hadoopConfig, data, path); + outFiles.add(path); +} + } catch (IOException ioe) { +throw new RuntimeException("Failed to write results", ioe); --- End diff -- Can we add the path that failed to write to the exception message? ---
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578574#comment-16578574 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209665613 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/mr/PcapJob.java --- @@ -307,8 +307,11 @@ public void setCompleteCheckInterval(long interval) { } return this; } -mrJob.submit(); -jobStatus.withState(State.SUBMITTED).withDescription("Job submitted").withJobId(mrJob.getJobID().toString()); +synchronized (this) { --- End diff -- Can we add a comment about why we need the lock here? > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578571#comment-16578571 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209655011 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java --- @@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() { LOG.warn("Unable to cleanup files in HDFS", e); } } +LOG.info("Done finalizing results"); return new PcapPages(outFiles); } - protected abstract void write(PcapResultsWriter resultsWriter, Configuration hadoopConfig, List data, Path outputPath) throws IOException; + /** + * Figure out how many threads to use in the thread pool. If it's a string and ends with "C", + * then strip the C and treat it as an integral multiple of the number of cores. If it's a + * string and does not end with a C, then treat it as a number in string form. + */ + private static int getNumThreads(String numThreads) { +String numThreadsStr = ((String) numThreads).trim().toUpperCase(); +if (numThreadsStr.endsWith("C")) { + Integer factor = Integer.parseInt(numThreadsStr.replace("C", "")); --- End diff -- Should we add a catch block for when a user enters an invalid value? We should catch and provide a helpful exception message like "Invalid value for property 'pcap.finalizer.threadpool.size'; value='3CCC'". > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209665613 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/mr/PcapJob.java --- @@ -307,8 +307,11 @@ public void setCompleteCheckInterval(long interval) { } return this; } -mrJob.submit(); -jobStatus.withState(State.SUBMITTED).withDescription("Job submitted").withJobId(mrJob.getJobID().toString()); +synchronized (this) { --- End diff -- Can we add a comment about why we need the lock here? ---
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578572#comment-16578572 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209674410 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java --- @@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() { LOG.warn("Unable to cleanup files in HDFS", e); } } +LOG.info("Done finalizing results"); return new PcapPages(outFiles); } - protected abstract void write(PcapResultsWriter resultsWriter, Configuration hadoopConfig, List data, Path outputPath) throws IOException; + /** + * Figure out how many threads to use in the thread pool. If it's a string and ends with "C", + * then strip the C and treat it as an integral multiple of the number of cores. If it's a + * string and does not end with a C, then treat it as a number in string form. + */ + private static int getNumThreads(String numThreads) { +String numThreadsStr = ((String) numThreads).trim().toUpperCase(); +if (numThreadsStr.endsWith("C")) { + Integer factor = Integer.parseInt(numThreadsStr.replace("C", "")); + return factor * Runtime.getRuntime().availableProcessors(); +} else { + return Integer.parseInt(numThreadsStr); +} + } + + protected List writeParallel(Configuration hadoopConfig, Map> toWrite, + int parallelism) throws IOException { +List outFiles = Collections.synchronizedList(new ArrayList<>()); +ForkJoinPool tp = new ForkJoinPool(parallelism); +try { + tp.submit(() -> { +toWrite.entrySet().parallelStream().forEach(e -> { --- End diff -- Shouldn't we be calling `tp.submit` for each (path, data)? > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578573#comment-16578573 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209671313 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java --- @@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() { LOG.warn("Unable to cleanup files in HDFS", e); } } +LOG.info("Done finalizing results"); return new PcapPages(outFiles); } - protected abstract void write(PcapResultsWriter resultsWriter, Configuration hadoopConfig, List data, Path outputPath) throws IOException; + /** + * Figure out how many threads to use in the thread pool. If it's a string and ends with "C", + * then strip the C and treat it as an integral multiple of the number of cores. If it's a + * string and does not end with a C, then treat it as a number in string form. + */ + private static int getNumThreads(String numThreads) { +String numThreadsStr = ((String) numThreads).trim().toUpperCase(); +if (numThreadsStr.endsWith("C")) { + Integer factor = Integer.parseInt(numThreadsStr.replace("C", "")); + return factor * Runtime.getRuntime().availableProcessors(); +} else { + return Integer.parseInt(numThreadsStr); +} + } + + protected List writeParallel(Configuration hadoopConfig, Map> toWrite, + int parallelism) throws IOException { +List outFiles = Collections.synchronizedList(new ArrayList<>()); +ForkJoinPool tp = new ForkJoinPool(parallelism); +try { + tp.submit(() -> { +toWrite.entrySet().parallelStream().forEach(e -> { + try { +Path path = e.getKey(); +List data = e.getValue(); +if (data.size() > 0) { + write(getResultsWriter(), hadoopConfig, data, path); + outFiles.add(path); +} + } catch (IOException ioe) { +throw new RuntimeException("Failed to write results", ioe); --- End diff -- Can we add the path that failed to write to the exception message? > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578568#comment-16578568 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209650724 --- Diff: metron-interface/metron-rest/README.md --- @@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through standard in and expects P Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. +Pcap query jobs have a finalization routine that writes their results out to HDFS in pages. There is a threadpool used for this finalization that can be configured to use a specified number of threads. +This setting is exposed as the Spring property `pcap.finalizer.threadpool.size` --- End diff -- Do you have any advice on when a user should increase/decrease this value? Are there errors I might see that would be resolved by increasing/decreasing this value? If you don't have a good understanding of this, then we don't need to worry about it. > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209655011 --- Diff: metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java --- @@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() { LOG.warn("Unable to cleanup files in HDFS", e); } } +LOG.info("Done finalizing results"); return new PcapPages(outFiles); } - protected abstract void write(PcapResultsWriter resultsWriter, Configuration hadoopConfig, List data, Path outputPath) throws IOException; + /** + * Figure out how many threads to use in the thread pool. If it's a string and ends with "C", + * then strip the C and treat it as an integral multiple of the number of cores. If it's a + * string and does not end with a C, then treat it as a number in string form. + */ + private static int getNumThreads(String numThreads) { +String numThreadsStr = ((String) numThreads).trim().toUpperCase(); +if (numThreadsStr.endsWith("C")) { + Integer factor = Integer.parseInt(numThreadsStr.replace("C", "")); --- End diff -- Should we add a catch block for when a user enters an invalid value? We should catch and provide a helpful exception message like "Invalid value for property 'pcap.finalizer.threadpool.size'; value='3CCC'". ---
[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209651293 --- Diff: metron-interface/metron-rest/README.md --- @@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through standard in and expects P Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. +Pcap query jobs have a finalization routine that writes their results out to HDFS in pages. There is a threadpool used for this finalization that can be configured to use a specified number of threads. +This setting is exposed as the Spring property `pcap.finalizer.threadpool.size` --- End diff -- Should we mention that 1C, 4C are valid values in addition to integers? Perhaps just copy the text you have in the Ambari description into the README. Good stuff. ---
[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209649851 --- Diff: metron-interface/metron-rest/README.md --- @@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through standard in and expects P Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. +Pcap query jobs have a finalization routine that writes their results out to HDFS in pages. There is a threadpool used for this finalization that can be configured to use a specified number of threads. +This setting is exposed as the Spring property `pcap.finalizer.threadpool.size` --- End diff -- Can we document the default value for this? ---
[jira] [Commented] (METRON-1733) PCAP UI - PCAP queries don't work on Safari
[ https://issues.apache.org/jira/browse/METRON-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578499#comment-16578499 ] ASF GitHub Bot commented on METRON-1733: Github user ruffle1986 commented on the issue: https://github.com/apache/metron/pull/1158 I think it's just enough the trigger a travis rebuild to make it pass. > PCAP UI - PCAP queries don't work on Safari > --- > > Key: METRON-1733 > URL: https://issues.apache.org/jira/browse/METRON-1733 > Project: Metron > Issue Type: Sub-task >Reporter: Shane Ardell >Assignee: Shane Ardell >Priority: Major > > On Safari, PCAP queries fail with a 500 internal server error. No issues seen > with Chrome or Firefox. After digging into the search request, it looks like > the values for the startTime and endTime are 'NaN'. It looks like Safari > cannot parse the format of the time we are passing to the getDate() funciton. > For more on this issue: > https://stackoverflow.com/questions/21883699/safari-javascript-date-nan-issue--mm-dd-hhmmss -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron issue #1158: METRON-1733: PCAP UI - PCAP queries don't work on Safari
Github user ruffle1986 commented on the issue: https://github.com/apache/metron/pull/1158 I think it's just enough the trigger a travis rebuild to make it pass. ---
[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing
[ https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578426#comment-16578426 ] ASF GitHub Bot commented on METRON-1732: Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209649720 --- Diff: metron-interface/metron-rest/README.md --- @@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through standard in and expects P Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. +Pcap query jobs have a finalization routine that writes their results out to HDFS in pages. There is a threadpool used for this finalization that can be configured to use a specified number of threads. +This setting is exposed as the Spring property `pcap.finalizer.threadpool.size` --- End diff -- Can we document the default value for this? > Fix job status liveness bug and parallelize finalizer file writing > -- > > Key: METRON-1732 > URL: https://issues.apache.org/jira/browse/METRON-1732 > Project: Metron > Issue Type: Sub-task >Reporter: Michael Miklavcic >Assignee: Michael Miklavcic >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...
Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1157#discussion_r209649720 --- Diff: metron-interface/metron-rest/README.md --- @@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through standard in and expects P Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. +Pcap query jobs have a finalization routine that writes their results out to HDFS in pages. There is a threadpool used for this finalization that can be configured to use a specified number of threads. +This setting is exposed as the Spring property `pcap.finalizer.threadpool.size` --- End diff -- Can we document the default value for this? ---
[jira] [Commented] (METRON-1707) Port Profiler to Spark
[ https://issues.apache.org/jira/browse/METRON-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578366#comment-16578366 ] ASF GitHub Bot commented on METRON-1707: Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1150 > @simonellistonball: Do we have to use groupByKey in the spark implementation, is it not possible to use reduceByKey to build the profiles... I had in the back of my mind that groupByKey might not be the most performance option, but I just didn't focus any energy on that for the first pass. I will take a look and see if we can't use your advice. Thanks for the pointer @simonellistonball ! > Port Profiler to Spark > -- > > Key: METRON-1707 > URL: https://issues.apache.org/jira/browse/METRON-1707 > Project: Metron > Issue Type: Sub-task >Reporter: Nick Allen >Assignee: Nick Allen >Priority: Major > > Create a port of the Profiler that runs in Spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron issue #1150: METRON-1707 Port Profiler to Spark [Feature Branch]
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1150 > @simonellistonball: Do we have to use groupByKey in the spark implementation, is it not possible to use reduceByKey to build the profiles... I had in the back of my mind that groupByKey might not be the most performance option, but I just didn't focus any energy on that for the first pass. I will take a look and see if we can't use your advice. Thanks for the pointer @simonellistonball ! ---
[jira] [Commented] (METRON-1707) Port Profiler to Spark
[ https://issues.apache.org/jira/browse/METRON-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578343#comment-16578343 ] ASF GitHub Bot commented on METRON-1707: Github user simonellistonball commented on the issue: https://github.com/apache/metron/pull/1150 Do we have to use groupByKey in the spark implementation, is it not possible to use reduceByKey to build the profiles, since profilers are by definition reducible. I've seen groupByKey cause performance problems (see https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html for a good discussion on this). > Port Profiler to Spark > -- > > Key: METRON-1707 > URL: https://issues.apache.org/jira/browse/METRON-1707 > Project: Metron > Issue Type: Sub-task >Reporter: Nick Allen >Assignee: Nick Allen >Priority: Major > > Create a port of the Profiler that runs in Spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] metron issue #1150: METRON-1707 Port Profiler to Spark [Feature Branch]
Github user simonellistonball commented on the issue: https://github.com/apache/metron/pull/1150 Do we have to use groupByKey in the spark implementation, is it not possible to use reduceByKey to build the profiles, since profilers are by definition reducible. I've seen groupByKey cause performance problems (see https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html for a good discussion on this). ---