[jira] [Commented] (METRON-1707) Port Profiler to Spark

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578957#comment-16578957
 ] 

ASF GitHub Bot commented on METRON-1707:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1150#discussion_r209771180
  
--- Diff: 
metron-analytics/metron-profiler-spark/src/main/java/org/apache/metron/profiler/spark/function/ProfileBuilderFunction.java
 ---
@@ -0,0 +1,107 @@
+/*
+ *
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ */
+package org.apache.metron.profiler.spark.function;
+
+import org.apache.metron.profiler.DefaultMessageDistributor;
+import org.apache.metron.profiler.MessageDistributor;
+import org.apache.metron.profiler.MessageRoute;
+import org.apache.metron.profiler.ProfileMeasurement;
+import org.apache.metron.profiler.spark.ProfileMeasurementAdapter;
+import org.apache.metron.stellar.dsl.Context;
+import org.apache.spark.api.java.function.MapGroupsFunction;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.lang.invoke.MethodHandles;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+import static java.util.Comparator.comparing;
+import static 
org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION;
+import static 
org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION_UNITS;
+
+/**
+ * The function responsible for building profiles in Spark.
+ */
+public class ProfileBuilderFunction implements MapGroupsFunction  {
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  private long periodDurationMillis;
+  private Map globals;
+
+  public ProfileBuilderFunction(Properties properties, Map 
globals) {
+TimeUnit periodDurationUnits = 
TimeUnit.valueOf(PERIOD_DURATION_UNITS.get(properties, String.class));
+int periodDuration = PERIOD_DURATION.get(properties, Integer.class);
+this.periodDurationMillis = 
periodDurationUnits.toMillis(periodDuration);
+this.globals = globals;
+  }
+
+  /**
+   * Build a profile from a set of message routes.
+   *
+   * This assumes that all of the necessary routes have been provided
+   *
+   * @param group The group identifier.
+   * @param iterator The message routes.
+   * @return
+   */
+  @Override
+  public ProfileMeasurementAdapter call(String group, 
Iterator iterator) throws Exception {
+// create the distributor; some settings are unnecessary because it is 
cleaned-up immediately after processing the batch
+int maxRoutes = Integer.MAX_VALUE;
+long profileTTLMillis = Long.MAX_VALUE;
+MessageDistributor distributor = new 
DefaultMessageDistributor(periodDurationMillis, profileTTLMillis, maxRoutes);
+Context context = TaskUtils.getContext(globals);
+
+// sort the messages/routes
+List routes = toStream(iterator)
+.sorted(comparing(rt -> rt.getTimestamp()))
+.collect(Collectors.toList());
+LOG.debug("Building a profile for group '{}' from {} message(s)", 
group, routes.size());
+
+// apply each message/route to build the profile
+for(MessageRoute route: routes) {
+  distributor.distribute(route, context);
+}
--- End diff --

Or maybe we not need to apply the timestamps in order?  There are no strict 
guarantees of ordering when the Profiler runs in Storm really. Hmm.


> Port Profiler to Spark
> --
>
> Key: METRON-1707
> URL: https://issues.apache.org/jira/browse/METRON-1707
> Project: Metron
>  

[GitHub] metron pull request #1150: METRON-1707 Port Profiler to Spark [Feature Branc...

2018-08-13 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1150#discussion_r209771180
  
--- Diff: 
metron-analytics/metron-profiler-spark/src/main/java/org/apache/metron/profiler/spark/function/ProfileBuilderFunction.java
 ---
@@ -0,0 +1,107 @@
+/*
+ *
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ */
+package org.apache.metron.profiler.spark.function;
+
+import org.apache.metron.profiler.DefaultMessageDistributor;
+import org.apache.metron.profiler.MessageDistributor;
+import org.apache.metron.profiler.MessageRoute;
+import org.apache.metron.profiler.ProfileMeasurement;
+import org.apache.metron.profiler.spark.ProfileMeasurementAdapter;
+import org.apache.metron.stellar.dsl.Context;
+import org.apache.spark.api.java.function.MapGroupsFunction;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.lang.invoke.MethodHandles;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+import static java.util.Comparator.comparing;
+import static 
org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION;
+import static 
org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION_UNITS;
+
+/**
+ * The function responsible for building profiles in Spark.
+ */
+public class ProfileBuilderFunction implements MapGroupsFunction  {
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  private long periodDurationMillis;
+  private Map globals;
+
+  public ProfileBuilderFunction(Properties properties, Map 
globals) {
+TimeUnit periodDurationUnits = 
TimeUnit.valueOf(PERIOD_DURATION_UNITS.get(properties, String.class));
+int periodDuration = PERIOD_DURATION.get(properties, Integer.class);
+this.periodDurationMillis = 
periodDurationUnits.toMillis(periodDuration);
+this.globals = globals;
+  }
+
+  /**
+   * Build a profile from a set of message routes.
+   *
+   * This assumes that all of the necessary routes have been provided
+   *
+   * @param group The group identifier.
+   * @param iterator The message routes.
+   * @return
+   */
+  @Override
+  public ProfileMeasurementAdapter call(String group, 
Iterator iterator) throws Exception {
+// create the distributor; some settings are unnecessary because it is 
cleaned-up immediately after processing the batch
+int maxRoutes = Integer.MAX_VALUE;
+long profileTTLMillis = Long.MAX_VALUE;
+MessageDistributor distributor = new 
DefaultMessageDistributor(periodDurationMillis, profileTTLMillis, maxRoutes);
+Context context = TaskUtils.getContext(globals);
+
+// sort the messages/routes
+List routes = toStream(iterator)
+.sorted(comparing(rt -> rt.getTimestamp()))
+.collect(Collectors.toList());
+LOG.debug("Building a profile for group '{}' from {} message(s)", 
group, routes.size());
+
+// apply each message/route to build the profile
+for(MessageRoute route: routes) {
+  distributor.distribute(route, context);
+}
--- End diff --

Or maybe we not need to apply the timestamps in order?  There are no strict 
guarantees of ordering when the Profiler runs in Storm really. Hmm.


---


[jira] [Commented] (METRON-1735) Empty print status option causes NPE

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578946#comment-16578946
 ] 

ASF GitHub Bot commented on METRON-1735:


Github user merrimanr commented on the issue:

https://github.com/apache/metron/pull/1160
  
@justinleet it was something that needed to be fixed and should be good now.


> Empty print status option causes NPE
> 
>
> Key: METRON-1735
> URL: https://issues.apache.org/jira/browse/METRON-1735
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Ryan Merriman
>Assignee: Ryan Merriman
>Priority: Major
>
> REST does not set a print job status property causing a NPE in PcapJob 
> because the property is never added to the config.  The PcapJob should 
> default to false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron issue #1160: METRON-1735: Empty print status option causes NPE

2018-08-13 Thread merrimanr
Github user merrimanr commented on the issue:

https://github.com/apache/metron/pull/1160
  
@justinleet it was something that needed to be fixed and should be good now.


---


[GitHub] metron pull request #1150: METRON-1707 Port Profiler to Spark [Feature Branc...

2018-08-13 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1150#discussion_r209764724
  
--- Diff: 
metron-analytics/metron-profiler-spark/src/main/java/org/apache/metron/profiler/spark/function/ProfileBuilderFunction.java
 ---
@@ -0,0 +1,107 @@
+/*
+ *
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ */
+package org.apache.metron.profiler.spark.function;
+
+import org.apache.metron.profiler.DefaultMessageDistributor;
+import org.apache.metron.profiler.MessageDistributor;
+import org.apache.metron.profiler.MessageRoute;
+import org.apache.metron.profiler.ProfileMeasurement;
+import org.apache.metron.profiler.spark.ProfileMeasurementAdapter;
+import org.apache.metron.stellar.dsl.Context;
+import org.apache.spark.api.java.function.MapGroupsFunction;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.lang.invoke.MethodHandles;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+import static java.util.Comparator.comparing;
+import static 
org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION;
+import static 
org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION_UNITS;
+
+/**
+ * The function responsible for building profiles in Spark.
+ */
+public class ProfileBuilderFunction implements MapGroupsFunction  {
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  private long periodDurationMillis;
+  private Map globals;
+
+  public ProfileBuilderFunction(Properties properties, Map 
globals) {
+TimeUnit periodDurationUnits = 
TimeUnit.valueOf(PERIOD_DURATION_UNITS.get(properties, String.class));
+int periodDuration = PERIOD_DURATION.get(properties, Integer.class);
+this.periodDurationMillis = 
periodDurationUnits.toMillis(periodDuration);
+this.globals = globals;
+  }
+
+  /**
+   * Build a profile from a set of message routes.
+   *
+   * This assumes that all of the necessary routes have been provided
+   *
+   * @param group The group identifier.
+   * @param iterator The message routes.
+   * @return
+   */
+  @Override
+  public ProfileMeasurementAdapter call(String group, 
Iterator iterator) throws Exception {
+// create the distributor; some settings are unnecessary because it is 
cleaned-up immediately after processing the batch
+int maxRoutes = Integer.MAX_VALUE;
+long profileTTLMillis = Long.MAX_VALUE;
+MessageDistributor distributor = new 
DefaultMessageDistributor(periodDurationMillis, profileTTLMillis, maxRoutes);
+Context context = TaskUtils.getContext(globals);
+
+// sort the messages/routes
+List routes = toStream(iterator)
+.sorted(comparing(rt -> rt.getTimestamp()))
+.collect(Collectors.toList());
+LOG.debug("Building a profile for group '{}' from {} message(s)", 
group, routes.size());
+
+// apply each message/route to build the profile
+for(MessageRoute route: routes) {
+  distributor.distribute(route, context);
+}
--- End diff --

> @simonellistonball: Do we have to use groupByKey in the spark 
implementation, is it not possible to use reduceByKey to build the profiles...

What we do now is group by (profile, entity, period) to aggregate all of 
the messages needed to produce a measurement for any given profile period.  
Then those messages are sorted by timestamp and applied to the profile in that 
order.

I didn't see an easy way to use `reduceByKey` and ensure that the messages 
are applied to the profile in timestamp order.  Can you think of an alternative 
that maintains the ordering?




[jira] [Commented] (METRON-1707) Port Profiler to Spark

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578927#comment-16578927
 ] 

ASF GitHub Bot commented on METRON-1707:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1150#discussion_r209764724
  
--- Diff: 
metron-analytics/metron-profiler-spark/src/main/java/org/apache/metron/profiler/spark/function/ProfileBuilderFunction.java
 ---
@@ -0,0 +1,107 @@
+/*
+ *
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ */
+package org.apache.metron.profiler.spark.function;
+
+import org.apache.metron.profiler.DefaultMessageDistributor;
+import org.apache.metron.profiler.MessageDistributor;
+import org.apache.metron.profiler.MessageRoute;
+import org.apache.metron.profiler.ProfileMeasurement;
+import org.apache.metron.profiler.spark.ProfileMeasurementAdapter;
+import org.apache.metron.stellar.dsl.Context;
+import org.apache.spark.api.java.function.MapGroupsFunction;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.lang.invoke.MethodHandles;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.concurrent.TimeUnit;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
+import java.util.stream.StreamSupport;
+
+import static java.util.Comparator.comparing;
+import static 
org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION;
+import static 
org.apache.metron.profiler.spark.BatchProfilerConfig.PERIOD_DURATION_UNITS;
+
+/**
+ * The function responsible for building profiles in Spark.
+ */
+public class ProfileBuilderFunction implements MapGroupsFunction  {
+
+  protected static final Logger LOG = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  private long periodDurationMillis;
+  private Map globals;
+
+  public ProfileBuilderFunction(Properties properties, Map 
globals) {
+TimeUnit periodDurationUnits = 
TimeUnit.valueOf(PERIOD_DURATION_UNITS.get(properties, String.class));
+int periodDuration = PERIOD_DURATION.get(properties, Integer.class);
+this.periodDurationMillis = 
periodDurationUnits.toMillis(periodDuration);
+this.globals = globals;
+  }
+
+  /**
+   * Build a profile from a set of message routes.
+   *
+   * This assumes that all of the necessary routes have been provided
+   *
+   * @param group The group identifier.
+   * @param iterator The message routes.
+   * @return
+   */
+  @Override
+  public ProfileMeasurementAdapter call(String group, 
Iterator iterator) throws Exception {
+// create the distributor; some settings are unnecessary because it is 
cleaned-up immediately after processing the batch
+int maxRoutes = Integer.MAX_VALUE;
+long profileTTLMillis = Long.MAX_VALUE;
+MessageDistributor distributor = new 
DefaultMessageDistributor(periodDurationMillis, profileTTLMillis, maxRoutes);
+Context context = TaskUtils.getContext(globals);
+
+// sort the messages/routes
+List routes = toStream(iterator)
+.sorted(comparing(rt -> rt.getTimestamp()))
+.collect(Collectors.toList());
+LOG.debug("Building a profile for group '{}' from {} message(s)", 
group, routes.size());
+
+// apply each message/route to build the profile
+for(MessageRoute route: routes) {
+  distributor.distribute(route, context);
+}
--- End diff --

> @simonellistonball: Do we have to use groupByKey in the spark 
implementation, is it not possible to use reduceByKey to build the profiles...

What we do now is group by (profile, entity, period) to aggregate all of 
the messages needed to produce a measurement for any given profile period.  
Then those messages are sorted by timestamp 

[jira] [Created] (METRON-1737) Document Job cleanup

2018-08-13 Thread Ryan Merriman (JIRA)
Ryan Merriman created METRON-1737:
-

 Summary: Document Job cleanup
 Key: METRON-1737
 URL: https://issues.apache.org/jira/browse/METRON-1737
 Project: Metron
  Issue Type: Sub-task
Reporter: Ryan Merriman


Pcap query results are written to HDFS.  Overtime more HDFS file space will be 
used as queries are run.  There is currently no automated cleanup feature so we 
need to document how to do this in case a user needs to do it manually or with 
a script.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1733) PCAP UI - PCAP queries don't work on Safari

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578913#comment-16578913
 ] 

ASF GitHub Bot commented on METRON-1733:


Github user merrimanr commented on the issue:

https://github.com/apache/metron/pull/1158
  
Can you close this @sardell?


> PCAP UI - PCAP queries don't work on Safari
> ---
>
> Key: METRON-1733
> URL: https://issues.apache.org/jira/browse/METRON-1733
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Shane Ardell
>Assignee: Shane Ardell
>Priority: Major
>
> On Safari, PCAP queries fail with a 500 internal server error. No issues seen 
> with Chrome or Firefox. After digging into the search request, it looks like 
> the values for the startTime and endTime are 'NaN'. It looks like Safari 
> cannot parse the format of the time we are passing to the getDate() funciton. 
> For more on this issue:
> https://stackoverflow.com/questions/21883699/safari-javascript-date-nan-issue--mm-dd-hhmmss



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron issue #1158: METRON-1733: PCAP UI - PCAP queries don't work on Safari

2018-08-13 Thread merrimanr
Github user merrimanr commented on the issue:

https://github.com/apache/metron/pull/1158
  
Can you close this @sardell?


---


[jira] [Commented] (METRON-1733) PCAP UI - PCAP queries don't work on Safari

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578910#comment-16578910
 ] 

ASF GitHub Bot commented on METRON-1733:


Github user merrimanr commented on the issue:

https://github.com/apache/metron/pull/1158
  
I verified this in full dev.  +1


> PCAP UI - PCAP queries don't work on Safari
> ---
>
> Key: METRON-1733
> URL: https://issues.apache.org/jira/browse/METRON-1733
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Shane Ardell
>Assignee: Shane Ardell
>Priority: Major
>
> On Safari, PCAP queries fail with a 500 internal server error. No issues seen 
> with Chrome or Firefox. After digging into the search request, it looks like 
> the values for the startTime and endTime are 'NaN'. It looks like Safari 
> cannot parse the format of the time we are passing to the getDate() funciton. 
> For more on this issue:
> https://stackoverflow.com/questions/21883699/safari-javascript-date-nan-issue--mm-dd-hhmmss



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron issue #1158: METRON-1733: PCAP UI - PCAP queries don't work on Safari

2018-08-13 Thread merrimanr
Github user merrimanr commented on the issue:

https://github.com/apache/metron/pull/1158
  
I verified this in full dev.  +1


---


[jira] [Commented] (METRON-1734) Src and Dst port filters are incorrect after changing to empty

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578884#comment-16578884
 ] 

ASF GitHub Bot commented on METRON-1734:


Github user merrimanr closed the pull request at:

https://github.com/apache/metron/pull/1159


> Src and Dst port filters are incorrect after changing to empty
> --
>
> Key: METRON-1734
> URL: https://issues.apache.org/jira/browse/METRON-1734
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Ryan Merriman
>Assignee: Ryan Merriman
>Priority: Major
>
> When changing a port filter after a job has run, setting it to empty causes 
> the old value to be sent in the request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron pull request #1159: METRON-1734: Src and Dst port filters are incorre...

2018-08-13 Thread merrimanr
Github user merrimanr closed the pull request at:

https://github.com/apache/metron/pull/1159


---


[jira] [Commented] (METRON-1736) Enhance Batch Profiler Integration Test

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578868#comment-16578868
 ] 

ASF GitHub Bot commented on METRON-1736:


GitHub user nickwallen opened a pull request:

https://github.com/apache/metron/pull/1162

METRON-1736 Enhance Batch Profiler Integration Test

The integration test for the Batch Profiler should use the Profiler Client 
API and `PROFILE_GET` to validate the values that are produced.  This is more 
effective end-to-end validation that the Batch Profiler is working.

This is a pull request against the `METRON-1699-create-batch-profiler` 
feature branch.

This is dependent on #1161 .  By filtering on the last commit, this PR can 
be reviewed before the others are reviewed and merged.

## Pull Request Checklist

- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nickwallen/metron METRON-1736

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/1162.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1162


commit 6ce28594659928a8c87c57e22e1ab00d798d
Author: Nick Allen 
Date:   2018-07-10T14:08:48Z

METRON-1703 Make Core Profiler Components Serializable

commit 0051359cbb277881de896526345bb4fce1d5139c
Author: Nick Allen 
Date:   2018-07-10T19:42:19Z

METRON-1704 Message Timestamp Logic Should be Shared

commit 2413726bdf96221ec775a9c8de524e3ec92148b7
Author: Nick Allen 
Date:   2018-07-27T17:20:15Z

METRON-1706: HbaseClient.mutate should return the number of mutations

commit 21980ca764b98ddb96c4c8732e0ef7a6c5ea2c56
Author: Nick Allen 
Date:   2018-07-24T18:02:36Z

METRON-1705 Create ProfilePeriod Using Period ID

commit be15126419a2862864a7acd67349281b086f52cf
Author: Nick Allen 
Date:   2018-07-31T19:26:20Z

METRON-1707 Port Profiler to Spark

commit c410e412c50f4510f8674cd4fa5d4481f28a4a13
Author: Nick Allen 
Date:   2018-08-09T15:54:41Z

No need to handle packaging yet. That will come in a future PR

commit f1a8b49f99029e8d801dc62cfa9c2a0827a46cd8
Author: Nick Allen 
Date:   2018-08-13T13:25:56Z

Renamed execute() to run()

commit 7f585e0afaa76386934f785407eecc5d65175d8c
Author: Nick Allen 
Date:   2018-08-13T14:52:17Z

Reducing the size of the telemetry for the integration test. No need to 
have so much data

commit 6bce4797b33bee6c161b81188f94b4fa3e931a53
Author: Nick Allen 
Date:   2018-08-13T19:14:48Z

Only create an Hbase connection if there are measurements to write

commit ca038f9e4e65212158970046ce95c681a2ebda1b
Author: Nick Allen 
Date:   2018-08-13T20:24:58Z

METRON-1736 Enahnce Batch Profiler Integration Test




> Enhance Batch Profiler Integration Test
> ---
>
> Key: METRON-1736
> URL: https://issues.apache.org/jira/browse/METRON-1736
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> The integration test for the Batch Profiler should use the Profiler Client 
> API and `PROFILE_GET` to validate the values that are produced.  This is more 
> effective end-to-end validation that the Batch Profiler is working.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron pull request #1162: METRON-1736 Enhance Batch Profiler Integration Te...

2018-08-13 Thread nickwallen
GitHub user nickwallen opened a pull request:

https://github.com/apache/metron/pull/1162

METRON-1736 Enhance Batch Profiler Integration Test

The integration test for the Batch Profiler should use the Profiler Client 
API and `PROFILE_GET` to validate the values that are produced.  This is more 
effective end-to-end validation that the Batch Profiler is working.

This is a pull request against the `METRON-1699-create-batch-profiler` 
feature branch.

This is dependent on #1161 .  By filtering on the last commit, this PR can 
be reviewed before the others are reviewed and merged.

## Pull Request Checklist

- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nickwallen/metron METRON-1736

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/1162.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1162


commit 6ce28594659928a8c87c57e22e1ab00d798d
Author: Nick Allen 
Date:   2018-07-10T14:08:48Z

METRON-1703 Make Core Profiler Components Serializable

commit 0051359cbb277881de896526345bb4fce1d5139c
Author: Nick Allen 
Date:   2018-07-10T19:42:19Z

METRON-1704 Message Timestamp Logic Should be Shared

commit 2413726bdf96221ec775a9c8de524e3ec92148b7
Author: Nick Allen 
Date:   2018-07-27T17:20:15Z

METRON-1706: HbaseClient.mutate should return the number of mutations

commit 21980ca764b98ddb96c4c8732e0ef7a6c5ea2c56
Author: Nick Allen 
Date:   2018-07-24T18:02:36Z

METRON-1705 Create ProfilePeriod Using Period ID

commit be15126419a2862864a7acd67349281b086f52cf
Author: Nick Allen 
Date:   2018-07-31T19:26:20Z

METRON-1707 Port Profiler to Spark

commit c410e412c50f4510f8674cd4fa5d4481f28a4a13
Author: Nick Allen 
Date:   2018-08-09T15:54:41Z

No need to handle packaging yet. That will come in a future PR

commit f1a8b49f99029e8d801dc62cfa9c2a0827a46cd8
Author: Nick Allen 
Date:   2018-08-13T13:25:56Z

Renamed execute() to run()

commit 7f585e0afaa76386934f785407eecc5d65175d8c
Author: Nick Allen 
Date:   2018-08-13T14:52:17Z

Reducing the size of the telemetry for the integration test. No need to 
have so much data

commit 6bce4797b33bee6c161b81188f94b4fa3e931a53
Author: Nick Allen 
Date:   2018-08-13T19:14:48Z

Only create an Hbase connection if there are measurements to write

commit ca038f9e4e65212158970046ce95c681a2ebda1b
Author: Nick Allen 
Date:   2018-08-13T20:24:58Z

METRON-1736 Enahnce Batch Profiler Integration Test




---


[jira] [Created] (METRON-1736) Enhance Batch Profiler Integration Test

2018-08-13 Thread Nick Allen (JIRA)
Nick Allen created METRON-1736:
--

 Summary: Enhance Batch Profiler Integration Test
 Key: METRON-1736
 URL: https://issues.apache.org/jira/browse/METRON-1736
 Project: Metron
  Issue Type: Sub-task
Reporter: Nick Allen
Assignee: Nick Allen


The integration test for the Batch Profiler should use the Profiler Client API 
and `PROFILE_GET` to validate the values that are produced.  This is more 
effective end-to-end validation that the Batch Profiler is working.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1708) Run the Batch Profiler in Spark

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578777#comment-16578777
 ] 

ASF GitHub Bot commented on METRON-1708:


GitHub user nickwallen opened a pull request:

https://github.com/apache/metron/pull/1161

METRON-1708 Run the Batch Profiler in Spark

This adds the ability to run the Batch Profiler from the command line.  
This also packages up the Batch Profiler into a tarball.

This is a pull request against the `METRON-1699-create-batch-profiler` 
feature branch.

This is dependent on #1145 #1146 #1148 #1147 #1150 .  By filtering on the 
last commit, this PR can be reviewed before the others are reviewed and merged.

## Testing


1. Start-up the development environment.  Allow Metron to run for a bit so 
that a fair amount of telemetry is archived in HDFS.

1. Stop all Metron services.

1. Install Spark2 using Ambari.
* Use Add Service > Spark2, then follow prompts.

1. Deploy the Batch Profiler to the development environment.

From the host machine; outside the development VM, run the following.
```
cd metron-deployment/development/centos6
vagrant scp 
../../../metron-analytics/metron-profiler-spark/target/metron-profiler-spark-0.5.1-archive.tar.gz
 /tmp
```
Then from the development VM, run the following.
```
source /etc/default/metron
cd $METRON_HOME
tar -xvf /tmp/metron-profiler-spark-0.5.1-archive.tar.gz
```

1. Create a profile by editing 
`$METRON_HOME/config/zookeeper/profiler.json` as follows.

```
[root@node1 0.5.1]# cat $METRON_HOME/config/zookeeper/profiler.json
{
  "profiles": [
{
  "profile": "hello-world",
  "foreach": "'global'",
  "init":{ "count": "0" },
  "update":  { "count": "count + 1" },
  "result":  "count"
}
  ],
  "timestampField": "timestamp"
}
```

1. Count the number of messages in the 'indexing' topic.  This should not 
be changing.
```
[root@node1 ~]# /usr/hdp/current/kafka-broker/bin/kafka-run-class.sh 
kafka.tools.GetOffsetShell \
   --broker-list $BROKERLIST \
   --topic indexing \
   --time -1

indexing:0:8130
```
In this case there are 8,131 messages.

1. Delete any previously written profile measurements from HBase.
```
[root@node1 ~]# hbase shell
...

hbase(main):001:0> truncate 'profiler'
Truncating 'profiler' table (it may take a while):
 - Disabling table...
 - Truncating table...
0 row(s) in 4.1070 seconds
```

1. Confirm that all of the messages were successfully indexed in HDFS.

```
[root@node1 ~]# hdfs dfs -cat /apps/metron/indexing/indexed/*/* | wc -l
8130
```
 * Remember that we found 8,130 in the indexing topic previously.  This 
shows that all of them were indexed in HDFS successfully.

1. Alter the `$METRON_HOME/config/batch-profiler.properties` as follows.
```
[root@node1 0.5.1]# cat config/batch-profiler.properties
spark.master=local
spark.app.name=Batch Profiler
spark.sql.shuffle.partitions=8

profiler.period.duration=1
profiler.period.duration.units=MINUTES


profiler.batch.input.path=hdfs://localhost:8020/apps/metron/indexing/indexed/*/*
```

1. Fix-up some of the Spark configuration.

```
SPARK_HOME=/usr/hdp/current/spark2-client
cp  /usr/hdp/current/hbase-client/conf/hbase-site.xml $SPARK_HOME/conf/
cp $SPARK_HOME/conf/log4j.properties.template 
$SPARK_HOME/conf/log4j.properties
echo "log4j.logger.org.apache.metron.profiler.spark=DEBUG" >> 
$SPARK_HOME/conf/log4j.properties
```

1. You may need to create the Spark history directory in HDFS (if doing 
this in Full Dev.)

```
export HADOOP_USER_NAME=hdfs
hdfs dfs -mkdir /spark2-history
```

1. You may want to edit the log4j properties that sits in your config 
directory in $SPARK_HOME, or create one.

```
# Set everything to be logged to the console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} 
%p %c{1}: %m%n

# Set the default spark-shell log level to WARN. When running the 
spark-shell, the
# log level for this class is used to overwrite the root 

[GitHub] metron pull request #1161: METRON-1708 Run the Batch Profiler in Spark

2018-08-13 Thread nickwallen
GitHub user nickwallen opened a pull request:

https://github.com/apache/metron/pull/1161

METRON-1708 Run the Batch Profiler in Spark

This adds the ability to run the Batch Profiler from the command line.  
This also packages up the Batch Profiler into a tarball.

This is a pull request against the `METRON-1699-create-batch-profiler` 
feature branch.

This is dependent on #1145 #1146 #1148 #1147 #1150 .  By filtering on the 
last commit, this PR can be reviewed before the others are reviewed and merged.

## Testing


1. Start-up the development environment.  Allow Metron to run for a bit so 
that a fair amount of telemetry is archived in HDFS.

1. Stop all Metron services.

1. Install Spark2 using Ambari.
* Use Add Service > Spark2, then follow prompts.

1. Deploy the Batch Profiler to the development environment.

From the host machine; outside the development VM, run the following.
```
cd metron-deployment/development/centos6
vagrant scp 
../../../metron-analytics/metron-profiler-spark/target/metron-profiler-spark-0.5.1-archive.tar.gz
 /tmp
```
Then from the development VM, run the following.
```
source /etc/default/metron
cd $METRON_HOME
tar -xvf /tmp/metron-profiler-spark-0.5.1-archive.tar.gz
```

1. Create a profile by editing 
`$METRON_HOME/config/zookeeper/profiler.json` as follows.

```
[root@node1 0.5.1]# cat $METRON_HOME/config/zookeeper/profiler.json
{
  "profiles": [
{
  "profile": "hello-world",
  "foreach": "'global'",
  "init":{ "count": "0" },
  "update":  { "count": "count + 1" },
  "result":  "count"
}
  ],
  "timestampField": "timestamp"
}
```

1. Count the number of messages in the 'indexing' topic.  This should not 
be changing.
```
[root@node1 ~]# /usr/hdp/current/kafka-broker/bin/kafka-run-class.sh 
kafka.tools.GetOffsetShell \
   --broker-list $BROKERLIST \
   --topic indexing \
   --time -1

indexing:0:8130
```
In this case there are 8,131 messages.

1. Delete any previously written profile measurements from HBase.
```
[root@node1 ~]# hbase shell
...

hbase(main):001:0> truncate 'profiler'
Truncating 'profiler' table (it may take a while):
 - Disabling table...
 - Truncating table...
0 row(s) in 4.1070 seconds
```

1. Confirm that all of the messages were successfully indexed in HDFS.

```
[root@node1 ~]# hdfs dfs -cat /apps/metron/indexing/indexed/*/* | wc -l
8130
```
 * Remember that we found 8,130 in the indexing topic previously.  This 
shows that all of them were indexed in HDFS successfully.

1. Alter the `$METRON_HOME/config/batch-profiler.properties` as follows.
```
[root@node1 0.5.1]# cat config/batch-profiler.properties
spark.master=local
spark.app.name=Batch Profiler
spark.sql.shuffle.partitions=8

profiler.period.duration=1
profiler.period.duration.units=MINUTES


profiler.batch.input.path=hdfs://localhost:8020/apps/metron/indexing/indexed/*/*
```

1. Fix-up some of the Spark configuration.

```
SPARK_HOME=/usr/hdp/current/spark2-client
cp  /usr/hdp/current/hbase-client/conf/hbase-site.xml $SPARK_HOME/conf/
cp $SPARK_HOME/conf/log4j.properties.template 
$SPARK_HOME/conf/log4j.properties
echo "log4j.logger.org.apache.metron.profiler.spark=DEBUG" >> 
$SPARK_HOME/conf/log4j.properties
```

1. You may need to create the Spark history directory in HDFS (if doing 
this in Full Dev.)

```
export HADOOP_USER_NAME=hdfs
hdfs dfs -mkdir /spark2-history
```

1. You may want to edit the log4j properties that sits in your config 
directory in $SPARK_HOME, or create one.

```
# Set everything to be logged to the console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} 
%p %c{1}: %m%n

# Set the default spark-shell log level to WARN. When running the 
spark-shell, the
# log level for this class is used to overwrite the root logger's log 
level, so that
# the user can have different defaults for the shell and regular Spark 
apps.
log4j.logger.org.apache.spark.repl.Main=WARN

# Settings to quiet third party logs that are too verbose
 

[jira] [Commented] (METRON-1733) PCAP UI - PCAP queries don't work on Safari

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578771#comment-16578771
 ] 

ASF GitHub Bot commented on METRON-1733:


Github user sardell commented on the issue:

https://github.com/apache/metron/pull/1158
  
Closing and reopening to rerun Travis.


> PCAP UI - PCAP queries don't work on Safari
> ---
>
> Key: METRON-1733
> URL: https://issues.apache.org/jira/browse/METRON-1733
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Shane Ardell
>Assignee: Shane Ardell
>Priority: Major
>
> On Safari, PCAP queries fail with a 500 internal server error. No issues seen 
> with Chrome or Firefox. After digging into the search request, it looks like 
> the values for the startTime and endTime are 'NaN'. It looks like Safari 
> cannot parse the format of the time we are passing to the getDate() funciton. 
> For more on this issue:
> https://stackoverflow.com/questions/21883699/safari-javascript-date-nan-issue--mm-dd-hhmmss



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron pull request #1158: METRON-1733: PCAP UI - PCAP queries don't work on...

2018-08-13 Thread sardell
GitHub user sardell reopened a pull request:

https://github.com/apache/metron/pull/1158

METRON-1733: PCAP UI - PCAP queries don't work on Safari

## Contributor Comments
This PR fixes a bug where Safari cannot read the format of the date we are 
passing to the startTimeMs and endTimeMs parameters. To resolve this, I used 
moment js (which was already being used in the project) to get the numeric 
value of the time strings instead of new Date().getTime().

## Testing
Using Safari, run a PCAP query in the Alerts UI. If you check the request 
payload, it should contain the correct numeric values for the startTimeMs and 
endTimeMs instead of NaN, and your search results should complete the same as 
they would in Chrome or another browser.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- n/a ~~Have you written or updated unit tests and or integration tests to 
verify your changes?~~
- n/a ~~If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?~~
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- n/a ~~Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:~~

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sardell/metron METRON-1733

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/1158.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1158


commit 348d70da6b95ecd65f434f13805f0c95b0c62161
Author: Shane Ardell 
Date:   2018-08-10T15:34:52Z

fix safari date NaN  issue

commit 027520ffba1577f3c0c8216c418d469465d496e7
Author: Shane Ardell 
Date:   2018-08-12T05:57:29Z

Merge branch 'feature/METRON-1554-pcap-query-panel' into METRON-1733




---


[jira] [Commented] (METRON-1733) PCAP UI - PCAP queries don't work on Safari

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578773#comment-16578773
 ] 

ASF GitHub Bot commented on METRON-1733:


GitHub user sardell reopened a pull request:

https://github.com/apache/metron/pull/1158

METRON-1733: PCAP UI - PCAP queries don't work on Safari

## Contributor Comments
This PR fixes a bug where Safari cannot read the format of the date we are 
passing to the startTimeMs and endTimeMs parameters. To resolve this, I used 
moment js (which was already being used in the project) to get the numeric 
value of the time strings instead of new Date().getTime().

## Testing
Using Safari, run a PCAP query in the Alerts UI. If you check the request 
payload, it should contain the correct numeric values for the startTimeMs and 
endTimeMs instead of NaN, and your search results should complete the same as 
they would in Chrome or another browser.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- n/a ~~Have you written or updated unit tests and or integration tests to 
verify your changes?~~
- n/a ~~If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?~~
- [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- n/a ~~Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:~~

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sardell/metron METRON-1733

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/1158.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1158


commit 348d70da6b95ecd65f434f13805f0c95b0c62161
Author: Shane Ardell 
Date:   2018-08-10T15:34:52Z

fix safari date NaN  issue

commit 027520ffba1577f3c0c8216c418d469465d496e7
Author: Shane Ardell 
Date:   2018-08-12T05:57:29Z

Merge branch 'feature/METRON-1554-pcap-query-panel' into METRON-1733




> PCAP UI - PCAP queries don't work on Safari
> ---
>
> Key: METRON-1733
> URL: https://issues.apache.org/jira/browse/METRON-1733
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Shane Ardell
>Assignee: Shane Ardell
>Priority: Major
>
> On Safari, PCAP queries fail with a 500 internal server error. No issues seen 
> with Chrome or Firefox. After 

[jira] [Commented] (METRON-1733) PCAP UI - PCAP queries don't work on Safari

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578772#comment-16578772
 ] 

ASF GitHub Bot commented on METRON-1733:


Github user sardell closed the pull request at:

https://github.com/apache/metron/pull/1158


> PCAP UI - PCAP queries don't work on Safari
> ---
>
> Key: METRON-1733
> URL: https://issues.apache.org/jira/browse/METRON-1733
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Shane Ardell
>Assignee: Shane Ardell
>Priority: Major
>
> On Safari, PCAP queries fail with a 500 internal server error. No issues seen 
> with Chrome or Firefox. After digging into the search request, it looks like 
> the values for the startTime and endTime are 'NaN'. It looks like Safari 
> cannot parse the format of the time we are passing to the getDate() funciton. 
> For more on this issue:
> https://stackoverflow.com/questions/21883699/safari-javascript-date-nan-issue--mm-dd-hhmmss



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron issue #1158: METRON-1733: PCAP UI - PCAP queries don't work on Safari

2018-08-13 Thread sardell
Github user sardell commented on the issue:

https://github.com/apache/metron/pull/1158
  
Closing and reopening to rerun Travis.


---


[GitHub] metron pull request #1158: METRON-1733: PCAP UI - PCAP queries don't work on...

2018-08-13 Thread sardell
Github user sardell closed the pull request at:

https://github.com/apache/metron/pull/1158


---


[jira] [Commented] (METRON-1734) Src and Dst port filters are incorrect after changing to empty

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578758#comment-16578758
 ] 

ASF GitHub Bot commented on METRON-1734:


Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/1159
  
+1 by inspection, thanks for the explanation on the string vs. number and 
the casting.


> Src and Dst port filters are incorrect after changing to empty
> --
>
> Key: METRON-1734
> URL: https://issues.apache.org/jira/browse/METRON-1734
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Ryan Merriman
>Assignee: Ryan Merriman
>Priority: Major
>
> When changing a port filter after a job has run, setting it to empty causes 
> the old value to be sent in the request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron issue #1159: METRON-1734: Src and Dst port filters are incorrect afte...

2018-08-13 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/1159
  
+1 by inspection, thanks for the explanation on the string vs. number and 
the casting.


---


[jira] [Commented] (METRON-1733) PCAP UI - PCAP queries don't work on Safari

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578732#comment-16578732
 ] 

ASF GitHub Bot commented on METRON-1733:


Github user merrimanr commented on the issue:

https://github.com/apache/metron/pull/1158
  
@sardell can you close and reopen this PR to trigger another travis run?


> PCAP UI - PCAP queries don't work on Safari
> ---
>
> Key: METRON-1733
> URL: https://issues.apache.org/jira/browse/METRON-1733
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Shane Ardell
>Assignee: Shane Ardell
>Priority: Major
>
> On Safari, PCAP queries fail with a 500 internal server error. No issues seen 
> with Chrome or Firefox. After digging into the search request, it looks like 
> the values for the startTime and endTime are 'NaN'. It looks like Safari 
> cannot parse the format of the time we are passing to the getDate() funciton. 
> For more on this issue:
> https://stackoverflow.com/questions/21883699/safari-javascript-date-nan-issue--mm-dd-hhmmss



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron issue #1158: METRON-1733: PCAP UI - PCAP queries don't work on Safari

2018-08-13 Thread merrimanr
Github user merrimanr commented on the issue:

https://github.com/apache/metron/pull/1158
  
@sardell can you close and reopen this PR to trigger another travis run?


---


[jira] [Commented] (METRON-1734) Src and Dst port filters are incorrect after changing to empty

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578693#comment-16578693
 ] 

ASF GitHub Bot commented on METRON-1734:


Github user merrimanr commented on a diff in the pull request:

https://github.com/apache/metron/pull/1159#discussion_r209698673
  
--- Diff: 
metron-interface/metron-alerts/src/app/pcap/pcap-filters/pcap-filters.component.ts
 ---
@@ -63,9 +63,13 @@ export class PcapFiltersComponent implements OnInit, 
OnChanges {
 this.model.endTimeMs = new Date(this.endTimeStr).getTime();
 if (this.ipSrcPort !== '') {
   this.model.ipSrcPort = +this.ipSrcPort;
+} else {
--- End diff --

They are different types, one is a string and one is a number.  I believe 
we created the this.ipSrcPort and this.ipDstPort string variables for regex 
validation purposes.  As such we can't just assign an empty string to a number 
type.  The + operator converts a string to a number. 


> Src and Dst port filters are incorrect after changing to empty
> --
>
> Key: METRON-1734
> URL: https://issues.apache.org/jira/browse/METRON-1734
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Ryan Merriman
>Assignee: Ryan Merriman
>Priority: Major
>
> When changing a port filter after a job has run, setting it to empty causes 
> the old value to be sent in the request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1735) Empty print status option causes NPE

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578695#comment-16578695
 ] 

ASF GitHub Bot commented on METRON-1735:


GitHub user merrimanr reopened a pull request:

https://github.com/apache/metron/pull/1160

METRON-1735: Empty print status option causes NPE

## Contributor Comments
This is a regression in the feature branch introduced by 
https://github.com/apache/metron/pull/1138.  The default behavior of PcapJob is 
that it should not print status by default and not fail when that setting is 
missing.  

### Changed Included

- Changed the default behavior of the Pcap CLI to print status by default
- Removed the print status flag from the CLI
- Fixed bug in getting print status option in PcapJob
- Added getter/setting methods to PcapJob for testing purposes
- Added test cases

### Testing
Still testing in full dev.

- You should get result in the Pcap UI now
- The print status option in the CLI should be missing
- The CLI should print status every time

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/merrimanr/incubator-metron METRON-1735

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/1160.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1160


commit d13db64c9ae816fded89128a99cd5d6a8a71648c
Author: merrimanr 
Date:   2018-08-10T22:03:25Z

initial commit




> Empty print status option causes NPE
> 
>
> Key: METRON-1735
> URL: https://issues.apache.org/jira/browse/METRON-1735
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Ryan Merriman
>Assignee: Ryan Merriman
>Priority: Major
>
> REST does not set a print job status property causing a NPE in PcapJob 
> because the property is never added to the config.  The PcapJob should 
> default to 

[jira] [Commented] (METRON-1735) Empty print status option causes NPE

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578694#comment-16578694
 ] 

ASF GitHub Bot commented on METRON-1735:


Github user merrimanr closed the pull request at:

https://github.com/apache/metron/pull/1160


> Empty print status option causes NPE
> 
>
> Key: METRON-1735
> URL: https://issues.apache.org/jira/browse/METRON-1735
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Ryan Merriman
>Assignee: Ryan Merriman
>Priority: Major
>
> REST does not set a print job status property causing a NPE in PcapJob 
> because the property is never added to the config.  The PcapJob should 
> default to false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron pull request #1160: METRON-1735: Empty print status option causes NPE

2018-08-13 Thread merrimanr
GitHub user merrimanr reopened a pull request:

https://github.com/apache/metron/pull/1160

METRON-1735: Empty print status option causes NPE

## Contributor Comments
This is a regression in the feature branch introduced by 
https://github.com/apache/metron/pull/1138.  The default behavior of PcapJob is 
that it should not print status by default and not fail when that setting is 
missing.  

### Changed Included

- Changed the default behavior of the Pcap CLI to print status by default
- Removed the print status flag from the CLI
- Fixed bug in getting print status option in PcapJob
- Added getter/setting methods to PcapJob for testing purposes
- Added test cases

### Testing
Still testing in full dev.

- You should get result in the Pcap UI now
- The print status option in the CLI should be missing
- The CLI should print status every time

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:

  ```
  cd site-book
  mvn site
  ```

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/merrimanr/incubator-metron METRON-1735

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/1160.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1160


commit d13db64c9ae816fded89128a99cd5d6a8a71648c
Author: merrimanr 
Date:   2018-08-10T22:03:25Z

initial commit




---


[GitHub] metron pull request #1160: METRON-1735: Empty print status option causes NPE

2018-08-13 Thread merrimanr
Github user merrimanr closed the pull request at:

https://github.com/apache/metron/pull/1160


---


[GitHub] metron pull request #1159: METRON-1734: Src and Dst port filters are incorre...

2018-08-13 Thread merrimanr
Github user merrimanr commented on a diff in the pull request:

https://github.com/apache/metron/pull/1159#discussion_r209698673
  
--- Diff: 
metron-interface/metron-alerts/src/app/pcap/pcap-filters/pcap-filters.component.ts
 ---
@@ -63,9 +63,13 @@ export class PcapFiltersComponent implements OnInit, 
OnChanges {
 this.model.endTimeMs = new Date(this.endTimeStr).getTime();
 if (this.ipSrcPort !== '') {
   this.model.ipSrcPort = +this.ipSrcPort;
+} else {
--- End diff --

They are different types, one is a string and one is a number.  I believe 
we created the this.ipSrcPort and this.ipDstPort string variables for regex 
validation purposes.  As such we can't just assign an empty string to a number 
type.  The + operator converts a string to a number. 


---


[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578642#comment-16578642
 ] 

ASF GitHub Bot commented on METRON-1732:


Github user mmiklavc commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209689930
  
--- Diff: 
metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/mr/PcapJob.java
 ---
@@ -307,8 +307,11 @@ public void setCompleteCheckInterval(long interval) {
   }
   return this;
 }
-mrJob.submit();
-jobStatus.withState(State.SUBMITTED).withDescription("Job 
submitted").withJobId(mrJob.getJobID().toString());
+synchronized (this) {
--- End diff --

Will do. This lock is about thread visibility as opposed to actual issues 
with concurrent modification. It may be that this lock is not need with 
getStatus being synchronized. I will double check and report back via modified 
code and/or code comment on this.


> Fix job status liveness bug and parallelize finalizer file writing
> --
>
> Key: METRON-1732
> URL: https://issues.apache.org/jira/browse/METRON-1732
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Michael Miklavcic
>Assignee: Michael Miklavcic
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...

2018-08-13 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209689930
  
--- Diff: 
metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/mr/PcapJob.java
 ---
@@ -307,8 +307,11 @@ public void setCompleteCheckInterval(long interval) {
   }
   return this;
 }
-mrJob.submit();
-jobStatus.withState(State.SUBMITTED).withDescription("Job 
submitted").withJobId(mrJob.getJobID().toString());
+synchronized (this) {
--- End diff --

Will do. This lock is about thread visibility as opposed to actual issues 
with concurrent modification. It may be that this lock is not need with 
getStatus being synchronized. I will double check and report back via modified 
code and/or code comment on this.


---


[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578627#comment-16578627
 ] 

ASF GitHub Bot commented on METRON-1732:


Github user mmiklavc commented on the issue:

https://github.com/apache/metron/pull/1157
  
Good feedback @nickwallen, I'll  make adjustments.


> Fix job status liveness bug and parallelize finalizer file writing
> --
>
> Key: METRON-1732
> URL: https://issues.apache.org/jira/browse/METRON-1732
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Michael Miklavcic
>Assignee: Michael Miklavcic
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578626#comment-16578626
 ] 

ASF GitHub Bot commented on METRON-1732:


Github user mmiklavc commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209687780
  
--- Diff: 
metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java
 ---
@@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() {
 LOG.warn("Unable to cleanup files in HDFS", e);
   }
 }
+LOG.info("Done finalizing results");
 return new PcapPages(outFiles);
   }
 
-  protected abstract void write(PcapResultsWriter resultsWriter, 
Configuration hadoopConfig, List data, Path outputPath) throws 
IOException;
+  /**
+   * Figure out how many threads to use in the thread pool. If it's a 
string and ends with "C",
+   * then strip the C and treat it as an integral multiple of the number 
of cores.  If it's a
+   * string and does not end with a C, then treat it as a number in string 
form.
+   */
+  private static int getNumThreads(String numThreads) {
+String numThreadsStr = ((String) numThreads).trim().toUpperCase();
+if (numThreadsStr.endsWith("C")) {
+  Integer factor = Integer.parseInt(numThreadsStr.replace("C", ""));
+  return factor * Runtime.getRuntime().availableProcessors();
+} else {
+  return Integer.parseInt(numThreadsStr);
+}
+  }
+
+  protected List writeParallel(Configuration hadoopConfig, Map> toWrite,
+  int parallelism) throws IOException {
+List outFiles = Collections.synchronizedList(new ArrayList<>());
+ForkJoinPool tp = new ForkJoinPool(parallelism);
+try {
+  tp.submit(() -> {
+toWrite.entrySet().parallelStream().forEach(e -> {
--- End diff --

As I understand it, submit is effectively submitting the set of tasks for 
the parallel stream to execute within this threadpool, e.g. 
https://www.baeldung.com/java-8-parallel-streams-custom-threadpool. As a side 
note, the reason for a custom threadpool at all is so that this doesn't cause 
issues with other streams since the default in Java is to use a global context 
for this sort of thing. Liveness issues may arise when using the shared global 
context.


> Fix job status liveness bug and parallelize finalizer file writing
> --
>
> Key: METRON-1732
> URL: https://issues.apache.org/jira/browse/METRON-1732
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Michael Miklavcic
>Assignee: Michael Miklavcic
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron issue #1157: METRON-1732: Fix job status liveness bug and parallelize...

2018-08-13 Thread mmiklavc
Github user mmiklavc commented on the issue:

https://github.com/apache/metron/pull/1157
  
Good feedback @nickwallen, I'll  make adjustments.


---


[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...

2018-08-13 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209687780
  
--- Diff: 
metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java
 ---
@@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() {
 LOG.warn("Unable to cleanup files in HDFS", e);
   }
 }
+LOG.info("Done finalizing results");
 return new PcapPages(outFiles);
   }
 
-  protected abstract void write(PcapResultsWriter resultsWriter, 
Configuration hadoopConfig, List data, Path outputPath) throws 
IOException;
+  /**
+   * Figure out how many threads to use in the thread pool. If it's a 
string and ends with "C",
+   * then strip the C and treat it as an integral multiple of the number 
of cores.  If it's a
+   * string and does not end with a C, then treat it as a number in string 
form.
+   */
+  private static int getNumThreads(String numThreads) {
+String numThreadsStr = ((String) numThreads).trim().toUpperCase();
+if (numThreadsStr.endsWith("C")) {
+  Integer factor = Integer.parseInt(numThreadsStr.replace("C", ""));
+  return factor * Runtime.getRuntime().availableProcessors();
+} else {
+  return Integer.parseInt(numThreadsStr);
+}
+  }
+
+  protected List writeParallel(Configuration hadoopConfig, Map> toWrite,
+  int parallelism) throws IOException {
+List outFiles = Collections.synchronizedList(new ArrayList<>());
+ForkJoinPool tp = new ForkJoinPool(parallelism);
+try {
+  tp.submit(() -> {
+toWrite.entrySet().parallelStream().forEach(e -> {
--- End diff --

As I understand it, submit is effectively submitting the set of tasks for 
the parallel stream to execute within this threadpool, e.g. 
https://www.baeldung.com/java-8-parallel-streams-custom-threadpool. As a side 
note, the reason for a custom threadpool at all is so that this doesn't cause 
issues with other streams since the default in Java is to use a global context 
for this sort of thing. Liveness issues may arise when using the shared global 
context.


---


[jira] [Commented] (METRON-1734) Src and Dst port filters are incorrect after changing to empty

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578615#comment-16578615
 ] 

ASF GitHub Bot commented on METRON-1734:


Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/1159#discussion_r209683673
  
--- Diff: 
metron-interface/metron-alerts/src/app/pcap/pcap-filters/pcap-filters.component.ts
 ---
@@ -63,9 +63,13 @@ export class PcapFiltersComponent implements OnInit, 
OnChanges {
 this.model.endTimeMs = new Date(this.endTimeStr).getTime();
 if (this.ipSrcPort !== '') {
   this.model.ipSrcPort = +this.ipSrcPort;
+} else {
--- End diff --

This is probably a dumb question, but why do we have to specifically delete 
the value?  Don't we treat empty string the same as missing?  If that's the 
case, why isn't this just 
```
this.model.ipSrcPort = +this.ipSrcPort;
this.model.ipDstPort = +this.ipDstPort;
```

Sidenote, are the pluses in here even doing anything? Could it just be
```
this.model.ipSrcPort = this.ipSrcPort;
this.model.ipDstPort = this.ipDstPort;
```




> Src and Dst port filters are incorrect after changing to empty
> --
>
> Key: METRON-1734
> URL: https://issues.apache.org/jira/browse/METRON-1734
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Ryan Merriman
>Assignee: Ryan Merriman
>Priority: Major
>
> When changing a port filter after a job has run, setting it to empty causes 
> the old value to be sent in the request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron pull request #1159: METRON-1734: Src and Dst port filters are incorre...

2018-08-13 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/1159#discussion_r209683673
  
--- Diff: 
metron-interface/metron-alerts/src/app/pcap/pcap-filters/pcap-filters.component.ts
 ---
@@ -63,9 +63,13 @@ export class PcapFiltersComponent implements OnInit, 
OnChanges {
 this.model.endTimeMs = new Date(this.endTimeStr).getTime();
 if (this.ipSrcPort !== '') {
   this.model.ipSrcPort = +this.ipSrcPort;
+} else {
--- End diff --

This is probably a dumb question, but why do we have to specifically delete 
the value?  Don't we treat empty string the same as missing?  If that's the 
case, why isn't this just 
```
this.model.ipSrcPort = +this.ipSrcPort;
this.model.ipDstPort = +this.ipDstPort;
```

Sidenote, are the pluses in here even doing anything? Could it just be
```
this.model.ipSrcPort = this.ipSrcPort;
this.model.ipDstPort = this.ipDstPort;
```




---


[jira] [Commented] (METRON-1735) Empty print status option causes NPE

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578600#comment-16578600
 ] 

ASF GitHub Bot commented on METRON-1735:


Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/1160
  
@merrimanr Could you bump Travis? Looks like a maven connection issue, 
rather than something to actually be fixed.


> Empty print status option causes NPE
> 
>
> Key: METRON-1735
> URL: https://issues.apache.org/jira/browse/METRON-1735
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Ryan Merriman
>Assignee: Ryan Merriman
>Priority: Major
>
> REST does not set a print job status property causing a NPE in PcapJob 
> because the property is never added to the config.  The PcapJob should 
> default to false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron issue #1160: METRON-1735: Empty print status option causes NPE

2018-08-13 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/metron/pull/1160
  
@merrimanr Could you bump Travis? Looks like a maven connection issue, 
rather than something to actually be fixed.


---


[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578569#comment-16578569
 ] 

ASF GitHub Bot commented on METRON-1732:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209649851
  
--- Diff: metron-interface/metron-rest/README.md ---
@@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through 
standard in and expects P
 
 Pcap query jobs can be configured for submission to a YARN queue.  This 
setting is exposed as the Spring property `pcap.yarn.queue`.  If configured, 
the REST application will set the `mapreduce.job.queuename` Hadoop property to 
that value.
 
+Pcap query jobs have a finalization routine that writes their results out 
to HDFS in pages. There is a threadpool used for this finalization that can be 
configured to use a specified number of threads.
+This setting is exposed as the Spring property 
`pcap.finalizer.threadpool.size`
--- End diff --

Can we document the default value for this?


> Fix job status liveness bug and parallelize finalizer file writing
> --
>
> Key: METRON-1732
> URL: https://issues.apache.org/jira/browse/METRON-1732
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Michael Miklavcic
>Assignee: Michael Miklavcic
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...

2018-08-13 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209674410
  
--- Diff: 
metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java
 ---
@@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() {
 LOG.warn("Unable to cleanup files in HDFS", e);
   }
 }
+LOG.info("Done finalizing results");
 return new PcapPages(outFiles);
   }
 
-  protected abstract void write(PcapResultsWriter resultsWriter, 
Configuration hadoopConfig, List data, Path outputPath) throws 
IOException;
+  /**
+   * Figure out how many threads to use in the thread pool. If it's a 
string and ends with "C",
+   * then strip the C and treat it as an integral multiple of the number 
of cores.  If it's a
+   * string and does not end with a C, then treat it as a number in string 
form.
+   */
+  private static int getNumThreads(String numThreads) {
+String numThreadsStr = ((String) numThreads).trim().toUpperCase();
+if (numThreadsStr.endsWith("C")) {
+  Integer factor = Integer.parseInt(numThreadsStr.replace("C", ""));
+  return factor * Runtime.getRuntime().availableProcessors();
+} else {
+  return Integer.parseInt(numThreadsStr);
+}
+  }
+
+  protected List writeParallel(Configuration hadoopConfig, Map> toWrite,
+  int parallelism) throws IOException {
+List outFiles = Collections.synchronizedList(new ArrayList<>());
+ForkJoinPool tp = new ForkJoinPool(parallelism);
+try {
+  tp.submit(() -> {
+toWrite.entrySet().parallelStream().forEach(e -> {
--- End diff --

Shouldn't we be calling `tp.submit` for each (path, data)?  


---


[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578570#comment-16578570
 ] 

ASF GitHub Bot commented on METRON-1732:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209651293
  
--- Diff: metron-interface/metron-rest/README.md ---
@@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through 
standard in and expects P
 
 Pcap query jobs can be configured for submission to a YARN queue.  This 
setting is exposed as the Spring property `pcap.yarn.queue`.  If configured, 
the REST application will set the `mapreduce.job.queuename` Hadoop property to 
that value.
 
+Pcap query jobs have a finalization routine that writes their results out 
to HDFS in pages. There is a threadpool used for this finalization that can be 
configured to use a specified number of threads.
+This setting is exposed as the Spring property 
`pcap.finalizer.threadpool.size`
--- End diff --

Should we mention that 1C, 4C are valid values in addition to integers?  
Perhaps just copy the text you have in the Ambari description into the README.  
Good stuff.


> Fix job status liveness bug and parallelize finalizer file writing
> --
>
> Key: METRON-1732
> URL: https://issues.apache.org/jira/browse/METRON-1732
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Michael Miklavcic
>Assignee: Michael Miklavcic
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...

2018-08-13 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209650724
  
--- Diff: metron-interface/metron-rest/README.md ---
@@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through 
standard in and expects P
 
 Pcap query jobs can be configured for submission to a YARN queue.  This 
setting is exposed as the Spring property `pcap.yarn.queue`.  If configured, 
the REST application will set the `mapreduce.job.queuename` Hadoop property to 
that value.
 
+Pcap query jobs have a finalization routine that writes their results out 
to HDFS in pages. There is a threadpool used for this finalization that can be 
configured to use a specified number of threads.
+This setting is exposed as the Spring property 
`pcap.finalizer.threadpool.size`
--- End diff --

Do you have any advice on when a user should increase/decrease this value?  
Are there errors I might see that would be resolved by increasing/decreasing 
this value?

If you don't have a good understanding of this, then we don't need to worry 
about it.


---


[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...

2018-08-13 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209671313
  
--- Diff: 
metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java
 ---
@@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() {
 LOG.warn("Unable to cleanup files in HDFS", e);
   }
 }
+LOG.info("Done finalizing results");
 return new PcapPages(outFiles);
   }
 
-  protected abstract void write(PcapResultsWriter resultsWriter, 
Configuration hadoopConfig, List data, Path outputPath) throws 
IOException;
+  /**
+   * Figure out how many threads to use in the thread pool. If it's a 
string and ends with "C",
+   * then strip the C and treat it as an integral multiple of the number 
of cores.  If it's a
+   * string and does not end with a C, then treat it as a number in string 
form.
+   */
+  private static int getNumThreads(String numThreads) {
+String numThreadsStr = ((String) numThreads).trim().toUpperCase();
+if (numThreadsStr.endsWith("C")) {
+  Integer factor = Integer.parseInt(numThreadsStr.replace("C", ""));
+  return factor * Runtime.getRuntime().availableProcessors();
+} else {
+  return Integer.parseInt(numThreadsStr);
+}
+  }
+
+  protected List writeParallel(Configuration hadoopConfig, Map> toWrite,
+  int parallelism) throws IOException {
+List outFiles = Collections.synchronizedList(new ArrayList<>());
+ForkJoinPool tp = new ForkJoinPool(parallelism);
+try {
+  tp.submit(() -> {
+toWrite.entrySet().parallelStream().forEach(e -> {
+  try {
+Path path = e.getKey();
+List data = e.getValue();
+if (data.size() > 0) {
+  write(getResultsWriter(), hadoopConfig, data, path);
+  outFiles.add(path);
+}
+  } catch (IOException ioe) {
+throw new RuntimeException("Failed to write results", ioe);
--- End diff --

Can we add the path that failed to write to the exception message?


---


[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578574#comment-16578574
 ] 

ASF GitHub Bot commented on METRON-1732:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209665613
  
--- Diff: 
metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/mr/PcapJob.java
 ---
@@ -307,8 +307,11 @@ public void setCompleteCheckInterval(long interval) {
   }
   return this;
 }
-mrJob.submit();
-jobStatus.withState(State.SUBMITTED).withDescription("Job 
submitted").withJobId(mrJob.getJobID().toString());
+synchronized (this) {
--- End diff --

Can we add a comment about why we need the lock here?


> Fix job status liveness bug and parallelize finalizer file writing
> --
>
> Key: METRON-1732
> URL: https://issues.apache.org/jira/browse/METRON-1732
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Michael Miklavcic
>Assignee: Michael Miklavcic
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578571#comment-16578571
 ] 

ASF GitHub Bot commented on METRON-1732:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209655011
  
--- Diff: 
metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java
 ---
@@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() {
 LOG.warn("Unable to cleanup files in HDFS", e);
   }
 }
+LOG.info("Done finalizing results");
 return new PcapPages(outFiles);
   }
 
-  protected abstract void write(PcapResultsWriter resultsWriter, 
Configuration hadoopConfig, List data, Path outputPath) throws 
IOException;
+  /**
+   * Figure out how many threads to use in the thread pool. If it's a 
string and ends with "C",
+   * then strip the C and treat it as an integral multiple of the number 
of cores.  If it's a
+   * string and does not end with a C, then treat it as a number in string 
form.
+   */
+  private static int getNumThreads(String numThreads) {
+String numThreadsStr = ((String) numThreads).trim().toUpperCase();
+if (numThreadsStr.endsWith("C")) {
+  Integer factor = Integer.parseInt(numThreadsStr.replace("C", ""));
--- End diff --

Should we add a catch block for when a user enters an invalid value?  We 
should catch and provide a helpful exception message like "Invalid value for 
property 'pcap.finalizer.threadpool.size'; value='3CCC'".


> Fix job status liveness bug and parallelize finalizer file writing
> --
>
> Key: METRON-1732
> URL: https://issues.apache.org/jira/browse/METRON-1732
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Michael Miklavcic
>Assignee: Michael Miklavcic
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...

2018-08-13 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209665613
  
--- Diff: 
metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/mr/PcapJob.java
 ---
@@ -307,8 +307,11 @@ public void setCompleteCheckInterval(long interval) {
   }
   return this;
 }
-mrJob.submit();
-jobStatus.withState(State.SUBMITTED).withDescription("Job 
submitted").withJobId(mrJob.getJobID().toString());
+synchronized (this) {
--- End diff --

Can we add a comment about why we need the lock here?


---


[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578572#comment-16578572
 ] 

ASF GitHub Bot commented on METRON-1732:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209674410
  
--- Diff: 
metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java
 ---
@@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() {
 LOG.warn("Unable to cleanup files in HDFS", e);
   }
 }
+LOG.info("Done finalizing results");
 return new PcapPages(outFiles);
   }
 
-  protected abstract void write(PcapResultsWriter resultsWriter, 
Configuration hadoopConfig, List data, Path outputPath) throws 
IOException;
+  /**
+   * Figure out how many threads to use in the thread pool. If it's a 
string and ends with "C",
+   * then strip the C and treat it as an integral multiple of the number 
of cores.  If it's a
+   * string and does not end with a C, then treat it as a number in string 
form.
+   */
+  private static int getNumThreads(String numThreads) {
+String numThreadsStr = ((String) numThreads).trim().toUpperCase();
+if (numThreadsStr.endsWith("C")) {
+  Integer factor = Integer.parseInt(numThreadsStr.replace("C", ""));
+  return factor * Runtime.getRuntime().availableProcessors();
+} else {
+  return Integer.parseInt(numThreadsStr);
+}
+  }
+
+  protected List writeParallel(Configuration hadoopConfig, Map> toWrite,
+  int parallelism) throws IOException {
+List outFiles = Collections.synchronizedList(new ArrayList<>());
+ForkJoinPool tp = new ForkJoinPool(parallelism);
+try {
+  tp.submit(() -> {
+toWrite.entrySet().parallelStream().forEach(e -> {
--- End diff --

Shouldn't we be calling `tp.submit` for each (path, data)?  


> Fix job status liveness bug and parallelize finalizer file writing
> --
>
> Key: METRON-1732
> URL: https://issues.apache.org/jira/browse/METRON-1732
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Michael Miklavcic
>Assignee: Michael Miklavcic
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578573#comment-16578573
 ] 

ASF GitHub Bot commented on METRON-1732:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209671313
  
--- Diff: 
metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java
 ---
@@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() {
 LOG.warn("Unable to cleanup files in HDFS", e);
   }
 }
+LOG.info("Done finalizing results");
 return new PcapPages(outFiles);
   }
 
-  protected abstract void write(PcapResultsWriter resultsWriter, 
Configuration hadoopConfig, List data, Path outputPath) throws 
IOException;
+  /**
+   * Figure out how many threads to use in the thread pool. If it's a 
string and ends with "C",
+   * then strip the C and treat it as an integral multiple of the number 
of cores.  If it's a
+   * string and does not end with a C, then treat it as a number in string 
form.
+   */
+  private static int getNumThreads(String numThreads) {
+String numThreadsStr = ((String) numThreads).trim().toUpperCase();
+if (numThreadsStr.endsWith("C")) {
+  Integer factor = Integer.parseInt(numThreadsStr.replace("C", ""));
+  return factor * Runtime.getRuntime().availableProcessors();
+} else {
+  return Integer.parseInt(numThreadsStr);
+}
+  }
+
+  protected List writeParallel(Configuration hadoopConfig, Map> toWrite,
+  int parallelism) throws IOException {
+List outFiles = Collections.synchronizedList(new ArrayList<>());
+ForkJoinPool tp = new ForkJoinPool(parallelism);
+try {
+  tp.submit(() -> {
+toWrite.entrySet().parallelStream().forEach(e -> {
+  try {
+Path path = e.getKey();
+List data = e.getValue();
+if (data.size() > 0) {
+  write(getResultsWriter(), hadoopConfig, data, path);
+  outFiles.add(path);
+}
+  } catch (IOException ioe) {
+throw new RuntimeException("Failed to write results", ioe);
--- End diff --

Can we add the path that failed to write to the exception message?


> Fix job status liveness bug and parallelize finalizer file writing
> --
>
> Key: METRON-1732
> URL: https://issues.apache.org/jira/browse/METRON-1732
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Michael Miklavcic
>Assignee: Michael Miklavcic
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578568#comment-16578568
 ] 

ASF GitHub Bot commented on METRON-1732:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209650724
  
--- Diff: metron-interface/metron-rest/README.md ---
@@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through 
standard in and expects P
 
 Pcap query jobs can be configured for submission to a YARN queue.  This 
setting is exposed as the Spring property `pcap.yarn.queue`.  If configured, 
the REST application will set the `mapreduce.job.queuename` Hadoop property to 
that value.
 
+Pcap query jobs have a finalization routine that writes their results out 
to HDFS in pages. There is a threadpool used for this finalization that can be 
configured to use a specified number of threads.
+This setting is exposed as the Spring property 
`pcap.finalizer.threadpool.size`
--- End diff --

Do you have any advice on when a user should increase/decrease this value?  
Are there errors I might see that would be resolved by increasing/decreasing 
this value?

If you don't have a good understanding of this, then we don't need to worry 
about it.


> Fix job status liveness bug and parallelize finalizer file writing
> --
>
> Key: METRON-1732
> URL: https://issues.apache.org/jira/browse/METRON-1732
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Michael Miklavcic
>Assignee: Michael Miklavcic
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...

2018-08-13 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209655011
  
--- Diff: 
metron-platform/metron-pcap/src/main/java/org/apache/metron/pcap/finalizer/PcapFinalizer.java
 ---
@@ -99,10 +104,55 @@ protected PcapResultsWriter getResultsWriter() {
 LOG.warn("Unable to cleanup files in HDFS", e);
   }
 }
+LOG.info("Done finalizing results");
 return new PcapPages(outFiles);
   }
 
-  protected abstract void write(PcapResultsWriter resultsWriter, 
Configuration hadoopConfig, List data, Path outputPath) throws 
IOException;
+  /**
+   * Figure out how many threads to use in the thread pool. If it's a 
string and ends with "C",
+   * then strip the C and treat it as an integral multiple of the number 
of cores.  If it's a
+   * string and does not end with a C, then treat it as a number in string 
form.
+   */
+  private static int getNumThreads(String numThreads) {
+String numThreadsStr = ((String) numThreads).trim().toUpperCase();
+if (numThreadsStr.endsWith("C")) {
+  Integer factor = Integer.parseInt(numThreadsStr.replace("C", ""));
--- End diff --

Should we add a catch block for when a user enters an invalid value?  We 
should catch and provide a helpful exception message like "Invalid value for 
property 'pcap.finalizer.threadpool.size'; value='3CCC'".


---


[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...

2018-08-13 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209651293
  
--- Diff: metron-interface/metron-rest/README.md ---
@@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through 
standard in and expects P
 
 Pcap query jobs can be configured for submission to a YARN queue.  This 
setting is exposed as the Spring property `pcap.yarn.queue`.  If configured, 
the REST application will set the `mapreduce.job.queuename` Hadoop property to 
that value.
 
+Pcap query jobs have a finalization routine that writes their results out 
to HDFS in pages. There is a threadpool used for this finalization that can be 
configured to use a specified number of threads.
+This setting is exposed as the Spring property 
`pcap.finalizer.threadpool.size`
--- End diff --

Should we mention that 1C, 4C are valid values in addition to integers?  
Perhaps just copy the text you have in the Ambari description into the README.  
Good stuff.


---


[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...

2018-08-13 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209649851
  
--- Diff: metron-interface/metron-rest/README.md ---
@@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through 
standard in and expects P
 
 Pcap query jobs can be configured for submission to a YARN queue.  This 
setting is exposed as the Spring property `pcap.yarn.queue`.  If configured, 
the REST application will set the `mapreduce.job.queuename` Hadoop property to 
that value.
 
+Pcap query jobs have a finalization routine that writes their results out 
to HDFS in pages. There is a threadpool used for this finalization that can be 
configured to use a specified number of threads.
+This setting is exposed as the Spring property 
`pcap.finalizer.threadpool.size`
--- End diff --

Can we document the default value for this?


---


[jira] [Commented] (METRON-1733) PCAP UI - PCAP queries don't work on Safari

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578499#comment-16578499
 ] 

ASF GitHub Bot commented on METRON-1733:


Github user ruffle1986 commented on the issue:

https://github.com/apache/metron/pull/1158
  
I think it's just enough the trigger a travis rebuild to make it pass.


> PCAP UI - PCAP queries don't work on Safari
> ---
>
> Key: METRON-1733
> URL: https://issues.apache.org/jira/browse/METRON-1733
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Shane Ardell
>Assignee: Shane Ardell
>Priority: Major
>
> On Safari, PCAP queries fail with a 500 internal server error. No issues seen 
> with Chrome or Firefox. After digging into the search request, it looks like 
> the values for the startTime and endTime are 'NaN'. It looks like Safari 
> cannot parse the format of the time we are passing to the getDate() funciton. 
> For more on this issue:
> https://stackoverflow.com/questions/21883699/safari-javascript-date-nan-issue--mm-dd-hhmmss



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron issue #1158: METRON-1733: PCAP UI - PCAP queries don't work on Safari

2018-08-13 Thread ruffle1986
Github user ruffle1986 commented on the issue:

https://github.com/apache/metron/pull/1158
  
I think it's just enough the trigger a travis rebuild to make it pass.


---


[jira] [Commented] (METRON-1732) Fix job status liveness bug and parallelize finalizer file writing

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578426#comment-16578426
 ] 

ASF GitHub Bot commented on METRON-1732:


Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209649720
  
--- Diff: metron-interface/metron-rest/README.md ---
@@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through 
standard in and expects P
 
 Pcap query jobs can be configured for submission to a YARN queue.  This 
setting is exposed as the Spring property `pcap.yarn.queue`.  If configured, 
the REST application will set the `mapreduce.job.queuename` Hadoop property to 
that value.
 
+Pcap query jobs have a finalization routine that writes their results out 
to HDFS in pages. There is a threadpool used for this finalization that can be 
configured to use a specified number of threads.
+This setting is exposed as the Spring property 
`pcap.finalizer.threadpool.size`
--- End diff --

Can we document the default value for this?


> Fix job status liveness bug and parallelize finalizer file writing
> --
>
> Key: METRON-1732
> URL: https://issues.apache.org/jira/browse/METRON-1732
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Michael Miklavcic
>Assignee: Michael Miklavcic
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron pull request #1157: METRON-1732: Fix job status liveness bug and para...

2018-08-13 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/1157#discussion_r209649720
  
--- Diff: metron-interface/metron-rest/README.md ---
@@ -223,6 +223,9 @@ REST will supply the script with raw pcap data through 
standard in and expects P
 
 Pcap query jobs can be configured for submission to a YARN queue.  This 
setting is exposed as the Spring property `pcap.yarn.queue`.  If configured, 
the REST application will set the `mapreduce.job.queuename` Hadoop property to 
that value.
 
+Pcap query jobs have a finalization routine that writes their results out 
to HDFS in pages. There is a threadpool used for this finalization that can be 
configured to use a specified number of threads.
+This setting is exposed as the Spring property 
`pcap.finalizer.threadpool.size`
--- End diff --

Can we document the default value for this?


---


[jira] [Commented] (METRON-1707) Port Profiler to Spark

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578366#comment-16578366
 ] 

ASF GitHub Bot commented on METRON-1707:


Github user nickwallen commented on the issue:

https://github.com/apache/metron/pull/1150
  
> @simonellistonball: Do we have to use groupByKey in the spark 
implementation, is it not possible to use reduceByKey to build the profiles...

I had in the back of my mind that groupByKey might not be the most 
performance option, but I just didn't focus any energy on that for the first 
pass.

I will take a look and see if we can't use your advice.  Thanks for the 
pointer @simonellistonball !


> Port Profiler to Spark
> --
>
> Key: METRON-1707
> URL: https://issues.apache.org/jira/browse/METRON-1707
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> Create a port of the Profiler that runs in Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron issue #1150: METRON-1707 Port Profiler to Spark [Feature Branch]

2018-08-13 Thread nickwallen
Github user nickwallen commented on the issue:

https://github.com/apache/metron/pull/1150
  
> @simonellistonball: Do we have to use groupByKey in the spark 
implementation, is it not possible to use reduceByKey to build the profiles...

I had in the back of my mind that groupByKey might not be the most 
performance option, but I just didn't focus any energy on that for the first 
pass.

I will take a look and see if we can't use your advice.  Thanks for the 
pointer @simonellistonball !


---


[jira] [Commented] (METRON-1707) Port Profiler to Spark

2018-08-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/METRON-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578343#comment-16578343
 ] 

ASF GitHub Bot commented on METRON-1707:


Github user simonellistonball commented on the issue:

https://github.com/apache/metron/pull/1150
  
Do we have to use groupByKey in the spark implementation, is it not 
possible to use reduceByKey to build the profiles, since profilers are by 
definition reducible. I've seen groupByKey cause performance problems (see 
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html
 for a good discussion on this).


> Port Profiler to Spark
> --
>
> Key: METRON-1707
> URL: https://issues.apache.org/jira/browse/METRON-1707
> Project: Metron
>  Issue Type: Sub-task
>Reporter: Nick Allen
>Assignee: Nick Allen
>Priority: Major
>
> Create a port of the Profiler that runs in Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] metron issue #1150: METRON-1707 Port Profiler to Spark [Feature Branch]

2018-08-13 Thread simonellistonball
Github user simonellistonball commented on the issue:

https://github.com/apache/metron/pull/1150
  
Do we have to use groupByKey in the spark implementation, is it not 
possible to use reduceByKey to build the profiles, since profilers are by 
definition reducible. I've seen groupByKey cause performance problems (see 
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html
 for a good discussion on this).


---