[GitHub] metron issue #946: METRON-1465:Support for Elasticsearch X-pack

2018-03-15 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/946
  
thanks @mmiklavc 


---


[GitHub] metron issue #946: METRON-1465:Support for Elasticsearch X-pack

2018-03-15 Thread mmiklavc
Github user mmiklavc commented on the issue:

https://github.com/apache/metron/pull/946
  
@ottobackwards - I have some open changes in a PR against this PR. Just to 
confirm, the expected result of this PR should be that users can choose
1. Which client they want to instantiate
2. Whether they want X-Pack enabled or not

The default ootb functionality should be the same as what we currently do. 
X-Pack is a special case.


---


[GitHub] metron pull request #965: METRON-590 Enable Use of Event Time in Profiler

2018-03-15 Thread nickwallen
GitHub user nickwallen opened a pull request:

https://github.com/apache/metron/pull/965

METRON-590 Enable Use of Event Time in Profiler

This enables the use of event time processing in the Profiler.

By default, the Profiler will still use processing time.  If you configure 
the profiler with a `timestampField` then it will extract the timestamps from 
that field contained within the incoming telemetry.

## Manual Testing



1. Launch a development environment.  Shutdown Indexing, Elasticsearch, 
Kibana, YARN, and MapReduce2 to avoid any resource issues.

1. Using Ambari, change the following settings and restart the Profiler.

Set the "Period Duration" to 1 minute.
Set the "Window Duration" to 15 seconds.
Set the "Window Lag" to 30 seconds.

1. Replace `/opt/sensor-stubs/bin/start-bro-stub` with the following.

Instead of adding the current time into each Bro message, this will add 
a timestamp from 1 day ago.
```
#
# how long to delay between each 'batch' in seconds.
#
DELAY=${1:-2}

#
# how many messages to send in each 'batch'.  the messages are drawn 
randomly
# from the entire set of canned data.
#
COUNT=${2:-10}

INPUT="/opt/sensor-stubs/data/bro.out"
PRODUCER="/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh"
TOPIC="bro"

while true; do

  # transform the bro timestamp and push to kafka
  SEARCH="\"ts\"\:[0-9]\+\."
  REPLACE="\"ts\"\:`date -d '1 day ago' +'%s'`\."
  shuf -n $COUNT $INPUT | sed -e "s/$SEARCH/$REPLACE/g" | $PRODUCER 
--broker-list node1:6667 --topic $TOPIC

  sleep $DELAY
done
```

1. Restart the Bro Sensor Stub.

```
service sensor-stubs stop
service sensor-stubs start bro
```

1. Open up the REPL and configure the Profiler like so.

Notice that we are setting the 'timestampField' within the Profiler 
configuration.  This will tell the Profiler to extract the timestamp from this 
field rather than using system time.
```
[Stellar]>>> conf := SHELL_EDIT(conf)
{
  "profiles": [
{
  "profile": "hello-world",
  "onlyif": "source.type == 'bro'",
  "foreach": "'global'",
  "init":{ "count": "0" },
  "update":  { "count": "count + 1" },
  "result":  "count"
}
  ],
  "timestampField": "timestamp"
}
[Stellar]>>>
[Stellar]>>>
[Stellar]>>> CONFIG_PUT("PROFILER",conf)
```

1. Query the Profiler data store.  This will take a minute or so until you 
see a value written.

```
[Stellar]>>> PROFILE_GET("hello-world", "global", PROFILE_FIXED(2, 
"DAYS"))
[]
[Stellar]>>> PROFILE_GET("hello-world", "global", PROFILE_FIXED(2, 
"DAYS"))
[200]
```

1. Now query back just a couple hours instead.  Notice that you should get 
no results.  This indicates that the Profiler successfully used the timestamp 
from the Bro data which contained day old values.

```
[Stellar]>>> PROFILE_GET("hello-world", "global", PROFILE_FIXED(2, 
"HOURS"))
[]
```

1. Now change the Profiler configuration to remove the "timestampField" 
setting.  This will switch the Profiler back to using system aka processing 
time.

```
[Stellar]>>> conf := SHELL_EDIT(conf)
{
  "profiles": [
{
  "profile": "hello-world",
  "onlyif": "source.type == 'bro'",
  "foreach": "'global'",
  "init":{ "count": "0" },
  "update":  { "count": "count + 1" },
  "result":  "count"
}
  ]
}
[Stellar]>>>
[Stellar]>>> CONFIG_PUT("PROFILER",conf)
```

1. The Profiler will pick-up the change after the next flush event.  Query 
for profile data in the past few minutes.  This shows that the Profiler has 
switched back to use system time aka processing time.

```
[Stellar]>>> PROFILE_GET("hello-world", "global", PROFILE_FIXED(2, 
"MINUTES"))
[180, 190]
```

1. In Storm you can also set logging to DEBUG for 
"org.apache.metron.profiler". This will output detailed worker logs that allows 
you to also verify that the profiler is using the correct timestamps.



## Pull Request Checklist

- [ ] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [ ] Does your PR title start with METRON- where  is the JI

[GitHub] metron pull request #914: METRON-1397 Support for JSON Path and complex docu...

2018-03-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/metron/pull/914


---


[GitHub] metron issue #914: METRON-1397 Support for JSON Path and complex documents i...

2018-03-15 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/914
  
I love this, sorry I didn't review it sooner @ottobackwards +1 by 
inspection.  This is great.


---


[GitHub] metron pull request #959: METRON-1485 Upgrade vagrant for dev environments

2018-03-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/metron/pull/959


---


[GitHub] metron pull request #924: METRON-1299 In MetronError tests, don't test for H...

2018-03-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/metron/pull/924


---


[GitHub] metron issue #959: METRON-1485 Upgrade vagrant for dev environments

2018-03-15 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/959
  
+1 by inspection, thanks Jon!


---


[GitHub] metron pull request #962: METRON-1488: user_settings hbase table does not ha...

2018-03-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/metron/pull/962


---


[GitHub] metron pull request #963: METRON-1490: Better error message when user specif...

2018-03-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/metron/pull/963


---


[GitHub] metron issue #964: METRON-1491: The indexing topology restart logic is wrong

2018-03-15 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/964
  
Ok, created 
[METRON-1492](https://issues.apache.org/jira/browse/METRON-1492). 


---


[GitHub] metron issue #964: METRON-1491: The indexing topology restart logic is wrong

2018-03-15 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/964
  
Yes, lets get it in there


---


[GitHub] metron issue #964: METRON-1491: The indexing topology restart logic is wrong

2018-03-15 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/964
  
@ottobackwards yeah, definitely; I think that's ultimately where we want to 
go.  The first step to that is separating out these functions like I have in 
this PR.  The next is doing the ambari work which will utilize it.  Shall I 
create a follow-on JIRA?


---


[GitHub] metron issue #964: METRON-1491: The indexing topology restart logic is wrong

2018-03-15 Thread ottobackwards
Github user ottobackwards commented on the issue:

https://github.com/apache/metron/pull/964
  
Should we think about exposing them as separate things in ambari over all?  
Go all the way with this?


---


[GitHub] metron pull request #964: METRON-1491: The indexing topology restart logic i...

2018-03-15 Thread cestella
GitHub user cestella opened a pull request:

https://github.com/apache/metron/pull/964

METRON-1491: The indexing topology restart logic is wrong

## Contributor Comments
If either topology is down, Ambari shows all of Indexing as dead. Clicking 
start attempts to start them both and fails if either is still running. 
Furthermore, it appears to retry 3 times before finally failing the command.

In order to test this, kill one of the indexing topologies then attempt to 
start indexing from ambari.  Things should go smoothly.  Also, try restarting 
and starting in various states.

## Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.  
Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  


In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:

### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
- [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?


### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified 
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
  ```
  mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
  ```

- [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?

 Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cestella/incubator-metron METRON-1491

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/metron/pull/964.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #964


commit 423e9a074492ae2071022b9a4adb347d2a109633
Author: cstella 
Date:   2018-03-15T14:27:09Z

METRON-1491: The indexing topology restart logic is wrong




---


[GitHub] metron issue #963: METRON-1490: Better error message when user specifies an ...

2018-03-15 Thread nickwallen
Github user nickwallen commented on the issue:

https://github.com/apache/metron/pull/963
  
+1 LGTM


---


[GitHub] metron issue #963: METRON-1490: Better error message when user specifies an ...

2018-03-15 Thread mmiklavc
Github user mmiklavc commented on the issue:

https://github.com/apache/metron/pull/963
  
+1 via inspection


---


[GitHub] metron issue #962: METRON-1488: user_settings hbase table does not have acls...

2018-03-15 Thread nickwallen
Github user nickwallen commented on the issue:

https://github.com/apache/metron/pull/962
  
+1 looks good. thanks


---


[GitHub] metron pull request #958: METRON-1483: Create a tool to monitor performance ...

2018-03-15 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/958#discussion_r174787158
  
--- Diff: 
metron-contrib/metron-performance/src/main/java/org/apache/metron/performance/load/SendToKafka.java
 ---
@@ -0,0 +1,107 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.metron.performance.load;
+
+import org.apache.kafka.clients.producer.KafkaProducer;
+import org.apache.kafka.clients.producer.ProducerRecord;
+
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.TimerTask;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Future;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.function.Supplier;
+
+public class SendToKafka extends TimerTask {
--- End diff --

alright, done and done.


---


[GitHub] metron pull request #961: METRON-1487 Define Performance Benchmarks for Enri...

2018-03-15 Thread nickwallen
Github user nickwallen commented on a diff in the pull request:

https://github.com/apache/metron/pull/961#discussion_r174772238
  
--- Diff: metron-platform/metron-enrichment/Performance.md ---
@@ -0,0 +1,527 @@
+
+
+# Enrichment Performance
+
+This guide defines a set of benchmarks used to measure the performance of 
the Enrichment topology.  The guide also provides detailed steps on how to 
execute those benchmarks along with advice for tuning the Unified Enrichment 
topology.
+
+* [Benchmarks](#benchmarks)
+* [Benchmark Execution](#benchmark-execution)
+* [Performance Tuning](#performance-tuning)
+* [Benchmark Results](#benchmark-results)
+
+## Benchmarks
+
+The following section describes a set of enrichments that will be used to 
benchmark the performance of the Enrichment topology.
+
+* [Geo IP Enrichment](#geo-ip-enrichment)
+* [HBase Enrichment](#hbase-enrichment)
+* [Stellar Enrichment](#stellar-enrichment)
+
+### Geo IP Enrichment
+
+This benchmark measures the performance of executing a Geo IP enrichment.  
Given a valid IP address the enrichment will append detailed location 
information for that IP.  The location information is sourced from an external 
Geo IP data source like [Maxmind](https://github.com/maxmind/GeoIP2-java).
+
+ Configuration
+
+Adding the following Stellar expression to the Enrichment topology 
configuration will define a Geo IP enrichment.
+```
+geo := GEO_GET(ip_dst_addr)
+```
+
+After the enrichment process completes, the  telemetry message will 
contain a set of fields with location information for the given IP address.
+```
+{
+   "ip_dst_addr":"151.101.129.140",
+   ...
+   "geo.city":"San Francisco",
+   "geo.country":"US",
+   "geo.dmaCode":"807",
+   "geo.latitude":"37.7697",
+   "geo.location_point":"37.7697,-122.3933",
+   "geo.locID":"5391959",
+   "geo.longitude":"-122.3933",
+   "geo.postalCode":"94107",
+ }
+```
+
+### HBase Enrichment
+
+This benchmark measures the performance of executing an enrichment that 
retrieves data from an external HBase table. This type of enrichment is useful 
for enriching telemetry from an Asset Database or other source of relatively 
static data.
+
+ Configuration
+
+Adding the following Stellar expression to the Enrichment topology 
configuration will define an Hbase enrichment.  This looks up the 'ip_dst_addr' 
within an HBase table 'top-1m' and returns a hostname.
+```
+top1m := ENRICHMENT_GET('top-1m', ip_dst_addr, 'top-1m', 't')
+```
+
+After the telemetry has been enriched, it will contain the host and IP 
elements that were retrieved from the HBase table.
+```
+{
+   "ip_dst_addr":"151.101.2.166",
+   ...
+   "top1m.host":"earther.com",
+   "top1m.ip":"151.101.2.166"
+}
+```
+
+### Stellar Enrichment
+
+This benchmark measures the performance of executing a basic Stellar 
expression.  In this benchmark, the enrichment is purely a computational task 
that has no dependence on an external system like a database.  
+
+ Configuration
+
+Adding the following Stellar expression to the Enrichment topology 
configuration will define a basic Stellar enrichment.  The following returns 
true if the IP is in the given subnet and false otherwise.
+```
+local := IN_SUBNET(ip_dst_addr, '192.168.0.0/24')
+```
+
+After the telemetry has been enriched, it will contain a field with a 
boolean value indicating whether the IP was within the given subnet.
+```
+{
+   "ip_dst_addr":"151.101.2.166",
+   ...
+   "local":false
+}
+```
+
+## Benchmark Execution
+
+This section describes the steps necessary to execute the performance 
benchmarks for the Enrichment topology.
+
+* [Prepare Enrichment Data](#prepare-enrichment-data)
+* [Load HBase with Enrichment Data](#load-hbase-with-enrichment-data)
+* [Configure the Enrichments](#configure-the-enrichments)
+* [Create Input Telemetry](#create-input-telemetry)
+* [Cluster Setup](#cluster-setup)
+* [Monitoring](#monitoring)
+
+### Prepare Enrichment Data
+
+The Alexa Top 1 Million was used as an data source for these benchmarks.
+
+1. Download the [Alexa Top 1 
Million](http://s3.amazonaws.com/alexa-static/top-1m.csv.zip).
+
+2. For each hostname, query DNS to retrieve an associated IP address.  
+
+   A script like the following can be used for this.  There is no need to 
do this for all 1 million entries in the data set. Doing this for around 10,000 
records is sufficient.
+
+   ```python
+   import dns.resolver
+   import csv
+
+   resolver = dns.resolver.Resolver()
+   resolver.nameservers = ['8.8.8.8', '8.8.4.4']
+