[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-27 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
@cestella My +1 stands with the testing issues ironed out.  Thanks for 
looking into it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-27 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
Ok, I ran it 20 times independently over the weekend and again through 
travis whenever I had a moment and the tests appear stable after I kill the 
slots manually as a last resort in the FluxComponent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-26 Thread james-sirota
Github user james-sirota commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
I was able to validate this on a single-node Suse-based Hadoop cluster once 
with Kerberos disabled and second time with Kerberos enabled.  That's a good 
first step.  I will try to run this up in AWS tomorrow on a larger cluster to 
see if it still works.  But so far so good. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-24 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
Ok, update on the integration tests.  It appears that the issue is that 
storm 1.0.3 does not consistently kill the slots when shutting down.  It times 
out after a minute and dies.  The fix that I am testing is to directly close 
the slot in that case.  I've run the travis build about 8 times with no 
failures.  I'm running locally 20 times before I claim it's fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-24 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
I found a couple of spots where we might not be cleaning things up and 
exceptions that are thrown that kill the spout that shouldn't.  I've run the 
test 10 times locally and will continue to run it in travis for the rest of the 
day to suss out any other lingering issues.  It almost assuredly is correlated 
to shutting down things during load, I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-24 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
@justinleet It is unclear and that test has had some intermittent issues, 
especially under load, I've noticed.  I added a more descriptive message to 
help diagnose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-24 Thread mmiklavc
Github user mmiklavc commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
@justinleet Thanks for checking this out - modified the instructions with 
the ACL command corrections. Must have copy-pasted the wrong commands from 
history, so thanks for that!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-24 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
@cestella Just noticed Travis after I commented.  I'm moderately surprised 
that the most recent PR would break it, do you know what the issue is?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-24 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
+1, was able to follow Mike's instructions, with a couple caveats.

- Group authorization command was missing
```
/usr/hdp/current/kafka-broker/bin/kafka-acls.sh --authorizer 
kafka.security.auth.SimpleAclAuthorizer --authorizer-properties 
zookeeper.connect=node1:2181 --add --allow-principal User:storm-metron_cluster 
--allow-principal User:justin --group jsonMap_parser
```
- Topic authorization command on the enrichments topic side was missing.
```
 /usr/hdp/current/kafka-broker/bin/kafka-acls.sh --authorizer 
kafka.security.auth.SimpleAclAuthorizer --authorizer-properties 
zookeeper.connect=node1:2181 --add --allow-principal User:storm-metron_cluster 
--allow-principal User:justin --topic enrichments
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-23 Thread mmiklavc
Github user mmiklavc commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
I just finished running some parser tests in a Kerberized environment to 
confirm that this will address both kerberized and non-kerberized 
configurations. Here is what I had to do to get this running. Note, I'm using 
Ambari to configure everything for Hadoop, but not the Metron components. There 
is separate work that will need to be done for Metron in the MPack for Ambari.

1. Spin up full dev as normal (ensure that the profile is correctly set to 
-P HDP-2.5.0.0 in metron-deployment/roles/metron-builder/tasks/main.yml). As of 
this writing, the current full-dev ansible install will do a complete build as 
part of the deployment.
2. service stop monit and kill all the topologies. You can go ahead and 
kill the other sensors per the commands from @cestella  as well (bro, pycapa, 
etc.). We'll spin up our own simple test topology and populate the kafka topic 
manually to test this out.
3. Setup symlinks
```
ln -s /usr/metron/0.3.1/lib/metron-*-uber.jar 
/usr/hdp/2.5.3.0-37/storm/extlib-daemon/
ln -s /usr/hdp/current/hadoop-client/conf/core-site.xml /etc/storm/conf
ln -s /usr/hdp/current/hbase-client/conf/hbase-site.xml /etc/storm/conf
ln -s 
/usr/hdp/2.5.3.0-37/storm/contrib/storm-hbase/storm-hbase-1.0.1.2.5.3.0-37.jar 
/usr/hdp/2.5.3.0-37/storm/extlib
ln -s 
/usr/hdp/2.5.3.0-37/storm/contrib/storm-hdfs/storm-hdfs-1.0.1.2.5.3.0-37.jar 
/usr/hdp/2.5.3.0-37/storm/extlib
```
4. Check that the jce security is setup in 
/usr/jdk64/jdk1.8.0_77/jre/lib/security/. If not, then you'll want to run 
through the following steps:
4.1 Download the jce policy from: 
http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html
 and copy it to your vm.
4.2 unzip -o -j -q /path-to-the-copied-zip/jce_policy-8.zip -d 
/usr/jdk64/jdk1.8.0_77/jre/lib/security/
5. Configure MIT Kerberos
```
yum -y install krb5-server krb5-libs krb5-workstation
sed -i 's/kerberos.example.com/node1/g' /etc/krb5.conf
cp /etc/krb5.conf /var/lib/ambari-server/resources/scripts
kdb5_util create -s
/etc/rc.d/init.d/krb5kdc start
/etc/rc.d/init.d/kadmin start
chkconfig krb5kdc on
chkconfig kadmin on

```
6. Setup admin and personal user principals. You'll kinit as the "{your 
user name}" user. For my tests, I did "addprinc mike". I just made the 
passwords "password" for the sake of easy testing.
```
kadmin.local -q "addprinc admin/admin"
kadmin.local -q "addprinc {your user name}"
```
7. In Ambari, setup AutoHDFS by adding the following properties to 
custom-storm-site
```

nimbus.autocredential.plugins.classes=['org.apache.storm.hdfs.common.security.AutoHDFS','org.apache.storm.hbase.security.AutoHBase']

nimbus.credential.renewers.classes=['org.apache.storm.hdfs.common.security.AutoHDFS','org.apache.storm.hbase.security.AutoHBase']
hdfs.keytab.file=/etc/security/keytabs/hdfs.headless.keytab
hdfs.kerberos.principal=hdfs-metron_clus...@example.com
hbase.keytab.file=/etc/security/keytabs/hbase.headless.keytab
hbase.kerberos.principal=hbase-metron_clus...@example.com
nimbus.credential.renewers.freq.secs=82800
```
8. Kerberize the cluster via Ambari. The wizard is fairly straightforward, 
but further documentation can be found 
[here](http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_security/content/_enabling_kerberos_security_in_ambari.html).
 For this exercise, choose existing MIT KDC (this is what we setup and 
installed in previous steps.) Realm is EXAMPLE.COM. The admin principal will 
end up as admin/ad...@example.com when testing the KDC.
9. Let the cluster spin up, but don't worry about starting up Metron via 
Ambari - we're going to run the parsers manually against the rest of the Hadoop 
cluster Kerberized.
10. Kinit and provide the password you chose from earlier.
```
kinit 
```
11. Create a kafka topic for the jsonMap parser.
```
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper node1:2181 
--create --topic jsonMap --replication-factor 1 --partitions 1
```
12. Setup the ACLs for the topic AND the consumer group. The consumer group 
config is new in this API as far as I can tell because we didn't have to do 
this in the "storm-kafka" version of the api. Note that the docs around kafka 
0.10 will suggest using a "--consumer-group" option, but in HDP 2.5.x the 
option needs to be "--group". Also make sure you use your username in 
"--allow-principal User:" instead of "mike". 
```
/usr/hdp/current/kafka-broker/bin/kafka-acls.sh --authorizer 
kafka.security.auth.SimpleAclAuthorizer --authorizer-properties 
zookeeper.connect=node1:2181 --add --allow-principal User:storm-metron_cluster 
--allow-principal User:mike --topic jsonMap
/usr/hdp/current/kafka-broker/bin/kafka-acls.sh --author

[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-23 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
Follow-on JIRA to move back to apache storm for storm-kafka-client is at 
https://issues.apache.org/jira/browse/METRON-794


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-23 Thread justinleet
Github user justinleet commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
@cestella I'm good with keeping the extension points especially after the 
points you made. I think the TODOs are valuable, I just wanted to know the 
thought behind potentially building it out.

Given the API instability, unfortunately it seems like our dependencies 
aren't going to provide that insulation layer.  I'd rather have that be 
provided in a stable manner upstream from us, but that's not something we have 
any control over.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-22 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
@justinleet good points overall.  Yeah, I'm not thrilled with the HDP 
dependency either, as I stated already.

Regarding "If flux gets support for the Builder items, would we just axe 
those classes entirely?"
I believe that the extension will live on.  It has a couple of benefits 
beyond the builder pattern in flux:
* The parsers aren't using flux and do need a way to expose the 
configuration of the spout via properties.  This is part of the extension here.
* A component of extension in the storm-kafka-client Builder is via 
polymorphism (e.g. the TupleBuilder, etc).  This will just never work well in 
flux, I think.
* There is an assumption in the storm-kafka-client builder that we're 
passing in brokers directly, rather than reading them from zookeeper.
* This API is shifting dramatically.  It's totally different in storm 1.0.3 
vs 1.0.1.  Creating a layer here insulates us from API changes and localizes 
their impact.

I made the TODOs because, while what we have here matched the capabilities 
of what we had before, we could do better in supporting multiple topics.  I 
wanted to bring out the places that would need to shift to support that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-22 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
I was planning to make a JIRA around the multiple topic bit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-22 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
# Testing Plan
## Preliminaries

* Please perform the following tests on the `full-dev` vagrant environment.
* Set an environment variable to indicate `METRON_HOME`:
`export METRON_HOME=/usr/metron/0.3.1` 


## Ensure Data Flows from the Indices
Ensure that with a basic full-dev we get data into the elasticsearch
indices and into HDFS.

## (Optional) Free Up Space on the virtual machine

First, let's free up some headroom on the virtual machine.  If you are 
running this on a
multinode cluster, you would not have to do this.
* Stop and disable Metron in Ambari
* Kill monit via `service monit stop`
* Kill tcpreplay via `for i in $(ps -ef | grep tcpreplay | awk '{print 
$2}');do kill -9 $i;done`
* Kill yaf via `for i in $(ps -ef | grep yaf | awk '{print $2}');do kill -9 
$i;done`
* Kill bro via `for i in $(ps -ef | grep bro | awk '{print $2}');do kill -9 
$i;done`

## Test the PCAP topology

A new kafka spout necessitates testing pcap.
### Install and start pycapa 
```
# set env vars
export PYCAPA_HOME=/opt/pycapa
export PYTHON27_HOME=/opt/rh/python27/root

# Install these packages via yum (RHEL, CentOS)
yum -y install epel-release centos-release-scl 
yum -y install "@Development tools" python27 python27-scldevel 
python27-python-virtualenv libpcap-devel libselinux-python

# Setup directories
mkdir $PYCAPA_HOME && chmod 755 $PYCAPA_HOME

# Create virtualenv
export LD_LIBRARY_PATH="/opt/rh/python27/root/usr/lib64"
${PYTHON27_HOME}/usr/bin/virtualenv pycapa-venv

# Copy pycapa
# copy incubator-metron/metron-sensors/pycapa from the Metron source tree 
into $PYCAPA_HOME on the node you would like to install pycapa on.

# Build it
cd ${PYCAPA_HOME}/pycapa
# activate the virtualenv
source ${PYCAPA_HOME}/pycapa-venv/bin/activate
pip install -r requirements.txt
python setup.py install

# Run it
cd ${PYCAPA_HOME}/pycapa-venv/bin
pycapa --producer --topic pcap -i eth1 -k node1:6667
```
### Ensure pycapa can write to HDFS
* Ensure that `/apps/metron/pcap` exists and can be written to by the
  storm user.  If not, then:
```
sudo su - hdfs
hadoop fs -mkdir -p /apps/metron/pcap
hadoop fs -chown metron:hadoop /apps/metron/pcap
hadoop fs -chmod 775 /apps/metron/pcap
``` 
* Start the pcap topology via `$METRON_HOME/bin/start_pcap_topology.sh`
* Start the pycapa packet capture producer on eth1 via `/usr/bin/pycapa 
--producer --topic pcap -i eth1 -k node1:6667`
* Watch the topology in the Storm UI and kill the packet capture utility 
from before, when the number of packets ingested is over 3k.  Ensure that at at 
least 3 files exist on HDFS by running `hadoop fs -ls /apps/metron/pcap`
* Choose a file (denoted by $FILE) and dump a few of the contents using the 
pcap_inspector utility via `$METRON_HOME/bin/pcap_inspector.sh -i $FILE -n 5`
* Choose one of the lines and note the protocol.
  * Note that when you run the commands below, the resulting file will be 
placed in the execution directory where you kicked off the job from.
* Run a Stellar query filter query by executing a command similar to the 
following, with the values noted above (match your start_time format to the 
date format provided - default is to use millis since epoch):
```
$METRON_HOME/bin/pcap_query.sh query -st "20160617" -df "MMdd" -query 
"protocol == 6" -rpf 500
```
* Verify the MR job finishes successfully. Upon completion, you should see 
multiple files named with relatively current datestamps in your current 
directory, e.g. pcap-data-20160617160549737+.pcap
* Copy the files to your local machine and verify you can them it in 
Wireshark. I chose a middle file and the last file. The middle file should have 
500 records (per the records_per_file option), and the last one will likely 
have a number of records <= 500.

## Test the Profiler

### Setup
* Ensure that Metron is stopped and put in maintenance mode in Ambari
* Create the profiler hbase table
`echo "create 'profiler', 'P'" | hbase shell`

* Open `~/rand_gen.py` and paste the following:
```
#!/usr/bin/python
import random
import sys
import time
def main():
  mu = float(sys.argv[1])
  sigma = float(sys.argv[2])
  freq_s = int(sys.argv[3])
  while True:
out = '{ "value" : ' + str(random.gauss(mu, sigma)) + ' }'
print out
sys.stdout.flush()
time.sleep(freq_s)

if __name__ == '__main__':
  main()
```
This will generate random JSON maps with a numeric field called `value`

* Set the profiler to use 1 minute tick durations:
  * Edit `$METRON_HOME/config

[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-22 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/incubator-metron/pull/486
  
I owe:
* Better Javadocs around some of the pcap infrastructure
* An acceptance test plan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---