[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...

2017-03-27 Thread cestella
Github user cestella commented on the issue:

Ok, I ran it 20 times independently over the weekend and again through 
travis whenever I had a moment and the tests appear stable after I kill the 
slots manually as a last resort in the FluxComponent.

2017-03-26 Thread james-sirota
Github user james-sirota commented on the issue:

I was able to validate this on a single-node Suse-based Hadoop cluster once 
with Kerberos disabled and second time with Kerberos enabled.  That's a good 
first step.  I will try to run this up in AWS tomorrow on a larger cluster to 
see if it still works.  But so far so good. 

2017-03-24 Thread cestella
Github user cestella commented on the issue:

Ok, update on the integration tests.  It appears that the issue is that 
storm 1.0.3 does not consistently kill the slots when shutting down.  It times 
out after a minute and dies.  The fix that I am testing is to directly close 
the slot in that case.  I've run the travis build about 8 times with no 
failures.  I'm running locally 20 times before I claim it's fixed.

2017-03-24 Thread cestella
Github user cestella commented on the issue:

@justinleet It is unclear and that test has had some intermittent issues, 
especially under load, I've noticed.  I added a more descriptive message to 
help diagnose.

2017-03-24 Thread justinleet
Github user justinleet commented on the issue:

@cestella Just noticed Travis after I commented.  I'm moderately surprised 
that the most recent PR would break it, do you know what the issue is?

2017-03-24 Thread justinleet
Github user justinleet commented on the issue:

+1, was able to follow Mike's instructions, with a couple caveats.

- Group authorization command was missing
/usr/hdp/current/kafka-broker/bin/kafka-acls.sh --authorizer 
kafka.security.auth.SimpleAclAuthorizer --authorizer-properties 
zookeeper.connect=node1:2181 --add --allow-principal User:storm-metron_cluster 
--allow-principal User:justin --group jsonMap_parser
- Topic authorization command on the enrichments topic side was missing.
 /usr/hdp/current/kafka-broker/bin/kafka-acls.sh --authorizer 
kafka.security.auth.SimpleAclAuthorizer --authorizer-properties 
zookeeper.connect=node1:2181 --add --allow-principal User:storm-metron_cluster 
--allow-principal User:justin --topic enrichments

2017-03-23 Thread mmiklavc
Github user mmiklavc commented on the issue:

I just finished running some parser tests in a Kerberized environment to 
confirm that this will address both kerberized and non-kerberized 
configurations. Here is what I had to do to get this running. Note, I'm using 
Ambari to configure everything for Hadoop, but not the Metron components. There 
is separate work that will need to be done for Metron in the MPack for Ambari.

1. Spin up full dev as normal (ensure that the profile is correctly set to 
-P HDP- in metron-deployment/roles/metron-builder/tasks/main.yml). As of 
this writing, the current full-dev ansible install will do a complete build as 
part of the deployment.
2. service stop monit and kill all the topologies. You can go ahead and 
kill the other sensors per the commands from @cestella  as well (bro, pycapa, 
etc.). We'll spin up our own simple test topology and populate the kafka topic 
manually to test this out.
3. Setup symlinks
ln -s /usr/metron/0.3.1/lib/metron-*-uber.jar 
ln -s /usr/hdp/current/hadoop-client/conf/core-site.xml /etc/storm/conf
ln -s /usr/hdp/current/hbase-client/conf/hbase-site.xml /etc/storm/conf
ln -s 
ln -s 
4. Check that the jce security is setup in 
/usr/jdk64/jdk1.8.0_77/jre/lib/security/. If not, then you'll want to run 
through the following steps:
4.1 Download the jce policy from: 
 and copy it to your vm.
4.2 unzip -o -j -q /path-to-the-copied-zip/jce_policy-8.zip -d 
5. Configure MIT Kerberos
yum -y install krb5-server krb5-libs krb5-workstation
sed -i 's/kerberos.example.com/node1/g' /etc/krb5.conf
cp /etc/krb5.conf /var/lib/ambari-server/resources/scripts
kdb5_util create -s
/etc/rc.d/init.d/krb5kdc start
/etc/rc.d/init.d/kadmin start
chkconfig krb5kdc on
chkconfig kadmin on

6. Setup admin and personal user principals. You'll kinit as the "{your 
user name}" user. For my tests, I did "addprinc mike". I just made the 
passwords "password" for the sake of easy testing.
kadmin.local -q "addprinc admin/admin"
kadmin.local -q "addprinc {your user name}"
7. In Ambari, setup AutoHDFS by adding the following properties to 


8. Kerberize the cluster via Ambari. The wizard is fairly straightforward, 
but further documentation can be found 
 For this exercise, choose existing MIT KDC (this is what we setup and 
installed in previous steps.) Realm is EXAMPLE.COM. The admin principal will 
end up as admin/ad...@example.com when testing the KDC.
9. Let the cluster spin up, but don't worry about starting up Metron via 
Ambari - we're going to run the parsers manually against the rest of the Hadoop 
cluster Kerberized.
10. Kinit and provide the password you chose from earlier.
11. Create a kafka topic for the jsonMap parser.
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper node1:2181 
--create --topic jsonMap --replication-factor 1 --partitions 1
12. Setup the ACLs for the topic AND the consumer group. The consumer group 
config is new in this API as far as I can tell because we didn't have to do 
this in the "storm-kafka" version of the api. Note that the docs around kafka 
0.10 will suggest using a "--consumer-group" option, but in HDP 2.5.x the 
option needs to be "--group". Also make sure you use your username in 
"--allow-principal User:" instead of "mike". 
/usr/hdp/current/kafka-broker/bin/kafka-acls.sh --authorizer 
kafka.security.auth.SimpleAclAuthorizer --authorizer-properties 
zookeeper.connect=node1:2181 --add --allow-principal User:storm-metron_cluster 
--allow-principal User:mike --topic jsonMap

2017-03-23 Thread cestella
Github user cestella commented on the issue:

Follow-on JIRA to move back to apache storm for storm-kafka-client is at 

2017-03-23 Thread justinleet
Github user justinleet commented on the issue:

@cestella I'm good with keeping the extension points especially after the 
points you made. I think the TODOs are valuable, I just wanted to know the 
thought behind potentially building it out.

Given the API instability, unfortunately it seems like our dependencies 
aren't going to provide that insulation layer.  I'd rather have that be 
provided in a stable manner upstream from us, but that's not something we have 
any control over.

2017-03-22 Thread cestella
Github user cestella commented on the issue:

I was planning to make a JIRA around the multiple topic bit.

2017-03-22 Thread cestella
Github user cestella commented on the issue:

# Testing Plan
## Preliminaries

* Please perform the following tests on the `full-dev` vagrant environment.
* Set an environment variable to indicate `METRON_HOME`:
`export METRON_HOME=/usr/metron/0.3.1` 

## Ensure Data Flows from the Indices
Ensure that with a basic full-dev we get data into the elasticsearch
indices and into HDFS.

## (Optional) Free Up Space on the virtual machine

First, let's free up some headroom on the virtual machine.  If you are 
running this on a
multinode cluster, you would not have to do this.
* Stop and disable Metron in Ambari
* Kill monit via `service monit stop`
* Kill tcpreplay via `for i in $(ps -ef | grep tcpreplay | awk '{print 
$2}');do kill -9 $i;done`
* Kill yaf via `for i in $(ps -ef | grep yaf | awk '{print $2}');do kill -9 
* Kill bro via `for i in $(ps -ef | grep bro | awk '{print $2}');do kill -9 

## Test the PCAP topology

A new kafka spout necessitates testing pcap.
### Install and start pycapa 
# set env vars
export PYCAPA_HOME=/opt/pycapa
export PYTHON27_HOME=/opt/rh/python27/root

# Install these packages via yum (RHEL, CentOS)
yum -y install epel-release centos-release-scl 
yum -y install "@Development tools" python27 python27-scldevel 
python27-python-virtualenv libpcap-devel libselinux-python

# Setup directories
mkdir $PYCAPA_HOME && chmod 755 $PYCAPA_HOME

# Create virtualenv
export LD_LIBRARY_PATH="/opt/rh/python27/root/usr/lib64"
${PYTHON27_HOME}/usr/bin/virtualenv pycapa-venv

# Copy pycapa
# copy incubator-metron/metron-sensors/pycapa from the Metron source tree 
into $PYCAPA_HOME on the node you would like to install pycapa on.

# Build it
cd ${PYCAPA_HOME}/pycapa
# activate the virtualenv
source ${PYCAPA_HOME}/pycapa-venv/bin/activate
pip install -r requirements.txt
python setup.py install

# Run it
cd ${PYCAPA_HOME}/pycapa-venv/bin
pycapa --producer --topic pcap -i eth1 -k node1:6667
### Ensure pycapa can write to HDFS
* Ensure that `/apps/metron/pcap` exists and can be written to by the
  storm user.  If not, then:
sudo su - hdfs
hadoop fs -mkdir -p /apps/metron/pcap
hadoop fs -chown metron:hadoop /apps/metron/pcap
hadoop fs -chmod 775 /apps/metron/pcap
* Start the pcap topology via `$METRON_HOME/bin/start_pcap_topology.sh`
* Start the pycapa packet capture producer on eth1 via `/usr/bin/pycapa 
--producer --topic pcap -i eth1 -k node1:6667`
* Watch the topology in the Storm UI and kill the packet capture utility 
from before, when the number of packets ingested is over 3k.  Ensure that at at 
least 3 files exist on HDFS by running `hadoop fs -ls /apps/metron/pcap`
* Choose a file (denoted by $FILE) and dump a few of the contents using the 
pcap_inspector utility via `$METRON_HOME/bin/pcap_inspector.sh -i $FILE -n 5`
* Choose one of the lines and note the protocol.
  * Note that when you run the commands below, the resulting file will be 
placed in the execution directory where you kicked off the job from.
* Run a Stellar query filter query by executing a command similar to the 
following, with the values noted above (match your start_time format to the 
date format provided - default is to use millis since epoch):
$METRON_HOME/bin/pcap_query.sh query -st "20160617" -df "MMdd" -query 
"protocol == 6" -rpf 500
* Verify the MR job finishes successfully. Upon completion, you should see 
multiple files named with relatively current datestamps in your current 
directory, e.g. pcap-data-20160617160549737+.pcap
* Copy the files to your local machine and verify you can them it in 
Wireshark. I chose a middle file and the last file. The middle file should have 
500 records (per the records_per_file option), and the last one will likely 
have a number of records <= 500.

## Test the Profiler

### Setup
* Ensure that Metron is stopped and put in maintenance mode in Ambari
* Create the profiler hbase table
`echo "create 'profiler', 'P'" | hbase shell`

* Open `~/rand_gen.py` and paste the following:
import random
import sys
import time
def main():
  mu = float(sys.argv[1])
  sigma = float(sys.argv[2])
  freq_s = int(sys.argv[3])
  while True:
out = '{ "value" : ' + str(random.gauss(mu, sigma)) + ' }'
print out

if __name__ == '__main__':
This will generate random JSON maps with a numeric field called `value`

* Set the profiler to use 1 minute tick durations:
  * Edit 

2017-03-22 Thread cestella
Github user cestella commented on the issue:

I owe:
* Better Javadocs around some of the pcap infrastructure
* An acceptance test plan

