[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user justinleet commented on the issue: https://github.com/apache/incubator-metron/pull/486 @cestella My +1 stands with the testing issues ironed out. Thanks for looking into it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/486 Ok, I ran it 20 times independently over the weekend and again through travis whenever I had a moment and the tests appear stable after I kill the slots manually as a last resort in the FluxComponent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user james-sirota commented on the issue: https://github.com/apache/incubator-metron/pull/486 I was able to validate this on a single-node Suse-based Hadoop cluster once with Kerberos disabled and second time with Kerberos enabled. That's a good first step. I will try to run this up in AWS tomorrow on a larger cluster to see if it still works. But so far so good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/486 Ok, update on the integration tests. It appears that the issue is that storm 1.0.3 does not consistently kill the slots when shutting down. It times out after a minute and dies. The fix that I am testing is to directly close the slot in that case. I've run the travis build about 8 times with no failures. I'm running locally 20 times before I claim it's fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/486 I found a couple of spots where we might not be cleaning things up and exceptions that are thrown that kill the spout that shouldn't. I've run the test 10 times locally and will continue to run it in travis for the rest of the day to suss out any other lingering issues. It almost assuredly is correlated to shutting down things during load, I think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/486 @justinleet It is unclear and that test has had some intermittent issues, especially under load, I've noticed. I added a more descriptive message to help diagnose. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user mmiklavc commented on the issue: https://github.com/apache/incubator-metron/pull/486 @justinleet Thanks for checking this out - modified the instructions with the ACL command corrections. Must have copy-pasted the wrong commands from history, so thanks for that! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user justinleet commented on the issue: https://github.com/apache/incubator-metron/pull/486 @cestella Just noticed Travis after I commented. I'm moderately surprised that the most recent PR would break it, do you know what the issue is? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user justinleet commented on the issue: https://github.com/apache/incubator-metron/pull/486 +1, was able to follow Mike's instructions, with a couple caveats. - Group authorization command was missing ``` /usr/hdp/current/kafka-broker/bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=node1:2181 --add --allow-principal User:storm-metron_cluster --allow-principal User:justin --group jsonMap_parser ``` - Topic authorization command on the enrichments topic side was missing. ``` /usr/hdp/current/kafka-broker/bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=node1:2181 --add --allow-principal User:storm-metron_cluster --allow-principal User:justin --topic enrichments ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user mmiklavc commented on the issue: https://github.com/apache/incubator-metron/pull/486 I just finished running some parser tests in a Kerberized environment to confirm that this will address both kerberized and non-kerberized configurations. Here is what I had to do to get this running. Note, I'm using Ambari to configure everything for Hadoop, but not the Metron components. There is separate work that will need to be done for Metron in the MPack for Ambari. 1. Spin up full dev as normal (ensure that the profile is correctly set to -P HDP-2.5.0.0 in metron-deployment/roles/metron-builder/tasks/main.yml). As of this writing, the current full-dev ansible install will do a complete build as part of the deployment. 2. service stop monit and kill all the topologies. You can go ahead and kill the other sensors per the commands from @cestella as well (bro, pycapa, etc.). We'll spin up our own simple test topology and populate the kafka topic manually to test this out. 3. Setup symlinks ``` ln -s /usr/metron/0.3.1/lib/metron-*-uber.jar /usr/hdp/2.5.3.0-37/storm/extlib-daemon/ ln -s /usr/hdp/current/hadoop-client/conf/core-site.xml /etc/storm/conf ln -s /usr/hdp/current/hbase-client/conf/hbase-site.xml /etc/storm/conf ln -s /usr/hdp/2.5.3.0-37/storm/contrib/storm-hbase/storm-hbase-1.0.1.2.5.3.0-37.jar /usr/hdp/2.5.3.0-37/storm/extlib ln -s /usr/hdp/2.5.3.0-37/storm/contrib/storm-hdfs/storm-hdfs-1.0.1.2.5.3.0-37.jar /usr/hdp/2.5.3.0-37/storm/extlib ``` 4. Check that the jce security is setup in /usr/jdk64/jdk1.8.0_77/jre/lib/security/. If not, then you'll want to run through the following steps: 4.1 Download the jce policy from: http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html and copy it to your vm. 4.2 unzip -o -j -q /path-to-the-copied-zip/jce_policy-8.zip -d /usr/jdk64/jdk1.8.0_77/jre/lib/security/ 5. Configure MIT Kerberos ``` yum -y install krb5-server krb5-libs krb5-workstation sed -i 's/kerberos.example.com/node1/g' /etc/krb5.conf cp /etc/krb5.conf /var/lib/ambari-server/resources/scripts kdb5_util create -s /etc/rc.d/init.d/krb5kdc start /etc/rc.d/init.d/kadmin start chkconfig krb5kdc on chkconfig kadmin on⨠``` 6. Setup admin and personal user principals. You'll kinit as the "{your user name}" user. For my tests, I did "addprinc mike". I just made the passwords "password" for the sake of easy testing. ``` kadmin.local -q "addprinc admin/admin" kadmin.local -q "addprinc {your user name}" ``` 7. In Ambari, setup AutoHDFS by adding the following properties to custom-storm-site ``` nimbus.autocredential.plugins.classes=['org.apache.storm.hdfs.common.security.AutoHDFS','org.apache.storm.hbase.security.AutoHBase'] nimbus.credential.renewers.classes=['org.apache.storm.hdfs.common.security.AutoHDFS','org.apache.storm.hbase.security.AutoHBase'] hdfs.keytab.file=/etc/security/keytabs/hdfs.headless.keytab hdfs.kerberos.principal=hdfs-metron_clus...@example.com hbase.keytab.file=/etc/security/keytabs/hbase.headless.keytab hbase.kerberos.principal=hbase-metron_clus...@example.com nimbus.credential.renewers.freq.secs=82800 ``` 8. Kerberize the cluster via Ambari. The wizard is fairly straightforward, but further documentation can be found [here](http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_security/content/_enabling_kerberos_security_in_ambari.html). For this exercise, choose existing MIT KDC (this is what we setup and installed in previous steps.) Realm is EXAMPLE.COM. The admin principal will end up as admin/ad...@example.com when testing the KDC. 9. Let the cluster spin up, but don't worry about starting up Metron via Ambari - we're going to run the parsers manually against the rest of the Hadoop cluster Kerberized. 10. Kinit and provide the password you chose from earlier. ``` kinit ``` 11. Create a kafka topic for the jsonMap parser. ``` /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper node1:2181 --create --topic jsonMap --replication-factor 1 --partitions 1 ``` 12. Setup the ACLs for the topic AND the consumer group. The consumer group config is new in this API as far as I can tell because we didn't have to do this in the "storm-kafka" version of the api. Note that the docs around kafka 0.10 will suggest using a "--consumer-group" option, but in HDP 2.5.x the option needs to be "--group". Also make sure you use your username in "--allow-principal User:" instead of "mike". ``` /usr/hdp/current/kafka-broker/bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=node1:2181 --add --allow-principal User:storm-metron_cluster --allow-principal User:mike --topic jsonMap /usr/hdp/current/kafka-broker/bin/kafka-acls.sh --author
[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/486 Follow-on JIRA to move back to apache storm for storm-kafka-client is at https://issues.apache.org/jira/browse/METRON-794 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user justinleet commented on the issue: https://github.com/apache/incubator-metron/pull/486 @cestella I'm good with keeping the extension points especially after the points you made. I think the TODOs are valuable, I just wanted to know the thought behind potentially building it out. Given the API instability, unfortunately it seems like our dependencies aren't going to provide that insulation layer. I'd rather have that be provided in a stable manner upstream from us, but that's not something we have any control over. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/486 @justinleet good points overall. Yeah, I'm not thrilled with the HDP dependency either, as I stated already. Regarding "If flux gets support for the Builder items, would we just axe those classes entirely?" I believe that the extension will live on. It has a couple of benefits beyond the builder pattern in flux: * The parsers aren't using flux and do need a way to expose the configuration of the spout via properties. This is part of the extension here. * A component of extension in the storm-kafka-client Builder is via polymorphism (e.g. the TupleBuilder, etc). This will just never work well in flux, I think. * There is an assumption in the storm-kafka-client builder that we're passing in brokers directly, rather than reading them from zookeeper. * This API is shifting dramatically. It's totally different in storm 1.0.3 vs 1.0.1. Creating a layer here insulates us from API changes and localizes their impact. I made the TODOs because, while what we have here matched the capabilities of what we had before, we could do better in supporting multiple topics. I wanted to bring out the places that would need to shift to support that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/486 I was planning to make a JIRA around the multiple topic bit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/486 # Testing Plan ## Preliminaries * Please perform the following tests on the `full-dev` vagrant environment. * Set an environment variable to indicate `METRON_HOME`: `export METRON_HOME=/usr/metron/0.3.1` ## Ensure Data Flows from the Indices Ensure that with a basic full-dev we get data into the elasticsearch indices and into HDFS. ## (Optional) Free Up Space on the virtual machine First, let's free up some headroom on the virtual machine. If you are running this on a multinode cluster, you would not have to do this. * Stop and disable Metron in Ambari * Kill monit via `service monit stop` * Kill tcpreplay via `for i in $(ps -ef | grep tcpreplay | awk '{print $2}');do kill -9 $i;done` * Kill yaf via `for i in $(ps -ef | grep yaf | awk '{print $2}');do kill -9 $i;done` * Kill bro via `for i in $(ps -ef | grep bro | awk '{print $2}');do kill -9 $i;done` ## Test the PCAP topology A new kafka spout necessitates testing pcap. ### Install and start pycapa ``` # set env vars export PYCAPA_HOME=/opt/pycapa export PYTHON27_HOME=/opt/rh/python27/root # Install these packages via yum (RHEL, CentOS) yum -y install epel-release centos-release-scl yum -y install "@Development tools" python27 python27-scldevel python27-python-virtualenv libpcap-devel libselinux-python # Setup directories mkdir $PYCAPA_HOME && chmod 755 $PYCAPA_HOME # Create virtualenv export LD_LIBRARY_PATH="/opt/rh/python27/root/usr/lib64" ${PYTHON27_HOME}/usr/bin/virtualenv pycapa-venv # Copy pycapa # copy incubator-metron/metron-sensors/pycapa from the Metron source tree into $PYCAPA_HOME on the node you would like to install pycapa on. # Build it cd ${PYCAPA_HOME}/pycapa # activate the virtualenv source ${PYCAPA_HOME}/pycapa-venv/bin/activate pip install -r requirements.txt python setup.py install # Run it cd ${PYCAPA_HOME}/pycapa-venv/bin pycapa --producer --topic pcap -i eth1 -k node1:6667 ``` ### Ensure pycapa can write to HDFS * Ensure that `/apps/metron/pcap` exists and can be written to by the storm user. If not, then: ``` sudo su - hdfs hadoop fs -mkdir -p /apps/metron/pcap hadoop fs -chown metron:hadoop /apps/metron/pcap hadoop fs -chmod 775 /apps/metron/pcap ``` * Start the pcap topology via `$METRON_HOME/bin/start_pcap_topology.sh` * Start the pycapa packet capture producer on eth1 via `/usr/bin/pycapa --producer --topic pcap -i eth1 -k node1:6667` * Watch the topology in the Storm UI and kill the packet capture utility from before, when the number of packets ingested is over 3k. Ensure that at at least 3 files exist on HDFS by running `hadoop fs -ls /apps/metron/pcap` * Choose a file (denoted by $FILE) and dump a few of the contents using the pcap_inspector utility via `$METRON_HOME/bin/pcap_inspector.sh -i $FILE -n 5` * Choose one of the lines and note the protocol. * Note that when you run the commands below, the resulting file will be placed in the execution directory where you kicked off the job from. * Run a Stellar query filter query by executing a command similar to the following, with the values noted above (match your start_time format to the date format provided - default is to use millis since epoch): ``` $METRON_HOME/bin/pcap_query.sh query -st "20160617" -df "MMdd" -query "protocol == 6" -rpf 500 ``` * Verify the MR job finishes successfully. Upon completion, you should see multiple files named with relatively current datestamps in your current directory, e.g. pcap-data-20160617160549737+.pcap * Copy the files to your local machine and verify you can them it in Wireshark. I chose a middle file and the last file. The middle file should have 500 records (per the records_per_file option), and the last one will likely have a number of records <= 500. ## Test the Profiler ### Setup * Ensure that Metron is stopped and put in maintenance mode in Ambari * Create the profiler hbase table `echo "create 'profiler', 'P'" | hbase shell` * Open `~/rand_gen.py` and paste the following: ``` #!/usr/bin/python import random import sys import time def main(): mu = float(sys.argv[1]) sigma = float(sys.argv[2]) freq_s = int(sys.argv[3]) while True: out = '{ "value" : ' + str(random.gauss(mu, sigma)) + ' }' print out sys.stdout.flush() time.sleep(freq_s) if __name__ == '__main__': main() ``` This will generate random JSON maps with a numeric field called `value` * Set the profiler to use 1 minute tick durations: * Edit `$METRON_HOME/config
[GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Github user cestella commented on the issue: https://github.com/apache/incubator-metron/pull/486 I owe: * Better Javadocs around some of the pcap infrastructure * An acceptance test plan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---