[GitHub] metron issue #585: METRON-936: Fixes to pcap for performance and testing
Github user cestella commented on the issue: https://github.com/apache/metron/pull/585 +1 by inspection, great addition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] metron issue #585: METRON-936: Fixes to pcap for performance and testing
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/585 I see that. re: keys and methods for retrieving and saving them. I'll save refactoring and cleaning that up to a separate PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] metron issue #585: METRON-936: Fixes to pcap for performance and testing
Github user cestella commented on the issue: https://github.com/apache/metron/pull/585 Regarding the test data, it's not a sequence file in the format suitable for reading in PcapInspector. Depending on the test case, we construct the appropriate kafka representation. The value is what is being used, modified to be suitable for the test case (e.g. with headers and no key or without headers and with a key) and fed into kafka. This is being done in the `readPcaps` method of the Integration Test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] metron issue #585: METRON-936: Fixes to pcap for performance and testing
Github user cestella commented on the issue: https://github.com/apache/metron/pull/585 @mmiklavc it depends on which test case you're talking about. We have two modes of operation in the pcap topology and 2 test cases in the integration test and these are defined by the flux property `kafka.pcap.ts_scheme`. These modes define the deserialization logic used in the topology to convert kafka key/values to bytes suitable for writing to HDFS: * `FROM_PACKET`: which expects a fully-formed packet (with headers) and parses the packet and extracts the timestamp from the value. This is a legacy mode, which functioned with pycapa prior to rewriting. We should eventually deprecate this and remove it. This is associated with the `FromPacketDeserializer` * `FROM_KEY` : which expects raw data and a timestamp from the key. This is by far the dominant mode of operation and the one you will see in `pycapa` or `fastcapa`. This is associated with the `FromKeyDeserializer` It appears that you are doing the null check in the `HDFSWriterCallback`. I would recommend doing this null check in `FromKeyDeserializer` as a null key is not an illegal state for the `FromPacketDeserializer`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] metron issue #585: METRON-936: Fixes to pcap for performance and testing
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/585 I found some additional issues with error handling in the HDFSWriterCallback. So I fixed this to throw an IllegalArgumentException when the key is null, but that revealed further problems in our test infrastructure. PCAPTopologyIntegrationTest seems to be relying on data that does not provide a key. Was this by design? I get the following exception thrown, which is the one that I added as a null check on key: ``` Running org.apache.metron.pcap.integration.PcapTopologyIntegrationTest Formatting using clusterid: testClusterID 2017-05-16 11:05:39 ERROR util:0 - Async loop died! java.lang.IllegalArgumentException: Expected a key but none provided at org.apache.metron.spout.pcap.HDFSWriterCallback.apply(HDFSWriterCallback.java:121) at org.apache.storm.kafka.CallbackCollector.emit(CallbackCollector.java:59) at org.apache.storm.kafka.spout.KafkaSpoutStream.emit(KafkaSpoutStream.java:79) at org.apache.storm.kafka.spout.KafkaSpoutStreamsNamedTopics.emit(KafkaSpoutStreamsNamedTopics.java:101) at org.apache.storm.kafka.spout.KafkaSpout.emitTupleIfNotEmitted(KafkaSpout.java:280) at org.apache.storm.kafka.spout.KafkaSpout.emit(KafkaSpout.java:265) at org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:212) at org.apache.storm.daemon.executor$fn__6503$fn__6518$fn__6549.invoke(executor.clj:651) at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) at clojure.lang.AFn.run(AFn.java:22) at java.lang.Thread.run(Thread.java:745) 2017-05-16 11:05:39 ERROR executor:0 - java.lang.IllegalArgumentException: Expected a key but none provided at org.apache.metron.spout.pcap.HDFSWriterCallback.apply(HDFSWriterCallback.java:121) at org.apache.storm.kafka.CallbackCollector.emit(CallbackCollector.java:59) at org.apache.storm.kafka.spout.KafkaSpoutStream.emit(KafkaSpoutStream.java:79) at org.apache.storm.kafka.spout.KafkaSpoutStreamsNamedTopics.emit(KafkaSpoutStreamsNamedTopics.java:101) at org.apache.storm.kafka.spout.KafkaSpout.emitTupleIfNotEmitted(KafkaSpout.java:280) at org.apache.storm.kafka.spout.KafkaSpout.emit(KafkaSpout.java:265) at org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:212) at org.apache.storm.daemon.executor$fn__6503$fn__6518$fn__6549.invoke(executor.clj:651) at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484) at clojure.lang.AFn.run(AFn.java:22) at java.lang.Thread.run(Thread.java:745) 2017-05-16 11:05:39 ERROR util:0 - Halting process: ("Worker died") java.lang.RuntimeException: ("Worker died") at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341) at clojure.lang.RestFn.invoke(RestFn.java:423) at org.apache.storm.daemon.worker$fn__7172$fn__7173.invoke(worker.clj:761) at org.apache.storm.daemon.executor$mk_executor_data$fn__6388$fn__6389.invoke(executor.clj:275) at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:494) at clojure.lang.AFn.run(AFn.java:22) at java.lang.Thread.run(Thread.java:745) ``` When I attempt to view the PCAP file with the PcapInspector in the IDE, I get this exception. ``` Exception in thread "main" java.io.IOException: wrong key class: org.apache.hadoop.io.LongWritable is not class org.apache.hadoop.io.IntWritable at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2254) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2306) at org.apache.metron.utils.PcapInspector.main(PcapInspector.java:142) Process finished with exit code 1 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---