[GitHub] metron issue #585: METRON-936: Fixes to pcap for performance and testing

2017-05-17 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/585
  
+1 by inspection, great addition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron issue #585: METRON-936: Fixes to pcap for performance and testing

2017-05-16 Thread mmiklavc
Github user mmiklavc commented on the issue:

https://github.com/apache/metron/pull/585
  
I see that. re: keys and methods for retrieving and saving them. I'll save 
refactoring and cleaning that up to a separate PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron issue #585: METRON-936: Fixes to pcap for performance and testing

2017-05-16 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/585
  
Regarding the test data, it's not a sequence file in the format suitable 
for reading in PcapInspector.  Depending on the test case, we construct the 
appropriate kafka representation.  The value is what is being used, modified to 
be suitable for the test case (e.g. with headers and no key or without headers 
and with a key) and fed into kafka.  This is being done in the `readPcaps` 
method of the Integration Test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron issue #585: METRON-936: Fixes to pcap for performance and testing

2017-05-16 Thread cestella
Github user cestella commented on the issue:

https://github.com/apache/metron/pull/585
  
@mmiklavc it depends on which test case you're talking about.  We have two 
modes of operation in the pcap topology and 2 test cases in the integration 
test and these are defined by the flux property `kafka.pcap.ts_scheme`.  These 
modes define the deserialization logic used in the topology to convert kafka 
key/values to bytes suitable for writing to HDFS:
* `FROM_PACKET`: which expects a fully-formed packet (with headers) and 
parses the packet and extracts the timestamp from the value.  This is a legacy 
mode, which functioned with pycapa prior to rewriting.  We should eventually 
deprecate this and remove it.  This is associated with the 
`FromPacketDeserializer`
* `FROM_KEY` : which expects raw data and a timestamp from the key.  This 
is by far the dominant mode of operation and the one you will see in `pycapa` 
or `fastcapa`.  This is associated with the `FromKeyDeserializer`

It appears that you are doing the null check in the `HDFSWriterCallback`.  
I would recommend doing this null check in `FromKeyDeserializer` as a null key 
is not an illegal state for the `FromPacketDeserializer`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron issue #585: METRON-936: Fixes to pcap for performance and testing

2017-05-16 Thread mmiklavc
Github user mmiklavc commented on the issue:

https://github.com/apache/metron/pull/585
  
I found some additional issues with error handling in the 
HDFSWriterCallback. So I fixed this to throw an IllegalArgumentException when 
the key is null, but that revealed further problems in our test infrastructure. 
PCAPTopologyIntegrationTest seems to be relying on data that does not provide a 
key. Was this by design?

I get the following exception thrown, which is the one that I added as a 
null check on key:
```
Running org.apache.metron.pcap.integration.PcapTopologyIntegrationTest
Formatting using clusterid: testClusterID
2017-05-16 11:05:39 ERROR util:0 - Async loop died!
java.lang.IllegalArgumentException: Expected a key but none provided
at 
org.apache.metron.spout.pcap.HDFSWriterCallback.apply(HDFSWriterCallback.java:121)
at 
org.apache.storm.kafka.CallbackCollector.emit(CallbackCollector.java:59)
at 
org.apache.storm.kafka.spout.KafkaSpoutStream.emit(KafkaSpoutStream.java:79)
at 
org.apache.storm.kafka.spout.KafkaSpoutStreamsNamedTopics.emit(KafkaSpoutStreamsNamedTopics.java:101)
at 
org.apache.storm.kafka.spout.KafkaSpout.emitTupleIfNotEmitted(KafkaSpout.java:280)
at org.apache.storm.kafka.spout.KafkaSpout.emit(KafkaSpout.java:265)
at 
org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:212)
at 
org.apache.storm.daemon.executor$fn__6503$fn__6518$fn__6549.invoke(executor.clj:651)
at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484)
at clojure.lang.AFn.run(AFn.java:22)
at java.lang.Thread.run(Thread.java:745)
2017-05-16 11:05:39 ERROR executor:0 -
java.lang.IllegalArgumentException: Expected a key but none provided
at 
org.apache.metron.spout.pcap.HDFSWriterCallback.apply(HDFSWriterCallback.java:121)
at 
org.apache.storm.kafka.CallbackCollector.emit(CallbackCollector.java:59)
at 
org.apache.storm.kafka.spout.KafkaSpoutStream.emit(KafkaSpoutStream.java:79)
at 
org.apache.storm.kafka.spout.KafkaSpoutStreamsNamedTopics.emit(KafkaSpoutStreamsNamedTopics.java:101)
at 
org.apache.storm.kafka.spout.KafkaSpout.emitTupleIfNotEmitted(KafkaSpout.java:280)
at org.apache.storm.kafka.spout.KafkaSpout.emit(KafkaSpout.java:265)
at 
org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:212)
at 
org.apache.storm.daemon.executor$fn__6503$fn__6518$fn__6549.invoke(executor.clj:651)
at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484)
at clojure.lang.AFn.run(AFn.java:22)
at java.lang.Thread.run(Thread.java:745)
2017-05-16 11:05:39 ERROR util:0 - Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341)
at clojure.lang.RestFn.invoke(RestFn.java:423)
at 
org.apache.storm.daemon.worker$fn__7172$fn__7173.invoke(worker.clj:761)
at 
org.apache.storm.daemon.executor$mk_executor_data$fn__6388$fn__6389.invoke(executor.clj:275)
at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:494)
at clojure.lang.AFn.run(AFn.java:22)
at java.lang.Thread.run(Thread.java:745)
```

When I attempt to view the PCAP file with the PcapInspector in the IDE, I 
get this exception.
```
Exception in thread "main" java.io.IOException: wrong key class: 
org.apache.hadoop.io.LongWritable is not class org.apache.hadoop.io.IntWritable
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2254)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2306)
at org.apache.metron.utils.PcapInspector.main(PcapInspector.java:142)

Process finished with exit code 1
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---