[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/metron/pull/585


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request:

https://github.com/apache/metron/pull/585#discussion_r116883407
  
--- Diff: 
metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/deserializer/KeyValueDeserializer.java
 ---
@@ -28,6 +26,17 @@
 public abstract class KeyValueDeserializer implements Serializable {
   protected TimestampConverter converter;
 
+  public static class Result {
--- End diff --

I looked at this and decided to tweak the terms a bit in this inner class. 
After stepping away from it a couple days I couldn't immediately tell what 
Result.result actually meant :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request:

https://github.com/apache/metron/pull/585#discussion_r116866077
  
--- Diff: 
metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/HDFSWriterCallback.java
 ---
@@ -116,7 +117,11 @@ public HDFSWriterCallback withConfig(HDFSWriterConfig 
config) {
 public List apply(List tuple, EmitContext context) {
 byte[] key = (byte[]) tuple.get(0);
 byte[] value = (byte[]) tuple.get(1);
-if(!config.getDeserializer().deserializeKeyValue(key, value, 
KeyValue.key.get(), KeyValue.value.get())) {
--- End diff --

Looks that way, good catch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request:

https://github.com/apache/metron/pull/585#discussion_r116865536
  
--- Diff: 
metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/PartitionHDFSWriter.java
 ---
@@ -102,14 +106,43 @@ public void sync(FSDataOutputStream input) throws 
IOException {
   private SyncHandler syncHandler;
   private long batchStartTime;
   private long numWritten;
+  private Configuration fsConfig = new Configuration();
 
   public PartitionHDFSWriter(String topic, int partition, String uuid, 
HDFSWriterConfig config) {
 this.topic = topic;
 this.partition = partition;
 this.uuid = uuid;
 this.config = config;
+
 try {
-  this.fs = FileSystem.get(new Configuration());
+  int replicationFactor = config.getReplicationFactor();
+  if (replicationFactor != -1) {
+fsConfig.set("dfs.replication", 
(String.valueOf(replicationFactor)));
+  }
+  if(config.getHDFSConfig() != null && 
!config.getHDFSConfig().isEmpty()) {
+for(Map.Entry entry : 
config.getHDFSConfig().entrySet()) {
+  if(entry.getValue() instanceof Integer) {
--- End diff --

String.valueOf() takes a primitive and we don't specifically know the type 
of X in the incoming config map when calling fsConfig.setX (where X is a type, 
e.g. Boolean, Integer, etc.). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread mmiklavc
Github user mmiklavc commented on a diff in the pull request:

https://github.com/apache/metron/pull/585#discussion_r116848618
  
--- Diff: 
metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/PartitionHDFSWriter.java
 ---
@@ -183,14 +219,14 @@ private void turnoverIfNecessary(long ts, boolean 
force) throws IOException {
 }
 
   }
-  writer = SequenceFile.createWriter(new Configuration()
+  writer = SequenceFile.createWriter(this.fsConfig
   , SequenceFile.Writer.keyClass(LongWritable.class)
   , SequenceFile.Writer.valueClass(BytesWritable.class)
   , SequenceFile.Writer.stream(outputStream)
   , 
SequenceFile.Writer.compression(SequenceFile.CompressionType.NONE)
   );
   //reset state
-  LOG.info("Turning over and writing to " + path);
+  LOG.info(String.format("Turning over and writing to %s: [duration=%s 
NS, force=%s, initial=%s, overDuration=%s, tooManyPackets=%s]", path, duration, 
force, initial, overDuration, tooManyPackets));
--- End diff --

That sounds great. I wasn't planning to switch the logging implementation 
from Log4j in this PR, but I can change the deps if you think we need it now.
`org.apache.log4j.Category#info(java.lang.Object, java.lang.Throwable)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/585#discussion_r116823693
  
--- Diff: 
metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/HDFSWriterCallback.java
 ---
@@ -116,7 +117,11 @@ public HDFSWriterCallback withConfig(HDFSWriterConfig 
config) {
 public List apply(List tuple, EmitContext context) {
 byte[] key = (byte[]) tuple.get(0);
 byte[] value = (byte[]) tuple.get(1);
-if(!config.getDeserializer().deserializeKeyValue(key, value, 
KeyValue.key.get(), KeyValue.value.get())) {
--- End diff --

KeyValue as a whole is unused now, right?  Can we just delete the class 
entirely at this point?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/585#discussion_r116795288
  
--- Diff: 
metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/deserializer/KeyValueDeserializer.java
 ---
@@ -28,6 +26,17 @@
 public abstract class KeyValueDeserializer implements Serializable {
   protected TimestampConverter converter;
 
+  public static class Result {
--- End diff --

Good call, I'm fine with it given the context.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/585#discussion_r116791165
  
--- Diff: 
metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/PartitionHDFSWriter.java
 ---
@@ -183,14 +219,14 @@ private void turnoverIfNecessary(long ts, boolean 
force) throws IOException {
 }
 
   }
-  writer = SequenceFile.createWriter(new Configuration()
+  writer = SequenceFile.createWriter(this.fsConfig
   , SequenceFile.Writer.keyClass(LongWritable.class)
   , SequenceFile.Writer.valueClass(BytesWritable.class)
   , SequenceFile.Writer.stream(outputStream)
   , 
SequenceFile.Writer.compression(SequenceFile.CompressionType.NONE)
   );
   //reset state
-  LOG.info("Turning over and writing to " + path);
+  LOG.info(String.format("Turning over and writing to %s: [duration=%s 
NS, force=%s, initial=%s, overDuration=%s, tooManyPackets=%s]", path, duration, 
force, initial, overDuration, tooManyPackets));
--- End diff --

+1 to that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread cestella
Github user cestella commented on a diff in the pull request:

https://github.com/apache/metron/pull/585#discussion_r116791055
  
--- Diff: 
metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/deserializer/KeyValueDeserializer.java
 ---
@@ -28,6 +26,17 @@
 public abstract class KeyValueDeserializer implements Serializable {
   protected TimestampConverter converter;
 
+  public static class Result {
--- End diff --

Well, to be fair, it's `KeyValueDeserializer.Result` since it's a static 
inner class, not just `Result`.  I'm inclined to think the naming is fine given 
the parent class, but I'm open to being convinced otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/585#discussion_r116742092
  
--- Diff: 
metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/deserializer/KeyValueDeserializer.java
 ---
@@ -28,6 +26,17 @@
 public abstract class KeyValueDeserializer implements Serializable {
   protected TimestampConverter converter;
 
+  public static class Result {
--- End diff --

Can we call this something more informative than Result?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/585#discussion_r116741150
  
--- Diff: 
metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/PartitionHDFSWriter.java
 ---
@@ -102,14 +106,43 @@ public void sync(FSDataOutputStream input) throws 
IOException {
   private SyncHandler syncHandler;
   private long batchStartTime;
   private long numWritten;
+  private Configuration fsConfig = new Configuration();
 
   public PartitionHDFSWriter(String topic, int partition, String uuid, 
HDFSWriterConfig config) {
 this.topic = topic;
 this.partition = partition;
 this.uuid = uuid;
 this.config = config;
+
 try {
-  this.fs = FileSystem.get(new Configuration());
+  int replicationFactor = config.getReplicationFactor();
+  if (replicationFactor != -1) {
+fsConfig.set("dfs.replication", 
(String.valueOf(replicationFactor)));
+  }
+  if(config.getHDFSConfig() != null && 
!config.getHDFSConfig().isEmpty()) {
+for(Map.Entry entry : 
config.getHDFSConfig().entrySet()) {
+  if(entry.getValue() instanceof Integer) {
--- End diff --

Under the hood, don't all these fsConfig.setX calls just call 
String.valueOf()?  I'm not sure if that relationship (being all String under 
the hood is guaranteed or just an implementation detail), but if it is, this 
if-else chain simplifies significantly.  I'll dig in a bit to see if there is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/585#discussion_r116741758
  
--- Diff: 
metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/PartitionHDFSWriter.java
 ---
@@ -183,14 +219,14 @@ private void turnoverIfNecessary(long ts, boolean 
force) throws IOException {
 }
 
   }
-  writer = SequenceFile.createWriter(new Configuration()
+  writer = SequenceFile.createWriter(this.fsConfig
   , SequenceFile.Writer.keyClass(LongWritable.class)
   , SequenceFile.Writer.valueClass(BytesWritable.class)
   , SequenceFile.Writer.stream(outputStream)
   , 
SequenceFile.Writer.compression(SequenceFile.CompressionType.NONE)
   );
   //reset state
-  LOG.info("Turning over and writing to " + path);
+  LOG.info(String.format("Turning over and writing to %s: [duration=%s 
NS, force=%s, initial=%s, overDuration=%s, tooManyPackets=%s]", path, duration, 
force, initial, overDuration, tooManyPackets));
--- End diff --

Couldn't this be formatted using the Logger's built-in lazy token 
replacement, rather than the String.format?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] metron pull request #585: METRON-936: Fixes to pcap for performance and test...

2017-05-16 Thread justinleet
Github user justinleet commented on a diff in the pull request:

https://github.com/apache/metron/pull/585#discussion_r116739444
  
--- Diff: 
metron-platform/metron-pcap-backend/src/main/java/org/apache/metron/spout/pcap/PartitionHDFSWriter.java
 ---
@@ -102,14 +106,43 @@ public void sync(FSDataOutputStream input) throws 
IOException {
   private SyncHandler syncHandler;
   private long batchStartTime;
   private long numWritten;
+  private Configuration fsConfig = new Configuration();
 
   public PartitionHDFSWriter(String topic, int partition, String uuid, 
HDFSWriterConfig config) {
 this.topic = topic;
 this.partition = partition;
 this.uuid = uuid;
 this.config = config;
+
 try {
-  this.fs = FileSystem.get(new Configuration());
+  int replicationFactor = config.getReplicationFactor();
+  if (replicationFactor != -1) {
--- End diff --

Could we make this > 0?  Nothing 0 or less makes sense to actually use.  
Might also make sense to log any replication factor actually used, for 
debugging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---