RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-23 Thread Venkat, Ankam
Spark Committers: Please advise the way forward for this issue. Thanks for your support. Regards, Venkat From: Venkat, Ankam Sent: Thursday, January 22, 2015 9:34 AM To: 'Frank Austin Nothaft'; 'user@spark.apache.org' Cc: 'Nick Allen' Subject: RE: How to 'Pip

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-22 Thread Silvio Fiorito
Nick, Have you tried https://github.com/kaitoy/pcap4j I’ve used this in a Spark app already and didn’t have any issues. My use case was slightly different than yours, but you should give it a try. From: Nick Allen mailto:n...@nickallen.org>> Date: Friday, January 16, 2015 at 10:09 AM To: "user@

RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-22 Thread Venkat, Ankam
d as new enhancement Jira request? Nick: What's your take on this? Regards, Venkat Ankam From: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu] Sent: Wednesday, January 21, 2015 12:30 PM To: Venkat, Ankam Cc: Nick Allen; user@spark.apache.org<mailto:user@spark.apache.org> Subj

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-22 Thread Frank Austin Nothaft
Regards, > Venkat Ankam > > > From: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu] > Sent: Wednesday, January 21, 2015 12:30 PM > To: Venkat, Ankam > Cc: Nick Allen; user@spark.apache.org > Subject: Re: How to 'Pipe' Binary Data in Apache Spark > >

RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-22 Thread Venkat, Ankam
: What's your take on this? Regards, Venkat Ankam From: Frank Austin Nothaft [mailto:fnoth...@berkeley.edu] Sent: Wednesday, January 21, 2015 12:30 PM To: Venkat, Ankam Cc: Nick Allen; user@spark.apache.org Subject: Re: How to 'Pipe' Binary Data in Apache Spark Hi Venkat/Nick, The

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-21 Thread Frank Austin Nothaft
, '-t' > >>> 'wav', '-', '-n', 'stats'])).collect() ß Does not work. Tried different > >>> options. > AttributeError: 'function' object has no attribute 'read' > > Any suggestions? > > Regar

RE: How to 'Pipe' Binary Data in Apache Spark

2015-01-21 Thread Venkat, Ankam
7;/usr/local/bin/sox', '-t' >>> 'wav', '-', '-n', 'stats'])).collect() <-- Does not work. Tried different >>> options. AttributeError: 'function' object has no attribute 'read' Any suggestions? Regards, V

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-16 Thread Nick Allen
I just wanted to reiterate the solution for the benefit of the community. The problem is not from my use of 'pipe', but that 'textFile' cannot be used to read in binary data. (Doh) There are a couple options to move forward. 1. Implement a custom 'InputFormat' that understands the binary input da

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-16 Thread Nick Allen
Per your last comment, it appears I need something like this: https://github.com/RIPE-NCC/hadoop-pcap Thanks a ton. That get me oriented in the right direction. On Fri, Jan 16, 2015 at 10:20 AM, Sean Owen wrote: > Well it looks like you're reading some kind of binary file as text. > That isn

Re: How to 'Pipe' Binary Data in Apache Spark

2015-01-16 Thread Sean Owen
Well it looks like you're reading some kind of binary file as text. That isn't going to work, in Spark or elsewhere, as binary data is not even necessarily the valid encoding of a string. There are no line breaks to delimit lines and thus elements of the RDD. Your input has some record structure (