You should be able to use a custom Hadoop file:
sc.newAPIHadoopFile(...) Use FileInputFormat with longWritable as the key class and BinaryWritable as the value class. This will read the files from an input directory which can be a local file system for testing. Take a look at the code for sc.textFile to see how it gets set up with the inputFormat and writable classes if you get stuck. — Sent from Mailbox for iPhone On Tue, Feb 4, 2014 at 10:55 PM, David Thomas <[email protected]> wrote: > I have a set of binary files and I would like to create an RDD out of them > and pipe them through an external process. So how do I create an RDD of > such objects? For quick prototyping, can I do it without using HDFS?
