Hey guys, What is the best way for me to get an RDD[(K,V)] for a Map File created by MapFile.Writer? The Map file has Text key and MyArrayWritable as the value.
Something akin to sc.textFile($path) So far I have tried two approaches -sc.hadoopFile and sc.sequenceFile #1 val rdd= sc.hadoopFile[Text,MyArrayWritable,SequenceFileInputFormat[Text, MyArrayWritable]]($path) val count = rdd.count() Which gives me a run time error: 14/06/04 12:05:22 WARN TaskSetManager: Loss was due to java.io.EOFException java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1800) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:190) #2 The second snippet that I have tried is val rdd:RDD[(Text, QArrayWritable)] =sc.sequenceFile[Text,QArrayWritable]($path) val count = rdd.count() Same exception as above. I don't see any MapInputFormat and understood that SequenceFileFormat should have been correct. < https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/InputFormat.html > What am I missing? Many thanks again, Amit In case it is pertinent... MyArrayWritable is public class MyArrayWritable extends ArrayWritable { public MyArrayWritable() { super(CustomWritable.class); } public QArrayWritable(CustomWritable[] values) { super(CustomWritable.class, values); } } and CustomWritable implements WritableComparable<CustomWritable>