I am having problems reading hadoop sequence files produced from a combo of 
Merge Content -> Create Hadoop Sequence File -> PutHDFS.   I have tried both a 
small dataset and larger dataset and the result is the same.  I can read a few 
of the records but it seems like the sequence files haven’t been correctly 
closed or completely written to HDFS as I get errors processing the entire file.


Example:

 val testSF = sc.sequenceFile(pathToFile, classOf[org.apache.hadoop.io.Text], 
classOf[org.apache.hadoop.io.BytesWritable]).map(x=>(x._1.toString, new 
String(x._2.copyBytes)))

Reading a few records works and has the content I expect:
testSF.take(10)

However, when trying to process the entire file such as doing a count I get:

 java.io.IOException: apache.hadoop.io.Text"org.apache.hadoop.io.Byt read 47 
bytes, should read 426734183

Reply via email to