I am having problems reading hadoop sequence files produced from a combo of Merge Content -> Create Hadoop Sequence File -> PutHDFS. I have tried both a small dataset and larger dataset and the result is the same. I can read a few of the records but it seems like the sequence files haven’t been correctly closed or completely written to HDFS as I get errors processing the entire file.
Example: val testSF = sc.sequenceFile(pathToFile, classOf[org.apache.hadoop.io.Text], classOf[org.apache.hadoop.io.BytesWritable]).map(x=>(x._1.toString, new String(x._2.copyBytes))) Reading a few records works and has the content I expect: testSF.take(10) However, when trying to process the entire file such as doing a count I get: java.io.IOException: apache.hadoop.io.Text"org.apache.hadoop.io.Byt read 47 bytes, should read 426734183
