Hi I've just noticed that the ability to read sequence files does not look like it's been implemented yet by the PySpark API?
Would it be a difficult task for me to add this feature without being familiar with the code base? Alternatively, is there any work around for this? My data is in a single very large sequence file containing > 250,000 elements. My code is already in python. I'm writing the sequence file using Pydoop, so perhaps there is a way to build a RDD by reading in via Pydoop? Thanks, Peter
