Hi

I've just noticed that the ability to read sequence files does not look like 
it's been implemented yet by the PySpark API? 

Would it be a difficult task for me to add this feature without being familiar 
with the code base?

Alternatively, is there any work around for this? My data is in a single very 
large sequence file containing > 250,000 elements. My code is already in 
python. I'm writing the sequence file using Pydoop, so perhaps there is a way 
to build a RDD by reading in via Pydoop?

Thanks,
Peter

Reply via email to