PySpark sequence file support

Peter Aberline Fri, 18 Oct 2013 02:11:45 -0700

Hi

I've just noticed that the ability to read sequence files does not look like 
it's been implemented yet by the PySpark API?


Would it be a difficult task for me to add this feature without being familiar 
with the code base?

Alternatively, is there any work around for this? My data is in a single very 
large sequence file containing > 250,000 elements. My code is already in 
python. I'm writing the sequence file using Pydoop, so perhaps there is a way 
to build a RDD by reading in via Pydoop?

Thanks,
Peter

PySpark sequence file support

Reply via email to