Note that, Spark use HDFS API to access the file. 
HDFS API has KeyValueTextInputFormat that addresses Aureliano’s requirement.

I am just not sure it KeyValueTextInputFormat has been pulled into the latest 
version of spark yet.
Without that, it may be messy to make sure that the partition boundary is a new 
line character.

I think this usage pattern is important, if it is not yet available, I can try 
to pull it in.

--------------------------------------------
Michael (Bach) Bui, PhD,
Senior Staff Architect, ADATAO Inc.
www.adatao.com




On Dec 30, 2013, at 6:28 AM, Aureliano Buendia <[email protected]> wrote:

> Hi,
> 
> When reading a simple text file in spark, what's the best way of mapping each 
> line to (line number, line)? RDD doesn't seem to have an equivalent of 
> zipWithIndex.

Reply via email to