Note that, Spark use HDFS API to access the file. HDFS API has KeyValueTextInputFormat that addresses Aureliano’s requirement.
I am just not sure it KeyValueTextInputFormat has been pulled into the latest version of spark yet. Without that, it may be messy to make sure that the partition boundary is a new line character. I think this usage pattern is important, if it is not yet available, I can try to pull it in. -------------------------------------------- Michael (Bach) Bui, PhD, Senior Staff Architect, ADATAO Inc. www.adatao.com On Dec 30, 2013, at 6:28 AM, Aureliano Buendia <[email protected]> wrote: > Hi, > > When reading a simple text file in spark, what's the best way of mapping each > line to (line number, line)? RDD doesn't seem to have an equivalent of > zipWithIndex.
