On Mon, Dec 30, 2013 at 4:24 PM, Michael (Bach) Bui <[email protected]>wrote:
> Note that, Spark use HDFS API to access the file. > HDFS API has KeyValueTextInputFormat that addresses Aureliano’s > requirement. > It shouldn't be specific to text files, the same should happen with binary files. > > I am just not sure it KeyValueTextInputFormat has been pulled into the > latest version of spark yet. > Without that, it may be messy to make sure that the partition boundary is > a new line character. > > I think this usage pattern is important, if it is not yet available, I can > try to pull it in. > I agree. It'd be super useful to have this feature. > > -------------------------------------------- > Michael (Bach) Bui, PhD, > Senior Staff Architect, ADATAO Inc. > www.adatao.com > > > > > On Dec 30, 2013, at 6:28 AM, Aureliano Buendia <[email protected]> > wrote: > > Hi, > > When reading a simple text file in spark, what's the best way of mapping > each line to (line number, line)? RDD doesn't seem to have an equivalent of > zipWithIndex. > > >
