Re: How to map each line to (line number, line)?

Aureliano Buendia Mon, 30 Dec 2013 08:29:09 -0800

On Mon, Dec 30, 2013 at 4:24 PM, Michael (Bach) Bui <[email protected]>wrote:


> Note that, Spark use HDFS API to access the file.
> HDFS API has KeyValueTextInputFormat that addresses Aureliano’s
> requirement.
>

It shouldn't be specific to text files, the same should happen with binary
files.


>
> I am just not sure it KeyValueTextInputFormat has been pulled into the
> latest version of spark yet.
> Without that, it may be messy to make sure that the partition boundary is
> a new line character.
>
> I think this usage pattern is important, if it is not yet available, I can
> try to pull it in.
>

I agree. It'd be super useful to have this feature.


>
> --------------------------------------------
> Michael (Bach) Bui, PhD,
> Senior Staff Architect, ADATAO Inc.
> www.adatao.com
>
>
>
>
> On Dec 30, 2013, at 6:28 AM, Aureliano Buendia <[email protected]>
> wrote:
>
> Hi,
>
> When reading a simple text file in spark, what's the best way of mapping
> each line to (line number, line)? RDD doesn't seem to have an equivalent of
> zipWithIndex.
>
>
>

Re: How to map each line to (line number, line)?

Reply via email to