You can use mapPartitionsWithIndex and look at the partition index (0 will be
the first partition) to decide whether to skip the first line.
Matei
On Apr 14, 2014, at 8:50 AM, Ethan Jewett wrote:
> We have similar needs but IIRC, I came to the conclusion that this would only
> work on ordered
We have similar needs but IIRC, I came to the conclusion that this would
only work on ordered RDDs, and then you would still have to figure out
which partition is the first one. I ended up deciding it would be best to
just drop the header lines from a Scala iterator before creating an RDD
based on
Has there been any thought to adding a tail() method to RDD? It would
be really handy to skip over the first item in an RDD when it contains
header information. Even better would be a drop(int) function that
would allow you to skip over several lines of header information. Our
attempts to do