You can use mapPartitionsWithIndex and look at the partition index (0 will be the first partition) to decide whether to skip the first line.
Matei On Apr 14, 2014, at 8:50 AM, Ethan Jewett <esjew...@gmail.com> wrote: > We have similar needs but IIRC, I came to the conclusion that this would only > work on ordered RDDs, and then you would still have to figure out which > partition is the first one. I ended up deciding it would be best to just drop > the header lines from a Scala iterator before creating an RDD based on it. > Not sure if this was the "right" thing to do, but would that work for you? > > Regards, > Ethan > > > On Mon, Apr 14, 2014 at 10:24 AM, Philip Ogren <philip.og...@oracle.com> > wrote: > Has there been any thought to adding a tail() method to RDD? It would be > really handy to skip over the first item in an RDD when it contains header > information. Even better would be a drop(int) function that would allow you > to skip over several lines of header information. Our attempts to do > something equivalent with a filter() call seem a bit contorted. Any thoughts? > > Thanks, > Philip >