I was assuming a header line in each file - that's what I've usually seen. But now he has an answer for either case.
On Wed, Sep 25, 2013 at 2:15 AM, Reynold Xin <[email protected]> wrote: > Note that drops all partition's first line. You probably want to use an > index to drop only the first partition. > > i.e. > > data.mapPartitionsWithIndex { case (iter, index) => > if (index == 0) iter.drop(1) else iter > } > > > -- > Reynold Xin, AMPLab, UC Berkeley > http://rxin.org > > > > On Tue, Sep 24, 2013 at 11:10 PM, Nathan Kronenfeld < > [email protected]> wrote: > >> You shouldn't even need the index. >> >> Just: >> >> data.mapPartitions(_.drop(1)) >> >> should work, I think. >> >> >> On Wed, Sep 25, 2013 at 1:52 AM, Michael Kun Yang >> <[email protected]>wrote: >> >>> thank you! But can you explain in more detail? I only want to skip the >>> first line, not the whole block. >>> >>> >>> On Tue, Sep 24, 2013 at 8:54 PM, Jason Lenderman >>> <[email protected]>wrote: >>> >>>> Perhaps you could use mapPartitionsWithIndex to do this. >>>> >>>> >>>> On Tue, Sep 24, 2013 at 4:52 PM, Michael Kun Yang <[email protected] >>>> > wrote: >>>> >>>>> Spark's filter can do this job, but it need to scan very line (row). >>>>> Is there a way to just skip the first line in the file? >>>>> >>>>> any feedback? >>>>> >>>>> >>>>> On Tue, Sep 24, 2013 at 4:14 PM, Michael Kun Yang < >>>>> [email protected]> wrote: >>>>> >>>>>> Dataframes usually have headers in the first row, how can I avoid >>>>>> reading the first row? >>>>>> I know in hadoop, I can figure it out by the line number. >>>>>> >>>>>> Best >>>>>> >>>>> >>>>> >>>> >>> >> >> >> -- >> Nathan Kronenfeld >> Senior Visualization Developer >> Oculus Info Inc >> 2 Berkeley Street, Suite 600, >> Toronto, Ontario M5A 4J5 >> Phone: +1-416-203-3003 x 238 >> Email: [email protected] >> > > -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: [email protected]
