awesome, it works! Thank you so much for the help.
On Tue, Sep 24, 2013 at 11:19 PM, Nathan Kronenfeld < [email protected]> wrote: > I was assuming a header line in each file - that's what I've usually seen. > > But now he has an answer for either case. > > > On Wed, Sep 25, 2013 at 2:15 AM, Reynold Xin <[email protected]> wrote: > >> Note that drops all partition's first line. You probably want to use an >> index to drop only the first partition. >> >> i.e. >> >> data.mapPartitionsWithIndex { case (iter, index) => >> if (index == 0) iter.drop(1) else iter >> } >> >> >> -- >> Reynold Xin, AMPLab, UC Berkeley >> http://rxin.org >> >> >> >> On Tue, Sep 24, 2013 at 11:10 PM, Nathan Kronenfeld < >> [email protected]> wrote: >> >>> You shouldn't even need the index. >>> >>> Just: >>> >>> data.mapPartitions(_.drop(1)) >>> >>> should work, I think. >>> >>> >>> On Wed, Sep 25, 2013 at 1:52 AM, Michael Kun Yang >>> <[email protected]>wrote: >>> >>>> thank you! But can you explain in more detail? I only want to skip the >>>> first line, not the whole block. >>>> >>>> >>>> On Tue, Sep 24, 2013 at 8:54 PM, Jason Lenderman <[email protected] >>>> > wrote: >>>> >>>>> Perhaps you could use mapPartitionsWithIndex to do this. >>>>> >>>>> >>>>> On Tue, Sep 24, 2013 at 4:52 PM, Michael Kun Yang < >>>>> [email protected]> wrote: >>>>> >>>>>> Spark's filter can do this job, but it need to scan very line (row). >>>>>> Is there a way to just skip the first line in the file? >>>>>> >>>>>> any feedback? >>>>>> >>>>>> >>>>>> On Tue, Sep 24, 2013 at 4:14 PM, Michael Kun Yang < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Dataframes usually have headers in the first row, how can I avoid >>>>>>> reading the first row? >>>>>>> I know in hadoop, I can figure it out by the line number. >>>>>>> >>>>>>> Best >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> Nathan Kronenfeld >>> Senior Visualization Developer >>> Oculus Info Inc >>> 2 Berkeley Street, Suite 600, >>> Toronto, Ontario M5A 4J5 >>> Phone: +1-416-203-3003 x 238 >>> Email: [email protected] >>> >> >> > > > -- > Nathan Kronenfeld > Senior Visualization Developer > Oculus Info Inc > 2 Berkeley Street, Suite 600, > Toronto, Ontario M5A 4J5 > Phone: +1-416-203-3003 x 238 > Email: [email protected] >
