Re: how to avoid reading the first line of dataframe?

Reynold Xin Tue, 24 Sep 2013 23:16:33 -0700

Note that drops all partition's first line. You probably want to use an
index to drop only the first partition.


i.e.

data.mapPartitionsWithIndex { case (iter, index) =>
  if (index == 0) iter.drop(1) else iter
}


--
Reynold Xin, AMPLab, UC Berkeley
http://rxin.org



On Tue, Sep 24, 2013 at 11:10 PM, Nathan Kronenfeld <
[email protected]> wrote:

> You shouldn't even need the index.
>
> Just:
>
> data.mapPartitions(_.drop(1))
>
> should work, I think.
>
>
> On Wed, Sep 25, 2013 at 1:52 AM, Michael Kun Yang <[email protected]>wrote:
>
>> thank you! But can you explain in more detail? I only want to skip the
>> first line, not the whole block.
>>
>>
>> On Tue, Sep 24, 2013 at 8:54 PM, Jason Lenderman 
>> <[email protected]>wrote:
>>
>>> Perhaps you could use mapPartitionsWithIndex to do this.
>>>
>>>
>>> On Tue, Sep 24, 2013 at 4:52 PM, Michael Kun Yang 
>>> <[email protected]>wrote:
>>>
>>>> Spark's filter can do this job, but it need to scan very line (row). Is
>>>> there a way to just skip the first line in the file?
>>>>
>>>> any feedback?
>>>>
>>>>
>>>> On Tue, Sep 24, 2013 at 4:14 PM, Michael Kun Yang <[email protected]
>>>> > wrote:
>>>>
>>>>> Dataframes usually have headers in the first row, how can I avoid
>>>>> reading the first row?
>>>>> I know in hadoop, I can figure it out by the line number.
>>>>>
>>>>> Best
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone:  +1-416-203-3003 x 238
> Email:  [email protected]
>

Re: how to avoid reading the first line of dataframe?

Reply via email to