Re: how to avoid reading the first line of dataframe?

Nathan Kronenfeld Tue, 24 Sep 2013 23:20:27 -0700

I was assuming a header line in each file - that's what I've usually seen.

But now he has an answer for either case.



On Wed, Sep 25, 2013 at 2:15 AM, Reynold Xin <[email protected]> wrote:

> Note that drops all partition's first line. You probably want to use an
> index to drop only the first partition.
>
> i.e.
>
> data.mapPartitionsWithIndex { case (iter, index) =>
>   if (index == 0) iter.drop(1) else iter
> }
>
>
> --
> Reynold Xin, AMPLab, UC Berkeley
> http://rxin.org
>
>
>
> On Tue, Sep 24, 2013 at 11:10 PM, Nathan Kronenfeld <
> [email protected]> wrote:
>
>> You shouldn't even need the index.
>>
>> Just:
>>
>> data.mapPartitions(_.drop(1))
>>
>> should work, I think.
>>
>>
>> On Wed, Sep 25, 2013 at 1:52 AM, Michael Kun Yang 
>> <[email protected]>wrote:
>>
>>> thank you! But can you explain in more detail? I only want to skip the
>>> first line, not the whole block.
>>>
>>>
>>> On Tue, Sep 24, 2013 at 8:54 PM, Jason Lenderman 
>>> <[email protected]>wrote:
>>>
>>>> Perhaps you could use mapPartitionsWithIndex to do this.
>>>>
>>>>
>>>> On Tue, Sep 24, 2013 at 4:52 PM, Michael Kun Yang <[email protected]
>>>> > wrote:
>>>>
>>>>> Spark's filter can do this job, but it need to scan very line (row).
>>>>> Is there a way to just skip the first line in the file?
>>>>>
>>>>> any feedback?
>>>>>
>>>>>
>>>>> On Tue, Sep 24, 2013 at 4:14 PM, Michael Kun Yang <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Dataframes usually have headers in the first row, how can I avoid
>>>>>> reading the first row?
>>>>>> I know in hadoop, I can figure it out by the line number.
>>>>>>
>>>>>> Best
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Nathan Kronenfeld
>> Senior Visualization Developer
>> Oculus Info Inc
>> 2 Berkeley Street, Suite 600,
>> Toronto, Ontario M5A 4J5
>> Phone:  +1-416-203-3003 x 238
>> Email:  [email protected]
>>
>
>


-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  [email protected]

Re: how to avoid reading the first line of dataframe?

Reply via email to