awesome, it works!

Thank you so much for the help.


On Tue, Sep 24, 2013 at 11:19 PM, Nathan Kronenfeld <
[email protected]> wrote:

> I was assuming a header line in each file - that's what I've usually seen.
>
> But now he has an answer for either case.
>
>
> On Wed, Sep 25, 2013 at 2:15 AM, Reynold Xin <[email protected]> wrote:
>
>> Note that drops all partition's first line. You probably want to use an
>> index to drop only the first partition.
>>
>> i.e.
>>
>> data.mapPartitionsWithIndex { case (iter, index) =>
>>   if (index == 0) iter.drop(1) else iter
>> }
>>
>>
>> --
>> Reynold Xin, AMPLab, UC Berkeley
>> http://rxin.org
>>
>>
>>
>> On Tue, Sep 24, 2013 at 11:10 PM, Nathan Kronenfeld <
>> [email protected]> wrote:
>>
>>> You shouldn't even need the index.
>>>
>>> Just:
>>>
>>> data.mapPartitions(_.drop(1))
>>>
>>> should work, I think.
>>>
>>>
>>> On Wed, Sep 25, 2013 at 1:52 AM, Michael Kun Yang 
>>> <[email protected]>wrote:
>>>
>>>> thank you! But can you explain in more detail? I only want to skip the
>>>> first line, not the whole block.
>>>>
>>>>
>>>> On Tue, Sep 24, 2013 at 8:54 PM, Jason Lenderman <[email protected]
>>>> > wrote:
>>>>
>>>>> Perhaps you could use mapPartitionsWithIndex to do this.
>>>>>
>>>>>
>>>>> On Tue, Sep 24, 2013 at 4:52 PM, Michael Kun Yang <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Spark's filter can do this job, but it need to scan very line (row).
>>>>>> Is there a way to just skip the first line in the file?
>>>>>>
>>>>>> any feedback?
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 24, 2013 at 4:14 PM, Michael Kun Yang <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Dataframes usually have headers in the first row, how can I avoid
>>>>>>> reading the first row?
>>>>>>> I know in hadoop, I can figure it out by the line number.
>>>>>>>
>>>>>>> Best
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Nathan Kronenfeld
>>> Senior Visualization Developer
>>> Oculus Info Inc
>>> 2 Berkeley Street, Suite 600,
>>> Toronto, Ontario M5A 4J5
>>> Phone:  +1-416-203-3003 x 238
>>> Email:  [email protected]
>>>
>>
>>
>
>
> --
> Nathan Kronenfeld
> Senior Visualization Developer
> Oculus Info Inc
> 2 Berkeley Street, Suite 600,
> Toronto, Ontario M5A 4J5
> Phone:  +1-416-203-3003 x 238
> Email:  [email protected]
>

Reply via email to