Sorry, I didn't realize that zipWithIndex() is not in v0.9.1. It is in
the master branch and will be included in v1.0. It first counts number
of records per partition and then assigns indices starting from 0.
-Xiangrui

On Wed, Apr 23, 2014 at 9:56 AM, Chengi Liu <chengi.liu...@gmail.com> wrote:
> Also, zipWithIndex() is not valid.. Did you meant zipParititions?
>
>
> On Wed, Apr 23, 2014 at 9:55 AM, Chengi Liu <chengi.liu...@gmail.com> wrote:
>>
>> Xiangrui,
>>   So, is it that full code suggestion is :
>> val trigger = rddData.zipWithIndex().filter(
>> _._2 >= 10L).map(_._1)
>>
>> and then what DB Tsai recommended
>> trigger.mapPartitionsWithIndex((partitionIdx: Int, lines:
>> Iterator[String]) => {
>>   if (partitionIdx == 0) {
>>     lines.drop(n)
>>   }
>>   lines
>> })
>>
>> Is that the full operation..
>>
>> What happens, if I have to drop so many records that the number exceeds
>> partition 0.. ??
>> How do i handle that case?
>>
>>
>>
>>
>> On Wed, Apr 23, 2014 at 9:51 AM, Xiangrui Meng <men...@gmail.com> wrote:
>>>
>>> If the first partition doesn't have enough records, then it may not
>>> drop enough lines. Try
>>>
>>> rddData.zipWithIndex().filter(_._2 >= 10L).map(_._1)
>>>
>>> It might trigger a job.
>>>
>>> Best,
>>> Xiangrui
>>>
>>> On Wed, Apr 23, 2014 at 9:46 AM, DB Tsai <dbt...@stanford.edu> wrote:
>>> > Hi Chengi,
>>> >
>>> > If you just want to skip first n lines in RDD, you can do
>>> >
>>> > rddData.mapPartitionsWithIndex((partitionIdx: Int, lines:
>>> > Iterator[String])
>>> > => {
>>> >   if (partitionIdx == 0) {
>>> >     lines.drop(n)
>>> >   }
>>> >   lines
>>> > }
>>> >
>>> >
>>> > Sincerely,
>>> >
>>> > DB Tsai
>>> > -------------------------------------------------------
>>> > My Blog: https://www.dbtsai.com
>>> > LinkedIn: https://www.linkedin.com/in/dbtsai
>>> >
>>> >
>>> > On Wed, Apr 23, 2014 at 9:18 AM, Chengi Liu <chengi.liu...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hi,
>>> >>   What is the easiest way to skip first n lines in rdd??
>>> >> I am not able to figure this one out?
>>> >> Thanks
>>> >
>>> >
>>
>>
>

Reply via email to