It worked for me:

a=[]
for i in range(0,10000):
   a.append(i)

def f(iterator): yield sum(1 for _ in iterator)

print sc.parallelize(a, 16).mapPartitions(lambda x: f(x)).collect()


13/09/27 17:58:26 INFO spark.SparkContext: Job finished: collect at
NativeMethodAccessorImpl.java:-2, took 0.172441 s
[625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625, 625,
625]



--
Reynold Xin, AMPLab, UC Berkeley
http://rxin.org



On Thu, Sep 26, 2013 at 10:08 PM, Shangyu Luo <[email protected]> wrote:

> I can see the test for ParallelCollectionRDD.slice().
> But how to explain the result of my test?
> The following is the simple code I used for test
>     a=[]
>     for i in range(0,10000):
>                 a.append(i)
>     print sc.parallelize(a, 16).mapPartitions(lambda x: f(x)).collect()
> and the result is [0, 523776, 0, 1572352, 2620928, 0, 3669504, 4718080, 0,
> 5766656, 0, 6815232, 7863808, 0, 8912384, 7532280]
>
>
> 2013/9/26 Mike <[email protected]>
>
>> > It does in fact attempt to do that, and the tests check for that, but
>> > there's no guarantee in its API.  Of course "equally" here means +/-
>> > one element.
>>
>> Correction: *except* when the Seq is a NumericRange: then it shorts the
>> last partition.  E.g., 91 elements split 10 ways -> 10 elements in the
>> first 9 partitions, one in the last.
>>
>
>
>
> --
> --
>
> Shangyu, Luo
> Department of Computer Science
> Rice University
>
> --
> Not Just Think About It, But Do It!
> --
> Success is never final.
> --
> Losers always whine about their best
>

Reply via email to